CN111832282A

CN111832282A - External knowledge fused BERT model fine adjustment method and device and computer equipment

Info

Publication number: CN111832282A
Application number: CN202010688347.5A
Authority: CN
Inventors: 阮鸿涛; 郑立颖; 徐亮; 阮晓雯
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-07-16
Filing date: 2020-07-16
Publication date: 2020-10-27
Anticipated expiration: 2040-07-16
Also published as: CN111832282B; WO2021139266A1

Abstract

The invention discloses a method, a device and computer equipment for fine tuning a BERT model fused with external knowledge, wherein the method comprises the following steps: if an input Chinese sentence is received, obtaining a sentence vector and a part-of-speech vector of the Chinese sentence according to the BERT model; extracting an semantic set of the Chinese sentence from a preset external knowledge base; inputting the sememes in the sememe set into the BERT model to obtain a sememe vector set of the sememe set; screening out the semantic vector of the Chinese sentence from the semantic vector set; and fusing the sentence vector, the part-of-speech vector and the primitive sense vector of the Chinese sentence according to a preset fusion rule to finish fine tuning of the BERT model. Based on the natural language processing technology in the artificial intelligence, the invention not only successfully integrates external knowledge into the BERT model to finish fine adjustment of the BERT model, improves the accuracy of the BERT model for text analysis, but also can expand the text analysis field of the BERT model according to the external knowledge.

Description

External knowledge fused BERT model fine adjustment method and device and computer equipment

Technical Field

The invention relates to the technical field of classification models, in particular to a method and a device for fine tuning a BERT model fusing external knowledge and computer equipment.

Background

A Pre-training model (PTM) is a network model that is trained on a large data set and then stored. The pre-training model is used for transfer learning and can be used as a feature extraction device. The application of the pre-trained model is generally the following process: firstly, under the condition that the calculation performance is met, a pre-training model is trained by using a certain larger data set, then the pre-training model is modified according to different tasks, and finally, the data set of a new task is used for fine adjustment on the modified pre-training model. The pre-training model has the advantages that the training cost is low, the convergence rate can be higher by matching with downstream tasks, the model performance can be effectively improved, particularly, the model has a better effect on tasks with scarce training data, namely, the model is learned based on a better initial state, and the better performance can be achieved. The pre-training model is adopted, the whole network structure does not need to be retrained, and only the training is carried out on a plurality of layers of networks.

Currently, there are two methods for applying a pre-training model to a downstream classification task: feature-based (Feature-based) method and Fine-tuning (Fine-tuning) method, where Feature-based refers to using Word vectors trained by a pre-training language model as features, and inputting the features into a downstream target task, an ELMO (Embedding from language model) model is mainly adopted, a network structure of the ELMO model adopts a two-layer bidirectional LSTM, and the ELMO provides a Feature form of each Word, namely Word Embedding of context information, downstream, and then directly inputs the Word Embedding into other network (GRU, CNN) structures, and parameters to be trained in the downstream task are only weights of each Embedding mentioned above. The Fine-tuning is to add a small amount of Task-specific parameters on the basis of a trained language model and then retrain the new corpus for Fine tuning, wherein the Fine-tuning comprises a BERT (Bidirectional Encode retransmission from the models) model, the feature extraction in the BERT model uses a Transformer which has stronger feature extraction capability than LSTM, and the BERT uses a MLM (MaskLanguge model) mode to enable the encoder of the Transformer to realize fusion of Bidirectional features, but the BERT model still cannot be completely qualified for the analysis Task of knowledge information in a specific field after the pre-training of a large-scale unmarked corpus and in the subsequent Fine tuning process.

Disclosure of Invention

The embodiment of the invention provides a method and a device for fine tuning a BERT model fusing external knowledge and computer equipment, and solves the problem that the BERT model cannot be completely competent for the analysis task of knowledge information in a specific field in the subsequent fine tuning process.

In a first aspect, an embodiment of the present invention provides a method for fine tuning a BERT model fused with external knowledge, which includes:

if an input Chinese sentence is received, obtaining a sentence vector and a part-of-speech vector of the Chinese sentence according to the BERT model;

extracting an semantic set of the Chinese sentence from a preset external knowledge base;

inputting the sememes in the sememe set into the BERT model to obtain a sememe vector set of the sememe set;

screening out the semantic vector of the Chinese sentence from the semantic vector set;

and fusing the sentence vector, the part-of-speech vector and the primitive sense vector of the Chinese sentence according to a preset fusion rule to finish fine tuning of the BERT model.

In a second aspect, an embodiment of the present invention provides a fine tuning apparatus for a BERT model fusing external knowledge, including:

the first obtaining unit is used for obtaining a sentence vector and a part-of-speech vector of the Chinese sentence according to the BERT model if the input Chinese sentence is received;

the semantic extraction unit is used for extracting a semantic set of the Chinese sentences from a preset external knowledge base;

a second obtaining unit, configured to input an primitive in the primitive set into the BERT model to obtain a primitive vector set of the primitive set;

the screening unit is used for screening out the semantic vector of the Chinese sentence from the semantic vector set;

and the fusion unit is used for fusing the sentence vector, the part-of-speech vector and the primitive sense vector of the Chinese sentence according to a preset fusion rule so as to finish fine tuning of the BERT model.

In a third aspect, an embodiment of the present invention further provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer program to implement the method for fine tuning of the external knowledge fused BERT model according to the first aspect.

In a fourth aspect, the present invention further provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program, when executed by a processor, causes the processor to execute the method for fine tuning a BERT model that fuses external knowledge according to the first aspect.

The embodiment of the invention provides a method, a device and computer equipment for fine tuning a BERT model fusing external knowledge. The method for finely adjusting the BERT model fusing the external knowledge successfully fuses the external knowledge into the BERT model to finish fine adjustment of the BERT model, improves the accuracy of the BERT model in analyzing the text, and can also improve the field of analyzing the text by the BERT model according to the external knowledge.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flow chart of a fine tuning method of a BERT model with external knowledge fused according to an embodiment of the present invention;

FIG. 2 is a sub-flow diagram of a method for fine tuning a BERT model with external knowledge incorporated therein according to an embodiment of the present invention;

FIG. 3 is a schematic view of another sub-flow of a fine tuning method of a BERT model with external knowledge fused according to an embodiment of the present invention;

FIG. 4 is a schematic view of another sub-flow of a fine tuning method of a BERT model with external knowledge fused according to an embodiment of the present invention;

FIG. 5 is a schematic view of another sub-flow of a fine tuning method of a BERT model with external knowledge fused according to an embodiment of the present invention;

FIG. 6 is a schematic view of another sub-flow of a fine tuning method of a BERT model with external knowledge fused according to an embodiment of the present invention;

FIG. 7 is a schematic block diagram of a fine tuning apparatus of a BERT model fusing external knowledge according to an embodiment of the present invention;

FIG. 8 is a block diagram of a sub-unit schematic diagram of a fine tuning apparatus of a BERT model fusing external knowledge provided by an embodiment of the present invention;

FIG. 9 is a schematic block diagram of another sub-unit of a fine tuning apparatus of a BERT model fusing external knowledge according to an embodiment of the present invention;

FIG. 10 is a schematic block diagram of another sub-unit of a fine tuning apparatus of a BERT model fusing external knowledge according to an embodiment of the present invention;

FIG. 11 is a schematic block diagram of another sub-unit of a fine tuning apparatus of a BERT model fusing external knowledge according to an embodiment of the present invention;

FIG. 12 is a schematic block diagram of a computer device provided by an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

Referring to fig. 1, fig. 1 is a schematic flow chart illustrating a method for tuning a BERT model with external knowledge fused according to an embodiment of the present invention. The method for finely tuning the BERT model fusing the external knowledge is built and operated in a server, after an input Chinese sentence is received in the process of finely tuning the BERT model in the server, sentence vectors and part-of-speech vectors of the Chinese sentence are obtained through a pre-trained BERT model, simultaneously, the sememes of all sentences in the Chinese sentence are extracted from a preset external knowledge base, then the extracted sememes are screened and input into the pre-trained BERT model to obtain the sememe vectors of the Chinese sentence, and the sentence vectors, the part-of-speech vectors and the sememe vectors of the Chinese sentence are fused, so that not only can the external network of the pre-trained BERT model be updated, but also the fine tuning of the pre-trained BERT model can be completed.

As shown in fig. 1, the method includes steps S110 to S150.

S110, if an input Chinese sentence is received, a sentence vector and a part-of-speech vector of the Chinese sentence are obtained according to the BERT model.

And if an input Chinese sentence is received, obtaining a sentence vector and a part-of-speech vector of the Chinese sentence according to the BERT model. The Chinese sentences are sentences for fine tuning the BERT model, the BERT model is a language model which is trained in advance, and the part-of-speech vectors of the Chinese sentences are sentence vectors containing the parts-of-speech of the words in the Chinese sentences. The Chinese sentence is input into a server in a character string mode, and when the server receives the character string of the Chinese sentence, a sentence vector and a part-of-speech vector of the Chinese sentence can be obtained in two modes according to the BERT model. The first method comprises the following steps: performing word segmentation processing on the Chinese sentence to obtain a word of the Chinese sentence, performing part-of-speech tagging on the word to obtain a part-of-speech tagged word, then respectively inputting the Chinese sentence and the part-of-speech tagged word into the BERT model to obtain a sentence vector of the Chinese sentence and a part-of-speech vector of the word, and finally constructing the part-of-speech vector of the Chinese sentence according to the part-of-speech vector of the word; the second method is as follows: the method comprises the steps of performing word segmentation processing on the Chinese sentence to obtain words of the Chinese sentence, performing part-of-speech tagging on the words to obtain words after part-of-speech tagging, then respectively inputting the words and the words after part-of-speech tagging into a BERT model to obtain word vectors and part-of-speech vectors of the words, and finally constructing the sentence vectors and the part-of-speech vectors of the Chinese sentence according to the word vectors and the part-of-speech vectors of the words.

In one embodiment, as shown in fig. 2, the step S110 of obtaining a sentence vector of the chinese sentence according to the BERT model includes steps S111 and S112.

And S111, performing word segmentation processing on the Chinese sentence to obtain words in the Chinese sentence.

And performing word segmentation processing on the Chinese sentence to obtain words in the Chinese sentence. Specifically, in the process of performing word segmentation processing on the chinese sentence, there are generally three methods for performing word segmentation processing on the chinese sentence, including: the method comprises the following steps of (1) performing word segmentation based on character string matching, a word segmentation based on statistics and a word segmentation based on understanding, wherein the word segmentation based on the character string is a word segmentation method for matching a Chinese character string with entries in a dictionary; the word segmentation method based on statistics calculates the adjacent co-occurrence probability of each word by counting the frequency of the combination of adjacent co-occurrence words in the material, thereby judging whether the adjacent words can be formed into words; the understanding-based word segmentation method achieves the effect of recognizing words by enabling a computer to simulate the understanding of a human being to the Chinese sentence. In the embodiment of the invention, the Chinese sentence is subjected to word segmentation by adopting a reverse maximum matching method in the character string-based word segmentation method, and the word segmentation process comprises the following steps: and setting the number of Chinese characters contained in the longest entry in a preset dictionary as L, and starting processing from the tail of the character string of the Chinese sentence. At the beginning of each cycle, the last L characters of the character string are taken as processing objects, and the dictionary is searched. If the dictionary has such an L word, the matching is successful, and the processing object is segmented as a word; if the Chinese sentence is not successful, the first Chinese character of the processing object is removed, the rest character strings are used as new processing objects, matching is carried out again until the segmentation is successful, namely, one round of matching is completed, a word is segmented, and the steps are circulated until all the words in the Chinese sentence are segmented.

For example, the length of the longest word in the dictionary is 6, and for a character string of the Chinese sentence, which is "computer science and technology", 6 characters of "computer science and technology" are firstly taken as the character string to be processed, and the word is not in the dictionary, so that the matching fails; the first character is removed, the rest 'science and technology' is used as a new character string to be processed, and the matching is failed again; in this way, the 'technology' is finally removed as a matching field, if the word exists in the dictionary, the matching is successful, and the first word 'technology' is segmented. Then, the remaining character string "computer science and" in the Chinese sentence is taken, and the second word "and" is cut. So circulating, the result of the final segmentation is as follows: "computer", "science", "and", "technology".

And S112, inputting the words into the BERT model to obtain word vectors of the words, and constructing sentence vectors of the Chinese sentences according to the word vectors.

And inputting the words into the BERT model to obtain word vectors of the words and constructing sentence vectors of the Chinese sentences according to the word vectors. Specifically, the BERT model is a sentence-level language model, unlike the ELMo model that weights are added to each layer to perform global pooling when the model is spliced with a downstream specific NLP task, the BERT can directly obtain a unique vector representation of an entire sentence. The method adds a special mark [ CL ] in front of each input, then allows a Transformer to carry out deep encoding on the [ CL ], because the Transformer can encode global information into each position regardless of space and distance, and the highest hidden layer of the [ CL ] is directly connected with an output layer of softmax as the representation of a sentence/sentence pair, so that the highest hidden layer can be used as a 'checkpoint' on a gradient back propagation path, and the upper layer characteristics of the whole input can be learned. The process of constructing the sentence vector of the Chinese sentence according to the word vector comprises the following steps: and inputting all the words in the Chinese sentences into the BERT model to obtain word vectors of all the words in the Chinese sentences, and then performing vector superposition on the word vectors of all the words in the Chinese sentences according to the arrangement sequence of the characters in the Chinese sentences to obtain the sentence vectors of the Chinese sentences.

In one embodiment, as shown in fig. 3, the step S110 of obtaining the part-of-speech vector of the chinese sentence according to the BERT model includes steps S113 and S114.

And S113, performing part-of-speech tagging on the words according to a preset part-of-speech tagging rule to obtain the words after the part-of-speech tagging.

And performing part-of-speech tagging on the words according to a preset part-of-speech tagging rule to obtain the words after the part-of-speech tagging. The part-of-speech tagging rule is rule information for performing part-of-speech tagging on the word to obtain the word after the part-of-speech tagging. The part-of-speech tagging refers to a procedure for tagging the word with a correct part-of-speech, that is, a process for determining whether the word is a noun, a verb, an adjective, or another part-of-speech. The part of speech is the grammatical attribute of the word and is determined according to the grammatical function of the word in the combination. The grammatical attributes of words in Chinese include fourteen attributes of nouns, verbs, adjectives, numerators, quantifiers, pronouns, distinguishments, adverbs, prepositions, conjunctions, auxiliary words, sighs, adversaries and vocabularies. Specifically, in the process of performing part-of-speech tagging on the words, syntactic analysis is performed on the chinese sentences to determine the position relationship of the words in the sentences, part-of-speech information of the words is acquired from a preset part-of-speech tagging set according to the position relationship of the words in the sentences, and then the words are tagged according to a BIES tagging standard, that is, a character at the beginning of the word is represented by B, a character at the end of the word is represented by E, a character in the middle of the word is represented by I, a single word becomes a character of the word, and X represents the part-of-speech information of the words. If the word is a word with two characters, the word is represented as B-E-X; if the word is of a single character, the word is represented as S-X; if the word is a word with three characters, the word is represented as B-I-E-X; if the word is a four-character word, the word is represented as B-I-I-E-X.

S114, inputting the words with the parts of speech tagged into the BERT model to obtain part of speech vectors of the words, and constructing the part of speech vectors of the Chinese sentences according to the part of speech vectors of the words.

And inputting the words with the parts of speech tagged into the BERT model to obtain part of speech vectors of the words, and constructing the part of speech vectors of the Chinese sentences according to the part of speech vectors of the words. Specifically, the part-of-speech tagged word carries part-of-speech information of the sentence, the part-of-speech vector of the word is a word vector with part-of-speech tagging obtained after the sentence is subjected to word vectorization in the BERT model, and constructing the part-of-speech vector of the chinese sentence according to the part-of-speech vector of the word is: and performing vector superposition on the part-of-speech vectors of all words in the Chinese sentence according to the arrangement sequence of the characters in the Chinese sentence to obtain the sentence vector of the Chinese sentence.

And S120, extracting the semantic set of the Chinese sentence from a preset external knowledge base.

And extracting the semantic set of the Chinese sentence from a preset external knowledge base. Specifically, the external knowledge base is an open source knowledge network (OpenHowNet), and OpenHowNet is a common knowledge base that uses concepts represented by words of chinese and english as description objects to disclose relationships between concepts and attributes of the concepts as basic contents. The semantic source is the smallest inseparable semantic unit in linguistics, that is, one word can correspond to a plurality of semantic sources, and the semantic source set is the set of the semantic sources in the Chinese sentence. The process of extracting the semantic set of the Chinese sentence from the external knowledge base comprises the following steps: performing word segmentation processing on the Chinese sentence to obtain a word of the Chinese sentence, acquiring all the sememes corresponding to the word from OpenHowNet according to the mapping relation between the word and the sememes in OpenHowNet, and taking all the sememes corresponding to the word as an sememe set of the Chinese sentence.

S130, inputting the sememes in the sememe set into the BERT model to obtain a sememe vector set of the sememe set.

And inputting the sememes in the sememe set into the BERT model to obtain a sememe vector set of the sememe set. Specifically, the set of the primitive vectors of the primitive set is a set of the primitive vectors in the primitive set, and the primitives in the primitive set are stored in the server in a character string form before being input into the BERT model, so that all the primitives in the primitive set are input into the BERT model to be vectorized to obtain the primitive vectors of all the primitives in the primitive set, that is, the set of the primitive vectors.

S140, screening out the semantic vector of the Chinese sentence from the semantic vector set.

And screening out the semantic vector of the Chinese sentence from the semantic vector set. Specifically, because a plurality of sememes in the sememe set may correspond to each word in the chinese sentence, and the sememe vectors in the sememe set are obtained by inputting the sememes in the sememe set into the BERT model, and a plurality of sememe vectors in the sememe vector set correspond to each word vector in the chinese sentence, it is necessary to perform screening from the sememe vector set so that each word in the chinese sentence corresponds to only one sememe vector in the sememe vector set, that is, to perform screening from the sememe vector set according to the part of speech of each word in the chinese sentence to obtain a sememe vector of the chinese sentence.

In an embodiment, as shown in fig. 4, step S140 includes sub-steps S141 and S142.

S141, calculating the similarity between the semantic vectors in the semantic vector set and the part-of-speech vectors of the words.

And calculating the similarity between the semantic vectors in the semantic vector set and the part of speech vectors of the words. Specifically, the similarity is a reaction of distances obtained by calculating distances between a plurality of primitive vectors corresponding to the words in the primitive vector set and the part-of-speech vectors of the words, respectively, the longer the distance is, the lower the similarity between the primitive vectors in the primitive vector set and the part-of-speech vectors of the words is, and otherwise, the higher the similarity is, and the primitive vector with the highest similarity is used as the primitive vector of the corresponding word in the chinese sentence. The distance calculation comprises calculation such as Euclidean distance calculation, Manhattan distance calculation, Chebyshev distance calculation, Minkowski distance calculation, standardized Euclidean distance calculation, Mahalanobis distance calculation, included angle cosine calculation, Hamming distance calculation, Jacard similarity coefficient calculation, correlation coefficient calculation, information entropy calculation and the likeA method. In this embodiment, the similarity score is obtained by using a euclidean distance calculation method. The euclidean distance is a commonly used distance definition, referring to the true distance between two points in n-dimensional space, or the natural length of a vector. The Euclidean distance calculation formula of the semantic vectors in the semantic vector set and the part-of-speech vectors of the words is as follows:

where n denotes the dimension of the vector, x_1kIs a part-of-speech vector, x, of said word_2kIs the original vector in the set of original vectors.

S142, obtaining the semantic vector of the Chinese sentence from the semantic vector set according to the similarity.

And acquiring the semantic vector of the Chinese sentence from the semantic vector set according to the similarity. Because the similarity is a reaction of distances obtained by calculating the distances between a plurality of primitive vectors corresponding to the words in the primitive vector set and the part-of-speech vectors of the words respectively, the longer the distance is, the lower the similarity between the primitive vectors in the primitive vector set and the part-of-speech vectors of the words is, and otherwise, the higher the similarity is, therefore, all the primitive vectors in the primitive vector set are calculated by the similarity, and when the distance between one primitive vector in the primitive vector set and the part-of-speech vector of the word is the shortest, the highest similarity between the primitive vector and the part-of-speech vector of the word is, namely one primitive vector in the Chinese sentence.

S150, according to a preset fusion rule, fusing the sentence vector, the part-of-speech vector and the primitive sense vector of the Chinese sentence to finish fine tuning of the BERT model.

And fusing the sentence vector, the part-of-speech vector and the primitive sense vector of the Chinese sentence according to a preset fusion rule to finish fine tuning of the BERT model. The fusion rule is rule information for fusing a sentence vector, a word vector and an original meaning vector of the Chinese sentence to finish fine tuning of the BERT model. Specifically, the blending rule can not only finish fine adjustment of the BERT model, but also update the external network of the BERT model, so that the accuracy of the BERT model in classifying the sentences in the text is improved.

In one embodiment, as shown in FIG. 4, step S150 includes sub-steps S151 and S152.

And S151, splicing the part-of-speech vector, the primitive sense vector and the sentence vector of the Chinese sentence to obtain a spliced sentence vector.

And splicing the part-of-speech vector, the primitive sense vector and the sentence vector of the Chinese sentence to obtain a spliced sentence vector. Specifically, the part-of-speech vector, the primitive-sense vector and the sentence vector of the Chinese sentence are spliced to be vector superposition of the part-of-speech vector, the primitive-sense vector and the sentence vector of the Chinese sentence, so as to obtain a spliced sentence vector, wherein the spliced sentence vector not only contains semantic information of the Chinese sentence, but also contains primitive-sense information corresponding to the Chinese sentence.

S152, inputting the spliced sentence vectors into a preset first recurrent neural network to finish fine adjustment of the BERT model.

And inputting the spliced sentence vector into a preset first recurrent neural network to finish fine adjustment of the BERT model. Specifically, the first recurrent neural network is a trained external network of the BERT model, the first recurrent neural network may be a GRU recurrent neural network or a BiLSTM recurrent neural network, and the first recurrent neural network may be updated after the spliced sentence vector is input into the first recurrent neural network, that is, the external network of the BERT is updated, thereby completing fine tuning of the BERT model.

In another embodiment, as shown in fig. 6, step S150 includes sub-steps S1501 and S1502.

S1501, the part-of-speech vectors and the primitive sense vectors of the Chinese sentences are respectively input into a preset second cyclic neural network to obtain the part-of-speech sentence vectors and the primitive sense sentence vectors of the Chinese sentences.

And respectively inputting the part-of-speech vector and the original sense vector of the Chinese sentence into a preset second cyclic neural network to obtain the part-of-speech sentence vector and the original sense sentence vector of the Chinese sentence. Specifically, the second recurrent neural network is a trained external network of the BERT model, the second recurrent neural network may be a GRU recurrent neural network or a BiLSTM recurrent neural network, the part-of-speech vector and the primitive-sense vector of the chinese sentence are respectively input to the second recurrent neural network to perform semantic updating on the second recurrent neural network, and the part-of-speech vector and the primitive-sense vector of the chinese sentence output by the second recurrent neural network may be used to analyze the chinese sentence subsequently.

S1502, the part-of-speech sentence vectors, the primitive sense sentence vectors and the sentence vectors of the Chinese sentences are spliced to finish fine adjustment of the BERT model.

And splicing the part-of-speech sentence vector, the primitive sentence vector and the sentence vector of the Chinese sentence to finish fine adjustment of the BERT model. Specifically, the word-wise sentence vector, the primitive sense sentence vector and the sentence vector of the Chinese sentence are spliced, that is, the word-wise sentence vector, the primitive sense sentence vector and the sentence vector of the Chinese sentence are subjected to vector superposition to obtain the semantic vector of the Chinese sentence, so that the fine tuning of the BERT model is completed, and the problem that the subsequent Chinese sentences in the same field cannot be analyzed is solved.

The embodiment of the invention also provides a device 100 for fine tuning the external knowledge fused BERT model, which is used for executing any one of the embodiments of the method for fine tuning the external knowledge fused BERT model. Specifically, referring to fig. 7, fig. 7 is a schematic block diagram of a fine tuning apparatus 100 for a BERT model with external knowledge fusion according to an embodiment of the present invention.

As shown in fig. 7, the fine tuning apparatus 100 for a BERT model fusing external knowledge includes a first obtaining unit 110, an extraction unit 120, a second obtaining unit 130, a filtering unit 140, and a fusing unit 150.

The first obtaining unit 110 is configured to, if an input chinese sentence is received, obtain a sentence vector and a part-of-speech vector of the chinese sentence according to the BERT model.

In another embodiment of the present invention, as shown in fig. 8, the first obtaining unit 110 includes a word segmentation unit 111, a third obtaining unit 112, a part of speech tagging unit 113, and a fourth obtaining unit 114.

And the word segmentation unit 111 is configured to perform word segmentation processing on the chinese sentence to obtain a word in the chinese sentence.

A third obtaining unit 112, configured to input the word into the BERT model to obtain a word vector of the word, and construct a sentence vector of the chinese sentence according to the word vector.

And a part-of-speech tagging unit 113, configured to perform part-of-speech tagging on the word according to a preset part-of-speech tagging rule to obtain a part-of-speech tagged word.

A fourth obtaining unit 114, configured to input the part-of-speech tagged word into the BERT model to obtain a part-of-speech vector of the word, and construct a part-of-speech vector of the chinese sentence according to the part-of-speech vector of the word.

And an semantic extracting unit 120, configured to extract a semantic set of the chinese sentence from a preset external knowledge base.

A second obtaining unit 130, configured to input an primitive in the primitive set into the BERT model to obtain a set of primitive vectors of the primitive set.

A screening unit 140, configured to screen out an semantic vector of the chinese statement from the set of semantic vectors.

In other invention embodiments, as shown in fig. 9, the screening unit 140 includes a calculating unit 141 and a selecting unit 142.

A calculating unit 141, configured to calculate a similarity between an original vector in the original vector set and a part-of-speech vector of the word.

A selecting unit 142, configured to obtain an semantic vector of the chinese statement from the semantic vector set according to the similarity.

And the fusion unit 150 is configured to fuse the sentence vector, the part-of-speech vector, and the primitive sense vector of the chinese sentence according to a preset fusion rule to complete fine tuning of the BERT model.

In other inventive embodiments, as shown in fig. 10, the fusion unit 150 includes a first splicing unit 151 and a first generation unit 152.

The first concatenation unit 151 is configured to concatenate the part-of-speech vector, the primitive sense vector, and the sentence vector of the chinese sentence to obtain a concatenated sentence vector.

A first generating unit 152, configured to input the stitched sentence vector into a preset first recurrent neural network to complete fine tuning of the BERT model.

In other inventive embodiments, as shown in fig. 11, the fusion unit 150 includes a second generation unit 1501 and a second splicing unit 1502.

The second generating unit 1501 is configured to input the part-of-speech vector and the primitive-sense vector of the chinese sentence into a preset second recurrent neural network, respectively, to obtain the part-of-speech sentence vector and the primitive-sense sentence vector of the chinese sentence.

A second concatenation unit 1502 is configured to concatenate the part-of-speech sentence vector, the primitive sentence vector, and the sentence vector of the chinese sentence to complete fine tuning of the BERT model.

The external knowledge fused BERT model fine tuning device 100 provided by the embodiment of the invention is used for executing the above steps of obtaining a sentence vector and a part-of-speech vector of a Chinese sentence according to the BERT model if the input Chinese sentence is received; extracting an semantic set of the Chinese sentence from a preset external knowledge base; inputting the sememes in the sememe set into the BERT model to obtain a sememe vector set of the sememe set; screening out the semantic vector of the Chinese sentence from the semantic vector set; and fusing the sentence vector, the part-of-speech vector and the primitive sense vector of the Chinese sentence according to a preset fusion rule to finish fine tuning of the BERT model.

Referring to fig. 12, fig. 12 is a schematic block diagram of a computer device according to an embodiment of the present invention.

Referring to fig. 12, the device 500 includes a processor 502, memory, and a network interface 505 connected by a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.

The non-volatile storage medium 503 may store an operating system 5031 and a computer program 5032. The computer program 5032, when executed, may cause the processor 502 to perform a method of fine tuning a BERT model that incorporates external knowledge.

The processor 502 is used to provide computing and control capabilities that support the operation of the overall device 500.

The internal memory 504 provides an environment for running the computer program 5032 in the non-volatile storage medium 503, and when the computer program 5032 is executed by the processor 502, the processor 502 may be caused to execute a method for tuning the BERT model that incorporates external knowledge.

The network interface 505 is used for network communication, such as providing transmission of data information. It will be appreciated by those skilled in the art that the configuration shown in fig. 12 is a block diagram of only a portion of the configuration associated with the inventive arrangements and is not intended to limit the apparatus 500 to which the inventive arrangements may be applied, and that a particular apparatus 500 may include more or less components than those shown, or some components may be combined, or have a different arrangement of components.

Wherein the processor 502 is configured to run the computer program 5032 stored in the memory to implement the following functions: if an input Chinese sentence is received, obtaining a sentence vector and a part-of-speech vector of the Chinese sentence according to the BERT model; extracting an semantic set of the Chinese sentence from a preset external knowledge base; inputting the sememes in the sememe set into the BERT model to obtain a sememe vector set of the sememe set; screening out the semantic vector of the Chinese sentence from the semantic vector set; and fusing the sentence vector, the part-of-speech vector and the primitive sense vector of the Chinese sentence according to a preset fusion rule to finish fine tuning of the BERT model.

Those skilled in the art will appreciate that the embodiment of the apparatus 500 shown in fig. 12 does not constitute a limitation on the specific construction of the apparatus 500, and in other embodiments, the apparatus 500 may include more or fewer components than shown, or some components may be combined, or a different arrangement of components. For example, in some embodiments, the apparatus 500 may only include the memory and the processor 502, and in such embodiments, the structure and function of the memory and the processor 502 are the same as those of the embodiment shown in fig. 12, and are not repeated herein.

It should be understood that in the present embodiment, the Processor 502 may be a Central Processing Unit (CPU), and the Processor 502 may also be other general-purpose processors 502, a Digital Signal Processor 502 (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. The general-purpose processor 502 may be a microprocessor 502 or the processor 502 may be any conventional processor 502 or the like.

In another embodiment of the present invention, a computer storage medium is provided. The storage medium may be a non-volatile computer-readable storage medium. The storage medium stores a computer program 5032, wherein the computer program 5032 when executed by the processor 502 performs the steps of: if an input Chinese sentence is received, obtaining a sentence vector and a part-of-speech vector of the Chinese sentence according to the BERT model; extracting an semantic set of the Chinese sentence from a preset external knowledge base; inputting the sememes in the sememe set into the BERT model to obtain a sememe vector set of the sememe set; screening out the semantic vector of the Chinese sentence from the semantic vector set; and fusing the sentence vector, the part-of-speech vector and the primitive sense vector of the Chinese sentence according to a preset fusion rule to finish fine tuning of the BERT model.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses, devices and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided by the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only a logical division, and there may be other divisions when the actual implementation is performed, or units having the same function may be grouped into one unit, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electric, mechanical or other form of connection.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a storage medium. Based on such understanding, the technical solution of the present invention essentially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a device 500 (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk.

While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method for fine tuning a BERT model fused with external knowledge is characterized by comprising the following steps:

2. The method for fine-tuning a BERT model with external knowledge fused according to claim 1, wherein obtaining a sentence vector of the chinese sentence according to the BERT model comprises:

performing word segmentation processing on the Chinese sentence to obtain words in the Chinese sentence;

and inputting the words into the BERT model to obtain word vectors of the words and constructing sentence vectors of the Chinese sentences according to the word vectors.

3. The method for fine-tuning the BERT model fused with external knowledge according to claim 2, wherein the obtaining of the part-of-speech vector of the Chinese sentence according to the BERT model comprises:

performing part-of-speech tagging on the words according to a preset part-of-speech tagging rule to obtain words after part-of-speech tagging;

and inputting the words with the parts of speech tagged into the BERT model to obtain part of speech vectors of the words, and constructing the part of speech vectors of the Chinese sentences according to the part of speech vectors of the words.

4. The method for fine-tuning a BERT model with external knowledge fused according to claim 3, wherein the step of screening out the set of semantic vectors for the chinese sentence comprises:

calculating the similarity between the semantic vectors in the semantic vector set and the part-of-speech vectors of the words;

and acquiring the semantic vector of the Chinese sentence from the semantic vector set according to the similarity.

5. The method for fine tuning a BERT model with external knowledge fused thereto according to claim 1, wherein the step of fusing a sentence vector, a part-of-speech vector, and an original-sense vector of the chinese sentence according to a preset fusion rule to perform fine tuning of the BERT model comprises:

splicing the part-of-speech vector, the primitive sense vector and the sentence vector of the Chinese sentence to obtain a spliced sentence vector;

and inputting the spliced sentence vector into a preset first recurrent neural network to finish fine adjustment of the BERT model.

6. The method for fine tuning a BERT model with external knowledge fused thereto according to claim 1, wherein the step of fusing a sentence vector, a part-of-speech vector, and an original-sense vector of the chinese sentence according to a preset fusion rule to perform fine tuning of the BERT model comprises:

respectively inputting the part-of-speech vector and the original sense vector of the Chinese sentence into a preset second cyclic neural network to obtain the part-of-speech sentence vector and the original sense sentence vector of the Chinese sentence;

and splicing the part-of-speech sentence vector, the primitive sentence vector and the sentence vector of the Chinese sentence to finish fine adjustment of the BERT model.

7. A fine-tuning device of a BERT model fusing external knowledge, comprising:

8. The external knowledge fused BERT model fine-tuning apparatus as claimed in claim 7, wherein the first obtaining unit comprises:

the word segmentation unit is used for carrying out word segmentation processing on the Chinese sentence to obtain a word in the Chinese sentence;

and the third acquisition unit is used for inputting the words into the BERT model to obtain word vectors of the words and constructing sentence vectors of the Chinese sentences according to the word vectors.

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of fine tuning of the external knowledge fused BERT model according to any one of claims 1 to 6 when executing the computer program.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, causes the processor to carry out the method of fine tuning of an external knowledge fused BERT model according to any one of claims 1 to 6.