CN112749549B

CN112749549B - Chinese entity relation extraction method based on incremental learning and multi-model fusion

Info

Publication number: CN112749549B
Application number: CN202110091226.7A
Authority: CN
Inventors: 金康荣; 胡岩峰; 刘洋; 时聪; 顾爽; 刘午凌; 付啟明
Original assignee: Suzhou Research Institute Institute Of Electronics Chinese Academy Of Sciences
Current assignee: Suzhou Research Institute Institute Of Electronics Chinese Academy Of Sciences
Priority date: 2021-01-22
Filing date: 2021-01-22
Publication date: 2023-10-13
Anticipated expiration: 2041-01-22
Also published as: CN112749549A

Abstract

The application provides a Chinese entity relation extraction method based on incremental learning and multi-model fusion, which comprises the steps of pre-training a word vector model, an entity recognition model and a dependency syntactic analysis model, and initializing a relation data cluster; obtaining an incremental learning sample set of an expanded relation data cluster, obtaining an entity set of the sample by utilizing an entity recognition model, extracting subjects, predicates and objects of each sentence in the sample by utilizing a dependency syntactic analysis model, converting predicates in the sentences into word vectors by utilizing a word vector model, projecting the word vectors into the relation data cluster, and then continuously expanding the data volume of each relation data cluster in an incremental learning mode to finally obtain the expanded relation data cluster; and acquiring a test sample set extracted by the Chinese entity relationship, and combining the pre-training model and the expanded relationship data cluster to determine the relationship data type corresponding to the test sample so as to finish the extraction of the Chinese entity relationship. The application does not need a large amount of manual labeling, has strong expansion capability and high generalization degree.

Description

Chinese entity relation extraction method based on incremental learning and multi-model fusion

Technical Field

The application relates to the field of natural language processing, in particular to a Chinese entity relation extraction method based on incremental learning and multi-model fusion.

Background

In the internet era, a large amount of information appears at any moment, so that the mass users are faced with huge and disordered data, and are sometimes overwhelmed, time is usually spent to carefully read and understand, so that valuable information is extracted from unstructured information, the users are helped to quickly find information beneficial to themselves, and an automatic extraction mode is urgently needed, and information extraction technology is generated under the background.

Information extraction refers to extracting valuable information from a large amount of unstructured text information and converting the valuable information into structured data for storage, so that the user can conveniently further analyze and use the valuable information. The relation extraction is a very important technology in the field of information extraction, can automatically extract entity pairs in texts and relations among the entity pairs to form a triplet form, can help a user to obtain high-value information of the texts from massive data, can quickly understand the interrelationships among the information, and has important significance for construction of knowledge graphs and question-answering systems.

Most of relation extraction is based on a supervised learning or rule method, and a professional is usually required to manually label data, so that a great deal of time and labor cost are often spent, and the labeled data usually have a certain error to influence the subsequent algorithm model training. Moreover, training data sets used in the conventional relational extraction method are generally specific to a specific field, cannot be used in a generic way, and are difficult to apply in large-scale engineering. In addition, the relationship extraction model generated in the traditional way is often limited by original training data, and is not effectively utilized in the face of increasingly new data, and is lack of updating and expandability.

Disclosure of Invention

The application aims to provide a Chinese entity relation extraction method based on incremental learning and multi-model fusion, which aims to solve the problems that the existing relation extraction method needs a large amount of manual labeling, is limited to a specific field, does not have continuous expansibility, has poor generalization capability and is low in prediction accuracy.

The technical solution for realizing the purpose of the application is as follows: a Chinese entity relation extraction method based on incremental learning and multi-model fusion specifically comprises the following steps:

step 1: acquiring an external corpus of a Word2Vec pre-training model, and training by using a neural network algorithm to obtain a Word vector model;

step 2: obtaining an external corpus of an entity recognition pre-training model, and generating an entity recognition model by combining BiLSTM and CRF algorithms;

step 3: obtaining an external corpus of a dependency syntax analysis pre-training model, and generating a dependency syntax analysis model based on a dependency syntax analysis algorithm;

step 4: initializing a plurality of relation data clusters according to predefined basic relation categories among entities and basic relation words under each category;

step 5: obtaining an incremental learning sample set of an expanded relation data cluster, obtaining an entity set of the sample by utilizing an entity recognition model, extracting subjects, predicates and objects of each sentence in the sample by utilizing a dependency syntactic analysis model, converting predicates in the sentences into word vectors by utilizing a word vector model, projecting the word vectors into a plurality of relation data clusters initialized in the step 4, continuously expanding the data quantity of each relation data cluster by utilizing an incremental learning mode, and finally obtaining a plurality of relation data clusters which are expanded;

step 6: and 5, acquiring a test sample set extracted by Chinese entity relation, combining an entity set obtained by utilizing an entity recognition model, extracting subjects, predicates and objects of each sentence in the test sample by utilizing a dependency syntactic analysis model, converting predicates in the sentences into word vectors by utilizing a word vector model, projecting the word vectors into a plurality of relation data clusters expanded in the step 5, determining corresponding relation categories, and completing Chinese entity relation extraction.

Further, in step 1, an external corpus of a Word2Vec pre-training model is obtained, and a Word vector model is obtained through training by using a neural network algorithm and is recorded as M _w2v The specific method comprises the following steps:

1.1, a training corpus is a Chinese wikipedia corpus, and a training data set is generated by performing operations of text content extraction, data processing and word segmentation on the corpus;

based on the training data set, a Skip-gram Model (Continuous Skip-gram Model) in a word2vec algorithm is used for training, the Model comprises an input layer neural network, a projection layer neural network and an output layer neural network, semantic information of a context is predicted through a current vocabulary, and the vocabulary probability is calculated through a formula (1):

P(w _n-c ,w _n-c+1 ,…,w _n+c-1 ,w _n+c |w _n ) (1)

wherein w is _n Representing the nth vocabulary, c is the size of a sliding window, in the training parameters, setting the dimension of a word vector to be 250 dimensions, the window size to be 5, finally generating a word2vec word vector model through training, and marking the model as M _w2v 。

Further, in step 2, an external corpus of the entity recognition pre-training model is obtained, and an entity recognition model is generated by combining BiLSTM and CRF algorithms and is marked as M _ee The specific method comprises the following steps:

2.1 training based on MSRA_NER training data set by combining BiLSTM algorithm and CRF algorithm, wherein BiLSTM algorithm is also called as bidirectional LSTM algorithm, its input is the output of word embedding layer, namely word vector obtained by the conversion of embedding layer after text word segmentation, and is marked as (w ₁ ,w ₂ ,…,w _n )，w _n Representing the nth vocabulary, the output of the forward LSTM is noted asThe output of reverse LSTM is denoted +.>Calculating the output of the final hidden layer according to equation (2):

2.2, the CRF layer is arranged behind the BiLSTM layer, and the output of the BiLSTM is restrained by learning a tag state transition probability matrix;

2.3, finally generating an entity recognition model through training, and marking as M _ee 。

Further, in step 3, an external corpus of the dependency syntax analysis pre-training model is obtained, and a dependency syntax analysis model is generated based on a dependency syntax analysis algorithm, and the specific method comprises the following steps:

the training corpus is a Ha-Gong Chinese dependency corpus, the corpus is trained by using a dependency syntax analysis algorithm, the interdependence relationship among grammar components in sentences is learned, and finally a dependency syntax analysis model is generated and recorded as M _dp 。

Further, in step 4, according to the predefined categories of basic relationships between entities and basic relationship vocabulary under each category, the relationship data cluster is initialized, and the specific method is as follows:

4.1, basic relationship class labels c= (C) between predefined entities ₁ ,c ₂ ,…,c _m ) Wherein m is a relationship class number;

4.2 collecting and sorting basic relation vocabularies under each category, wherein the vocabulary number under each category is not less than 20, and the vocabulary number of each category is recorded as P= (P) ₁ ,…,p _i ,…,p _m ) Wherein p is _i Vocabulary number representing the i-th category;

4.3 using the word vector model M generated in step 1 _w2v Converting the basic vocabulary under each relation category into word vectors, and recording the word vectors asFinally, m relational data clusters are formed and marked as CU= (CU) ₁ ,…,cu _i ,…,cu _m ) Wherein, the method comprises the steps of, wherein,relational data cluster representing the ith category, the data amount in the cluster being p _i L is the data dimension of the word vector;

further, in step 5, an incremental learning sample set of the expanded relational data clusters is obtained, an entity recognition model is utilized to obtain an entity set of the samples, subjects, predicates and objects of each sentence in the samples are extracted by utilizing a dependency syntax analysis model, predicates in the sentences are converted into word vectors by utilizing a word vector model, the word vectors are projected into the plurality of relational data clusters initialized in step 4, then the data volume of each relational data cluster is continuously expanded by an incremental learning mode, and finally a plurality of expanded relational data clusters are obtained, and the specific method is as follows:

5.1, taking a Chinese text corpus of the search fox news as an increment learning sample set of an expansion relation data cluster, and storing the content in a TXT format, wherein the content is recorded as phi= (T) ₁ ,T ₂ ,…,T _n ) Wherein n is the number of samples;

5.2 for each text T in the sample set _i Using the entity recognition model M generated in step 2 _ee Extracting the entities, performing operations of de-duplication and filtering the stop words to obtain an entity set, and marking the entity set as E;

5.3 for text T _i Performing sentence processing;

5.4, for each sentence in the text, using the dependency syntax analysis model M generated in step 3 _dp Extracting subjects, predicates and objects of sentences to form a triplet form, and marking the triplet form as (S, V, O);

5.5, judging whether the subject S and the object O in the triplet exist in the entity set E or not, and if so, continuing; if not, skipping;

5.6 using M generated in step 1 _w2v The model converts the predicate V into a word vector V, matches the word vector V with m relational data clusters CU, and skips if the relational word vector data exists; if not, then relayContinuing;

5.7, calculating the similarity between the relation cluster and the ith relation cluster according to the formula (3):

where cos (-) represents the cosine similarity function between vectors,representing a word vector converted from a jth word under the ith relation class;

5.8 obtaining the relation data cluster category index corresponding to the maximum similarity according to the formula (4)

If it is the maximum similarityIs greater than or equal to the set similarity threshold +.>The word vector v is extended to the relational data cluster +.>In (i.e.)>If its maximum similarity is smaller than the threshold +.>Skipping;

and 5.9, continuously executing according to the incremental learning mode until all texts in the sample set phi are executed, storing all data and parameters, exiting iteration, and finally obtaining m relation data clusters CU with expanded and completed.

Further, in step 6, a test sample set extracted from a chinese entity relationship is obtained, and in combination with an entity set obtained from an entity recognition model, subjects, predicates and objects of each sentence in the test sample are extracted from a dependency syntax analysis model, predicates in the sentence are converted into word vectors by using a word vector model, and the word vectors are projected into a plurality of relationship data clusters expanded in step 5, so as to determine corresponding relationship types, thereby completing chinese entity relationship extraction, and the specific method is as follows:

6.1, obtaining a test sample set extracted by Chinese entity relation as ψ= (T) ₁ ,T ₂ ,…,T _q ) Wherein q is the number of test samples;

6.2 for each text T in the test sample set _i Using the entity recognition model M generated in step 2 _ee Extracting the entities, performing operations of de-duplication and filtering the stop words to obtain an entity set, and marking the entity set as E;

6.3 for text T _i Performing sentence processing;

6.4, for each sentence in the text, using the dependency syntax analysis model M generated in step 3 _dp Extracting subjects, predicates and objects of sentences to form a triplet form, and marking the triplet form as (S, V, O);

6.5, judging whether the subject S and the object O in the triplet exist in the entity set E or not, and if so, continuing; if not, skipping the triplet;

6.6 using the word vector model M generated in step 1 _w2v Converting predicate V into word vector V, projecting the word vector V into m relational data cluster CUs obtained in step 5, and calculating a relational cluster category index corresponding to the maximum similarity according to formula (3) and formula (4)Then the relation cluster category->As entity S and entity OThe relation between the sentences returns the relation triples existing in the sentences

And 6.7, continuously extracting the relation of the test data according to the mode until all texts are extracted, and returning all extraction results.

A Chinese entity relation extraction system based on incremental learning and multi-model fusion performs Chinese entity relation extraction based on incremental learning and multi-model fusion based on any one of the methods.

A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of the preceding claims for incremental learning and multimodal fusion based extraction of chinese entity relationship when executing the computer program.

A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any of the claims for chinese entity relationship extraction based on incremental learning and multimodal fusion.

Compared with the prior art, the application has the remarkable advantages that: (1) The application learns a large amount of semantic information based on an external Chinese corpus to generate a word vector model so as to carry out high-quality semantic understanding on the vocabulary in relation extraction, thereby enhancing the accuracy and generalization capability of relation extraction and solving the problem that the traditional relation extraction method only learns the semantic information of a training set to cause insufficient semantic understanding and generalization capability.

(2) According to the method, the entity pairs and the relations of the text are extracted in a multi-model fusion mode, a traditional method based on a specific training set is replaced, the accuracy of relation extraction is remarkably improved through integrating the advantages of the multiple models, the relation training set does not need to be marked manually, the time and labor cost are greatly reduced, and the problem that the relation extraction accuracy is low due to errors caused by manual marking is solved.

(3) The application adopts the incremental learning algorithm to continuously expand and optimize the relation data cluster, thereby making up the problems that the traditional mode depends on a specific training data set, so that relation extraction is limited to a specific field, the generalization capability in other fields is insufficient and the accuracy is low.

Drawings

FIG. 1 is a schematic flow chart of the method of the present application;

FIG. 2 is a schematic diagram of an initialization relational data cluster of the present application;

FIG. 3 is a schematic diagram of an expanded relational data cluster using incremental learning in accordance with the present application;

FIG. 4 is a schematic diagram of the present application for prediction in combination with multiple models and relational data clusters.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

As shown in fig. 1, the present application is mainly divided into VI steps:

step I, pre-training a word vector model based on external corpus;

step II, pre-training an entity recognition model based on external corpus;

step III, pre-training a dependency syntactic analysis model based on an external corpus;

step IV, initializing a relational data cluster;

step V, using incremental learning to expand the relational data cluster;

step VI is to combine multiple models and relational clusters to make predictions.

The technical scheme of the application and the scientific principle according to the technical scheme are described in detail below.

The word vector model pre-training specific process in the step I is as follows:

1.1, the training corpus is a Chinese wikipedia corpus, and a training data set is generated by carrying out operations such as text content extraction, data processing, word segmentation and the like on the corpus.

P(w _n-c ,w _n-c+1 ,…,w _n+c-1 ,w _n+c |w _n ) (1) wherein w _n The nth word is represented and c is the size of the sliding window.

1.3, in the training parameters, setting the dimension of word vector as 250 dimension and the window size as 5, finally generating word2vec word vector model through training, and marking as M _w2v 。

The specific process of pre-training the entity recognition model in the step II is as follows:

2.1 training based on MSRA_NER training data set, combining BiLSTM algorithm and CRF algorithm. The BiLSTM algorithm is also called as a bidirectional LSTM algorithm, can fully capture the semantic information of the context, and the input is the output of a word embedding layer, namely, word vectors obtained through the conversion of the embedding layer after text word segmentation are recorded as (w) ₁ ,w ₂ ,…,w _n )(w _n Representing the nth vocabulary), the output of the forward LSTM is noted asThe output of reverse LSTM is denoted +.>Calculating the output of the final hidden layer according to equation (2):

2.2, in order to fully utilize the state transition information of the tag, a CRF layer, namely a conditional random field, is added behind the BiLSTM layer, and the output of the BiLSTM is restrained by learning the state transition probability matrix of the tag, so that the rationality of predicting the tag is further improved.

The pre-training specific process of the dependency syntactic analysis model in the step III is as follows:

3.1, training the corpus into a Hai-Gong Chinese dependency corpus, training the corpus by using a dependency syntax analysis algorithm, learning the interdependence relationship among grammar components in sentences, and finally generating a dependency syntax analysis model which is recorded as M _dp 。

The specific process of initializing the relational data cluster in step IV is as follows (as shown in fig. 2):

4.1, assuming m is a relationship class number, firstly predefining a basic relationship class label C= (C) between entities ₁ ,c ₂ ,…,c _m ) And covers the relationship types that can occur in most application scenarios among entities.

4.2 collecting and sorting basic relation vocabularies under each category, wherein the vocabulary number under each category is not less than 20, and the vocabulary number of each category is recorded as P= (P) ₁ ,…,p _i ,…,p _m ) Wherein p is _i Representing the vocabulary of the i-th category.

4.3 using M generated in step I _w2v The word vector model converts the basic vocabulary under each relation category into word vectors (250 dimensions) and records the word vectors asFinally, m relational data clusters are formed and marked as CU= (CU) ₁ ,…,cu _i ,…,cu _m ) Wherein->Relational data cluster representing the ith category, the data amount in the cluster being p _i Each data is 250 dimensions.

The specific process of using incremental learning to augment the relational data clusters in step V is as follows (as shown in fig. 3):

5.1 New fox searchingThe smelling Chinese text corpus is used as an incremental learning sample set of an expansion relation data cluster, the content is stored in a TXT format, and is recorded as phi= (T) ₁ ,T ₂ ,…,T _n ) (n is the number of samples).

5.2 for each text T in the sample set _i Entity recognition model M generated by using step II _ee Extracting the entity, wherein the entity type comprises: persona, institution, country and place. And then, performing operations such as de-duplication, stop word filtering and the like on the extracted entities to obtain an entity set, and marking the entity set as E.

5.3 for text T _i And performing sentence processing.

5.4, for each sentence in the text, using the dependency syntax analysis model M generated in step III _dp The subject, predicate and object of the sentence are extracted to form a triplet, which is denoted as (S, V, O).

5.5, judging whether the subject S and the object O in the triplet exist in the entity set E. If so, continuing; if not, skipping.

5.6 using M generated in step I _w2v The model converts the predicate V into a word vector V, matches the word vector V with m relational data clusters CU, and skips if the relational word vector data exists; if not, continuing.

where cos () represents the cosine similarity function between vectors.

If it is the maximum similarityIs greater than or equal to the set similarity threshold +.>The word vector v is extended to the relational data cluster +.>In (i.e.)>If its maximum similarity is smaller than the threshold +.>Then skip.

The specific process of predicting in step VI in combination with multiple models and relational clusters is as follows (as shown in fig. 4):

6.1, assume that the test sample set extracted by the Chinese entity relation is: ψ= (T ₁ ,T ₂ ,…,T _q ) (q is the number of test samples).

6.2, for each text T _i The same procedure as in steps V5.2-5.4 was used to obtain the real object set E and the main predicate-guest triplet for each sentence, denoted (S, V, O).

6.3, judging whether the subject S and the object O in the triplet exist in the entity set E. If so, continuing; if not, skipping.

6.4 using M generated in step I _w2v The predicate V is converted into a word vector V by the word vector model, and projected into the relationship data cluster CU expanded in the step V, and the maximum similarity corresponding to the predicate V is calculated according to the Chinese formula (3) and the Chinese formula (4) in the step VRelation cluster category indexThen the relation cluster category->As a relationship between entity S and entity O, a relationship triplet is returned

And 6.5, continuously extracting the relation of the test data according to the mode until all texts are extracted, and returning all extraction results.

The application also provides a Chinese entity relation extraction system based on the incremental learning and the multi-model fusion, which is used for extracting the Chinese entity relation based on the incremental learning and the multi-model fusion based on the method.

A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method for chinese entity relationship extraction based on incremental learning and multimodal fusion when executing the computer program.

A computer readable storage medium having stored thereon a computer program which when executed by a processor implements the method for chinese entity relationship extraction based on incremental learning and multimodal fusion.

Examples

In order to verify the effectiveness of the inventive protocol, the following simulation experiments were performed.

Input: external corpus of three pre-training models (word vector model, entity recognition model, dependency syntactic analysis model), incremental learning sample set phi= [ T of expanded relation data cluster ₁ ,T ₂ ,…,T _n ](n is the number of samples), the test sample set ψ= (T) of chinese entity relation extraction ₁ ,T ₂ ,…,T _q ) (q is the number of test samples), the base relationship category c= (C) between predefined entities ₁ ,c ₂ ,…,c _m ) (m is the number of relationship classes) and basic relationship vocabulary under each class.

Step 1: three external corpus based on word vector model, entity recognition model and dependency syntax analysis model are used for generating three pre-training models, namely word vector model M, by using the modes in the steps I-III _w2v Entity recognition model M _ee Dependency syntax analysis model M _dp 。

Step 2: based on the basic relation category c= (C) between entities ₁ ,c ₂ ,…,c _m ) And initializing a relational data cluster according to the following steps of:

step 2.1: obtaining m relational data clusters Cu= (CU) by using the word vector conversion method of 4.2 in the step IV ₁ ,…,cu _i ,…,cu _m ) Wherein, the method comprises the steps of, wherein,relational data cluster representing the ith category, the data amount in the cluster being p _i Each data is 250 dimensions.

Step 3: incremental learning sample set phi= [ T ] based on extended data cluster ₁ ,T ₂ ,…,T _n ]Expanding the data cluster according to the following steps:

step 3.1: obtaining each text T by using the recognition mode of 5.2 in the step V _i Entity set E in (a).

Step 3.2: for text T _i And carrying out clauses.

Step 3.3: the subject, predicate and object in the sentence component are obtained by using the extraction method of 5.4 in the step V and marked as (S, V, O) triples.

Step 3.4: it is determined whether both subject S and object O in the triplet are present in the resulting set of entities E in Step 3.1. If so, entering Step 3.5; if not, ignore and proceed to Step 3.3 for continued analysis.

Step 3.5: using M generated in Step 1 _w2v The predicate V is converted into a word vector V by the word vector model, and the word vector V is matched with m relational data clusters CU, if the relational word vector data exists, the predicate V is ignoredSlightly and continuously analyzing in Step 3.3; if not, go to Step 3.6.

Step 3.6: and (3) obtaining the similarity between the predicate vector V and the ith relation cluster by using the calculation mode of 5.7 in the step V.

Step 3.7: obtaining the relation data cluster category index corresponding to the maximum similarity by using the calculation mode of 5.8 in the step VIf its maximum similarity->Greater than or equal to the set threshold->The predicate word vector v is extended to the relational data cluster +.>In (i.e.)>If its maximum similarity is smaller than the threshold +.>Then ignore and go to Step 3.3 for continued analysis.

Step 3.8: and continuously executing according to the incremental learning modes Step 3.1-Step 3.7 until all texts in the sample set phi are learned, storing all data and parameters, and exiting iteration to obtain m expanded relational data clusters CU. Otherwise, go on to Step 3.1.

Step 4: based on the test sample set ψ= (T ₁ ,T ₂ ,…,T _q ) The model M is extracted using the relationship according to the following steps _RE And (3) predicting:

step 4.1: for each text T _i The same procedure as in Steps 3.1 to 3.3 is used to obtain the real set E and the main predicate (S, V, O) triples of each sentence.

Step 4.2: it is determined whether subject S and object O in the triplet exist in the entity set E generated in Step 4.1. If so, entering Step 4.3; if not, the method is ignored.

Step 4.3: using M generated in Step 1 _w2v The predicate is converted into a word vector V by the word vector model, and the word vector V is projected into a relational data cluster obtained by Step 3.8, and the category index of the relational cluster corresponding to the maximum similarity is calculated according to the modes of the formulas (3) and (4) in the Step VThen the relation cluster category->As a relation between entity S and entity O, a relation triplet is returned in the form +.>

Step 4.4: and continuously extracting the relation of the test data according to the modes Step 4.1-Step 4.3 until all the test texts are extracted, exiting, and returning all extraction results.

And (3) outputting: the results are extracted from all relationships of the test sample set.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims

1. A Chinese entity relation extraction method based on incremental learning and multi-model fusion is characterized by comprising the following steps:

2. The method for extracting Chinese entity relationship based on incremental learning and multimodal fusion as defined in claim 1, wherein in step 1, an external corpus of Word2Vec pre-training models is obtained and a neural network is usedTraining the algorithm to obtain a word vector model which is marked as M _w2v The specific method comprises the following steps:

P(w _n-c ,w _n-c+1 ,…,w _n+c-1 ,w _n+c |w _n ) (1)

3. The method for extracting Chinese entity relationship based on incremental learning and multimodal fusion according to claim 1, wherein in step 2, an external corpus of entity recognition pre-training models is obtained, and an entity recognition model is generated by combining BiLSTM and CRF algorithms, and is denoted as M _ee The specific method comprises the following steps:

2.1 training based on MSRA_NER training data set by combining BiLSTM algorithm and CRF algorithm, wherein BiLSTM algorithm is also called as bidirectional LSTM algorithm, its input is the output of word embedding layer, namely word vector obtained by the conversion of embedding layer after text word segmentation, and is marked as (w ₁ ,w ₂ ,…,w _n )，w _n Representing the nth vocabulary, the output of the forward LSTM is noted asThe output of reverse LSTM is denoted +.>According to(2) Calculating the output of the final hidden layer:

4. The method for extracting chinese entity relationship based on incremental learning and multimodal fusion as defined in claim 1 wherein in step 3, an external corpus of the pre-training model for dependency syntax analysis is obtained, and a dependency syntax analysis model is generated based on a dependency syntax analysis algorithm, and the method comprises the steps of:

5. The method for extracting Chinese entity relationship based on incremental learning and multimodal fusion according to claim 1, wherein in step 4, a plurality of relationship data clusters are initialized according to predefined categories of basic relationship between entities and basic relationship vocabulary under each category, and the specific method comprises:

4.3 using the word vector model M generated in step 1 _w2v Base under each relationship categoryBasic vocabulary is converted into word vector and recorded asFinally, m relational data clusters are formed and marked as CU= (CU) ₁ ,…,cu _i ,…,cu _m ) Wherein, the method comprises the steps of, wherein,relational data cluster representing the ith category, the data amount in the cluster being p _i L is the data dimension of the word vector.

6. The method for extracting Chinese entity relation based on incremental learning and multimodal fusion according to claim 1, wherein in step 5, an incremental learning sample set of the expanded relation data clusters is obtained, an entity set of the samples is obtained by using an entity recognition model, subjects, predicates and objects of each sentence in the samples are extracted by using a dependency syntactic analysis model, predicates in the sentences are converted into word vectors by using a word vector model, the word vectors are projected into the plurality of relation data clusters initialized in step 4, then the data amount of each relation data cluster is continuously expanded by an incremental learning mode, and finally a plurality of relation data clusters which are completed through expansion are obtained, and the method comprises the following specific steps:

5.3 for text T _i Performing sentence processing;

5.6 using M generated in step 1 _w2v The model converts the predicate V into a word vector V, matches the word vector V with m relational data clusters CU, and skips if the relational word vector data exists; if not, continuing;

7. The method for extracting Chinese entity relation based on incremental learning and multi-model fusion according to claim 1, wherein in step 6, a test sample set of Chinese entity relation extraction is obtained, an entity set of test samples is obtained by combining an entity recognition model, subjects, predicates and objects of each sentence in the test sample are extracted by utilizing a dependency syntactic analysis model, predicates in the sentences are converted into word vectors by utilizing a word vector model, the word vectors are projected into a plurality of relation data clusters which are expanded in step 5, corresponding relation categories are determined, and Chinese entity relation extraction is completed, and the method comprises the following specific steps:

6.3 for text T _i Performing sentence processing;

6.6 using the word vector model M generated in step 1 _w2v Converting predicate V into word vector V, projecting the word vector V into m relational data cluster CUs obtained in step 5, and calculating a relational cluster category index corresponding to the maximum similarity according to formula (3) and formula (4)Then the relation cluster category->As the relation between the entity S and the entity O, the relation triplet existing in the sentence is returned

8. A chinese entity relationship extraction system based on incremental learning and multimodal fusion, wherein chinese entity relationship extraction based on incremental learning and multimodal fusion is performed based on the method of any one of claims 1-7.

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of claims 1-7 for incremental learning and multimodal fusion based extraction of chinese entity relationship when the computer program is executed.

10. A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any of claims 1-7 for chinese entity relation extraction based on incremental learning and multimodal fusion.