CN116415005B

CN116415005B - Relationship extraction method for academic network construction of scholars

Info

Publication number: CN116415005B
Application number: CN202310684297.7A
Authority: CN
Inventors: 费洪晓; 谭杨盈; 杨柳; 龙军; 王子冬; 黄文体
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2023-06-12
Filing date: 2023-06-12
Publication date: 2023-08-18
Anticipated expiration: 2043-06-12
Also published as: CN116415005A

Abstract

The application discloses a relationship extraction method for academic network construction of scholars, which comprises the following steps: step S1: constructing a multi-teacher model comprising at least two teacher models, and calculating the weights of the teacher models; step S2: constructing a student model, calculating distillation target distribution according to flexible temperature, calculating knowledge distillation loss, and calculating total loss by combining the knowledge distillation loss and remote supervision loss; and S3, performing relation extraction, training the student model to obtain a relation extraction model, and performing relation extraction on the input data set by using the relation extraction model. The method has the advantages that the method sets flexible temperature in consideration of the difference of samples, uses the relative difference value of the information entropy of the samples in the hidden layer and the classifying layer, retains the effective information of the samples as far as possible, and reduces the softening uncertainty of the sample labels; in addition, the application introduces the thought of multiple teachers into relation extraction, and propagates richer and effective knowledge by constructing global relations of different teachers.

Description

Relationship extraction method for academic network construction of scholars

Technical Field

The application relates to the technical field of deep learning, in particular to a relationship extraction method for academic network construction of scholars.

Background

With iterative updates of academic research, the amount of academic text data is also growing explosively, and the way of manually labeling text data requires a significant time cost. Therefore, the automatic relation extraction of the multi-source academic text and the scientific construction of the academic network of the scholars are the problems to be solved urgently.

The concept of knowledge graph is proposed by google in 2012, and the effective data organization mode is widely applied to various knowledge fields such as finance, information, medical treatment and the like. Meanwhile, in order to integrate massive scholars information and academic resources, some academic knowledge maps are proposed, such as Aminer, aceMap, etc., and the academic knowledge maps integrate unstructured multisource information into a structured scholars academic network, so as to help to mine and integrate academic knowledge from massive academic texts, including papers, projects, subject information, etc. of scientific researchers; meanwhile, scientific and technological entities and semantic relations among data are mined from multi-source academic texts, and unified data structure organization forms among different data forms are constructed.

Constructing an academic network of the scholars requires organizing academic text information. The academic information of the scholars is widely existed in the real world, and is often obtained from homepages of the scholars, wikipedia knowledge base and various reports on the Internet, and most of data of the academic text is unstructured and cannot be subjected to data mining by using a unified template. The data has the characteristics of different data structures, various data types, data burst type iterative update and the like. The method comprises the steps of establishing a academic network of a scholars, wherein the problems of difficult relation extraction and sparse map information exist, a model for processing a natural language task at present is a single target task, available knowledge is limited, and the method has no good generalization capability in practical use. The problems of label noise and low data utilization rate mainly exist in the method.

In view of the foregoing, there is a great need for a relationship extraction method for academic network construction of scholars to solve the problems in the prior art.

Disclosure of Invention

The application aims to provide a relationship extraction method for academic network construction of scholars, which comprises the following specific technical scheme:

a relationship extraction method for academic network construction of scholars comprises the following steps:

step S1: constructing a multi-teacher model, wherein the multi-teacher model comprises at least two teacher models, the teacher models are trained through cross entropy loss, and the weight of the teacher models is calculated through the cross entropy loss and F1 score;

step S2: constructing a student model, calculating distillation target distribution of the student model according to the flexible temperature, calculating knowledge distillation loss by combining the distillation target distribution and the teacher model weight in the step S1, and calculating total loss of the student model by using the knowledge distillation loss and the remote supervision loss of the student model;

and S3, performing relation extraction, training the student model based on the total loss of the student model in the step S2 to obtain a relation extraction model, and performing relation extraction on the input data set by using the relation extraction model.

Preferably, in step S1, the cross entropy loss expression of the teacher model is as follows:

；

wherein ,cross entropy loss representing teacher model, +.>Representing the number of sentences>Representing the number of sentence categories>Representing remote supervision comments,/->Representing predictions of the teacher model.

Preferably, in step S1, the expression of the teacher model weight is as follows:

；

wherein ,is->Weights of bit teacher model, +.>Representing the number of teacher models, +.>Representing the super-parameters; />Indicate->Cross entropy loss of bit teacher model, +.>Indicate->F1 score for the bit teacher model.

Preferably, in step S2, the distillation target distribution includes sentence representation and prediction thereof by the teacher model, and the expression is as follows:

；

wherein ,representing a distillation target distribution of the student model; />Is a superparameter and->；/>Is->Sentence representations of the individual sentences; />Represent teacher model pair->Prediction of individual sentences.

Preferably, in step S2, a flexible temperature-based is usedThe prediction function computes the prediction of the sentence, expressed as follows:

；

wherein ,representation->A predictive function; />Representing a flexible temperature; />Representation->Vector;indicating teacher model predicts +.>Flexible temperature of individual sentences; />Indicating teacher model predicts +.>+.>Vector.

Preferably, in step S2, the flexible temperature is combined by the difference between the hidden layer information entropy and the classified layer information entropyThe function calculation and flexible temperature expression are as follows:

；

wherein , and />All represent superparameters, ">，/>；/>Indicate->Hidden layer information entropy of each sentence; />Indicate->Classification layer information entropy of each sentence.

Preferably, in step S2, the information entropy is calculated as follows:

；

wherein ,representing information entropy; />Is indicated at->Back->Predicting vector obtaining; />Representation->Vector.

Preferably, in step S2, the expression of the knowledge distillation loss is as follows:

；

wherein ,representing knowledge distillation loss; />Representing a distillation target distribution of the student model;representing student model vs. sentence->Is predicted by the computer; />Representing student model predictive sentences +.>Is->Vector; />Representing student model predictive sentences +.>Is provided.

Preferably, in step S2, the remote supervision loss calculation mode of the student model is the same as the cross entropy loss of the teacher model.

Preferably, in step S2, the total loss expression of the student model is as follows:

；

wherein ,representing the total loss of the student model; />Representing superparameters, wherein->；/>Representing the remote supervision loss of the student model.

The technical scheme of the application has the following beneficial effects:

(1) The application provides a multi-view flexible temperature calculation mode by utilizing the relative difference value of information of a hidden layer and classified layer. According to the application, the elastic temperature is set in consideration of different information amounts of each sample, so that the uncertainty of the traditional temperature on softening of the sample label is reduced while the effective information of the sample is reserved to the greatest extent.

(2) The application builds a multi-teacher model, and builds a global relation between teachers by adopting F1 score and cross entropy loss as indexes in the training process of the teacher model, so as to avoid misleading student models due to inaccurate knowledge; according to the application, a knowledge distillation model based on an attention mechanism is carried out by adopting a multi-teacher model and a student model, so that effective knowledge in the teacher model is extracted, the training speed of the model is improved, and the characteristic learning capacity of the model is improved.

(3) The application provides a machine learning architecture for parallel training of a relation extraction model and a relation extraction framework for remote supervision, solves the problem of efficient extraction of multi-source data on a pre-training language model, and reduces the computational power requirement of the model.

In addition to the objects, features and advantages described above, the present application has other objects, features and advantages. The present application will be described in further detail with reference to the drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application. In the drawings:

FIG. 1 is a flow chart of the steps of a relationship extraction method in a preferred embodiment of the present application;

FIG. 2 is a schematic diagram of remote supervision relation extraction in accordance with a preferred embodiment of the present application.

Detailed Description

Embodiments of the present application will be described in detail below with reference to the accompanying drawings.

Examples:

referring to fig. 1, a relationship extraction method for academic network construction of a learner includes the following steps:

The concrete explanation is as follows:

in this embodiment, the loss value and the F1 score of each teacher model need to be considered when constructing the multi-teacher model, and these two values reflect the degree of approach of each teacher model to the predicted value and the actual value of the sentence, so that the multi-teacher model of this embodiment can make more accurate judgment in the process of transmitting knowledge to the student model on the noise problem in the academic text facing massive remote supervision and labeling.

The teacher model in the present embodiment is extracted based on the sentence-level relationship, specifically, in step S1, the cross entropy loss expression of the teacher model is as follows:

；

Further, in order to maximize the knowledge of different teacher models, the present embodiment assigns weights of different teacher models in the multi-viewpoint calculation by cross entropy loss and F1 score of the teacher models, specifically, in step S1, the expression of the teacher model weights is as follows:

；

wherein ,is->Weights of bit teacher model, +.>Representing the number of teacher models, +.>Representing the super-parameters; />Indicate->Position teachingCross entropy loss of the engineer model, < >>Indicate->F1 score for the bit teacher model.

As can be seen from the above expression, largerCorresponding->The smaller and the larger +.>Corresponding->The larger the cross entropy loss and F1 fraction pair are, the more the cross entropy loss and F1 fraction pair can be flexibly controlled according to training results>Is a contribution of (a).

It should be noted that, in this embodiment, the calculation of the F1 score belongs to a conventional technical means, which is not described herein.

Further, the present embodiment predicts sentences in the package using the teacher model, and aggregates the sentence representations in the package, specifically, in step S2, the distillation target distribution includes the sentence representations and the prediction thereof by the teacher model, and the expression of the distillation target distribution is as follows:

；

wherein ,representing a distillation target distribution of the student model; />Is super-parameterAnd->；/>Is->Sentence representations of the individual sentences; />Represent teacher model pair->Prediction of individual sentences.

Further, the use of fixed temperature settings for different instances may result in predictions that are too gentle while feature information is lost, and may result in difficult predictions that are not sufficiently softened. In this embodiment, it is preferable to use a flexible temperature-based in step S2The prediction function computes the prediction of the sentence, expressed as follows:

；

From the saidThe predictive function shows that the higher the flexible temperature, the flatter the predictive result.

Further, in step S2, the flexible temperature is combined by the difference between the hidden layer information entropy and the classified layer information entropyThe function calculation and flexible temperature expression are as follows:

；

Preferably, in step S2, the information entropy is calculated as follows:

；

In order that the prediction of the student model can approach the soft distribution of the teacher model, the present embodiment adopts Kullback-Leibler (KL) divergence as the knowledge distillation loss, and the present embodiment calculates the knowledge distillation loss in combination with the multi-teacher weight, specifically, in step S2, the expression of the knowledge distillation loss is as follows:

；

wherein ,representing knowledge distillation loss; />Representing a distillation target distribution of the student model;representing student model vs. sentence->Is predicted by the computer; />Representing student model predictive sentences +.>Is->Vector (S)>，/>Representation->Real set of dimensions, +.>Representing the number of sentence categories; />Representing student model predictive sentences +.>Flexible temperature->，/>Representing a set of real numbers.

In step S2, the remote supervision loss calculation mode of the student model is the same as the cross entropy loss of the teacher model, which is not described in detail in this embodiment.

Further, in step S2, the total loss expression of the student model is as follows:

；

As shown in fig. 2, the embodiment realizes knowledge distillation through a multi-teacher model, learns different characteristic information of academic samples through different teacher models, obtains more accurate knowledge from knowledge of a plurality of teacher sources, and calculates distillation loss in the training process according to prediction values imitating the teacher.

The above description is only of the preferred embodiments of the present application and is not intended to limit the present application, but various modifications and variations can be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. The relationship extraction method for academic network construction of the scholars is characterized by comprising the following steps:

step S3, performing relation extraction, training the student model based on the total loss of the student model in the step S2 to obtain a relation extraction model, and performing relation extraction on the input data set by using the relation extraction model;

in step S2, the distillation target distribution includes sentence representation and prediction thereof by the teacher model, and the expression is as follows:

wherein ,representing a distillation target distribution of the student model; beta is super parameter and beta is E [0,1 ]]；/>Sentence representation for the kth sentence; />Representation ofPredicting a kth sentence by the teacher model;

the prediction of sentences is calculated using a flexible temperature-based Softmax prediction function, expressed as follows:

wherein ,representing a Softmax prediction function; />Representing a flexible temperature; z _k Representing a logic vector; />The flexible temperature of the kth sentence predicted by the teacher model is represented; />A logic vector representing a kth sentence predicted by the teacher model;

the flexible temperature is calculated by combining the difference between the information entropy of the hidden layer and the information entropy of the classified layer and a sigmoid function, and the expression of the flexible temperature is as follows:

wherein, eta and mu both represent super parameters, eta>0，μ∈(0,1)；Hidden layer information representing kth sentenceEntropy; e, e _k And (5) representing the information entropy of the classification layer of the kth sentence.

2. The relationship extraction method according to claim 1, wherein in step S1, a cross entropy loss expression of the teacher model is as follows:

wherein ,L^T Represents the cross entropy loss of the teacher model, N represents the number of sentences, O represents the number of sentence categories,representing remote supervision comments,/->Representing predictions of the teacher model.

3. The relationship extraction method according to claim 2, wherein in step S1, the expression of the teacher model weight is as follows:

wherein ,w_q The weight of the Q-th teacher model is Q, the number of the teacher models is Q, and alpha represents the super parameter;represents cross entropy loss of the qth teacher model, f _q The F1 score for the qth teacher model is represented.

4. A relation extracting method according to claim 3, wherein in step S2, the information entropy is calculated as follows:

e＝-y·log(y)；

wherein e represents information entropy; y represents the prediction of the logit vector after softmax; z represents the logic vector.

5. The relationship extraction method according to claim 4, wherein in step S2, the expression of the knowledge distillation loss is as follows:

wherein ,L_KD Representing knowledge distillation loss;representing a distillation target distribution of the student model; /> Representing a student model's prediction of sentence k; />A logic vector representing a student model predicted sentence k; />Representing the flexible temperature of the student model predictive sentence k.

6. The method according to claim 5, wherein in step S2, the remote supervision loss calculation method of the student model is the same as the cross entropy loss of the teacher model.

7. The relationship extraction method according to claim 6, wherein in step S2, the total loss expression of the student model is as follows:

L＝λL ^S +(1-λ)L _KD ；

wherein L represents the total loss of the student model; lambda represents a superparameter where lambda e 0,1]；L ^S Representing the remote supervision loss of the student model.