CN115392256A

CN115392256A - Drug adverse event relation extraction method based on semantic segmentation

Info

Publication number: CN115392256A
Application number: CN202211040440.0A
Authority: CN
Inventors: 崔少国; 陈俊桦
Original assignee: Chongqing Normal University
Current assignee: Chongqing Normal University
Priority date: 2022-08-29
Filing date: 2022-08-29
Publication date: 2022-11-25

Abstract

The invention provides a method for extracting adverse drug event relation based on semantic segmentation, which comprises the following steps: the method comprises the steps of establishing a drug adverse event relation extraction model with a local context information feature extractor, a semantic feature fusion device, a classifier and a sample imbalance processor, preprocessing data, training the model, optimizing parameters and extracting the drug adverse event relation. The method can better identify the mentioned boundary by marking before and after the drug mention by using special symbols and splicing the adverse event mention label behind the text by using the suspension mark; meanwhile, a U-shaped semantic segmentation network is introduced to fuse local context information to capture global interdependency among adverse drug events, so that key information can be found more accurately; in addition, a balanced softmax method is used for processing unbalanced relationship distribution, so that the influence of irrelevant triple on a model is avoided, and adverse drug event relationships in a medical text are extracted more accurately.

Description

Drug adverse event relation extraction method based on semantic segmentation

Technical Field

The invention relates to the technical field of medical text data mining, in particular to a method for extracting adverse drug event relation based on semantic segmentation.

Background

Adverse Drug Event (ADE) refers to an Adverse clinical Event occurring during the course of Drug therapy, which is not necessarily causal to the Drug. There are two main causes of adverse drug events, one is a problem with drug quality and the other is medication errors. Adverse drug events seriously jeopardize the physical health of patients and bring about huge economic losses to the whole medical system and society. According to statistics, the emergency rate of medical adverse events accounts for 28% of the total rate of medical adverse events, and due to the importance and harmfulness of medical adverse events, researchers in various fields such as life sciences, biology and comprehensive medicine are concerned with the emergency rate. In addition, although the ultimate goal of drug discovery is to develop chemicals for the treatment of specific diseases, recognition of the chemical and its resulting adverse drug reactions correspondences is critical to improving chemical safety and toxicity studies, as well as facilitating new drug compound screening methods.

After long-term exploration of researchers, the drug adverse event research technology based on text mining gradually develops from an early template and rule-based method to a traditional machine learning-based method taking data as guidance, and makes a major breakthrough in both theoretical and practical researches. In addition, with the rise and development of the deep learning method, the deep learning framework based on the neural network also provides a new idea for the text mining method. Because the neural network model can automatically learn the internal features of the data from the original data through large-scale data training, breakthrough progress has been made in the field of speech and image recognition, and great potential is also shown in the field of natural language processing. Therefore, the text mining method based on deep learning will become a trend of future research development. And the method for researching adverse drug events by using the text mining method based on deep learning has great value and promotion effect on promoting the development of relevant biomedical research.

The inventors of the present application have found through research that identifying all medication and adverse event mentions from natural language texts and extracting the medication and its corresponding adverse event relationship has the following problems due to the specific definition of adverse events of the medication: (1) With the acceleration of the development process of drugs in the biomedical field, in clinical trials before the market, due to the limitation of test conditions, adverse events of many drugs are difficult to be found and are listed in adverse event reports; in addition, because some drug adverse events do not occur until after a period of time after being taken, or occur in a specific population, many potential adverse events cannot be covered by existing dictionaries or databases, and it is difficult to find such potential drug adverse event mentions by only dictionary and rule methods; (2) The same condition mention may be both a drug adverse event and an indication in different contexts, so identification of a drug adverse event mention is more dependent on understanding the context semantic relationship to distinguish specific drug adverse events; (3) There is no uniform naming mode for the same adverse drug event, there may be multiple expression modes for the same disease, such problems may lead to sparse mention names, difficult to be fully learned in limited labeled corpus, difficult to be identified; (4) In some natural language texts, adverse drug events are often represented by non-medical terms, which are often connected with preceding and following common words or adjectives to represent a mention of adverse drug events, so that it is difficult to judge the boundaries of mention of adverse drug events, thereby causing inaccurate recognition.

Disclosure of Invention

Aiming at the technical problems existing in the extraction of the existing medicines and the adverse event relations corresponding to the medicines, the invention provides a method for extracting the adverse event relations of the medicines based on semantic segmentation, which marks the medicines before and after the mention of the medicines by using special symbols and splices the mention marks of the adverse events behind texts by using suspension marks so as to better identify the boundaries of the medicines and the mention of the adverse events; meanwhile, a U-shaped semantic segmentation network is introduced to fuse local context information to capture global interdependency among adverse drug events, so that key information can be found more accurately; in addition, a balanced softmax method is used for processing unbalanced relationship distribution, so that the influence of irrelevant triple on a model is avoided, and the adverse drug event relationship in the medical text is extracted more accurately.

In order to solve the technical problem, the invention adopts the following technical scheme:

a method for extracting adverse drug event relation based on semantic segmentation comprises the following steps:

s1, establishing a drug adverse event relation extraction model:

the drug adverse event relation extraction model is used for extracting drugs in a medical text and adverse events caused by the drugs, and the model structure comprises a local context information feature extractor, a semantic feature fusion device, a classifier and a sample imbalance processor; wherein,

the local context information feature extractor is used for extracting different mentioned local context features from the input of the medical text, and specifically comprises the following steps: given a drug adverse event document containing N text labels

Firstly, fixing marks are inserted at the beginning and the end of the drug mention<s>And</s>to mark the drug mention location and then to mention the corresponding candidate adverse event with suspension marking<o>And</o>the way is spliced behind the text, wherein<o>And</o>encoding the same location as the corresponding adverse event mention, then providing the combined sequence of text labels and inserted suspension labels to the BERT pre-training model to obtain the drug mention label local context representation e ^s And adverse event mention flag local context representation e ^o E is to be ^s And e ^o Split-together as corresponding drug mention and adverse event mention vs. insert representation

Wherein M represents the maximum logarithm of mentions of the composition of drug mentions and adverse event mentions in the sample, and finally obtaining an attention expression by using a BERT pre-training model

Wherein A is the average value of the attention heads in the last Encoder layer of the BERT pre-training model, and the attention matrix A and affine transformation from the BERT pre-training model are used for obtaining a mention pair relation matrix of the medicine and the adverse event:

wherein,

is a Hadamard product, W ₁ Is a learnable parameter matrix, H is a drug mention and adverse event mention pair embedded representation, A _s Denotes drug mention e ^s Attention to all the labels of the document, obtained by averaging the mean of the drug mentions in the last Encoder layer the attention head, A _o Indicating adverse event mention e ^o Attention to all the indicia of the document was gained by averaging the average of the adverse events mentioned in the last Encoder layer with the attention head, F (s, o) representing the drug and adverse event mentioned pair (e) ^s ,e ^o ) A relationship matrix;

the semantic feature fusion device is used for fusing the mentioned global dependencies of the local context information through a coding module and a U-shaped semantic segmentation network, and specifically comprises the following steps: firstly, a reference pair relation matrix F epsilon R containing local context information ^M×M×D The method is used as a D-channel image, and is combined with a coding module, then rich global features are obtained by utilizing a U-shaped semantic segmentation network, the U-shaped semantic segmentation network comprises a global feature extraction block, two upsampling blocks with jump connection and a feature output layer which are sequentially arranged, and therefore a local context and a global dependency information matrix are obtained:

Y＝U(W ₂ F)

wherein Y ∈ R ^M×M×D ' represents local context and global dependency information matrix, U ∈ R ^M×M×D ' representing a U-shaped semantically segmented network, W ₂ Is a matrix of weights that can be learned,to reduce the dimension of F, and D' is much smaller than D, W ₂ F represents an encoding module;

the classifier is used for predicting adverse drug event relations through a local context and global dependency information matrix and a smooth embedding representation, and specifically comprises the following steps: m is embedded first using the local context mentioned at different locations in the document,

obtaining the same mentioned smooth embedded representation E by using the maximally pooled smooth version _i ：

Wherein E is _i Denotes a reference to e _i Is to be presented in a smooth embedded representation,

indicating a drug or adverse event mention e in a document _i Total number of occurrences;

smoothly embedding representation E in separately obtained drug and adverse event _s And E _o After the local context and the global dependency information matrix Y, the classifier firstly utilizes a feedforward neural network to convert E into _s 、E _o Mapping Y to a hidden representation z, and then obtaining a relation probability through a bilinear function, wherein the specific process is as follows:

z _s ＝tanh(W _s E _s +Y _s,o )

z _o ＝tanh(W _o E _o +Y _s,o )

P(r|E _s ,E _o )＝σ(z _s W _r z _o +b _r )

wherein z is _s Is a hidden representation of the drug, z _o Is a hidden representation of an adverse event, P is a relationship probability, Y _s,o Is a reference to drugs and adverse events in matrix Y (e) ^s ,e ^o ) Is represented by the local context and global dependency information, tanh is a nonlinear activation function, and σ isBilinear function, W _s 、W _o 、W _r 、b _r Is a learnable parameter matrix;

the sample imbalance processor is used for training by introducing a balanced softmax method and introducing an additional class 0 to process the class imbalance problem in the sample set, and the scores of target classes are expected to be larger than a threshold value t ₀ The scores of the non-target categories are all less than the threshold t ₀ ：

Wherein L represents the target loss function, log represents the logarithm based on e, e represents the constant, t _i Indicates the probability of the ith positive label, t _j The probability of the jth negative label is represented, the omega pos represents the drug and the adverse event mention relationship corresponding to the drug, namely the positive label, and the omega neg represents the drug and the non-corresponding adverse event mention relationship, namely the negative label;

s2, preprocessing data, specifically adopting the following method to perform reference unification processing:

firstly, carrying out pause word removing processing on mentions in the medical text, then carrying out regularization matching, and classifying mentions with regularization matching degree higher than 90% as the same mention;

s3, model training and parameter optimization: training the extraction model by using the processed data, designing an objective optimization function to optimize network parameters, and generating an optimal extraction model, which specifically comprises the following steps of:

s31, the data set is divided into 7:2:1, dividing the training set, the verification set and the test set in proportion;

s32, adopting a balanced softmax classified cross entropy loss function as an optimization target, wherein the target function is realized by adopting the same formula as the target loss function L calculated in the sample unbalance processor in the step S1;

s33, optimizing a target function by adopting a random gradient descent algorithm, and updating network model parameters by using error back propagation;

s4, extracting adverse drug event relations:

s41, preprocessing medical text data to be extracted to obtain standardized sample data, and defining the medicine and the non-corresponding adverse event mention relationship pair category as 0;

s42, forming a training sample for a medical sample and all drug mentions and adverse event mentions contained in the medical sample, directly inserting two fixed marks of < S > and </S > before and after all drug mentions, and splicing the adverse event mentions behind a text in a suspension mark mode represented by < o > and </o >;

s43, feeding the sample into a BERT pre-training model, and for each pair of drug and adverse event mention mark pairs, splicing the local context representation of the drug mention mark and the local context representation of the adverse event mention mark together to serve as embedded representation of the corresponding drug mention and adverse event mention pairs;

s44, after embedding representation of all drug mention and adverse event mention pairs containing local context information of the sample is obtained, carrying out affine transformation on the embedded representation and an attention layer of a BERT pre-training model to obtain a mention pair relation matrix of the drug and the adverse event;

s45, combining the mention pair relation matrix containing the local context information with a coding module, and acquiring rich global features by utilizing a U-shaped semantic segmentation network so as to output all local contexts and a global dependency information matrix;

s46, obtaining smooth embedded expression of the medicine and the adverse event, mapping the smooth embedded expression of the medicine and the adverse event, the local context and the global dependency information matrix to a hidden expression by utilizing a feedforward neural network, and then obtaining a relation probability, namely a relation score through a bilinear function;

and S47, calculating scores of the positive sample relation and the negative sample relation by introducing a softmax method, and enabling the scores of the positive sample relation to be larger than 0.

Further, in the U-shaped semantic segmentation network utilized by the semantic feature fusion device in step S1, the global feature extraction block includes three convolution modules and two maximum pooling layers, the first maximum pooling layer is located behind the first convolution module, the second maximum pooling layer is located behind the second convolution module, each convolution module includes two convolution layers, and the number of channels in the feature extraction block is doubled; the two upsampling blocks comprise an anti-convolution layer and two convolution layers which are sequentially arranged, the first upsampling block is positioned behind the third convolution module, the second upsampling block is positioned behind the first upsampling block, and the number of channels in each upsampling block is reduced by half; and the output result of the deconvolution layer in the second up-sampling block is in jumping connection with the output result of the second convolution layer in the first convolution module, and the output result of the deconvolution layer in the first up-sampling block is in jumping connection with the output result of the second convolution layer in the second convolution module.

Further, the convolution kernel size of the two convolution layers in each convolution module is 3 × 3 and the step size is 1, the convolution kernel size of the two largest pooling layers is 2 × 2 and the step size is 2, the convolution kernel size of the deconvolution layer in the two upsampling blocks is 2 × 2 and the step size is 2, the convolution kernel size of the two convolution layers is 3 × 3 and the step size is 1, and the convolution kernel size of the characteristic output layer is 1 × 1 and the step size is 1.

Further, in the step S1 sample imbalance processor, the threshold t is set ₀ Set to 0, the formula for calculating the target loss function L is simplified as follows:

compared with the prior art, the extraction method of adverse drug event relation based on semantic segmentation has the following beneficial effects:

1. according to the invention, when the embedded representation is obtained, the suspended mark representation is utilized, different reference embedded representations can be more effectively distinguished, and the prediction accuracy can be obviously improved.

2. The U-shaped semantic segmentation network is used for capturing the global interdependence relation among triples (medicine, adverse event and medicine adverse event relation), so that the extraction model can more effectively solve the problem that the distance between the reference pairs is too long and key information cannot be found.

3. An encoding module is introduced to capture locally mentioned context information, and the global interdependence relationship is fused, so that the extraction model can more fully understand global semantics, and specific adverse drug events can be better distinguished.

4. A balanced softmax method is used for processing the problem of unbalanced relation distribution, and the condition of 'undersampling' of an extraction model is avoided, so that the accuracy of relation classification is improved.

Drawings

Fig. 1 is a schematic flow chart of a drug adverse event relationship extraction system provided by the present invention.

Fig. 2 is a schematic diagram of a span of a suspension mark provided by the present invention.

FIG. 3 is a schematic diagram of a mention-pair relationship matrix provided by the present invention.

Fig. 4 is a schematic diagram of a drug adverse event relationship extraction network provided by the present invention.

FIG. 5 is a schematic diagram of a U-shaped semantic segmentation network structure provided by the present invention.

Detailed Description

In order to make the technical means, the creation characteristics, the achievement purposes and the effects of the invention easy to understand, the invention is further explained below by combining the specific drawings.

Referring to fig. 1 to 5, the present invention provides a method for extracting adverse drug event relationship based on semantic segmentation, which includes the following steps:

s1, establishing a drug adverse event relation extraction model:

the local context information feature extractor is used for extracting different mentioned local context features from the input of the medical text, and specifically comprises the following steps: with the benefit of the parallelism of the floating labels, a series of related mentions can be flexibly packaged into a training instance, given that one contains N text labels (for noting which of the textsCharacters belonging to common characters, no definition, which characters belong to a drug or an adverse event-referenced character)

First, as shown in FIG. 2, the fixed marks are inserted at the beginning and the end of the drug mention<s>And</s>to mark the drug mention location and then to mention the corresponding candidate adverse event with suspension marking<o>And</o>means for splicing behind text, i.e. for mentioning corresponding adverse events<o>And</o>the mark is placed behind the text, wherein<o>And</o>the same position codes are mentioned as corresponding adverse events, namely the same position codes are adopted for representing<o>And</o>with corresponding adverse event mentions, the combined sequence of text labels and inserted suspension labels is then provided to the BERT pre-training model to obtain a drug mention label local context representation e ^s And adverse event mention flag local context representation e ^o E is to be ^s And e ^o Split-together as corresponding drug mention and adverse event mention vs. insert representation

wherein,

is a Hadamard product, W ₁ Is a learnable parameter matrix, H is a drug mention and adverse event mention pair embedded representation, A _s For drug mention e ^s Attention to all the labels of the document, obtained by averaging the mean of the drug mentions in the last Encoder layer the attention head, A _o Mention of indicating adverse events e ^o Attention to all the indicia of the document was gained by averaging the average of the adverse events mentioned in the last Encoder layer with the attention head, F (s, o) representing the drug and adverse event mentioned pair (e) ^s ,e ^o ) A relationship matrix, as shown in FIG. 2 in particular;

the semantic feature fusion device is used for fusing the mentioned global dependencies by the local context information through the coding module and the U-shaped semantic segmentation network (see table 1 below), and specifically comprises: firstly, a reference pair relation matrix F epsilon R containing local context information ^M×M×D The image is used as a D channel image, namely a document level relation prediction formula is converted into a pixel mask in F, rich global features are obtained by utilizing a U-shaped semantic segmentation network, and the U-shaped semantic segmentation network comprises sequentially arranged global feature extraction blocks (sequence numbers 1-8), two upper sampling blocks (sequence numbers 9-14) with jump connection and a feature output layer (sequence number 15); as a specific way, the global feature extraction block includes three convolution modules and two maximum pooling layers, the first maximum pooling layer is located behind the first convolution module, the second maximum pooling layer is located behind the second convolution module, each convolution module includes two convolution layers, the convolution kernel sizes of the two convolution layers in each convolution module are 3 × 3 and the step size is 1, the convolution kernel sizes of the two maximum pooling layers are 2 × 2 and the step size is 2, the number of channels in the feature extraction block is doubled, the number of channels of the first convolution module and the first maximum pooling layer is 64, the number of channels of the second convolution module and the second maximum pooling layer is 128, the number of channels of the third convolution module is 256, the partition region in the reference pair relationship matrix refers to the occurrence of the relationship between the reference pairs, the U-shaped semantic partition network can facilitate the information exchange between the reference pair in the sense field, similar to implicit reasoning, specifically, the feature extraction block can enlarge the sense field of the current reference pair embedded F, thereby providing rich global information for representing learning(ii) a The two upsampling blocks respectively comprise an deconvolution layer and two convolution layers which are sequentially arranged, the first upsampling block is positioned behind the third convolution module, the second upsampling block is positioned behind the first upsampling block, the convolution kernel size of the deconvolution layer in the two upsampling blocks is 2 x 2, the step length is 2, the convolution kernel size of the two convolution layers is 3 x 3, the step length is 1, the number of channels in each upsampling block is halved, the aggregation information can be distributed to each pixel, the number of channels in the first upsampling block is 128, and the number of channels in the second upsampling block is 64; the output result of the deconvolution layer (serial number 12) in the second upsampling block is in jump connection with the output result of the second convolution layer (serial number 2) in the first convolution module, and the output result of the deconvolution layer (serial number 9) in the first upsampling block is in jump connection with the second convolution layer (serial number 5) in the second convolution module, so that the last upsampling block is taken as an example, the last upsampling block is characterized by not only the output (same-scale characteristic) from the first convolution module but also the output (large-scale characteristic) from the first upsampling block, and therefore the multi-scale characteristics are effectively fused together; the convolution kernel size of the characteristic output layer is 1 × 1, the step length is 1, and the number of channels is 1. The specific parameters of the U-shaped semantic segmentation network model are shown in table 1 below.

TABLE 1 Hyperparameter List of Global dependent network model architecture

Then combining an encoding module and a U-shaped semantic segmentation network to obtain a local context and global dependency information matrix Y:

Y＝U(W ₂ F)

wherein Y ∈ R ^M×M×D' Representing a local context and a global dependency information matrix, U ∈ R ^M×M×D' Representing a U-shaped semantically segmented network, W ₂ Is a learnable weight matrix to reduce the dimension of F, and D' is far awayLess than D, W ₂ F represents an encoding module;

the classifier is used for predicting adverse drug event relations through a local context and global dependency information matrix and a smooth embedding expression, and specifically comprises the following steps: the same mention may occur multiple times in the document, so m is embedded first with a local context of mention at a different location in the document,

Wherein, E _i Denotes a mention of e _i Is to be represented by a smooth embedding of (c),

smoothly embedding representation E in separately obtained drug and adverse event _s And E _o After the local context and the global dependency information matrix Y, the classifier utilizes a feedforward neural network to classify E _s 、E _o Mapping Y to a hidden representation z, and then obtaining a relation probability through a bilinear function, wherein the specific process is as follows:

z _s ＝tanh(W _s E _s +Y _s,o )

z _o ＝tanh(W _o E _o +Y _s,o )

P(r|E _s ,E _o )＝σ(z _s W _r z _o +b _r )

wherein z is _s Is a hidden representation of the drug, z _o Is a hidden representation of an adverse event, P is a relationship probability, Y _s,o Is the drug and adverse event mention in matrix Y for (e) ^s ,e ^o ) Is represented by the local context and global dependency information, tanh is a non-lineThe linear activating function is mainly used for non-linear transformation, sigma is the probability value of the bilinear function for outputting the prediction result, W _s 、W _o 、W _r 、b _r Is a learnable parameter matrix;

the sample imbalance processor is used for training by introducing a balanced softmax method and introducing an additional class 0 to process the class imbalance problem in the sample set, and the target class scores are expected to be larger than a threshold value t ₀ The scores of all the non-target categories are less than the threshold value t ₀ ：

as a specific embodiment, the threshold t is set for simplicity ₀ Set to 0, the formula for calculating the target loss function L is simplified as follows:

s2, data preprocessing: in medical texts, there are cases of referring to different writing methods, some refer to only initials, some refer to abbreviations of letters, and the like, so that the name referring unification processing of an entity needs to be performed, and the name referring unification processing is performed by specifically adopting the following method:

firstly, carrying out pause word removing processing on the mentions in the medical text, then carrying out regularization matching, and classifying the mentions with the regularization matching degree higher than 90% as the same mentions.

S3, model training and parameter optimization: training the model by using the processed data, designing an objective optimization function to optimize network parameters, and generating an optimal extraction model, which specifically comprises the following steps of:

s31, the data set is divided into 7:2:1, dividing the training set, the verification set and the test set in proportion; as a specific embodiment, the inventors of the present application obtained 505 pieces of medical document data in total;

and S32, adopting a classified cross entropy loss function as an optimization target, wherein the target function is realized by adopting the same formula as the target loss function L calculated in the sample imbalance processor in the step S1, namely the target function is as follows:

and S33, optimizing the objective function by adopting the conventional stochastic gradient descent algorithm, and updating the network model parameters by using error back propagation.

S4, extracting adverse drug event relations:

s41, preprocessing medical text data to be extracted to obtain standardized sample data (see data preprocessing step), and defining the medicine and the non-corresponding adverse event mention relation pair category as 0;

s42, forming a training sample for a medical sample and all drug mentions and adverse event mentions contained in the medical sample, wherein the drug mentions adopt fixed marks, namely two fixed marks of < S > and </S > are directly inserted before and after all drug mentions, and the adverse event mentions are spliced behind a text in a suspension mark mode represented by < o > and </o >;

s43, feeding the sample into a BERT pre-training model, and for each pair of drug and adverse event mention mark pairs, splicing the local context representation of the drug mention mark and the local context representation of the adverse event mention mark together, namely splicing the characterization of the drug mention mark and the characterization of the adverse event mention mark together as corresponding embedded representations or characterizations of the drug mention mark and the adverse event mention pair;

s44, after embedding representation or characterization of all drug mention and adverse event mention pairs containing local context information in the obtained sample, carrying out affine transformation on the drug mention and adverse event mention pairs and an attention layer of a BERT pre-training model to obtain a mention pair relation matrix of the drug and the adverse event;

1. according to the invention, when the embedded representation is obtained, the suspension mark representation is utilized, so that different mentioned embedded representations can be more effectively distinguished, and the prediction accuracy can be obviously improved.

Finally, the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all of them should be covered in the claims of the present invention.

Claims

1. A method for extracting adverse drug event relation based on semantic segmentation is characterized by comprising the following steps:

s1, establishing a drug adverse event relation extraction model:

Firstly, fixing marks are inserted at the beginning and the end of the drug mention<s>And</s>to mark drug mention locations and then mention corresponding candidate adverse events using suspension labeling<o>And</o>ways to stitch behind text, where<o>And</o>encoding the same location as the corresponding adverse event mention, then providing the combined sequence of text labels and inserted suspension labels to the BERT pre-training model to obtain the drug mention label local context representation e ^s And adverse event mention flag local context representation e ^o E is to be ^s And e ^o Split-together as corresponding drug mention and adverse event mention versus insert representation

Wherein M represents in the sampleMaximum logarithm of mentions of drug mention and adverse event mention composition, and finally obtaining attention representation by using BERT pre-training model

Wherein A is the average value of the attention heads in the last Encoder layer of the BERT pre-training model, and the attention matrix A and affine transformation from the BERT pre-training model are used for obtaining a reference pair relation matrix of the medicine and the adverse event:

wherein,

is a Hadamard product, W ₁ Is a learnable parameter matrix, H is a drug mention and adverse event mention pair embedded representation, A _s Denotes drug mention e ^s Attention to all labels of the document is gained by averaging the mean of the drug references to the attention head in the last Encoder layer, A _o Mention of indicating adverse events e ^o Attention to all the indicia of the document was gained by averaging the average of the adverse events mentioned in the last Encoder layer with the attention head, F (s, o) representing the drug and adverse event mentioned pair (e) ^s ,e ^o ) A relationship matrix;

Y＝U(W ₂ F)

wherein Y ∈ R ^M×M×D' Represents local context and global dependency information matrix, U belongs to R ^M×M×D' Representing a U-shaped semantically segmented network, W ₂ Is a learnable weight matrix to reduce the dimension of F, and D' is much smaller than D, W ₂ F represents an encoding module;

obtaining the same mentioned smooth embedded representation E using the maximally pooled smoothed version _i ：

Wherein E is _i Denotes a mention of e _i Is to be presented in a smooth embedded representation,

z _s ＝tanh(W _s E _s +Y _s,o )

z _o ＝tanh(W _o E _o +Y _s,o )

P(r|E _s ,E _o )＝σ(z _s W _r z _o +b _r )

wherein z is _s Is a hidden representation of the drug, z _o Is not provided withHidden representation of good events, P is the probability of relationship, Y _s,o Is a reference to drugs and adverse events in matrix Y (e) ^s ,e ^o ) Is represented by the local context and global dependency information, tanh is a nonlinear activation function, σ is a bilinear function, W _s 、W _o 、W _r 、b _r Is a learnable parameter matrix;

the sample imbalance processor is used for training by introducing a balanced softmax method and introducing an additional class 0 to process the class imbalance problem in the sample set, and the target class scores are expected to be larger than a threshold value t ₀ The scores of the non-target categories are all less than the threshold t ₀ ：

Wherein L represents an objective loss function, log represents a logarithm based on e, e represents a constant, t _i Indicates the probability of the ith positive label, t _j The probability of the jth negative label is represented, omega pos represents a medicine and an adverse event mention relation corresponding to the medicine, namely a positive label, and omega neg represents a medicine and an adverse event mention relation not corresponding to the medicine, namely a negative label;

s4, extracting adverse drug event relations:

s42, forming a training sample for a medical sample and all medicine mentions and adverse event mentions contained in the medical sample, directly inserting two fixed marks of < S > and </S > before and after all medicine mentions, and splicing the adverse event mentions behind a text in a suspension mark mode represented by < o > and </o >;

s43, feeding the sample into a BERT pre-training model, and for each pair of drug and adverse event mention mark pairs, splicing the local context representation of the drug mention mark and the local context representation of the adverse event mention mark together as corresponding drug mention and adverse event mention pair embedded representation;

s46, obtaining smooth embedded expression of the medicines and the adverse events, utilizing a feedforward neural network to map the smooth embedded expression of the medicines and the adverse events, a local context and a global dependency information matrix to hidden expression, and then obtaining relation probability, namely relation score through a bilinear function;

2. The method for extracting adverse drug event relation based on semantic segmentation according to claim 1, wherein in the U-shaped semantic segmentation network utilized by the semantic feature fusion device in step S1, the global feature extraction block includes three convolution modules and two maximum pooling layers, the first maximum pooling layer is located after the first convolution module, the second maximum pooling layer is located after the second convolution module, each convolution module includes two convolution layers, and the number of channels in the feature extraction block is doubled; the two up-sampling blocks respectively comprise an anti-convolution layer and two convolution layers which are sequentially arranged, the first up-sampling block is positioned behind the third convolution module, the second up-sampling block is positioned behind the first up-sampling block, and the number of channels in each up-sampling block is reduced by half; and the output result of the deconvolution layer in the second up-sampling block is in jumping connection with the output result of the second convolution layer in the first convolution module, and the output result of the deconvolution layer in the first up-sampling block is in jumping connection with the output result of the second convolution layer in the second convolution module.

3. The method according to claim 2, wherein the convolution kernel sizes of the two convolution layers in each convolution module are 3 x 3 and 1 step size, the convolution kernel sizes of the two largest pooling layers are 2 x 2 and 2 step sizes, the convolution kernel sizes of the deconvolution layers in the two upsampling blocks are 2 x 2 and 2 step sizes, the convolution kernel sizes of the two convolution layers are 3 x 3 and 1 step size, and the convolution kernel size of the feature output layer is 1 x 1 and 1 step size.

4. The method for extracting adverse drug event relationship based on semantic segmentation as claimed in claim 1, wherein the step S1 sample imbalance processor sets the threshold t ₀ Set to 0, the formula for calculating the target loss function L is simplified as follows: