CN113221567A

CN113221567A - Judicial domain named entity and relationship combined extraction method

Info

Publication number: CN113221567A
Application number: CN202110505601.8A
Authority: CN
Inventors: 毛松; 李振伟; 程佳; 张文静
Original assignee: BEIJING AEROSPACE INTELLIGENCE AND INFORMATION INSTITUTE
Current assignee: BEIJING AEROSPACE INTELLIGENCE AND INFORMATION INSTITUTE
Priority date: 2021-05-10
Filing date: 2021-05-10
Publication date: 2021-08-06

Abstract

The invention discloses a named entity and relationship joint extraction method in the judicial field, which is an entity relationship extraction method of a BILSTM network and attention mechanism set based on a BERT pre-training language model, realizes joint learning of two tasks through parameter sharing, and fully utilizes the relation between the tasks to optimize a result. Selecting a BERT pre-training language model training word vector to complete the conversion work of the data set word vector; then, acquiring more complete context feature information by using a BILSTM neural network, thereby extracting the text depth word vector features; and finally, acquiring a class label of the character through a softmax classifier to realize entity identification, and judging the association relationship between the current character and the previous character by using an attention mechanism to realize the combined extraction of the entity and the multiple relationships.

Description

Judicial domain named entity and relationship combined extraction method

Technical Field

The invention relates to the technical field of information extraction, in particular to a named entity and relationship combined extraction method in the judicial field.

Background

With the rapid development of the internet, in the present day of explosive growth of information, how to efficiently acquire required information is a hot research problem, and an information extraction technology is in force. The information extraction can be subdivided into 3 subtasks of named entity identification, entity relationship extraction and event extraction, wherein semantic triples are obtained through the entity identification and the entity relationship extraction, and the semantic triples are important preconditions for constructing a knowledge graph and understanding a natural language. The judicial field is a typical knowledge-intensive industry, and in the big data era of information explosion, laws and regulations, guide cases, legal documents and the like are developed in judicial work, and the judicial field has substantial and massive judicial data for the public, parties and judicial authorities. The text in the judicial field is mainly the professional description of the law personnel's past work on the culprit or suspect, which includes a large number of entities and entity relationships related to case details. The traditional method of manually extracting, integrating and managing information only by manpower can not meet the requirement of information extraction at present. Therefore, the automatic information extraction of the design model becomes a hot problem in the judicial industry at present, and how to effectively identify named entities and relationship classification in the judicial data is a key step for realizing automatic judgment. Existing correlation studies usually separate two subtasks of entity extraction and relationship extraction, namely, by pipeline (pipeline) mode, and although pipeline framework has flexibility to integrate different data sources and learning algorithms, there are certain problems: the connection between the two tasks is lost, resulting in error propagation, and the existence of common overlapping relationships between judicial entities is not considered.

Therefore, how to consider the relation between tasks and the overlapping relation between the judicial named entities, eliminate error propagation, and extract the judicial named entities and the relations is a problem that needs to be solved by those skilled in the art.

Disclosure of Invention

In view of the above, the invention provides a named entity and relationship joint extraction method in the judicial field, which is an entity relationship extraction method combining a bilst network and an attention mechanism based on a BERT pre-training language model, shares text characteristic parameters extracted by the bottom-layer bilst network, realizes two task joint learning through parameter sharing, fully utilizes the inter-task contact to optimize the result, and utilizes the interaction information thereof to improve the performance of model entity identification and relationship extraction without splitting.

In order to achieve the purpose, the invention adopts the following technical scheme:

a judicial domain named entity and relationship combined extraction method comprises the following steps:

step 1: selecting judicial domain data, setting a labeling strategy, labeling the judicial domain data according to the labeling strategy, and constructing a relation extraction data set;

step 2: inputting the relation extraction data set into a BERT pre-training language model to obtain vector representation of each character in the relation extraction data set, and generating a word vector sequence;

and step 3: inputting the word vector sequence into a BILSTM network to extract characteristic information and obtain a semantic vector;

and 4, step 4: classifying the semantic vector by adopting a softmax classifier to obtain an entity label of the character; thereby realizing entity identification;

and 5: and extracting the relationship labels between the entities according to the semantic vectors by adopting an attention mechanism, thereby realizing the extraction of the entity relationship.

Preferably, the label policy setting entity label is composed of four parts, including an entity boundary, an entity category, a relationship category and an entity position; the entity boundary adopts a BIO marking principle; the entity categories include legal documents, legal subjects, legal objects and legal facts; the judicial relationships comprise construction relationships, subject relationships, reason relationships, result relationships, tool relationships, mode relationships, place relationships, time relationships, purpose relationships, engagement relationships, reception relationships and parallel relationships; the entity positions are represented as 1, 2 and M, wherein 1 represents that the character is the 1 st entity in the relationship, 2 represents that the character is the 2 nd entity in the relationship, and M represents that the characters have an overlapping relationship and are respectively located at different positions.

Preferably, the BERT pre-training language model adopts a bidirectional Transformer as an encoder, represents the specific semantics of characters in context according to the semantic relationship of the context, and adds a residual error network and layer normalization; in particular, the method comprises the following steps of,

the attention unit calculation formula is:

q, K and V are input word vector matrixes respectively, and respectively represent a Query matrix, a Keys matrix and a Value matrix; d_kIs the input vector dimension; QK^TThe value obtained by point multiplication of the Q input word vector matrix and the K input word vector matrix represents the relation between the input word vectors, and determines the attention degree of other parts of sentence input when a word is coded at a certain position; through d_kAfter the reduction is carried out, weight representation is obtained through softmax normalization, the current output is weighted sum of all word vectors in the sentence, so that the representation of each word contains information of other words in the sentence, and is context-dependent and has more global property compared with the traditional word vector representation;

adopting a MultiHead mode, expanding the capability of the model to focus on different positions, and adopting the formula:

MultiHead(Q,K,V)＝Concat(head₁,...head_k)W⁰ (2)

head_i＝Attention(QW_i ^Q,KW_i ^K,VW_i ^V) (3)

wherein W⁰Is an additional weight matrix; head_iEach small space representing a multi-head self-attention model; w_i ^Q、W_i ^K、W_i ^VRespectively representing matrix mapping parameters obtained by training a Query matrix, a Keys matrix and a Value matrix, wherein i represents the input character sequence at different moments, the first word i is 2, and so on;

residual network and layer normalization are added to improve the degradation problem:

FNN＝max(0,xW₁+b₁)W₂+b₂ (5)

wherein, alpha and beta are parameters to be learned; μ and σ are the mean and variance of the input layers; w₁And W₂Is a weight matrix; b₁And b₂Is a bias vector. The input of the BERT pre-training language model is a relational extraction data set, the lexical text is labeled, the output is a word distributed representation of each word in the text, and the BERT pre-training language model aims to convert the lexical text into low-dimensional and dense vector representation.

Preferably, both the coding layer and the decoding layer in the BILSTM network adopt a bidirectional LSTM structure, each sentence is respectively calculated by adopting sequence and reverse sequence to obtain two sets of different hidden layer representations, and then the final hidden layer representation is obtained by vector splicing;

wherein h is_t-1Inputting the above information; x is the number of_tIs the current input information; sigma is a Sigmoid function of element directions; b_f、b_i、b_cAnd b₀Respectively representing a forgetting gate offset vector, an input gate offset vector, a cell state offset vector and an output gate offset vector;

is a cellular memory; is the product of the element directions; w is a_f、w_i、w_GAnd w₀Respectively representing a forgetting gate weight matrix, an input gate weight matrix, a cell state weight matrix and an output gate weight matrix; tan h is a hyperbolic tangent function; f. of_t，i_t，C_t，O_tRespectively representing a forgetting gate, an input gate, a cell state and an output gate;

indicating the cell state from the last unit c_t-1A new cell state after renewal; h is_tIs a hidden layer representation;

to hide the sequence from the layer

And reverse order hidden layer representation

The coded information representing a single character t in tandem, is represented as

h_t' denotes the last output of the BILSTM network, decoding layer h_t' is a semantic vector obtained by analyzing the context information at the moment t.

Preferably, the label corresponding to the character is classified from 4 entity category label sets of legal documents, legal subjects, legal objects and legal facts by using a softmax classifier, and the label probability y is_t：

Wherein the content of the first and second substances,

semantic vectors extracted through a BILSTM network; u is a parameter matrix; b is a bias term. U and b are obtained from the network iteration.

Preferably, in the step 5, when extracting the relation of the character input at the time t, the corresponding semantic vector is extracted

As an input of the attention mechanism, judging the incidence relation between the non-entity characters at the time t and the non-entity characters at the previous time, wherein a specific calculation formula of relation extraction is as follows:

wherein the content of the first and second substances,

representing the superposition of semantic vectors decoded and output at the moment t and all the previous moments in a BILSTM network decoding layer; b_≤tAnd b_tAll represent a bias vector; w₁And W₂Are all weight matrices;

representing the probability of association between characters.

According to the technical scheme, compared with the prior art, the invention discloses and provides a named entity and relationship joint extraction method in the judicial field, and the named entity and relationship joint extraction method is an entity relationship extraction algorithm based on the combination of the BITTM network and the attention mechanism of a BERT pre-training language model, realizes two tasks joint learning through parameter sharing, and fully utilizes the inter-task relation to optimize the result. Selecting a BERT pre-training language model training word vector to complete the conversion work of the data set word vector; then, acquiring more complete context feature information by using a BILSTM neural network, thereby extracting the text depth word vector features; and finally, acquiring a class label of the character through a softmax classifier to realize entity identification, and judging the association relationship between the current character and the previous character by using an attention mechanism to realize the combined extraction of the entity and multiple relationships.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a schematic diagram of the method of the present invention;

FIG. 2 is a schematic illustration of an exemplary annotation provided by the present invention;

FIG. 3 is a schematic diagram of a BERT pre-training language model provided by the present invention;

FIG. 4 is a diagram of a Transformer coding unit according to the present invention;

fig. 5 is a schematic diagram of a data acquisition and preprocessing process provided by the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention discloses a named entity and relationship combined extraction method in the judicial field.

1. When an entity is labeled, a label consists of 4 parts, which are respectively: entity boundaries, entity categories, relationship categories, and entity locations.

1) A physical boundary. The entity boundary adopts a BIO marking principle, wherein B represents an initial word of each entity, I represents a middle or tail word of the entity, and a non-entity character in the text is marked as O in a unified way;

2) entity category and relationship category labels. According to the design of relevant practitioners, the entities to be extracted are divided into: the legal documents, legal subjects, legal objects and legal facts are in four categories, and { DOC, SUB, OBJ, FAC } is used as the type labels of the 4 categories of entity names respectively. The labeled entity classes are shown in table 1 below:

TABLE 1 entity classes

The relationship class labels are subdivided into 12 types, which are: the relationship of job (agent), the relationship of job (task), the relationship of reason (cause), the relationship of result (outcontrol), the relationship of tool (tool), the relationship of mode (means), the relationship of place (place), the relationship of time (time), the relationship of destination (purpose), the relationship of job (engage), the relationship of lead (stress), and the relationship of parallel (juxtap) as shown in table 2 below:

TABLE 2 judicial concept relationship Classification and description

3) The physical location. The position of the entity represents the position of the entity in the relationship, and is defined by 1, 2 and M, wherein 1 represents that the word is the 1 st entity in the relationship, 2 represents the 2 nd entity, and M represents that the word exists in an overlapping relationship and is respectively different positions. The final relationship extraction result can be represented as a triple { entity 1, relationship category, entity 2}, as shown in fig. 2 for the labeled example sentence, for judicial entities and relationship labeled examples.

2. BERT pre-training language model

In order to avoid the error of boundary segmentation to the maximum extent, a word marking mode is selected, namely, the word is used as a basic unit for input, and a BERT pre-training language model is introduced for effectively integrating semantic information.

The BERT pre-training language model further increases the generalization capability of the word vector model, and fully describes the character-level, word-level, sentence-level and even sentence-level relation characteristics. The structure is shown in fig. 3, in order to fuse the contexts of the left and right sides of a word, BERT uses a bidirectional Transformer as an encoder, where "bidirectional" means that when a model processes a word, it can represent the specific semantics of the word in the context according to the semantic relationship of the context.

The Transformer coding unit is shown in fig. 4, which is the most important part of BERT, and the Transformer models a piece of text based entirely on the attention mechanism. The most important module of the coding unit is a self-attention part, and the calculation formula is as follows:

where Q, K and V are the input word vector matrices, d, respectively_kTo input vector dimensions, QK^TRepresenting calculating the relationship between input word vectors; through d_kAfter the reduction is carried out, weight representation is obtained through softmax normalization, the current output is weighted sum of all word vectors in the sentence, so that the representation of each word contains information of other words in the sentence, and is context-dependent and has more global property compared with the traditional word vector representation;

MultiHead(Q,K,V)＝Concat(head₁,head_k)W⁰ (2)

head_i＝Attention(QW_i ^Q,KW_i ^K,VW_i ^V) (3)

wherein W⁰Is an additional weight matrix; head_iEach small space representing the multi-head self-attention model can be understood as the multi-head self-attention model is divided into a plurality of heads, namely a plurality of small spaces, heads_iIs a calculation in each space; w_i ^Q、W_i ^K、W_i ^VRespectively representing matrix mapping parameters obtained by training a Query matrix, a Keys matrix and a Value matrix, wherein i represents the input character sequence at different moments, the first word i is 2, and so on;

FNN＝max(0,xW₁+b₁)W₂+b₂ (5)

where α and β are parameters to be learned, μ and σ are the mean and variance of the input layer; w₁And W₂Is a weight matrix; b₁And b₂Is a bias vector; the input of the BERT pre-training language model is a relational extraction data set, the judicial text is labeled, the output is a word distributed representation of each word in the text, and the BERT pre-training language model aims to convert the judicial text into low-dimensional and dense vector representation.

The BERT pre-training language model can make full use of information on the left and right sides of a word to obtain better distributed representation of the word.

3. BILSTM network

A Bidirectional LSTM (BLSTM) structure is adopted in both the encoding layer and the decoding layer. The bidirectional LSTM calculates two sets of different hidden layer representations of each sentence respectively by adopting sequence (starting from the first word and recursing from left to right) and reverse sequence (starting from the last word and recursing from right to left), and then obtains the final hidden layer representation by vector splicing.

is a cellular memory; is the product of the element directions; w is a_f、w_i、w_GAnd w₀Respectively representing a forgetting gate weight matrix, an input gate weight matrix, a cell state weight matrix and an output gate weightA re-matrix; tan h is a hyperbolic tangent function; f. of_t，i_t，C_t，O_tRespectively representing a forgetting gate, an input gate, a cell state and an output gate;

to hide the sequence from the layer

And reverse order hidden layer representation

h_t' denotes the final output, decoding layer h_t' is a semantic vector obtained by analyzing the context information at the moment t.

4. Entity identification and relationship extraction

In entity identification, as shown in fig. 2, in the encoding process, the text sequence is recorded as S ═ x (x)₁,x₂,x_n) Then, for the t-th character, i.e. the character input at the time of t, the vector representation E ═ of each character is obtained by the BERT pre-training language model₁,e₂,e_n) The word vector of the t-th word is e_tFeature information extracted by the BILSTM network coding layer

When the text features are analyzed by the same method, the method comprises the steps of

As the input of the time T BILSTM, the semantic information of the judicial text obtained by the propagation calculation of the forward LSTM and the backward LSTM of the decoding layer is respectively recorded as

Finally, the final semantic information is obtained through splicing

Namely, the semantic vector which is obtained by the decoding layer BILSTM network according to the context information analysis at the time t. Finally, classifying the label corresponding to the character from the 4 character label sets by using a softmax classifier, wherein the label probability y of the label is_t：

In the formula

Is a semantic vector extracted by a BILSTM network, U is a parameter matrix, and b is a bias item.

In the relationship extraction, the attention mechanism is used for realizing the relationship extraction between judicial domain entities. When extracting the relation of the t-th character (namely extracting the relation of the character input at the t moment), decoding by a BILSTM neural network to obtain the result

The semantic vector is used as the input of an attention mechanism, and the incidence relation between the non-entity characters at the time t and the non-entity characters at the previous time is judged.

As shown in fig. 2, when t is 6, it is only necessary to determine that there is a relationship between the current time character "hold" and the characters "sheet" and "three", and no determination is necessary for O tags such as "on", "some", "ground", and the like. The specific calculation formula in the relation extraction is as follows:

in the formula

Representing the superposition of the decoded output semantic features at time t and at all preceding times in the decoding layer, b_≤tAnd b_tAll represent a bias vector; w₁And W₂Are all weight matrices;

representing the probability of association between characters. Obtained b_≤tThe method not only can extract the probability between the t-time character and the previous character, but also can extract the type of the relationship, thereby realizing the extraction of the relationship label between judicial domain entities.

5. Verification analysis

The selected judicial domain corpus is derived from two parts, the first part is to acquire the website information of the judicial domain by using a web crawler technology, and the website information comprises a case information public network of a people inspection institute, a referee document network, a highest people court trial service guide case and a public case issued by a highest people court communique, 294 articles are selected in total, and 16 ten thousand characters are counted; the second part is a judicial domain dictionary, China's active Law assembly, including 200 ten thousand law articles, the contents of which include constitutional, criminal and national flag laws. After appropriate data preprocessing and manual labeling are carried out, a corpus of property disputes is constructed (the data preprocessing stage comprises the steps of removing special characters such as spaces, empty data and the like in a text, removing some partial texts with low training significance and excessively short texts, adopting a jieba word segmentation tool to segment the text, using a Chinese word segmentation tool NLPIR of Chinese academy of sciences and a hectic stop word list to carry out name recognition, text segmentation, part-of-speech marking and preprocessing so as to delete stop words and the like), and finally 10000 sentences are selected for experiment. 5 judicial relationships were extracted and classified, respectively: the relationship between the executed and accepted affairs, time, tools, modes and results, 2000 pieces of data are obtained from each relationship. Half of the training set is used as a training set for model training, and the other half is used as a test set for evaluating the performance of the method. The data acquisition and preprocessing process is shown in fig. 5.

(1) Evaluation index

The evaluation indexes of named entity identification and relationship extraction are values of precision rate P, recall rate R and F. And the accuracy and the recall rate respectively evaluate the entity relationship extraction effect from two different angles of precision ratio and recall ratio. For the entity relationship extraction task, the accuracy rate P and the recall rate R are mutually influenced and have a complementary relationship, so that the information of the accuracy rate and the recall rate is comprehensively considered by adopting an F1 value. The specific calculation formula is as follows: t is_pIdentifying the correct number of entities for the model, F_pNumber of unrelated entities identified for the model, F_nThe number of related entities but not detected by the model.

4.3 comparison of experiments

In order to effectively verify the rationality of the method and prove the necessity of each module in the method, after relevant data of the method is obtained in a simulation experiment, the set comparison test is as follows:

1) pipeline CNN model: using CNN to extract relation, using Convolution Deep Neural Network (CDNN) to extract vocabulary and sentence level characteristics, and using all word marks as input;

2) pipeline LSTMAttention model: the model adopts a bidirectional LSTM neural network model and is added with an Attention mechanism, so that complex characteristic engineering in the traditional work is avoided, and a relatively excellent effect is obtained in the task;

3) a combined model: the entity identification part is output by adding a softmax to an LSTM-decode layer, and the relation extraction part is composed of a CNN layer and a Max Pooling layer;

4) the multi-head combined model: a shared BERT layer is different from the base model.

Performance evaluation experiments were performed on the test sets, respectively, and data comparison was performed by integrating the results of the three experiments, the comparison results being shown in table 3 below.

TABLE 3 judicial entity relationship extraction results

As can be seen from the experimental results in Table 3, the accuracy, recall rate and F1 value are improved in three aspects in general:

1. and comparing the combined model with the pipeline model, wherein the combined model is higher than the pipeline model, and the combined model is superior to the pipeline model.

2. Compared with a combined model and a multi-head combined model, the F value of multi-head combination is higher than 1.61, and the design of the multi-head in the multi-head combined model has certain advantages.

3. Compared with the named entity recognition part of the invention, the named entity recognition part is 0.94 higher than the F1 value of the multi-head joint model, and the relation extraction part is 3.72 higher than the F1 value of the multi-head joint model, which shows that the named entity recognition part has better effect in both the tasks of entity recognition and relation extraction.

The method effectively solves the problem that word vectors cannot be constructed due to the polysemy of a word, and obtains better effect on the data set; in the relation extraction stage, a new combined model is constructed by comparing and investigating the advantages and the disadvantages of the pipeline model and the combined model extracted by relation, the entity and the relation can be extracted simultaneously, and the problems of error propagation delay of the pipeline model and neglect of the relation between two subtasks are solved. The technical scheme of the invention obtains partial support and guidance of China national key research and development plans (2018YFC 0832200; 2018YFC 0832201).

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A judicial domain named entity and relationship combined extraction method is characterized by comprising the following steps:

and 4, step 4: classifying the semantic vector by adopting a softmax classifier to obtain an entity label of the character;

and 5: and extracting the relationship labels between the entities according to the semantic vectors by adopting an attention mechanism.

2. The judicial domain named entity and relationship joint extraction method according to claim 1, wherein the tag policy setting entity tag is composed of four parts, including an entity boundary, an entity type, a relationship type, and an entity position; the entity boundary adopts a BIO marking principle; the entity categories include legal documents, legal subjects, legal objects and legal facts; the judicial relationships comprise construction relationships, subject relationships, reason relationships, result relationships, tool relationships, mode relationships, place relationships, time relationships, purpose relationships, engagement relationships, reception relationships and parallel relationships; the entity positions are represented as 1, 2 and M, wherein 1 represents that the character is the 1 st entity in the relationship, 2 represents that the character is the 2 nd entity in the relationship, and M represents that the characters have an overlapping relationship and are respectively located at different positions.

3. The judicial domain named entity and relationship joint extraction method according to claim 1, wherein the BERT pre-training language model adopts a bidirectional Transformer as an encoder, represents the specific semantics of characters in context according to the semantic relationship of context, and adds a residual error network and layer normalization; in particular, the method comprises the following steps of,

the attention unit calculation formula is:

q, K and V are input word vector matrixes respectively representing a Query matrix, a Keys matrix and a Value matrix; d_kIs the input vector dimension; QK^TPerforming point multiplication on the Q input word vector matrix and the K input word vector matrix to obtain a score, and expressing the relation between the input word vectors; through d_kAfter reduction, weight representation is obtained through softmax normalization, and output is weighted sum of all word vectors in a sentence;

adopting a MultiHead mode, and the formula is as follows:

MultiHead(Q,K,V)＝Concat(head₁,…head_k)W⁰ (2)

head_i＝Attention(QW_i ^Q,KW_i ^K,VW_i ^V) (3)

wherein W⁰To attachA weight matrix; head_iEach small space representing a multi-head self-attention model; w_i ^Q、W_i ^K、W_i ^VRespectively representing matrix mapping parameters obtained by training a Query matrix, a Keys matrix and a Value matrix, wherein i represents the sequence of input characters at different moments;

adding a residual network and layer normalization to improve the degradation problem:

FNN＝max(0,xW₁+b₁)W₂+b₂ (5)

wherein, alpha and beta are parameters to be learned; μ and σ denote the mean and variance of the input layer, respectively; w₁And W₂Is a weight matrix; b₁And b₂Is a bias vector.

4. The judicial domain named entity and relationship joint extraction method according to claim 2, wherein the coding layer and the decoding layer in the BILSTM network both adopt a bidirectional LSTM structure, each sentence is respectively calculated in sequence and in reverse order to obtain two different hidden layer representations, and then the final hidden layer representation is obtained by vector splicing;

is a cellular memory; is an elementThe product of the directions; w is a_f、w_i、w_cAnd w₀Respectively representing a forgetting gate weight matrix, an input gate weight matrix, a cell state weight matrix and an output gate weight matrix; tan h is a hyperbolic tangent function; f. of_t，i_t，C_t，O_tRespectively representing a forgetting gate, an input gate, a cell state and an output gate;

to hide the sequence from the layer

And reverse order hidden layer representation

5. The method for jointly extracting named entities and relationships in the judicial domain according to claim 4, wherein the labels corresponding to the characters are classified from the label sets of 4 entity categories of legal documents, legal subjects, legal objects and legal facts by using a softmax classifier, and the label probability y is_t：

Wherein the content of the first and second substances,

semantic vectors extracted through a BILSTM network; u is a parameter matrix; b is a bias term.

6. The method for jointly extracting named entities and relationships in the judicial domain according to claim 5, wherein in the step 5, when the relationships are extracted for the characters input at the time t, the corresponding semantic vector h is extracted_t ^dAs an input of the attention mechanism, judging the incidence relation between the non-entity characters at the time t and the non-entity characters at the previous time, wherein a specific calculation formula of relation extraction is as follows:

wherein the content of the first and second substances,

representing the superposition of semantic vectors decoded and output at the moment t and all the previous moments in a BILSTM network decoding layer; b_≤tAnd b_tAll represent a bias vector; w₃And W₄Are all weight matrices;

representing the probability of association between characters.