CN114462409A

CN114462409A - Audit field named entity recognition method based on countermeasure training

Info

Publication number: CN114462409A
Application number: CN202210109168.0A
Authority: CN
Inventors: 钱泰羽; 陈一飞; 乔红岩
Original assignee: NANJING AUDIT UNIVERSITY
Current assignee: NANJING AUDIT UNIVERSITY
Priority date: 2022-01-28
Filing date: 2022-01-28
Publication date: 2022-05-10

Abstract

With the issuance of a new auditing method, effective entity information is automatically identified from the linguistic data in the auditing field, which is beneficial to improving the efficiency of implementing the auditing policy. Named Entity Recognition (NER) aims at recognizing entities in corpora, and a deep learning method is mature in application and remarkable in result on the task, but a database of the corpora in the auditing field is not perfect, and Entity boundary division is not clear. The invention provides an audit field named entity recognition method based on countermeasure training. Chinese Word Segmentation (CWS) is used to identify Word boundaries, has much of the same Word boundary information as NER, uses the same place to assist NER tasks and help in boundary partitioning. The word vector is obtained by using the BERT, the shared information of the NER task and the CWS task is extracted through the countermeasure training, meanwhile, the noise caused by the private information of the CWS task is effectively prevented, the word boundary information shared by the tasks is fused into the NER task, and the accuracy of named entity recognition in the auditing field is improved.

Description

Audit field named entity recognition method based on countermeasure training

Technical Field

The invention relates to the technical field of named entity recognition, in particular to an audit field named entity recognition method based on countermeasure training.

Background

Named Entity Recognition (NER) is the most important basic task of Natural Language Processing (NLP), and is a pre-task of a relationship extraction, question and answer system, and the like. The main task is to mark predefined entity types from unstructured text, such as place names, organization names, etc. Traditional named entity recognition methods mostly start with improved models and feature engineering to reduce the dependence on rule methods and expert knowledge, but pay little attention to the problem of entity boundaries. With the promulgation of new auditing laws, auditing policies are divided into more and more details, and auditing policy texts are increased day by day. Meanwhile, the implementation of the audit policy is more and more important in the audit process, the existing audit policy is mainly implemented manually, and the workload of auditors is increased. In addition, the audit policy is mostly unstructured text, and the extraction of entities in the unstructured text is beneficial to improving the implementation efficiency of the audit policy. In the auditing field, the database of the linguistic data in the auditing field is not perfect, and the entity boundary division is not detailed enough. Chinese Word Segmentation (CWS) is used to identify Word boundaries, the CWS has a larger data set than NER, the boundary division is more detailed on a general data set, and NER has many boundary divisions similar to the CWS, and the same can be used to assist NER tasks and help the boundary division. Peng et al propose a joint model of NER tasks and CWS tasks in which the linear chain CRF has access to both the feature extractor of the NER and the LSTM module for word segmentation, and word segmentation and NER training share all the parameters of the LSTM module. Therefore, the model only focuses on task shared information between the NER task and the CWS task, and ignores the filtering of private information of each task, which brings noise to the two tasks.

Disclosure of Invention

The invention aims to overcome the defects in the prior art, and provides a named entity identification method in the audit field based on countermeasure training, which can effectively solve the problems in the technical background.

In order to achieve the purpose, the invention provides an audit field named entity recognition method based on countermeasure training, which comprises the following steps:

s1): acquisition of the data set: the invention mainly solves the problem of named entity identification in the audit field, so that the data set in the audit field is used as the main data set of the invention. The CWS and the NER can divide the entity boundary, the CWS has a larger data set, the boundary is divided more finely on a general data set, and the characteristic of the CWS can be used for assisting in completing the NER task; the new generation daily newspaper participle corpus is used as an auxiliary data set due to large data volume and rich content.

S11): NER dataset

The audit field data set collects 7323 language materials related to poverty-alleviation policies from a government website by using a web crawler, constructs the language materials by screening sentences with the word number of 10-100, and preprocesses the original data, including deletion of non-text parts, uniform coding and segmentation. According to the following steps: 2: the 1 approach is divided into a training set, a validation set, and a test set, and uses manual versus 4 entity types: the name of a person, the name of a place, the name of an organization and a proper noun are labeled by adopting a BIO (B represents the beginning of an entity; I represents the middle of the entity; and O represents that the entity is not) mode.

S12): CWS dataset

The word segmentation corpus of the new generation people's daily newspaper is constructed by taking all articles published by the humanistic and social computing research center of Nanjing agriculture university in the first half of 2015 and in 9 months of 2016, 1 month of 2017, 1 month of 2018 and 1 month of 2018 as objects, the scale of the corpus is more than 2300 ten thousand characters, and the corpus is labeled manually in a BMES mode. The invention uses the language material of 2018 month 1, which is 43647 in total.

S2): constructing a model: the model framework provided by the invention longitudinally comprises three tasks, wherein the left side is named as an entity identification task and comprises an NER BERT Embedding module, an NER Private BilSTM module and an NER CRF module; the right side is a Chinese word segmentation task which comprises a CWS BERT Embedding module, a CWS Private BilSTM module and a CWS CRF module; the middle part is an antagonistic training task which comprises a Shared BilSTM module and an antagonistic training module; the three tasks comprise an embedding layer, a sharing-private feature extraction layer and a CRF layer or an antagonistic training layer in the transverse direction, and the structure is introduced according to the three tasks in the transverse direction.

S21): embedding layer

The linguistic data is input into an embedded layer, BERT adopts Transformer for coding, a Self-attention mechanism is introduced to predict the dependency relationship among words and capture the information of the internal structure of a sentence, the length of the input sentence exceeds n for truncation, and meanwhile, 0 is used for complementing the length of the sentence which is less than n. And adding a vector [ CLS ] of input representation and a vector [ SEP ] of a divided sentence pair at the beginning of the sentence, and training the sentence to obtain more accurate semantic information. Sentence-level features are then obtained using Segment embedding to determine whether a given sentence is contiguous between the sentences. Since the word order of the text is crucial to the meaning of the sentence, BERT encodes each character position independently, learns the order characteristics of the input sequence, and thus obtains the information of each position. And finally, adding vectors obtained by Token embedding, Segment embedding and Position embedding to obtain an output sequence of the BERT.

S211): NER BERT Embedding module

Using an audit domain dataset for the NER task, a given sentence W ═ W₁，w₂，...，w_n]After being input into the NER BERT Embedding module, the sequence X ═ X of the word vector of each word may be output₁，x₂，...，x_n]Wherein w is_iFor words in sentences, x_iIs w_iThe corresponding word vector, n is the length of the sentence.

S212): CWS BERT Embedding module

Using new generation civilian daily tokenization corpora for the CWS task, a given sentence W 'is [ [ W'₁，w′₂，...，w′_m]After entering the CWS BERT Embedding module, the sequence of word vectors for each word X 'may be output'₁，x′₂，...，x′_m]Wherein, w'_iIs a word in sentence, x'_iIs w'_iCorresponding word vectors, m being the length of the sentence, and n > m being specified.

In summary, each dimension vector in X 'is complemented to n, and the complemented X' is integrally connected to the lower part of X to obtain a sequence

An input for extracting shared information against the training task.

S22): shared-private feature extraction layer

Long Short-Term Memory networks (LSTM) are a variant of Recurrent Neural Networks (RNN) that can efficiently use Long-distance information and solve the problems of gradient dispersion and gradient burst of RNN through gate control structures and Memory cells. The unidirectional LSTM can only obtain the information of the previous moment of the input information of the current moment, the information of the later moment of the input information of the current moment is also important in the sequence labeling task, and in order to fuse the information of the two sides of the sequence, the bidirectional LSTM (Bi-directional Long Short-Term Memory, BilTM) is adopted for feature extraction. Given an input sequence to perform feature extraction, the hidden state at the ith time represents the output features as shown in equations (1) to (3):

wherein,

and

respectively representing the hidden states in the forward and backward directions at the ith time,

indicating a connect operation.

S221): NER Private BilSTM Module

Converting the sequence X into [ X ]₁，x₂，…，x_n]Private feature extraction is carried out by inputting NER Private BilSTM module, and output feature of NER task Private BilSTM can be obtained

Wherein,

representing the per task private feature output at time i. For any sentence in the audit domain dataset, the hidden state of private BilSTM is represented as shown in equation (4):

wherein, theta_npFor the NER private BiLSTM parameter, dimension setting for hidden state.

S222): CWS Private BilSTM module

Sequence X '═ X'₁，x′₂，…，x′_m]Private feature extraction is carried out by inputting CWS Private BilSTM module, and output feature of CWS task Private BilSTM can be obtained

Wherein,

representing the CWS task private feature output at time i. For any sentence in the daily newspaper participle corpus of the new generation people, the hidden state of the private BilSTM layer is expressed as shown in formula (5):

wherein, theta_cpDimension setting for hidden state for CWS private BiLSTM parameter.

S223): shared BilSTM module

Will be sequenced

The Shared BilSTM module is input for Shared feature extraction, and the output feature of the Shared BilSTM can be obtained

Wherein,

and (4) representing the shared characteristics of the NER task and the CWS task output at the ith moment. For any sentence in the set, the hidden state of the shared BilSTM layer is represented as shown in equation (6):

wherein, theta_sharedDimension setting for hidden state for sharing the BilSTM parameter.

In summary, the Private feature extracted by the NER Private BilSTM module and the Shared feature extracted by the Shared BilSTM module are connected to obtain the total feature H of the NER task^nerAs input to the NER CRF module. The Private features extracted by the CWS Private BilSTM module and the Shared features extracted by the Shared BilSTM module are connected to obtain the total features H of the CWS task^cwsAs input to the CWS CRF module. Represented by the formulae (7) and (8):

s23): CRF layer

The BilSTM can only obtain the information relation between words without considering the mutual relation between continuous labels, so the invention uses a CRF layer to carry out label speculation on the characteristics after the BilSTM layer training, but because the labels of an NER task and a CWS task are different, a respective CRF layer is distributed to each task so as to obtain the sequence labels of the respective tasks, however, the dimension of a BilSTM output vector is not equal to the CRF, in order to calculate a loss function when the CRF carries out label speculation, a full connection layer is added to a vector H output by the BilSTM, and the CRF prediction process is expressed as the formula (9) and the formula (10):

0_i＝Ah_i+b (9)

wherein A is a weight, b is a bias term, X is an input sequence, y is a predicted tag sequence, K is a transition probability matrix,

is y_i-1Label transfer y_iThe probability score of a tag is determined by,

is a character x_iIs marked as the y_iThe score of each tag, n is the length of the sentence. Using a negative log-likelihood function for the loss function, the probability of obtaining the true tag sequence is expressed as formula (11):

wherein,

as authentic tag sequence, Y_XFor the set of all the data that is marked,

in order to predict the score of the correct label,

the sum of all tags is scored.

S231): NER CRF module

For H^nerThe loss function L can be obtained by the following equations (9) to (11)_nerExpressed as shown in formula (12):

s232): CWS CRF module

To H^cwsThe loss function L can be obtained by the following equations (9) to (11)_cwsExpressed as shown in formula (13):

the training process is continually tuned to minimize the loss function.

S24): the confrontation training layer:

the countermeasure technology inspired by the GAN network extracts the shared information of the NER and the CWS through countermeasure training, and simultaneously effectively prevents noise caused by private information of the CWS task. The task discriminator identifies which task the features come from through the Maxpooling layer and the Softmax layer, and when the model cannot identify which task the features come from, the shared feature extractor extracts the shared features of the two tasks, so that the task performance of named entity identification is improved. The task discriminator is represented by equations (14) and (15):

S＝Maxpooling(H^shared) (14)

D(s；δ_d)＝Softmax(A₁s+b₁) (15)

wherein H^sharedFor sharing the output of the feature extraction layer, δ_dFor parameters of task discriminators, i.e. including A₁Is a weight, b₁Is the bias term.

In order to prevent the private information of Chinese word segmentation task from enteringInto a shared information space, introducing a penalty function L_advTraining the shared feature extractor to make the task discriminator unable to effectively identify which task the feature comes from, the fighting loss function can be expressed as shown in equation (16):

wherein, delta_sTo share the BilSTM parameter theta_sharedI is the total number of tasks in the shared feature, J is the number of training samples in the shared feature, W_sIn order to share the feature extractor(s),

is the ith sample in the shared feature.

S3): model training

By means of the above-mentioned task loss function L for NER_nerCWS task loss function L_cwsAnd a penalty function L_advThe final loss function L of the model is expressed as shown in equation (17):

L＝GL_NER+(l-G)L_CWS+γL_adv (17)

where γ is the loss weight coefficient and G is the switching function that determines the inputs from the NER and CWS tasks.

In the process of training the model, a training example is extracted from a given task to update parameters, a final loss function is continuously optimized, and iteration is carried out according to the convergence rate of an NER task until the result is optimal.

Compared with the prior art, the invention has the beneficial effects that: according to the audit field named entity recognition method based on the countermeasure training, word vectors are obtained through BERT, shared information of an NER task and a CWS task is extracted through the countermeasure training, noise caused by private information of the CWS task is effectively prevented, filtering of the private information is improved, word boundary information shared by the tasks is fused into the NER task, and accuracy of audit field named entity recognition is improved.

Drawings

FIG. 1 is a model framework diagram of an audit field named entity recognition method based on countermeasure training according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, the present invention provides the following technical solutions:

an audit field named entity recognition method based on countermeasure training comprises the following steps:

first, acquisition of a data set

The invention mainly solves the problem of named entity identification in the audit field, so that the data set in the audit field is used as the main data set of the invention. The CWS and the NER can divide the entity boundary, the CWS has a larger data set, the boundary is divided more finely on a general data set, and the characteristic of the CWS can be used for assisting in completing the NER task. The new generation daily newspaper participle corpus (http:// corpus. njau. edu. cn /) is used as an auxiliary data set due to large data volume and rich content.

1) NER dataset

The audit field data set collects 7323 language materials related to poverty-alleviation policies from a government website by using a web crawler, constructs the language materials by screening sentences with the word number of 10-100, and preprocesses the original data, including deletion of non-text parts, uniform coding, segmentation and the like. According to the following steps: 2: the mode 1 is divided into a training set, a verification set and a test set, and the mode of BIO (B represents the beginning of an entity; I represents the middle of the entity; and O represents not the entity) is adopted to perform linguistic annotation on 4 entity types (name of a person, place, organization and proper noun) manually.

2) CWS dataset

The language material of the word segmentation of the new generation people's daily newspaper is constructed by taking all articles published by the human and social computing research center of Nanjing agriculture university in 9 months of the national daily newspaper, wherein the articles are 2015, the first half year (1-6 months) and 2016, the first month in 2017, the first month in 2018 and the second month in 2018 as objects, the scale of the articles exceeds 2300 characters, and the language material labeling is carried out manually in a BMES mode. The invention uses the language material of 2018 month 1, which is 43647 in total.

Second, construction of model

The model framework proposed by the present invention is shown in fig. 1. The system comprises three tasks longitudinally, wherein the left side is a named entity identification task which comprises an NER BERT Embedding module, an NER Private BilSTM module and an NER CRF module; the right side is a Chinese word segmentation task which comprises a CWS BERT Embedding module, a CWS Private BilSTM module and a CWS CRF module; the middle is an antagonistic training task which comprises a Shared BilSTM module and an antagonistic training module. The three tasks comprise an embedding layer, a sharing-private feature extraction layer and a CRF layer or an antagonistic training layer in a transverse direction, and the structure is introduced according to the three tasks in the transverse direction.

1 embedding layer

The linguistic data is input into an embedded layer, BERT adopts Transformer for coding, a Self-attention mechanism is introduced to predict the dependency relationship among words and capture the information of the internal structure of a sentence, the length of the input sentence is cut off when exceeding n, and meanwhile, 0 is used for complementing when the length of the sentence is less than n. And adding a vector [ CLS ] of the input representation and a vector [ SEP ] of the divided sentence pairs at the beginning of the sentence, and training the sentence to obtain more accurate semantic information (Token). Sentence-level features are then obtained using Segment embedding to determine whether a given sentence is contiguous between the sentences. Since the word order of text is crucial to sentence meaning, BERT encodes each character Position independently, learns the order characteristics of the input sequence, and thus obtains information (Position) of each Position. And finally, adding vectors obtained by Token embedding, Segment embedding and Position embedding to obtain an output sequence of the BERT.

1) NER BERT Embedding module

Using audit domain data sets for NER tasks, will giveThe sentence W ═ W₁，w₂，...，w_n]After being input into the NER BERT Embedding module, the sequence X ═ X of the word vector of each word may be output₁，x₂，...，x_n]Wherein w is_iFor words in sentences, x_iIs w_iThe corresponding word vector, n is the length of the sentence.

2) CWS BERT Embedding module

Using a new generation daily participle corpus for the CWS task, a given sentence W 'is [ W'₁，w′₂，...，w′_m]After entering the CWS BERT Embedding module, the sequence of word vectors for each word x ═ x 'may be output'₁，x′₂，...，x′_m]Wherein, w'_iIs a word in sentence, x'_iIs w'_iCorresponding word vectors, m being the length of the sentence, and n > m being specified.

An input for extracting shared information against the training task.

2 shared-private feature extraction layer

Long Short-Term Memory networks (LSTM) are a variant of Recurrent Neural Networks (RNN) that can efficiently use Long-distance information and solve the problems of gradient dispersion and gradient burst of RNN through gate control structures and Memory cells. The unidirectional LSTM can only obtain the information of the previous moment of the input information at the current moment, and the information of the subsequent moment of the input information at the current moment is also important in the sequence tagging task. In order to fuse the information on two sides of the sequence, the invention adopts bidirectional LSTM (Bi-directional Long Short-Term Memory, BilSTM) to carry out feature extraction.

Given an input sequence to perform feature extraction, the hidden state at the ith time represents the output features as shown in equations (1) to (3):

wherein,

and

indicating a connect operation.

The invention uses a sharing-Private characteristic extraction layer, an NER Private BilTM module extracts the characteristics of the auditing field and is used for an NER task, a CWS Private BilTM module extracts the characteristics of the new generation daily newspaper word segmentation linguistic data and is used for a CWS task, and a Shared information characteristic of the learned word boundary of a Shared BilTM module is used for a confrontation training task.

1) NER Private BilSTM Module

Converting the sequence X into [ X ]₁，x₂，...，x_n]Private feature extraction is carried out by inputting NER Private BilSTM module, and output feature of NER task Private BilSTM can be obtained

Wherein,

representing the per task private feature output at time i. For any sentence in the audit domain dataset, the hidden state of private BilSTM is expressed as (4)Shown in the figure:

2) CWS Private BilSTM module

Sequence X '═ X'₁，x′₂，...，x′_m]Private feature extraction is carried out by inputting a CWS Private BilSTM module, and the output feature of the CWS task Private BilSTM can be obtained

Wherein,

and (4) representing the private characteristics of the CWS task output at the ith moment. For any sentence in the daily newspaper participle corpus of the new generation people, the hidden state of the private BilSTM layer is expressed as shown in formula (5):

3) Shared BilSTM module

Will be sequenced

Wherein,

and (4) representing the shared characteristics of the NER task and the CWS task output at the ith moment. For any sentence in the setThe hidden state of the shared BilSTM layer is expressed as shown in formula (6):

3 CRF layer

The BilSTM can only obtain the information relation between words without considering the mutual relation between continuous labels, so the invention uses the CRF layer to carry out label estimation on the features after the BilSTM layer training, but because the labels of the NER task and the CWS task are different, the CRF layer is respectively distributed to each task, thereby obtaining the sequence label of each task. However, the dimension of the BiLSTM output vector is not equal to the CRF, so that in order to calculate the loss function when performing label estimation on the CRF, a full-connection layer is added to the vector H output by the BiLSTM, and the CRF prediction process is expressed as formula (9) and formula (10):

o_i＝Ah_i+b (9)

is y_i-1Label transfer y_iThe probability score of a tag is determined by,

wherein,

as authentic tag sequence, Y_XFor the set of all the data that is marked,

in order to predict the score of the correct label,

the sum of all tags is scored.

1) NER CRF module

To H^nerThe loss function L can be obtained by the following equations (9) to (11)_nerExpressed as shown in formula (12):

2) CWS CRF module

To H^cwsThe training sample of (1) is trained by the formulas (9) to (11) Available loss function L_cwsExpressed as shown in formula (13):

the training process is continually tuned to minimize the loss function.

4 confrontation training layer

The countermeasure technology inspired by the GAN network (generic adaptive Networks) extracts shared information of the NER and the CWS through countermeasure training, and effectively prevents noise caused by private information of the CWS task. The task discriminator identifies which task the features come from through the Maxpooling layer and the Softmax layer, and when the model cannot identify which task the features come from, the shared feature extractor extracts the shared features of the two tasks, so that the task performance of named entity identification is improved. The task discriminator is represented by equations (14) and (15):

s＝Maxpooling(H^shared) (14)

D(s；δ_d)＝Softmax(A₁s+b₁) (15)

In order to prevent the private information of the Chinese word segmentation task from entering a shared information space, a loss-resisting function L is introduced_advTraining the shared feature extractor to make the task discriminator unable to effectively identify which task the feature comes from, the fighting loss function can be expressed as shown in equation (16):

wherein, delta_sTo share the BilSTM parameter theta_sharedI is the total number of tasks in the shared feature, J is the number of training samples in the shared feature, E_sIn order to share the feature extractor(s),

is the ith sample in the shared feature.

Through training, the loss of task discriminators is constantly minimized to antagonistically encourage shared feature extractors to learn word boundary information shared by tasks. After training is finished, the shared feature extractor and the task discriminator reach balance, so that the task discriminator cannot distinguish which task the features come from.

Model training

By the above-mentioned task loss function L to NER_nerCWS task loss function L_cwsAnd a penalty function L_advThe final loss function L of the model is expressed as shown in equation (17):

L＝GL_NER+(l-G)L_CWS+γL_adv (17)

In the process of training the model, a training example is extracted from a given task to update parameters, a final loss function is continuously optimized, and iteration is carried out according to the convergence rate of the NER task until the result is optimal.

The pseudo code of the present invention is as follows:

fourth, experiment and results

1 Experimental setup

The hyperparameter value of the model is obtained through cross validation in the experiment, the dimensionality of a word vector is 768, the dimensionality of an LSTM hidden state is set to be 120, the loss weight coefficient gamma is set to be 0.05, the initial learning rate is set to be 0.001, Dropout is set to be 0.5, the batch size is set to be 64, the iteration number is set to be 20, and the experiment is optimized by using an Adam algorithm.

2 evaluation index

The experiment uses Precision (Precision, P), Recall (Recall, R) and F1 values to evaluate model performance, and the calculation formula is shown in formulas (18) to (20):

wherein, TP is the number of positive samples determined to be positive, FP is the number of negative samples determined to be positive, and FN is the number of negative samples determined to be positive.

3 results and conclusions of the experiment

TABLE 1 comparison of model results

And (4) conclusion: the comparison of experimental results shows that the method provided by the patent can effectively improve the value of F1 in the language material of the audit field.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. An audit field named entity recognition method based on countermeasure training is characterized by comprising the following steps: the method comprises the following steps:

s1): acquisition of the data set: using an audit field dataset as the NER dataset of the present invention; using a daily word segmentation corpus of new generation people as a CWS data set; the NER task is assisted using the CWS.

S11): NER dataset

Collecting language material related to poverty-alleviation policy from a government website by using a web crawler in an audit field data set, constructing the language material by screening sentences with the word number of 10-100, and preprocessing original data, wherein the preprocessing comprises deleting non-text parts, uniformly coding and segmenting; according to the following steps: 2: the 1 approach is divided into a training set, a validation set, and a test set, and uses manual versus 4 entity types: the name of a person, the name of a place, the name of a mechanism and the proper noun are labeled by adopting a BIO mode.

S12): CWS dataset

The daily word segmentation corpus of the new generation people is obtained through a website http:// corpus.njau.edu.cn of the national and social computing research center of Nanjing agriculture university.

S2): constructing a model: the model framework provided by the invention longitudinally comprises three tasks, wherein the left side is named as an entity identification task and comprises an NER BERT Embedding module, an NER Private BilSTM module and an NER CRF module; the right side is a Chinese word segmentation task which comprises a CWS BERT Embedding module, a CWS Private BilSTM module and a CWS CRF module; the middle part is an antagonistic training task which comprises a Shared BilSTM module and an antagonistic training module; the three tasks comprise an embedding layer, a sharing-private feature extraction layer and a CRF layer or an antagonistic training layer in a transverse direction, and the structure is introduced according to the three tasks in the transverse direction.

S21): embedding layer

Inputting the corpus into an embedded layer, adopting Transformer for coding by BERT, introducing a Self-attention mechanism to predict the dependency relationship among words and capture the information of the internal structure of a sentence, truncating the input sentence with the length exceeding n, and completing the use 0 with the length of the sentence less than n; adding a vector [ CLS ] of input representation and a vector [ SEP ] of a divided sentence pair at the beginning of a sentence, and training the sentence to obtain more accurate semantic information; then, Segment embedding is used for judging whether given sentences are continuous or not to obtain sentence level characteristics; since the word sequence of the text is crucial to the meaning of the sentence, BERT independently encodes each character position, learns the sequence characteristics of the input sequence, and thus obtains the information of each position; and finally, adding vectors obtained by Token embedding, Segment embedding and Position embedding to obtain an output sequence of the BERT.

S211): NER BERT Embedding module

S212): CWS BERT Embedding module

Using a new generation daily participle corpus for the CWS task, a given sentence W 'is [ W'₁，w′₂，...，w′_m]After entering the CWS BERT Embedding module, the sequence of word vectors for each word X 'may be output'₁，x′₂，...，x′_m]Wherein, w'_iIs a word in a sentence, x'_iIs w'_iCorresponding word vectors, m is the length of the sentence, and n is more than m;

An input for extracting shared information against the training task.

S22): shared-private feature extraction layer

Performing feature extraction by adopting bidirectional LSTM; given an input sequence to perform feature extraction, the hidden state at the ith time represents the output features as shown in equations (1) to (3):

wherein,

and

indicating a connect operation.

S221): NER Private BilSTM Module

Converting the sequence X into [ X ]₁，x₂，...，x_n]The Private characteristic extraction is carried out by inputting the NER Private BilSTM module, and the output characteristic of the NER task Private BilSTM can be obtained

Wherein,

representing the NER task private characteristics output at the ith moment; for any sentence in the audit domain dataset, the hidden state of private BilSTM is represented as shown in equation (4):

wherein, theta_npFor concealing NER private BilsTM parametersAnd setting the dimension of the state.

S222): CWS Private BilSTM module

Wherein,

representing the private characteristics of the CWS task output at the ith moment; for any sentence in the daily newspaper participle corpus of the new generation people, the hidden state of the private BilSTM layer is expressed as shown in formula (5):

S223): shared BilSTM module

Will be sequenced

Wherein,

representing the shared characteristics of the NER task and the CWS task output at the ith moment; for any sentence in the set, the hidden state of the shared BilSTM layer is represented as shown in equation (6):

In summary, the Private feature extracted by the NER Private BilSTM module and the Shared feature extracted by the Shared BilSTM module are connected to obtain the total feature H of the NER task^nerAs input to the NER CRF module; the Private features extracted by the CWS Private BilSTM module and the Shared features extracted by the Shared BilSTM module are connected to obtain the total features H of the CWS task^cwsAs input to the CWS CRF module; represented by the formulae (7) and (8):

s23): CRF layer

Performing label speculation on features after the BilSTM layer training by using a CRF layer, and adding a full-connection layer for a vector H output by the BilSTM, wherein the CRF prediction process is expressed as a formula (9) and a formula (10):

o_i＝Ah_i+b (9)

is y_i-1Label transfer y_iThe probability score of a tag is determined by,

is a character x_iIs marked as the y_iThe score of each label, n is the length of the sentence; using a negative log-likelihood function for the loss function, the probability of obtaining the true tag sequence is expressed as formula (11):

wherein,

as authentic tag sequence, Y_XFor the set of all the data that is marked,

in order to predict the score of the correct label,

the sum of all tags is scored.

S231): NER CRF module

s232): CWS CRF module

the training process is continually tuned to minimize the loss function.

S24): the confrontation training layer:

the task discriminator identifies which task the characteristics come from through the Maxpooling layer and the Softmax layer, and when the model cannot identify which task the characteristics come from, the shared characteristic extractor extracts the shared characteristics of the two tasks, so that the task performance of named entity identification is improved; the task discriminator is represented by equations (14) and (15):

s＝Maxpooling(H^shared) (14)

D(s；δ_d)＝Softmax(A₁s+b₁) (15)

wherein H^sharedFor sharing the output of the feature extraction layer, δ_dFor parameters of task discriminators, i.e. including A₁Is a weight, b₁Is a bias term;

wherein, delta_sFor sharing the BilSTM parameter theta_sharedI is the total number of tasks in the shared feature, J is the number of training samples in the shared feature, E_sIn order to share the feature extractor(s),

is the ith sample in the shared feature.

S3): model training

L＝GL_NER+(1-G)L_CWS+γL_adv (17)

wherein γ is a loss weight coefficient, G is a switching function that determines inputs from NER and CWS tasks;