CN111191038A - Neural network training method and device and named entity identification method and device - Google Patents

Neural network training method and device and named entity identification method and device Download PDF

Info

Publication number
CN111191038A
CN111191038A CN201811357670.3A CN201811357670A CN111191038A CN 111191038 A CN111191038 A CN 111191038A CN 201811357670 A CN201811357670 A CN 201811357670A CN 111191038 A CN111191038 A CN 111191038A
Authority
CN
China
Prior art keywords
layer
named entity
output
neural network
hole convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811357670.3A
Other languages
Chinese (zh)
Other versions
CN111191038B (en
Inventor
赵汉光
王珵
戴文渊
陈雨强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
4Paradigm Beijing Technology Co Ltd
Original Assignee
4Paradigm Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 4Paradigm Beijing Technology Co Ltd filed Critical 4Paradigm Beijing Technology Co Ltd
Priority to CN201811357670.3A priority Critical patent/CN111191038B/en
Publication of CN111191038A publication Critical patent/CN111191038A/en
Application granted granted Critical
Publication of CN111191038B publication Critical patent/CN111191038B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

A neural network training method and device and a named entity recognition method and device are provided. A training method of a neural network for named entity recognition, wherein the neural network comprises a pre-trained text conversion layer, a hole convolution layer, a local attention mechanism layer and a classification layer, the training method comprising: acquiring a training text and marking information thereof, wherein the marking information comprises named entity marking; inputting the training text into a text conversion layer to obtain word related information output by the text conversion layer; inputting the word-related information into the void convolutional layer to obtain the output of the void convolutional layer; inputting the output of the cavity convolution layer to the local attention mechanism layer to obtain the output of the local attention mechanism layer; inputting the output of the local attention mechanism layer to a classification layer to obtain named entity information output by the classification layer; and calculating the loss of the neural network based on the named entity information and the corresponding named entity labels, and training the neural network according to the loss of the neural network.

Description

Neural network training method and device and named entity identification method and device
Technical Field
The present invention relates to named entity recognition, and more particularly, to a neural network training method and apparatus for named entity recognition, and a neural network-based named entity recognition method and apparatus.
Background
Named Entity Recognition (NER) is a technique for identifying and categorizing Named entities that appear in text. For example, named entities may include three major classes (entity class, time class, and numeric class), seven minor classes (person name, organization name, place name, time, date, currency, and percentage) named entities. Named entity recognition is a fundamental task in natural language processing and is also a key technology in many applications (e.g., information retrieval, information extraction, and machine translation). Therefore, the research on the automatic identification of the named entities has important theoretical significance and practical value.
As the demand for named entity recognition technology is continuously increasing, the demand for named entity recognition technology is also increasing. However, in the existing named entity recognition technology, since the phenomenon that more Chinese character units are contained in the Chinese named entity may exist, the range of processing the characteristics for named entity recognition is small, and the calculation efficiency is low. Meanwhile, in the existing named entity recognition technology, the feature information for named entity recognition usually has a problem of being not outstanding enough, which results in low accuracy of named entity recognition.
Disclosure of Invention
The invention aims to provide a neural network training method and device for named entity recognition and a named entity recognition method and device based on the neural network.
One aspect of the present invention provides a training method for a neural network for named entity recognition, wherein the neural network includes a pre-trained text conversion layer, a hole convolution layer, a local attention mechanism layer and a classification layer, the training method includes: acquiring a training text and acquiring marking information of the training text, wherein the marking information of the training text comprises named entity marking; inputting the training text into a text conversion layer to obtain word related information output by the text conversion layer; inputting the word-related information into the cavity convolution layer to obtain the output of the cavity convolution layer; inputting the output of the cavity convolution layer to the local attention mechanism layer to obtain the output of the local attention mechanism layer; inputting the output of the local attention mechanism layer to a classification layer to obtain named entity information output by the classification layer; and calculating the loss of the neural network based on the named entity information output by the classification layer and the corresponding named entity labels, and training the neural network according to the loss of the neural network.
Optionally, the hole convolution layer includes a plurality of hole convolution layers which are sequentially connected and have sequentially increased expansion rates and are relatively prime, and the step of inputting the word-related information into the hole convolution layer to obtain the output of the hole convolution layer includes: inputting the word-related information into a first cavity convolution layer of the plurality of cavity convolution layers to respectively obtain outputs of the plurality of cavity convolution layers; and splicing the outputs of the plurality of hole convolution layers together to serve as the output of the hole convolution layer.
Optionally, the number of the plurality of hole convolution layers is 3, and the expansion rates of the plurality of hole convolution layers are 1, 2 and 5 in sequence, or the number of the plurality of hole convolution layers is 4, and the expansion rates of the plurality of hole convolution layers are 1, 2, 5 and 9 in sequence.
Optionally, the hole convolution layer includes a plurality of groups of hole convolution layers, each group of hole convolution layers includes a plurality of hole convolution layers which are sequentially connected and have sequentially increased expansion rates and are relatively prime, the step of inputting the word-related information to the hole convolution layers to obtain the output of the hole convolution layers includes: inputting the word-related information into a first group of cavity convolution layers in the multiple groups of cavity convolution layers to obtain output in the first group of cavity convolution layers; for each of the remaining plurality of cavity convolution layers except the last cavity convolution layer, adding the input of the cavity convolution layer and the output of the cavity convolution layer, and taking the addition result as the input of the next cavity convolution layer of the cavity convolution layers; splicing together the outputs of the plurality of hole convolution layers in the last group of hole convolution layers as the output of the hole convolution layers.
Optionally, the number of the plurality of cavity convolution layers included in each group of cavity convolution layers is 3, and the expansion rates of the plurality of cavity convolution layers included in each group of cavity convolution layers are sequentially 1, 2, and 5, or the number of the plurality of cavity convolution layers included in each group of cavity convolution layers is 4, and the expansion rates of the plurality of cavity convolution layers included in each group of cavity convolution layers are sequentially 1, 2, 5, and 9.
Optionally, the step of inputting the output of the void convolution layer to the local attention mechanism layer to obtain the output of the local attention mechanism layer includes: inputting the output of the hole convolution layer to a local attention mechanism layer to calculate correlations between features of each position in the output of the hole convolution layer and features within a predetermined range thereof; and obtaining final output characteristics of each position as the output of the local attention mechanism layer based on the correlation, the characteristics of each position and the characteristics in the preset range.
Optionally, the correlation is calculated by the following equation:
hi,i′=tanh(Wq·xi+Wx·xi′+bq)
ei,i′=σ(Wa·hi,i′+ba)
where i is the current position of interest, i' is the position of the range of interest d relative to position i,
Figure BDA0001866493190000031
Figure BDA0001866493190000032
is a down-rounding function, xiIs an input feature of position i, xi′Is an input feature of position i', WqAnd WxRespectively for the current person-losing feature xiAnd attention is paid to the person-losing feature xi′Trainable parameters for performing linear transformations, bqIs a partial term, hi,i′Is a relative representation of the position i' with respect to the corresponding feature of position i, WaAnd baFor mixing h withi,i′Parameters for performing linear transformations, ei,i′Is the degree of correlation of position i' with respect to position i, and σ is the activation function using the sigmoid function.
Optionally, the final output characteristics of the respective positions are obtained by the following equations:
ai=softmax(ei)
Figure BDA0001866493190000033
wherein e isiIs the attention vector of all positions i' relative to position i, aiIs the normalized attention vector, v, of all positions i' relative to position iiIs the final output characteristic of position i.
Alternatively, the relative position information is expressed by the following equation
Figure BDA0001866493190000034
Splicing to input features xi′As new input feature x'i′:x′i′=xi′||pi′
Optionally, the sorting layer is a linear chain random field layer.
Optionally, the text conversion layer is an embedded layer; the step of inputting the training text into the text conversion layer to obtain the word related information output by the text conversion layer comprises the following steps: inputting the training text into the pre-trained embedding layer to obtain the word related information output by the embedding layer; or, the text conversion layer comprises an embedding layer and a bidirectional language model; the step of inputting the training text into the text conversion layer to obtain the word related information output by the text conversion layer comprises the following steps: respectively inputting the training texts into a pre-trained embedding layer to obtain word related information output by the embedding layer, inputting the word related information output by the embedding layer into a bidirectional language model to obtain the word related information output by the bidirectional language model, and splicing the word related information output by the embedding layer and the word related information output by the bidirectional language model together to be used as the word related information output by a text conversion layer.
Optionally, the training method further comprises: coding the marking information of the training text; and decoding the named entity information output by the classification layer.
Optionally, the step of encoding the labeling information of the training text includes: carrying out BIO coding on the marking information of the training text, wherein the step of decoding the named entity information output by the classification layer comprises the following steps: and carrying out BIO decoding on the named entity information output by the classification layer, wherein B represents a first word of one named entity, I represents the rest words of the named entity, and O represents the words of non-named entities.
One aspect of the present invention provides a named entity recognition method based on a neural network, wherein the neural network is trained by any one of the training methods described above, the neural network includes a pre-trained text conversion layer, a hole convolution layer, a local attention mechanism layer and a classification layer, and the named entity recognition method includes: acquiring a predictive text to be identified; and inputting the predicted text into the neural network to obtain the named entity information output by the neural network.
One aspect of the present invention provides a training apparatus for a neural network for named entity recognition, wherein the neural network includes a pre-trained text conversion layer, a hole convolution layer, a local attention mechanism layer, and a classification layer, the training apparatus comprising: the system comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is configured to acquire a training text and acquire labeling information of the training text, and the labeling information of the training text comprises named entity labels; a named entity information generating unit configured to: inputting the training text into a text conversion layer to obtain word related information output by the text conversion layer; inputting the word-related information into the cavity convolution layer to obtain the output of the cavity convolution layer; inputting the output of the cavity convolution layer to the local attention mechanism layer to obtain the output of the local attention mechanism layer; inputting the output of the local attention mechanism layer to a classification layer to obtain named entity information output by the classification layer; and the training unit is configured to calculate the loss of the neural network based on the named entity information output by the classification layer and the corresponding named entity labels, and train the neural network according to the loss of the neural network.
Optionally, the hole convolution layer includes a plurality of hole convolution layers which are sequentially connected and have sequentially increasing expansion rates and are relatively prime, and the named entity information generating unit is configured to: inputting the word-related information into a first cavity convolution layer of the plurality of cavity convolution layers to respectively obtain outputs of the plurality of cavity convolution layers; and splicing the outputs of the plurality of hole convolution layers together to serve as the output of the hole convolution layer.
Optionally, the number of the plurality of hole convolution layers is 3, and the expansion rates of the plurality of hole convolution layers are 1, 2 and 5 in sequence, or the number of the plurality of hole convolution layers is 4, and the expansion rates of the plurality of hole convolution layers are 1, 2, 5 and 9 in sequence.
Optionally, the cavity convolution layers include a plurality of groups of cavity convolution layers, each group of cavity convolution layers includes a plurality of cavity convolution layers which are sequentially connected and have sequentially increased expansion rates and are relatively prime, and the named entity information generating unit is configured to: inputting the word-related information into a first group of cavity convolution layers in the multiple groups of cavity convolution layers to obtain output in the first group of cavity convolution layers; for each of the remaining plurality of cavity convolution layers except the last cavity convolution layer, adding the input of the cavity convolution layer and the output of the cavity convolution layer, and taking the addition result as the input of the next cavity convolution layer of the cavity convolution layers; splicing together the outputs of the plurality of hole convolution layers in the last group of hole convolution layers as the output of the hole convolution layers.
Optionally, the number of the plurality of cavity convolution layers included in each group of cavity convolution layers is 3, and the expansion rates of the plurality of cavity convolution layers included in each group of cavity convolution layers are sequentially 1, 2, and 5, or the number of the plurality of cavity convolution layers included in each group of cavity convolution layers is 4, and the expansion rates of the plurality of cavity convolution layers included in each group of cavity convolution layers are sequentially 1, 2, 5, and 9.
Optionally, the named entity information generating unit is configured to: inputting the output of the hole convolution layer to a local attention mechanism layer to calculate correlations between features of each position in the output of the hole convolution layer and features within a predetermined range thereof; and obtaining final output characteristics of each position as the output of the local attention mechanism layer based on the correlation, the characteristics of each position and the characteristics in the preset range.
Optionally, the named entity information generating unit is configured to calculate the correlation by the following equation:
hi,i′=tanh(Wq·xi+Wx·xi′+bq)
ei,i′=σ(Wa·hi,i′+ba)
where i is the current position of interest, i' is the position of the range of interest d relative to position i,
Figure BDA0001866493190000051
Figure BDA0001866493190000052
is a down-rounding function, xiIs an input feature of position i, xi′Is an input feature of position i', WqAnd WxRespectively trainable parameters for linear transformation of current input features and input features of interest, bqIs a partial term, hi,i′Is a relative representation of the position i' with respect to the corresponding feature of position i, WaAnd baFor mixing h withi,i′Parameters for performing linear transformations, ei,i′Is the degree of correlation of position i' with respect to position i, and σ is the activation function using the sigmoid function.
Optionally, the named entity information generating unit is configured to obtain the final output characteristics of the respective locations by the following equation:
ai=softmax(ei)
Figure BDA0001866493190000053
wherein e isiIs the attention vector of all positions i' relative to position i, aiIs the normalized attention vector, v, of all positions i' relative to position iiIs the final output characteristic of position i.
Optionally, the named entity information generating unit is configured to compare the relative position information by the following equation
Figure BDA0001866493190000054
Splicing to input features xi′As new input feature x'i′:x′i′=xi′||pi′
Optionally, the sorting layer is a linear chain random field layer.
Optionally, the text conversion layer is an embedded layer; the named entity information generating unit is configured to: inputting the training text into the pre-trained embedding layer to obtain the word related information output by the embedding layer; or, the text conversion layer comprises an embedding layer and a bidirectional language model; the named entity information generating unit is configured to: respectively inputting the training texts into a pre-trained embedding layer to obtain word related information output by the embedding layer, inputting the word related information output by the embedding layer into a bidirectional language model to obtain the word related information output by the bidirectional language model, and splicing the word related information output by the embedding layer and the word related information output by the bidirectional language model together to be used as the word related information output by a text conversion layer.
Optionally, the training device further comprises: the encoding unit is configured to encode the labeling information of the training text; a decoding unit configured to decode the named entity information output by the classification layer.
Optionally, the encoding unit is configured to: performing BIO encoding on the labeling information of the training text, wherein the decoding unit is configured to: and carrying out BIO decoding on the named entity information output by the classification layer, wherein B represents a first word of one named entity, I represents the rest words of the named entity, and O represents the words of non-named entities.
An aspect of the present invention provides a named entity recognition apparatus based on a neural network, wherein the neural network is trained by the training method as described above, the neural network includes a pre-trained text conversion layer, a hole convolution layer, a local attention mechanism layer, and a classification layer, and the named entity recognition apparatus includes: an acquisition unit configured to acquire a predicted text to be recognized; and the named entity information generating unit is configured to input the predicted text into the neural network to obtain the named entity information output by the neural network.
An aspect of the invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by one or more computing devices, causes the one or more computing devices to carry out any of the methods described above.
An aspect of the invention provides a system comprising one or more computing devices and one or more storage devices having a computer program recorded thereon, which when executed by the one or more computing devices, causes the one or more computing devices to carry out any of the methods as described above.
According to the technical scheme for carrying out named entity identification by utilizing the cavity convolution layer and the local attention mechanism layer, on one hand, the processing range of the characteristics is enlarged by utilizing the cavity convolution layer, so that the calculation efficiency is improved; on the other hand, the problem of unobtrusive characteristic information is reduced by utilizing the local attention mechanism layer, so that the accuracy of named entity identification is increased.
Additional aspects and/or advantages of the present general inventive concept will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the general inventive concept.
Drawings
The above and other objects and features of the present invention will become more apparent from the following description taken in conjunction with the accompanying drawings which illustrate, by way of example, an example in which:
FIG. 1 is a flow diagram illustrating a method of training a neural network for named entity recognition, according to an embodiment of the present invention;
FIG. 2 shows a schematic diagram of a neural network including hole convolution layers with expansion rates of 1, 2, and 5 in order, according to an embodiment of the present invention;
FIG. 3 shows a schematic diagram of a neural network comprising two sets of hole convolution layers with expansion rates of 1, 2 and 5 in order, according to an embodiment of the present invention;
FIG. 4 is a schematic diagram illustrating coverage of a hole convolution layer using a set of successively increasing and relatively prime numbers as the expansion ratio of a set of hole convolution layers in accordance with an embodiment of the present invention;
FIG. 5 shows a schematic diagram of a text conversion layer according to an embodiment of the invention;
FIG. 6 shows a schematic diagram of the text conversion layer of FIG. 5 during training;
FIG. 7 illustrates a schematic diagram of the text conversion layer of FIG. 6 after training is complete;
FIG. 8 illustrates a neural network-based named entity identification methodology, according to an embodiment of the present invention;
FIG. 9 illustrates a training apparatus for a neural network for named entity recognition, in accordance with an embodiment of the present invention;
fig. 10 illustrates a neural network-based named entity recognition apparatus according to an embodiment of the present invention.
Detailed Description
The following description is provided with reference to the accompanying drawings to assist in a comprehensive understanding of exemplary embodiments of the invention as defined by the claims and their equivalents. The description includes various specific details to aid understanding, but these details are to be regarded as illustrative only. Thus, one of ordinary skill in the art will recognize that: various changes and modifications may be made to the embodiments described herein without departing from the scope and spirit of the present invention. Moreover, descriptions of well-known functions and constructions may be omitted for clarity and conciseness.
In the present invention, the neural network used for named entity recognition may include a pre-trained text conversion layer, a hole convolution layer, a local attention mechanism layer, and a classification layer. This will be described in more detail below with reference to the accompanying drawings.
FIG. 1 is a flow diagram illustrating a method of training a neural network for named entity recognition, according to an embodiment of the present invention.
Referring to fig. 1, a training method of a neural network for named entity recognition according to an embodiment of the present invention includes steps S110 to S160.
In step S110, a training text is obtained, and labeling information of the training text is obtained, where the labeling information of the training text includes a named entity label.
Here, the named entity labels of the training text indicate whether the training text belongs to a named entity or to which named entity the training text belongs. As a non-limiting example, the named entity label of the training text "beijing city" may indicate that the training text "beijing city" belongs to a place name named entity, the named entity label of the training text "wangming" may indicate that the training text "wangming" belongs to a person name named entity, and the named entity label of the training text "beautiful" may indicate that the training text "beautiful" does not belong to a named entity.
In step S120, the training text is input to the text conversion layer, and the word-related information output by the text conversion layer is obtained.
That is, the text conversion layer of the present invention is configured to convert training text into word-related information. Here, the word-related information may be regarded as information having a mapping relation with the training text. In this case, the text conversion layer of the present invention may have various structures for converting training text into word-related information.
In one embodiment, the text conversion layer is an embedded layer, in which case, the inputting of the training text into the text conversion layer to obtain the word related information output by the text conversion layer includes: and inputting the training text into the pre-trained embedding layer to obtain the word related information output by the embedding layer.
In another embodiment, the text conversion layer includes an embedding layer and a bi-directional language model. In this case, the inputting the training text into the text conversion layer to obtain the word related information output by the text conversion layer includes: respectively inputting the training texts into a pre-trained embedding layer to obtain word related information output by the embedding layer, inputting the word related information output by the embedding layer into a bidirectional language model to obtain the word related information output by the bidirectional language model, and splicing the word related information output by the embedding layer and the word related information output by the bidirectional language model together to be used as the word related information output by a text conversion layer. This embodiment will be described in more detail later in connection with fig. 4.
Although some embodiments of text conversion layers are shown above, the invention is not so limited and any other network layer that can implement the functionality of the text conversion layers of the invention is also possible.
In step S130, the word-related information is input to the hole convolution layer, and an output of the hole convolution layer is obtained.
Here, the hole convolution is a convolution operation for increasing the pitch of the input used for calculation without changing the original convolution kernel size, and the pitch is expressed by the expansion ratio and corresponds to a normal convolution operation when the expansion ratio is 1. Taking a convolution kernel of size 3 as an example, assuming that the input is x and the weight of one kernel is W, the output of the kernel corresponding to position i is W · [ x ] when the expansion rate is di-d;xi;xi+d]The sum of (1). In other words, the hole convolution provides a larger field of view under the same calculation conditions, or in other words, the hole convolution can reduce the amount of calculation and improve the operation efficiency when the same field of view is provided.
In one embodiment, the hole convolution layer includes a plurality of hole convolution layers connected in series with sequentially increasing expansion rates and being relatively prime. In this case, the step of inputting the word-related information to the hole convolution layer to obtain an output of the hole convolution layer includes: inputting the word-related information into a first cavity convolution layer of the plurality of cavity convolution layers to respectively obtain outputs of the plurality of cavity convolution layers; the outputs of the plurality of hole convolution layers are spliced together as the output of the hole convolution layer.
In a conventional hole convolution layer, a power of 2 (e.g., 1, 2, 4, 8, 16, etc.) is generally adopted as the expansion ratio of a group of hole convolution layers. However, in the present embodiment, by using a set of sequentially increasing and coprime numbers as the expansion ratio of a set of hole convolution layers, a larger range is covered using a smaller number of layers, thereby further reducing the amount of calculation compared to the existing hole convolution layers. In one example, the number of the hole convolution layers is 3, and the expansion rates of the plurality of hole convolution layers are 1, 2 and 5 in sequence, or the number of the plurality of hole convolution layers is 4, and the expansion rates of the plurality of hole convolution layers are 1, 2, 5 and 9 in sequence, which is set to ensure that a continuous characteristic region is covered, and will be described later with reference to fig. 4. However, the invention is not limited thereto, and other combinations of prime expansion ratios are possible. For ease of understanding, the neural network including the hole convolution layers having expansion rates of 1, 2, and 5 in this order will be described in more detail later with reference to fig. 2.
In another embodiment, the hole convolution layers include a plurality of hole convolution layers, each of the hole convolution layers includes a plurality of hole convolution layers which are sequentially connected and have sequentially increased expansion rates and are relatively prime, and the step of inputting the word-related information into the hole convolution layers to obtain the output of the hole convolution layers includes: inputting the word-related information into a first group of void convolution layers of the plurality of groups of void convolution layers to obtain an output of the first group of void convolution layers; for each of the remaining plurality of cavity convolution layers except the last cavity convolution layer, adding the input of the cavity convolution layer and the output of the cavity convolution layer, and taking the addition result as the input of the next cavity convolution layer of the cavity convolution layers; and splicing the outputs of the plurality of void convolution layers in the last group of void convolution layers together to serve as the output of the void convolution layers.
In a conventional hole convolution layer, a power of 2 (e.g., 1, 2, 4, 8, 16, etc.) is generally adopted as the expansion ratio of a group of hole convolution layers. However, in the present embodiment, by using a plurality of sets of successively increasing and coprime numbers as the expansion ratios of the plurality of sets of hole convolution layers, a larger range is covered with a smaller number of layers, thereby further reducing the amount of calculation compared to the conventional hole convolution layers. Here, it is necessary to ensure that the number of kernels of the convolution of the last layer of holes in each group is the same as the dimension of the input feature, that is, the output dimension of the last layer in each group is the same as the input dimension of the first layer, so that the input and output features can be added to ensure the transmissibility of the gradient. Meanwhile, the convolution of a plurality of groups of holes further expands the range of context processing, and simultaneously, the identification capability is enhanced due to the nonlinear accumulation. In one example, the number of the plurality of hole convolution layers included in each group of hole convolution layers is 3, and the expansion rates of the plurality of hole convolution layers included in each group of hole convolution layers are 1, 2 and 5 in sequence, or the number of the plurality of hole convolution layers included in each group of hole convolution layers is 4, and the expansion rates of the plurality of hole convolution layers included in each group of hole convolution layers are 1, 2, 5 and 9 in sequence. However, the invention is not limited thereto, and other combinations of prime expansion ratios are possible. For ease of understanding, the sets of void convolution layers having expansion ratios of 1, 2, and 5 in this order will be described in more detail below with reference to fig. 3.
In step S140, the output of the void convolution layer is input to the local attention suppression layer, and the output of the local attention suppression layer is obtained.
Here, the local attention mechanism layer may be used to calculate the correlation of different positions of the sequence features, and the features with strong correlation are used for calculation to ensure high accuracy of named entity identification. In one embodiment, the step of inputting the output of the void convolution layer to the local attention mechanism layer to obtain the output of the local attention mechanism layer includes: inputting the output of the hole convolution layer to a local attention mechanism layer to calculate correlations between features of each position in the output of the hole convolution layer and features within a predetermined range thereof; and obtaining final output characteristics of each position as the output of the local attention mechanism layer based on the correlation, the characteristics of each position and the characteristics in the preset range.
As an example, the correlation may be calculated by the following equation:
hi,i′=tanh(Wq·xi+Wx·xi′+bq)
ei,i′=σ(Wa·hi,i′+ba)
where i is the current position of interest, i' is the position of the range of interest d relative to position i,
Figure BDA0001866493190000101
Figure BDA0001866493190000102
is a down-rounding function, xiIs an input feature of position i, xi′Is an input feature of position i', WqAnd WxAre respectively used for inputting the feature xiAnd attention input feature xi′Trainable parameters for performing linear transformations, bqIs a partial term, hi,i′Is a relative representation of the position i' with respect to the corresponding feature of position i, WaAnd baFor mixing h withi,i′Parameters for performing linear transformations, ei,i′Is the degree of correlation of location i' with respect to location i, σ is the activation function using the sigmoid function (i.e.,
Figure BDA0001866493190000103
). In one example, the parameters in the above formula may be learned or derived by the gradient back propagation of the loss of the final layer of the neural network.
As an example, the final output characteristics of the respective positions are obtained by the following equations:
ai=softmax(ei)
Figure BDA0001866493190000104
wherein e isiIs the attention vector of all positions i' relative to position i, aiIs the normalized attention vector, v, of all positions i' relative to position iiIs the final output characteristic of position i.
Alternatively, the relative position information is expressed by the following equation
Figure BDA0001866493190000111
Splicing to input features xi′As new input feature x'i′:x′i′=xi′||pi′. Here, let x'i′Has a dimension of (string length, original feature dimension), pi′Is (string length, 1), then the concatenated feature x'i′The dimension of (string length, original feature dimension + 1).
In step S150, the output of the local attention mechanism layer is input to the classification layer, and the named entity information output by the classification layer is obtained.
In one embodiment, the sorting layer is a linear chain element random field layer. However, the present invention is not limited thereto, and the classification layer of the present invention may be other classification layers (for example, a classification layer composed of a fully-connected layer and a normalization layer).
In step S160, based on the named entity information and the corresponding named entity label output by the classification layer, the loss of the neural network is calculated, and the neural network is trained according to the loss of the neural network.
In addition, optionally, the training method in fig. 1 may further include: coding the marking information of the training text; and decoding the named entity information output by the classification layer. In one example, the step of encoding the annotation information of the training text comprises: carrying out BIO coding on the marking information of the training text, wherein the step of decoding the named entity information output by the classification layer comprises the following steps: and carrying out BIO decoding on the named entity information output by the classification layer, wherein B represents a first word of one named entity, I represents the rest words of the named entity, and O represents the words of non-named entities. As a non-limiting example, the text "clinton president also signals mr. costmann, which is a welcome to the international humanistic work undertaken by VOAs. "can be coded by BIO as: "B-PER I-PER I-PER O O O O B-ORG I-ORG I-ORG O O O O O O O O O O O O O O O O O", wherein PER represents the name of a person and ORG represents the name of an organization. Optionally, the BIO decoding may be implemented by: enumerating each position in turn, if the current label belongs to class B (such as B-PER), then taking the current label as a starting position, and continuing to enumerate the position downwards until the label is not a class I label of a different type (for example, corresponding I-PER) as a terminating position, then the entity from the starting position to the terminating position is an entity of a certain type.
Optionally, the neural network for named entity recognition of the present invention may also be optimized. Optimization Methods include, but are not limited to, Stochastic Gradient Descent (SGD), Adaptive Gradient Methods (AdaGrad), Adaptive learning rate Methods (AdaDelta), Adaptive moment estimation (Adam), and the like.
FIG. 2 shows a schematic diagram of a neural network including hole convolution layers having expansion rates of 1, 2, and 5 in this order, according to an embodiment of the present invention.
Although fig. 2 shows three void convolution layers having expansion ratios of 1, 2, and 5 in this order, this is for illustrative purposes only, and the present invention is not limited thereto. As described above with reference to fig. 1, the number of the void convolution layers may not be limited to 3, and the expansion ratio thereof may not be limited to 1, 3, and 5.
With reference to fig. 1 and 2, the text conversion layer may convert the received text into word-related information and output the word-related information to the hole convolution layer having an expansion rate of 1. The hole convolution layer with the expansion rate of 1 performs hole convolution operation on the received word related information to obtain a first hole convolution operation result, and outputs the first hole convolution operation result to the hole convolution layer with the expansion rate of 2. And the hole convolution layer with the expansion rate of 2 performs hole convolution operation on the received first hole convolution operation result to obtain a second hole convolution operation result, and outputs the second hole convolution operation result to the hole convolution layer with the expansion rate of 5. And the hole convolution layer with the expansion rate of 5 performs hole convolution operation on the received second hole convolution operation result to obtain a third hole convolution operation result. The local attention mechanism layer can receive the splicing result of the first cavity convolution operation result, the second cavity convolution operation result and the third cavity convolution operation result, operate the operation results and output the operation results to the classification layer. Here, the stitching may indicate that the first hole convolution operation result, the second hole convolution operation result, and the third hole convolution operation result are stitched by position, and the dimension is doubled by 3, so as to simultaneously utilize information of different granularity ranges. As a non-limiting example, when the first hole convolution operation result is [ 100 ], the second hole convolution operation result is [ 010 ], and the third hole convolution operation result is [ 001 ], the concatenation result of the first hole convolution operation result, the second hole convolution operation result, and the third hole convolution operation result may be [ 100010001 ]. The classification layer outputs named entity information by calculating the operation result of the local attention mechanism layer.
FIG. 3 shows a schematic diagram of a neural network including two sets of hole convolution layers with expansion rates of 1, 2, and 5 in order, according to an embodiment of the present invention.
Although fig. 3 shows that the neural network includes two sets of three hole convolution layers having expansion rates of 1, 2, and 5 in this order, this is for illustrative purposes only, and the present invention is not limited thereto. As described above with reference to fig. 1, the number of groups of the hole convolution layers may not be limited to two groups, the number of hole convolution layers in each group may not be limited to 3, and the expansion ratio may not be limited to 1, 3, and 5. In the present invention, multiple sets of hole convolution layers can be connected using a ResNet-like structure.
With reference to fig. 1 and 3, the text conversion layer may convert the received text into word-related information and output the word-related information to the hole convolution layer having an expansion rate of 1 in the first group of hole convolution layers. The hole convolution layer with the expansion rate of 1 in the first group of hole convolution layers performs hole convolution operation on the received word related information to obtain a first hole convolution operation result, and the first hole convolution operation result is output to the hole convolution layer with the expansion rate of 2 in the first group of hole convolution layers. And the hole convolutional layer with the expansion rate of 2 performs hole convolutional operation on the received first hole convolutional operation result to obtain a second hole convolutional operation result, and outputs the second hole convolutional operation result in the first group of hole convolutional layers to the hole convolutional layer with the expansion rate of 5 in the first group of hole convolutional layers. And the hole convolution layer with the expansion rate of 5 performs hole convolution operation on the received second hole convolution operation result to obtain a third hole convolution operation result. And adding the output third hole convolution operation result with the word related information output by the text conversion layer, wherein the addition result can be used as the input of the first hole convolution layer with the expansion rate of 1 in the second group of hole convolution layers. Here, it is necessary to ensure that the output dimension of the last layer of the hole convolution layer in each group of hole convolution layers is the same as the input dimension of the first layer of the hole convolution layer, so that the input and output characteristics can be added, and the transmissibility of the gradient is ensured. As a non-limiting example, when the input of the first hole convolution layer with the expansion ratio of 1 of the first group of hole convolution layers is [ 1001 ] and the input of the third hole convolution layer with the expansion ratio of 5 of the first group of hole convolution layers is [ 0001 ], the addition result of the two is [ 1002 ]. The classification layer outputs named entity information by calculating the operation result of the local attention mechanism layer.
In FIG. 3, the use of multiple sets of hole convolution layers further expands the scope of context processing, while enhancing recognition capabilities due to the non-linear accumulation.
FIG. 4 shows a schematic diagram of the coverage of a hole convolution layer using a set of successively increasing and coprime numbers as the expansion ratio of a set of hole convolution layers, according to an embodiment of the present invention.
Referring to fig. 4, one circle may indicate word related information of one word. Here, a row of circles may indicate one hole convolution layer. In the embodiment of FIG. 4, the set of hole convolution expansion rates is [1, 2, 5], and when the kernel size is 3, the contexts of 3, 7, 15 can be captured for each position in turn. It can be seen that each layer uses information from a bottom contiguous region, and that the top layer has an expansion ratio of 5 and uses all information from the region of length 15. It should be understood that fig. 4 omits a number of lines and partial circles for ease of illustration, and the present invention is not limited to the lines and circles shown in fig. 4.
By using a set of successively larger and coprime void convolution layers as the expansion ratio of a set of void convolution layers, the processing or coverage of the features is increased, thereby improving computational efficiency.
FIG. 5 shows a schematic diagram of a text conversion layer according to an embodiment of the invention.
Referring to FIG. 5, a text conversion layer may include an embedding layer and a bi-directional language model. The embedding layer may receive text and convert the text to first word related information. The bi-directional language model may receive the first word-related information output by the embedding layer and output the second word-related information. The first word-related information and the second word-related information may be concatenated together to represent the word characteristics (i.e., the final word-related information ("output" in fig. 4)) for which the current position has context information. As a non-limiting example, the dimension of the second word of related information may be twice that of the first word of related information. For example, when the first word-related information is [ 001 ], the second word-related information may be [ 100010 ], in which case the final word-related information may be [ 001100010 ]. Here, the present invention can pre-train the embedding layer and the bi-directional language model using the crawled unlabeled corpus or text. The pre-trained embedding layer and bi-directional language model may be understood as parameters of the embedding layer and bi-directional language model that have been initialized. Further, note that in the present invention, a bi-directional language model is an optional feature that may sacrifice some efficiency to provide context-dependent features for greater accuracy.
In one example, the embedding layer may use Skip-Gram to obtain the first word-related information. However, the present invention is not limited thereto, and Continuous bags of Words (CBoW), global vectors (glovevectors, gloves), and open source library fastText, etc. may be used to acquire the first word related information. For example, the first word related information may be a word vector.
FIG. 6 shows a schematic diagram of the text conversion layer of FIG. 5 during training.
Referring to fig. 6, the embedding layer 610 is an embedding layer that has been pre-trained and converts an input text "named entity recognition" into first word-related information to input the first word-related information to a recurrent neural network (i.e., RNN network) in a forward direction and a reverse direction, respectively. In fig. 6, one circle represents word-related information of one word, and EOS represents the end of one sentence or text. Although it is illustrated in fig. 6 that the forward network and the reverse network are RNN networks, the present invention is not limited thereto. The forward network and the reverse network may further include, but are not limited to, one or more layers of Long-Short Term Memory (LSTM), Gate Round Unit (GRU), bidirectional Long-Short Term Memory (Bi-LSTM), bidirectional gate round Unit (Bi-GRU), and the like.
Here, a combination of the forward RNN network, the reverse RNN network, and the network layer softmax, which is a full connectivity layer using softmax as an activation function, may be regarded as a bi-directional language model. In the present invention, the network layer RNN1 through RNNn and the corresponding network layer softmax of the forward RNN network of the bidirectional language model may be used to forward predict the probability distribution of the next word when a word is given, and the network layer RNN1 through RNNn and the corresponding network layer softmax of the reverse RNN network of the bidirectional language model may be used to reverse predict the probability distribution of the previous word when a word is given. Specifically, in fig. 5, assuming that the first five words are known to be a combination of "named entity identities", the forward RNN network may be trained to output word-related information indicating that the next word "is" with the highest probability of corresponding; assuming that the next five words are known to be a combination of "entity recognition", the inverse RNN network may be trained to output word-related information indicating the highest probability that the last word "hits". In one example, if the position of the text "know" corresponds to a prediction of the text "not" if the RNN is expanded, it does not participate in the calculation even with "EOS". Existing language prediction models are generally forward prediction models, i.e. only the previous context is considered. However, in the embodiment of fig. 5 of the present invention, the bi-directional language model not only considers the previous context, but also considers the subsequent context, and is calculated based on the word-related information output by the embedding layer, so that the word-related information output by the text conversion layer is more accurate. In addition, in fig. 5, a text conversion layer (or a bi-directional language model) may be trained using a cross entropy loss function (cross entropy), however, the present invention is not limited thereto, and other existing loss functions may be used to train the text conversion layer.
FIG. 7 shows a schematic diagram of the text conversion layer of FIG. 6 after training is complete.
Compared with the trained text conversion layer in fig. 6, the text conversion layer in fig. 7 removes the last and last full connection layer, retains the structures and weights of the rest layers, and concatenates the hidden states of the final forward and backward RNNs according to time (or position) to serve as the bidirectional language model of the current position. The final output of the Bi-directional language model may be referred to as the Bi-LM feature.
FIG. 8 illustrates a neural network-based named entity identification method, according to an embodiment of the invention.
Here, the neural network in fig. 8 may be a neural network trained by any one of the training methods described with reference to fig. 1.
Referring to fig. 8, in step S810, a predicted text to be recognized is acquired; in step S820, the predicted text is input to the neural network, and the named entity information output by the neural network is obtained.
FIG. 9 illustrates a training apparatus for a neural network for named entity recognition, according to an embodiment of the present invention.
Here, the training apparatus of fig. 9 may be an apparatus configured to perform any of the training methods described with reference to fig. 1.
Referring to fig. 9, a training apparatus 900 of a neural network for named entity recognition includes an acquisition unit 910, a named entity information generation unit 920, and a training unit 930. The acquisition unit 910, the named entity information generation unit 920, and the training unit 930 will be described in more detail below.
In the present invention, the obtaining unit 910 may be configured to obtain a training text and obtain labeling information of the training text, where the labeling information of the training text includes a named entity label.
Here, the named entity labels of the training text indicate whether the training text belongs to a named entity or to which named entity the training text belongs. As a non-limiting example, the named entity label of the training text "beijing city" may indicate that the training text "beijing city" belongs to a place name named entity, the named entity label of the training text "wangming" may indicate that the training text "wangming" belongs to a person name named entity, and the named entity label of the training text "beautiful" may indicate that the training text "beautiful" does not belong to a named entity.
In the present invention, the named entity information generating unit 920 may be configured to: and inputting the training text into a text conversion layer to obtain the word related information output by the text conversion layer.
That is, the named entity information generating unit 920 may be configured to convert the training text into word related information by using a text conversion layer. Here, the word-related information may be regarded as information having a mapping relation with the training text. In this case, the text conversion layer of the present invention may have various structures for converting training text into word-related information.
In one embodiment, the text conversion layer is an embedded layer, in which case the named entity information generating unit 920 may be configured to: and inputting the training text into the pre-trained embedding layer to obtain the word related information output by the embedding layer.
In another embodiment, the text conversion layer includes an embedding layer and a bi-directional language model. In this case, the named entity information generating unit 920 may be configured to: respectively inputting the training texts into a pre-trained embedding layer to obtain word related information output by the embedding layer, inputting the word related information output by the embedding layer into a bidirectional language model to obtain the word related information output by the bidirectional language model, and splicing the word related information output by the embedding layer and the word related information output by the bidirectional language model together to be used as the word related information output by a text conversion layer. Further, the description related to the text conversion layer with reference to fig. 4 is also applicable here.
Although some embodiments of text conversion layers are shown above, the invention is not so limited and any other network layer that can implement the functionality of the text conversion layers of the invention is also possible.
In the present invention, the named entity information generating unit 920 may be further configured to: inputting the word-related information into the void convolutional layer to obtain the output of the void convolutional layer.
Here, the hole convolution is a convolution operation for increasing the pitch of the input used for calculation without changing the original convolution kernel size, and the pitch is expressed by the expansion ratio and corresponds to a normal convolution operation when the expansion ratio is 1. Taking a convolution kernel of size 3 as an example, assuming that the input is x and the weight of one kernel is W, the output of the kernel corresponding to position i is W · [ x ] when the expansion rate is di-d;xi;xi+d]The sum of (1). In other words, the hole convolution provides a larger field of view under the same calculation conditions, or in other words, the hole convolution can reduce the amount of calculation and improve the operation efficiency when the same field of view is provided.
In one embodiment, the hole convolution layer includes a plurality of hole convolution layers connected in series with sequentially increasing expansion rates and being relatively prime. At this time, the named entity information generating unit 920 may be configured to: inputting the word-related information into a first cavity convolution layer of the plurality of cavity convolution layers to respectively obtain outputs of the plurality of cavity convolution layers; the outputs of the plurality of hole convolution layers are spliced together as the output of the hole convolution layer.
In a conventional hole convolution layer, a power of 2 (e.g., 1, 2, 4, 8, 16, etc.) is generally adopted as the expansion ratio of a group of hole convolution layers. However, in the present embodiment, by using a set of sequentially increasing and coprime numbers as the expansion ratio of a set of hole convolution layers, a larger range is covered using a smaller number of layers, thereby further reducing the amount of calculation compared to the existing hole convolution layers. In one example, the number of the hole convolution layers is 3, and the expansion rates of the plurality of hole convolution layers are sequentially 1, 2, and 5, or the number of the plurality of hole convolution layers is 4, and the expansion rates of the plurality of hole convolution layers are sequentially 1, 2, 5, and 9, which can ensure coverage of a continuous feature region, and the description made in conjunction with fig. 2 is also applicable thereto. However, the invention is not limited thereto, and other combinations of prime expansion ratios are possible.
In another embodiment, the hole convolution layers include a plurality of hole convolution layers, each of which includes a plurality of hole convolution layers with sequentially increasing expansion rates and being relatively prime, in which the named entity information generating unit 920 may be configured to: inputting the word-related information into a first group of void convolution layers of the plurality of groups of void convolution layers to obtain an output of the first group of void convolution layers; for each of the remaining plurality of cavity convolution layers except the last cavity convolution layer, adding the input of the cavity convolution layer and the output of the cavity convolution layer, and taking the addition result as the input of the next cavity convolution layer of the cavity convolution layers; and splicing the outputs of the plurality of hole convolution layers in the last group of hole convolution layers together to serve as the output of the hole convolution layers.
In a conventional hole convolution layer, a power of 2 (e.g., 1, 2, 4, 8, 16, etc.) is generally adopted as the expansion ratio of a group of hole convolution layers. However, in the present embodiment, by using a plurality of sets of successively increasing and coprime numbers as the expansion ratios of the plurality of sets of hole convolution layers, a larger range is covered with a smaller number of layers, thereby further reducing the amount of calculation compared to the conventional hole convolution layers. Here, it is necessary to ensure that the number of kernels of the convolution of the last layer of holes in each group is the same as the dimension of the input feature, that is, the output dimension of the last layer in each group is the same as the input dimension of the first layer, so that the input and output features can be added to ensure the transmissibility of the gradient. Meanwhile, the convolution of a plurality of groups of holes further expands the range of context processing, and simultaneously, the identification capability is enhanced due to the nonlinear accumulation. In one example, the number of the plurality of hole convolution layers included in each group of hole convolution layers is 3, and the expansion rates of the plurality of hole convolution layers included in each group of hole convolution layers are 1, 2 and 5 in sequence, or the number of the plurality of hole convolution layers included in each group of hole convolution layers is 4, and the expansion rates of the plurality of hole convolution layers included in each group of hole convolution layers are 1, 2, 5 and 9 in sequence. However, the invention is not limited thereto, and other combinations of prime expansion ratios are possible. Furthermore, the description made in connection with fig. 3 also applies here.
In the present invention, the named entity information generating unit 920 may be further configured to: and inputting the output of the cavity convolution layer to the local attention mechanism layer to obtain the output of the local attention mechanism layer.
Here, the local attention mechanism layer may be used to calculate the correlation of different positions of the sequence features, and the features with strong correlation are used for calculation to ensure high accuracy of named entity identification. In one embodiment, the named entity information generating unit 920 may be configured to: inputting the output of the hole convolution layer to a local attention mechanism layer to calculate correlations between features of each position in the output of the hole convolution layer and features within a predetermined range thereof; and obtaining final output characteristics of each position as the output of the local attention mechanism layer based on the correlation, the characteristics of each position and the characteristics in the preset range.
As an example, the named entity information generating unit 920 may calculate the correlation by the following equation:
hi,i′=tanh(Wq·xi+Wx·xi′+bq)
ei,i′=σ(Wa·hi,i′+ba)
where i is the current position of interest, i' is the position of the range of interest d relative to position i,
Figure BDA0001866493190000181
Figure BDA0001866493190000182
is a down-rounding function, xiIs an input feature of position i, xi′Is the input bit of position iSymbol, WqAnd WxAre respectively used for inputting the feature xiAnd attention input feature xi′Trainable parameters for performing linear transformations, bqIs a partial term, hi,i′Is a relative representation of the position i' with respect to the corresponding feature of position i, WaAnd baFor mixing h withi,i′Parameters for performing linear transformations, ei,i′Is the degree of correlation of location i' with respect to location i, σ is the activation function using the sigmoid function (i.e.,
Figure BDA0001866493190000183
). In one example, the parameters in the above formula may be obtained through training. For example, the parameters in the above formula may be learned or derived by the gradient back propagation of the loss of the final layer of the neural network.
As an example, the named entity information generating unit 920 may obtain the final output characteristics of the respective locations by the following equations:
ai=softmax(ei)
Figure BDA0001866493190000184
wherein e isiIs the attention vector of all positions i' relative to position i, aiIs the normalized attention vector, v, of all positions i' relative to position iiIs the final output characteristic of position i.
Alternatively, the named entity information generating unit 920 may generate the relative location information by the following equation
Figure BDA0001866493190000185
Splicing to input features xi′As new input feature x'i′:x′i′=xi′||pi′. Here, let x'i′Has a dimension of (string length, original feature dimension), pi′Is (string length, 1), then the concatenated feature x'i′Dimension of (character string length, original feature dimension)Degree + 1).
In the present invention, the named entity information generating unit 920 may be further configured to: and inputting the output of the local attention mechanism layer to the classification layer to obtain the named entity information output by the classification layer.
In one embodiment, the sorting layer is a linear chain element random field layer. However, the present invention is not limited thereto, and the classification layer of the present invention may be other classification layers (for example, a classification layer composed of a fully-connected layer and a normalization layer).
Further, optionally, the training apparatus 900 for a neural network for named entity recognition may further include: the encoding unit is configured to encode the labeling information of the training text; a decoding unit configured to decode the named entity information output by the classification layer. In one example, the encoding unit is configured to perform BIO-encoding on the annotation information of the training text, and the decoding unit is configured to perform BIO-decoding on the named entity information output by the classification layer, where B denotes a first word of one named entity, I denotes remaining words of the one named entity, and O denotes a word of a non-named entity.
As a non-limiting example, the coding unit may show the text "clinton president also believes that mr. costas, to the international humanitarian work done by VOAs, is suggestive. BIO coding is: "B-PER I-PER I-PER O O O O B-ORG I-ORG I-ORG O O O O O O O O O O O O O O O O O", wherein PER represents the name of a person and ORG represents the name of an organization.
Alternatively, the decoding unit may implement the BIO decoding by: enumerating each position in turn, if the current label belongs to class B (such as B-PER), then taking the current label as a starting position, and continuing to enumerate the position downwards until the label is not a class I label of a different type (for example, corresponding I-PER) as a terminating position, then the entity from the starting position to the terminating position is an entity of a certain type.
In addition, optionally, the training apparatus 900 for neural network for named entity recognition may also optimize the neural network for named entity recognition of the present invention. Optimization Methods include, but are not limited to, Stochastic Gradient Descent (SGD), Adaptive gradient Methods (AdaGrad), Adaptive learning rate Methods (AdaDelta), Adaptive moment estimation (Adam), and the like.
Fig. 10 illustrates a neural network-based named entity recognition apparatus according to an embodiment of the present invention.
Referring to fig. 10, the named entity recognition apparatus 1000 includes an obtaining unit 1010 and a named entity information generating unit 1020, wherein the obtaining unit 1010 is configured to obtain a predicted text to be recognized, and the named entity information generating unit 1020 is configured to input the predicted text to a neural network, resulting in named entity information output by the neural network.
Here, the neural network of the named entity recognition apparatus 1000 may be a neural network trained by any one of the training methods described with reference to fig. 1.
The training method and the training apparatus of the neural network for named entity recognition and the method and the apparatus for named entity recognition based on the neural network according to the exemplary embodiments of the present invention have been described above with reference to fig. 1 to 10. However, it should be understood that: the devices, systems, units, etc. used in fig. 1-10 may each be configured as software, hardware, firmware, or any combination thereof that performs a particular function. For example, these systems, devices, units, etc. may correspond to dedicated integrated circuits, to pure software code, or to a combination of software and hardware. Further, one or more functions implemented by these systems, apparatuses, or units, etc. may also be uniformly executed by components in a physical entity device (e.g., processor, client, server, etc.).
Further, the above-described method may be implemented by a computer program recorded on a computer-readable storage medium. For example, according to an exemplary embodiment of the present invention, a computer-readable storage medium may be provided, having stored thereon a computer program which, when executed by one or more computing devices, causes the one or more computing devices to implement any of the methods disclosed in the present application.
For example, the computer program, when executed by one or more computing devices, causes the one or more computing devices to perform the steps of: acquiring a training text and acquiring marking information of the training text, wherein the marking information of the training text comprises named entity marking; inputting the training text into a text conversion layer to obtain word related information output by the text conversion layer; inputting the word-related information into the cavity convolution layer to obtain the output of the cavity convolution layer; inputting the output of the cavity convolution layer to the local attention mechanism layer to obtain the output of the local attention mechanism layer; inputting the output of the local attention mechanism layer to a classification layer to obtain named entity information output by the classification layer; and calculating the loss of the neural network based on the named entity information output by the classification layer and the corresponding named entity labels, and training the neural network according to the loss of the neural network.
For another example, the computer program, when executed by one or more computing devices, causes the one or more computing devices to perform the steps of: acquiring a predictive text to be identified; and inputting the predicted text into the neural network to obtain the named entity information output by the neural network.
The computer program in the computer-readable storage medium may be executed in an environment deployed in a computer device such as a client, a host, a proxy apparatus, a server, etc., and it should be noted that the computer program may be further used to perform additional steps other than the above steps or perform more specific processes when the above steps are performed, and the contents of the additional steps and the further processes are mentioned in the description of the related methods and apparatuses with reference to fig. 1 to 10, and thus will not be described again here to avoid repetition.
It should be noted that the neural network training method and device for named entity recognition and the neural network-based named entity recognition method and device according to the exemplary embodiments of the present invention may fully rely on the execution of a computer program to implement corresponding functions, wherein each unit of the device or system corresponds to each step in the functional architecture of the computer program, so that the whole device or system is called by a special software package (e.g., lib library) to implement the corresponding functions.
On the other hand, when each unit or device mentioned in fig. 1 to 10 is implemented in software, firmware, middleware or microcode, a program code or a code segment for performing the corresponding operation may be stored in a computer-readable storage medium such as a storage medium, so that a computing device (e.g., a processor) may perform the corresponding operation by reading and executing the corresponding program code or code segment.
For example, a system according to embodiments of the invention comprises one or more computing devices and one or more storage devices, wherein the one or more storage devices have stored therein a computer program that, when executed by the one or more computing devices, causes the one or more computing devices to implement any of the methods disclosed herein. For example, causing the one or more computing devices to perform the steps of: acquiring a training text and acquiring marking information of the training text, wherein the marking information of the training text comprises named entity marking; inputting the training text into a text conversion layer to obtain word related information output by the text conversion layer; inputting the word-related information into the cavity convolution layer to obtain the output of the cavity convolution layer; inputting the output of the cavity convolution layer to the local attention mechanism layer to obtain the output of the local attention mechanism layer; inputting the output of the local attention mechanism layer to a classification layer to obtain named entity information output by the classification layer; and calculating the loss of the neural network based on the named entity information output by the classification layer and the corresponding named entity labels, and training the neural network according to the loss of the neural network. For another example, the one or more computing devices are caused to perform the steps of: acquiring a predictive text to be identified; and inputting the predicted text into the neural network to obtain the named entity information output by the neural network.
In particular, the computing devices described above may be deployed in servers as well as on node devices in a distributed network environment. Further, the computing device apparatus may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the computing device apparatus may be connected to each other via a bus and/or network.
The computing device here need not be a single device, but may be any collection of devices or circuits that can execute the instructions (or sets of instructions) described above, either individually or in combination. The computing device may also be part of an integrated control computing device or computing device manager, or may be configured as a portable electronic device that interfaces with local or remote (e.g., via wireless transmission).
The computing device for performing the training method or the named entity recognition method of the neural network according to the exemplary embodiments of the present invention may be a processor, and such a processor may include a Central Processing Unit (CPU), a Graphic Processing Unit (GPU), a programmable logic device, a dedicated processor, a microcontroller, or a microprocessor. By way of example, and not limitation, the processor may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like. The processor may execute instructions or code stored in one of the storage devices, which may also store data. Instructions and data may also be transmitted and received over a network via a network interface device, which may employ any known transmission protocol.
The storage device may be integral to the processor, e.g., having RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, the storage device may comprise a stand-alone device, such as an external disk drive, storage array, or other storage device usable by any database computing device. The storage device and the processor may be operatively coupled or may communicate with each other, such as through an I/O port, a network connection, etc., so that the processor can read files stored in the storage device.
It should be noted that the exemplary implementation of the present invention focuses on solving the problems of small feature processing range, low calculation efficiency and low recognition accuracy of the current named entity recognition method. In particular, according to the technical scheme of performing named entity identification with the local attention mechanism layer, on one hand, the embodiment of the invention increases the processing range of the features by using the hole convolution layer, thereby improving the calculation efficiency; on the other hand, the exemplary embodiments of the present invention utilize the local attention mechanism layer to reduce the problem of salient feature information, thereby increasing the accuracy of named entity identification.
While exemplary embodiments of the present application have been described above, it should be understood that the above description is exemplary only, and not exhaustive, and that the present application is not limited to the exemplary embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the present application. Therefore, the protection scope of the present application shall be subject to the scope of the claims.

Claims (10)

1. A training method for a neural network for named entity recognition, wherein the neural network comprises a pre-trained text conversion layer, a hole convolution layer, a local attention mechanism layer and a classification layer, the training method comprising:
acquiring a training text and acquiring marking information of the training text, wherein the marking information of the training text comprises named entity marking;
inputting the training text into a text conversion layer to obtain word related information output by the text conversion layer;
inputting the word-related information into the cavity convolution layer to obtain the output of the cavity convolution layer;
inputting the output of the cavity convolution layer to the local attention mechanism layer to obtain the output of the local attention mechanism layer;
inputting the output of the local attention mechanism layer to a classification layer to obtain named entity information output by the classification layer;
and calculating the loss of the neural network based on the named entity information output by the classification layer and the corresponding named entity labels, and training the neural network according to the loss of the neural network.
2. The training method according to claim 1, wherein the hole convolution layer includes a plurality of hole convolution layers which are sequentially connected and have sequentially increased expansion rates and are relatively prime,
the step of inputting the word-related information into the hole convolution layer to obtain the output of the hole convolution layer includes:
inputting the word-related information into a first cavity convolution layer of the plurality of cavity convolution layers to respectively obtain outputs of the plurality of cavity convolution layers;
and splicing the outputs of the plurality of hole convolution layers together to serve as the output of the hole convolution layer.
3. The training method of claim 2, wherein the number of the plurality of hole convolution layers is 3 and the expansion rates of the plurality of hole convolution layers are 1, 2, and 5 in order, or the number of the plurality of hole convolution layers is 4 and the expansion rates of the plurality of hole convolution layers are 1, 2, 5, and 9 in order.
4. The training method of claim 1, wherein the hole convolution layers comprise a plurality of hole convolution layers, each hole convolution layer comprising a plurality of hole convolution layers sequentially connected and having successively increasing expansion rates and being relatively prime,
the step of inputting the word-related information into the hole convolution layer to obtain the output of the hole convolution layer includes:
inputting the word-related information into a first group of cavity convolution layers in the multiple groups of cavity convolution layers to obtain output in the first group of cavity convolution layers;
for each of the remaining plurality of cavity convolution layers except the last cavity convolution layer, adding the input of the cavity convolution layer and the output of the cavity convolution layer, and taking the addition result as the input of the next cavity convolution layer of the cavity convolution layers;
splicing together the outputs of the plurality of hole convolution layers in the last group of hole convolution layers as the output of the hole convolution layers.
5. The training method according to claim 4, wherein the number of the plurality of hole convolution layers included in each group of hole convolution layers is 3, and the expansion rates of the plurality of hole convolution layers included in each group of hole convolution layers are 1, 2, and 5 in this order, or the number of the plurality of hole convolution layers included in each group of hole convolution layers is 4, and the expansion rates of the plurality of hole convolution layers included in each group of hole convolution layers are 1, 2, 5, and 9 in this order.
6. A named entity recognition method based on a neural network, wherein the neural network is trained by the training method according to any one of claims 1 to 5, the neural network comprises a pre-trained text conversion layer, a hole convolution layer, a local attention mechanism layer and a classification layer, and the named entity recognition method comprises the following steps:
acquiring a predictive text to be identified;
and inputting the predicted text into the neural network to obtain the named entity information output by the neural network.
7. A training apparatus for a neural network for named entity recognition, wherein the neural network includes a pre-trained text conversion layer, a hole convolution layer, a local attention mechanism layer, and a classification layer, the training apparatus comprising:
the system comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is configured to acquire a training text and acquire labeling information of the training text, and the labeling information of the training text comprises named entity labels;
a named entity information generating unit configured to:
inputting the training text into a text conversion layer to obtain word related information output by the text conversion layer;
inputting the word-related information into the cavity convolution layer to obtain the output of the cavity convolution layer;
inputting the output of the cavity convolution layer to the local attention mechanism layer to obtain the output of the local attention mechanism layer;
inputting the output of the local attention mechanism layer to a classification layer to obtain named entity information output by the classification layer;
and the training unit is configured to calculate the loss of the neural network based on the named entity information output by the classification layer and the corresponding named entity labels, and train the neural network according to the loss of the neural network.
8. A named entity recognition apparatus based on a neural network, wherein the neural network is trained by the training method of claim 7, the neural network comprises a pre-trained text conversion layer, a hole convolution layer, a local attention mechanism layer and a classification layer, the named entity recognition apparatus comprises:
an acquisition unit configured to acquire a predicted text to be recognized;
and the named entity information generating unit is configured to input the predicted text into the neural network to obtain the named entity information output by the neural network.
9. A computer-readable storage medium having stored thereon a computer program that, when executed by one or more computing devices, causes the one or more computing devices to implement the method of any of claims 1-6.
10. A system comprising one or more computing devices and one or more storage devices having a computer program recorded thereon, which, when executed by the one or more computing devices, causes the one or more computing devices to carry out the method of any of claims 1-6.
CN201811357670.3A 2018-11-15 2018-11-15 Neural network training method and device and named entity recognition method and device Active CN111191038B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811357670.3A CN111191038B (en) 2018-11-15 2018-11-15 Neural network training method and device and named entity recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811357670.3A CN111191038B (en) 2018-11-15 2018-11-15 Neural network training method and device and named entity recognition method and device

Publications (2)

Publication Number Publication Date
CN111191038A true CN111191038A (en) 2020-05-22
CN111191038B CN111191038B (en) 2024-05-10

Family

ID=70707057

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811357670.3A Active CN111191038B (en) 2018-11-15 2018-11-15 Neural network training method and device and named entity recognition method and device

Country Status (1)

Country Link
CN (1) CN111191038B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111860351A (en) * 2020-07-23 2020-10-30 中国石油大学(华东) Remote sensing image fishpond extraction method based on line-row self-attention full convolution neural network
CN112183494A (en) * 2020-11-05 2021-01-05 新华三大数据技术有限公司 Character recognition method and device based on neural network and storage medium
CN113192534A (en) * 2021-03-23 2021-07-30 汉海信息技术(上海)有限公司 Address search method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170270911A1 (en) * 2016-03-17 2017-09-21 Kabushiki Kaisha Toshiba Training apparatus, training method, and computer program product
CN107885721A (en) * 2017-10-12 2018-04-06 北京知道未来信息技术有限公司 A kind of name entity recognition method based on LSTM
CN108334499A (en) * 2018-02-08 2018-07-27 海南云江科技有限公司 A kind of text label tagging equipment, method and computing device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170270911A1 (en) * 2016-03-17 2017-09-21 Kabushiki Kaisha Toshiba Training apparatus, training method, and computer program product
CN107885721A (en) * 2017-10-12 2018-04-06 北京知道未来信息技术有限公司 A kind of name entity recognition method based on LSTM
CN108334499A (en) * 2018-02-08 2018-07-27 海南云江科技有限公司 A kind of text label tagging equipment, method and computing device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吴晨玥: "基于改进卷积神经网络的视网膜血管图像分割", 《光学学报》, pages 1 - 13 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111860351A (en) * 2020-07-23 2020-10-30 中国石油大学(华东) Remote sensing image fishpond extraction method based on line-row self-attention full convolution neural network
CN111860351B (en) * 2020-07-23 2021-04-30 中国石油大学(华东) Remote sensing image fishpond extraction method based on line-row self-attention full convolution neural network
CN112183494A (en) * 2020-11-05 2021-01-05 新华三大数据技术有限公司 Character recognition method and device based on neural network and storage medium
CN113192534A (en) * 2021-03-23 2021-07-30 汉海信息技术(上海)有限公司 Address search method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111191038B (en) 2024-05-10

Similar Documents

Publication Publication Date Title
JP6955580B2 (en) Document summary automatic extraction method, equipment, computer equipment and storage media
CN110597970B (en) Multi-granularity medical entity joint identification method and device
CN110232183B (en) Keyword extraction model training method, keyword extraction device and storage medium
CN107918782B (en) Method and system for generating natural language for describing image content
CN108536679B (en) Named entity recognition method, device, equipment and computer readable storage medium
CN108920460B (en) Training method of multi-task deep learning model for multi-type entity recognition
CN112487807B (en) Text relation extraction method based on expansion gate convolutional neural network
CN111783462A (en) Chinese named entity recognition model and method based on dual neural network fusion
CN112329465A (en) Named entity identification method and device and computer readable storage medium
CN109657226B (en) Multi-linkage attention reading understanding model, system and method
CN110866401A (en) Chinese electronic medical record named entity identification method and system based on attention mechanism
CN108604311B (en) Enhanced neural network with hierarchical external memory
CN111191038B (en) Neural network training method and device and named entity recognition method and device
WO2023134082A1 (en) Training method and apparatus for image caption statement generation module, and electronic device
JP7178513B2 (en) Chinese word segmentation method, device, storage medium and computer equipment based on deep learning
WO2019220113A1 (en) Device and method for natural language processing
RU2712101C2 (en) Prediction of probability of occurrence of line using sequence of vectors
CN111611805A (en) Auxiliary writing method, device, medium and equipment based on image
KR20230072454A (en) Apparatus, method and program for bidirectional generation between image and text
JP2020008836A (en) Method and apparatus for selecting vocabulary table, and computer-readable storage medium
US20220129671A1 (en) Document Information Extraction Without Additional Annotations
US20230042327A1 (en) Self-supervised learning with model augmentation
CN112740200A (en) System and method for end-to-end deep reinforcement learning based on coreference resolution
CN113177406B (en) Text processing method, text processing device, electronic equipment and computer readable medium
CN113420869B (en) Translation method based on omnidirectional attention and related equipment thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant