CN111191038A

CN111191038A - Neural network training method and device and named entity identification method and device

Info

Publication number: CN111191038A
Application number: CN201811357670.3A
Authority: CN
Inventors: 赵汉光; 王珵; 戴文渊; 陈雨强
Original assignee: 4Paradigm Beijing Technology Co Ltd
Current assignee: 4Paradigm Beijing Technology Co Ltd
Priority date: 2018-11-15
Filing date: 2018-11-15
Publication date: 2020-05-22
Anticipated expiration: 2038-11-15
Also published as: CN111191038B

Abstract

A neural network training method and device and a named entity recognition method and device are provided. A training method of a neural network for named entity recognition, wherein the neural network comprises a pre-trained text conversion layer, a hole convolution layer, a local attention mechanism layer and a classification layer, the training method comprising: acquiring a training text and marking information thereof, wherein the marking information comprises named entity marking; inputting the training text into a text conversion layer to obtain word related information output by the text conversion layer; inputting the word-related information into the void convolutional layer to obtain the output of the void convolutional layer; inputting the output of the cavity convolution layer to the local attention mechanism layer to obtain the output of the local attention mechanism layer; inputting the output of the local attention mechanism layer to a classification layer to obtain named entity information output by the classification layer; and calculating the loss of the neural network based on the named entity information and the corresponding named entity labels, and training the neural network according to the loss of the neural network.

Description

Neural network training method and device and named entity identification method and device

Technical Field

The present invention relates to named entity recognition, and more particularly, to a neural network training method and apparatus for named entity recognition, and a neural network-based named entity recognition method and apparatus.

Background

Named Entity Recognition (NER) is a technique for identifying and categorizing Named entities that appear in text. For example, named entities may include three major classes (entity class, time class, and numeric class), seven minor classes (person name, organization name, place name, time, date, currency, and percentage) named entities. Named entity recognition is a fundamental task in natural language processing and is also a key technology in many applications (e.g., information retrieval, information extraction, and machine translation). Therefore, the research on the automatic identification of the named entities has important theoretical significance and practical value.

As the demand for named entity recognition technology is continuously increasing, the demand for named entity recognition technology is also increasing. However, in the existing named entity recognition technology, since the phenomenon that more Chinese character units are contained in the Chinese named entity may exist, the range of processing the characteristics for named entity recognition is small, and the calculation efficiency is low. Meanwhile, in the existing named entity recognition technology, the feature information for named entity recognition usually has a problem of being not outstanding enough, which results in low accuracy of named entity recognition.

Disclosure of Invention

The invention aims to provide a neural network training method and device for named entity recognition and a named entity recognition method and device based on the neural network.

One aspect of the present invention provides a training method for a neural network for named entity recognition, wherein the neural network includes a pre-trained text conversion layer, a hole convolution layer, a local attention mechanism layer and a classification layer, the training method includes: acquiring a training text and acquiring marking information of the training text, wherein the marking information of the training text comprises named entity marking; inputting the training text into a text conversion layer to obtain word related information output by the text conversion layer; inputting the word-related information into the cavity convolution layer to obtain the output of the cavity convolution layer; inputting the output of the cavity convolution layer to the local attention mechanism layer to obtain the output of the local attention mechanism layer; inputting the output of the local attention mechanism layer to a classification layer to obtain named entity information output by the classification layer; and calculating the loss of the neural network based on the named entity information output by the classification layer and the corresponding named entity labels, and training the neural network according to the loss of the neural network.

Optionally, the hole convolution layer includes a plurality of hole convolution layers which are sequentially connected and have sequentially increased expansion rates and are relatively prime, and the step of inputting the word-related information into the hole convolution layer to obtain the output of the hole convolution layer includes: inputting the word-related information into a first cavity convolution layer of the plurality of cavity convolution layers to respectively obtain outputs of the plurality of cavity convolution layers; and splicing the outputs of the plurality of hole convolution layers together to serve as the output of the hole convolution layer.

Optionally, the number of the plurality of hole convolution layers is 3, and the expansion rates of the plurality of hole convolution layers are 1, 2 and 5 in sequence, or the number of the plurality of hole convolution layers is 4, and the expansion rates of the plurality of hole convolution layers are 1, 2, 5 and 9 in sequence.

Optionally, the hole convolution layer includes a plurality of groups of hole convolution layers, each group of hole convolution layers includes a plurality of hole convolution layers which are sequentially connected and have sequentially increased expansion rates and are relatively prime, the step of inputting the word-related information to the hole convolution layers to obtain the output of the hole convolution layers includes: inputting the word-related information into a first group of cavity convolution layers in the multiple groups of cavity convolution layers to obtain output in the first group of cavity convolution layers; for each of the remaining plurality of cavity convolution layers except the last cavity convolution layer, adding the input of the cavity convolution layer and the output of the cavity convolution layer, and taking the addition result as the input of the next cavity convolution layer of the cavity convolution layers; splicing together the outputs of the plurality of hole convolution layers in the last group of hole convolution layers as the output of the hole convolution layers.

Optionally, the number of the plurality of cavity convolution layers included in each group of cavity convolution layers is 3, and the expansion rates of the plurality of cavity convolution layers included in each group of cavity convolution layers are sequentially 1, 2, and 5, or the number of the plurality of cavity convolution layers included in each group of cavity convolution layers is 4, and the expansion rates of the plurality of cavity convolution layers included in each group of cavity convolution layers are sequentially 1, 2, 5, and 9.

Optionally, the step of inputting the output of the void convolution layer to the local attention mechanism layer to obtain the output of the local attention mechanism layer includes: inputting the output of the hole convolution layer to a local attention mechanism layer to calculate correlations between features of each position in the output of the hole convolution layer and features within a predetermined range thereof; and obtaining final output characteristics of each position as the output of the local attention mechanism layer based on the correlation, the characteristics of each position and the characteristics in the preset range.

Optionally, the correlation is calculated by the following equation:

h_i，i′＝tanh(W_q·x_i+W_x·x_i′+b_q)

e_i，i′＝σ(W_a·h_i，i′+b_a)

where i is the current position of interest, i' is the position of the range of interest d relative to position i,

is a down-rounding function, x_iIs an input feature of position i, x_i′Is an input feature of position i', W_qAnd W_xRespectively for the current person-losing feature x_iAnd attention is paid to the person-losing feature x_i′Trainable parameters for performing linear transformations, b_qIs a partial term, h_i，i′Is a relative representation of the position i' with respect to the corresponding feature of position i, W_aAnd b_aFor mixing h with_i，i′Parameters for performing linear transformations, e_i，i′Is the degree of correlation of position i' with respect to position i, and σ is the activation function using the sigmoid function.

Optionally, the final output characteristics of the respective positions are obtained by the following equations:

a_i＝softmax(e_i)

wherein e is_iIs the attention vector of all positions i' relative to position i, a_iIs the normalized attention vector, v, of all positions i' relative to position i_iIs the final output characteristic of position i.

Alternatively, the relative position information is expressed by the following equation

Splicing to input features x_i′As new input feature x'_i′：x′_i′＝x_i′||p_i′。

Optionally, the sorting layer is a linear chain random field layer.

Optionally, the text conversion layer is an embedded layer; the step of inputting the training text into the text conversion layer to obtain the word related information output by the text conversion layer comprises the following steps: inputting the training text into the pre-trained embedding layer to obtain the word related information output by the embedding layer; or, the text conversion layer comprises an embedding layer and a bidirectional language model; the step of inputting the training text into the text conversion layer to obtain the word related information output by the text conversion layer comprises the following steps: respectively inputting the training texts into a pre-trained embedding layer to obtain word related information output by the embedding layer, inputting the word related information output by the embedding layer into a bidirectional language model to obtain the word related information output by the bidirectional language model, and splicing the word related information output by the embedding layer and the word related information output by the bidirectional language model together to be used as the word related information output by a text conversion layer.

Optionally, the training method further comprises: coding the marking information of the training text; and decoding the named entity information output by the classification layer.

Optionally, the step of encoding the labeling information of the training text includes: carrying out BIO coding on the marking information of the training text, wherein the step of decoding the named entity information output by the classification layer comprises the following steps: and carrying out BIO decoding on the named entity information output by the classification layer, wherein B represents a first word of one named entity, I represents the rest words of the named entity, and O represents the words of non-named entities.

One aspect of the present invention provides a named entity recognition method based on a neural network, wherein the neural network is trained by any one of the training methods described above, the neural network includes a pre-trained text conversion layer, a hole convolution layer, a local attention mechanism layer and a classification layer, and the named entity recognition method includes: acquiring a predictive text to be identified; and inputting the predicted text into the neural network to obtain the named entity information output by the neural network.

One aspect of the present invention provides a training apparatus for a neural network for named entity recognition, wherein the neural network includes a pre-trained text conversion layer, a hole convolution layer, a local attention mechanism layer, and a classification layer, the training apparatus comprising: the system comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is configured to acquire a training text and acquire labeling information of the training text, and the labeling information of the training text comprises named entity labels; a named entity information generating unit configured to: inputting the training text into a text conversion layer to obtain word related information output by the text conversion layer; inputting the word-related information into the cavity convolution layer to obtain the output of the cavity convolution layer; inputting the output of the cavity convolution layer to the local attention mechanism layer to obtain the output of the local attention mechanism layer; inputting the output of the local attention mechanism layer to a classification layer to obtain named entity information output by the classification layer; and the training unit is configured to calculate the loss of the neural network based on the named entity information output by the classification layer and the corresponding named entity labels, and train the neural network according to the loss of the neural network.

Optionally, the hole convolution layer includes a plurality of hole convolution layers which are sequentially connected and have sequentially increasing expansion rates and are relatively prime, and the named entity information generating unit is configured to: inputting the word-related information into a first cavity convolution layer of the plurality of cavity convolution layers to respectively obtain outputs of the plurality of cavity convolution layers; and splicing the outputs of the plurality of hole convolution layers together to serve as the output of the hole convolution layer.

Optionally, the cavity convolution layers include a plurality of groups of cavity convolution layers, each group of cavity convolution layers includes a plurality of cavity convolution layers which are sequentially connected and have sequentially increased expansion rates and are relatively prime, and the named entity information generating unit is configured to: inputting the word-related information into a first group of cavity convolution layers in the multiple groups of cavity convolution layers to obtain output in the first group of cavity convolution layers; for each of the remaining plurality of cavity convolution layers except the last cavity convolution layer, adding the input of the cavity convolution layer and the output of the cavity convolution layer, and taking the addition result as the input of the next cavity convolution layer of the cavity convolution layers; splicing together the outputs of the plurality of hole convolution layers in the last group of hole convolution layers as the output of the hole convolution layers.

Optionally, the named entity information generating unit is configured to: inputting the output of the hole convolution layer to a local attention mechanism layer to calculate correlations between features of each position in the output of the hole convolution layer and features within a predetermined range thereof; and obtaining final output characteristics of each position as the output of the local attention mechanism layer based on the correlation, the characteristics of each position and the characteristics in the preset range.

Optionally, the named entity information generating unit is configured to calculate the correlation by the following equation:

h_i，i′＝tanh(W_q·x_i+W_x·x_i′+b_q)

e_i，i′＝σ(W_a·h_i，i′+b_a)

is a down-rounding function, x_iIs an input feature of position i, x_i′Is an input feature of position i', W_qAnd W_xRespectively trainable parameters for linear transformation of current input features and input features of interest, b_qIs a partial term, h_i，i′Is a relative representation of the position i' with respect to the corresponding feature of position i, W_aAnd b_aFor mixing h with_i，i′Parameters for performing linear transformations, e_i，i′Is the degree of correlation of position i' with respect to position i, and σ is the activation function using the sigmoid function.

Optionally, the named entity information generating unit is configured to obtain the final output characteristics of the respective locations by the following equation:

a_i＝softmax(e_i)

Optionally, the named entity information generating unit is configured to compare the relative position information by the following equation

Optionally, the sorting layer is a linear chain random field layer.

Optionally, the text conversion layer is an embedded layer; the named entity information generating unit is configured to: inputting the training text into the pre-trained embedding layer to obtain the word related information output by the embedding layer; or, the text conversion layer comprises an embedding layer and a bidirectional language model; the named entity information generating unit is configured to: respectively inputting the training texts into a pre-trained embedding layer to obtain word related information output by the embedding layer, inputting the word related information output by the embedding layer into a bidirectional language model to obtain the word related information output by the bidirectional language model, and splicing the word related information output by the embedding layer and the word related information output by the bidirectional language model together to be used as the word related information output by a text conversion layer.

Optionally, the training device further comprises: the encoding unit is configured to encode the labeling information of the training text; a decoding unit configured to decode the named entity information output by the classification layer.

Optionally, the encoding unit is configured to: performing BIO encoding on the labeling information of the training text, wherein the decoding unit is configured to: and carrying out BIO decoding on the named entity information output by the classification layer, wherein B represents a first word of one named entity, I represents the rest words of the named entity, and O represents the words of non-named entities.

An aspect of the present invention provides a named entity recognition apparatus based on a neural network, wherein the neural network is trained by the training method as described above, the neural network includes a pre-trained text conversion layer, a hole convolution layer, a local attention mechanism layer, and a classification layer, and the named entity recognition apparatus includes: an acquisition unit configured to acquire a predicted text to be recognized; and the named entity information generating unit is configured to input the predicted text into the neural network to obtain the named entity information output by the neural network.

An aspect of the invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by one or more computing devices, causes the one or more computing devices to carry out any of the methods described above.

An aspect of the invention provides a system comprising one or more computing devices and one or more storage devices having a computer program recorded thereon, which when executed by the one or more computing devices, causes the one or more computing devices to carry out any of the methods as described above.

According to the technical scheme for carrying out named entity identification by utilizing the cavity convolution layer and the local attention mechanism layer, on one hand, the processing range of the characteristics is enlarged by utilizing the cavity convolution layer, so that the calculation efficiency is improved; on the other hand, the problem of unobtrusive characteristic information is reduced by utilizing the local attention mechanism layer, so that the accuracy of named entity identification is increased.

Additional aspects and/or advantages of the present general inventive concept will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the general inventive concept.

Drawings

The above and other objects and features of the present invention will become more apparent from the following description taken in conjunction with the accompanying drawings which illustrate, by way of example, an example in which:

FIG. 1 is a flow diagram illustrating a method of training a neural network for named entity recognition, according to an embodiment of the present invention;

FIG. 2 shows a schematic diagram of a neural network including hole convolution layers with expansion rates of 1, 2, and 5 in order, according to an embodiment of the present invention;

FIG. 3 shows a schematic diagram of a neural network comprising two sets of hole convolution layers with expansion rates of 1, 2 and 5 in order, according to an embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating coverage of a hole convolution layer using a set of successively increasing and relatively prime numbers as the expansion ratio of a set of hole convolution layers in accordance with an embodiment of the present invention;

FIG. 5 shows a schematic diagram of a text conversion layer according to an embodiment of the invention;

FIG. 6 shows a schematic diagram of the text conversion layer of FIG. 5 during training;

FIG. 7 illustrates a schematic diagram of the text conversion layer of FIG. 6 after training is complete;

FIG. 8 illustrates a neural network-based named entity identification methodology, according to an embodiment of the present invention;

FIG. 9 illustrates a training apparatus for a neural network for named entity recognition, in accordance with an embodiment of the present invention;

fig. 10 illustrates a neural network-based named entity recognition apparatus according to an embodiment of the present invention.

Detailed Description

The following description is provided with reference to the accompanying drawings to assist in a comprehensive understanding of exemplary embodiments of the invention as defined by the claims and their equivalents. The description includes various specific details to aid understanding, but these details are to be regarded as illustrative only. Thus, one of ordinary skill in the art will recognize that: various changes and modifications may be made to the embodiments described herein without departing from the scope and spirit of the present invention. Moreover, descriptions of well-known functions and constructions may be omitted for clarity and conciseness.

In the present invention, the neural network used for named entity recognition may include a pre-trained text conversion layer, a hole convolution layer, a local attention mechanism layer, and a classification layer. This will be described in more detail below with reference to the accompanying drawings.

FIG. 1 is a flow diagram illustrating a method of training a neural network for named entity recognition, according to an embodiment of the present invention.

Referring to fig. 1, a training method of a neural network for named entity recognition according to an embodiment of the present invention includes steps S110 to S160.

In step S110, a training text is obtained, and labeling information of the training text is obtained, where the labeling information of the training text includes a named entity label.

Here, the named entity labels of the training text indicate whether the training text belongs to a named entity or to which named entity the training text belongs. As a non-limiting example, the named entity label of the training text "beijing city" may indicate that the training text "beijing city" belongs to a place name named entity, the named entity label of the training text "wangming" may indicate that the training text "wangming" belongs to a person name named entity, and the named entity label of the training text "beautiful" may indicate that the training text "beautiful" does not belong to a named entity.

In step S120, the training text is input to the text conversion layer, and the word-related information output by the text conversion layer is obtained.

That is, the text conversion layer of the present invention is configured to convert training text into word-related information. Here, the word-related information may be regarded as information having a mapping relation with the training text. In this case, the text conversion layer of the present invention may have various structures for converting training text into word-related information.

In one embodiment, the text conversion layer is an embedded layer, in which case, the inputting of the training text into the text conversion layer to obtain the word related information output by the text conversion layer includes: and inputting the training text into the pre-trained embedding layer to obtain the word related information output by the embedding layer.

In another embodiment, the text conversion layer includes an embedding layer and a bi-directional language model. In this case, the inputting the training text into the text conversion layer to obtain the word related information output by the text conversion layer includes: respectively inputting the training texts into a pre-trained embedding layer to obtain word related information output by the embedding layer, inputting the word related information output by the embedding layer into a bidirectional language model to obtain the word related information output by the bidirectional language model, and splicing the word related information output by the embedding layer and the word related information output by the bidirectional language model together to be used as the word related information output by a text conversion layer. This embodiment will be described in more detail later in connection with fig. 4.

Although some embodiments of text conversion layers are shown above, the invention is not so limited and any other network layer that can implement the functionality of the text conversion layers of the invention is also possible.

In step S130, the word-related information is input to the hole convolution layer, and an output of the hole convolution layer is obtained.

Here, the hole convolution is a convolution operation for increasing the pitch of the input used for calculation without changing the original convolution kernel size, and the pitch is expressed by the expansion ratio and corresponds to a normal convolution operation when the expansion ratio is 1. Taking a convolution kernel of size 3 as an example, assuming that the input is x and the weight of one kernel is W, the output of the kernel corresponding to position i is W · [ x ] when the expansion rate is d_i-d；x_i；x_i+d]The sum of (1). In other words, the hole convolution provides a larger field of view under the same calculation conditions, or in other words, the hole convolution can reduce the amount of calculation and improve the operation efficiency when the same field of view is provided.

In one embodiment, the hole convolution layer includes a plurality of hole convolution layers connected in series with sequentially increasing expansion rates and being relatively prime. In this case, the step of inputting the word-related information to the hole convolution layer to obtain an output of the hole convolution layer includes: inputting the word-related information into a first cavity convolution layer of the plurality of cavity convolution layers to respectively obtain outputs of the plurality of cavity convolution layers; the outputs of the plurality of hole convolution layers are spliced together as the output of the hole convolution layer.

In a conventional hole convolution layer, a power of 2 (e.g., 1, 2, 4, 8, 16, etc.) is generally adopted as the expansion ratio of a group of hole convolution layers. However, in the present embodiment, by using a set of sequentially increasing and coprime numbers as the expansion ratio of a set of hole convolution layers, a larger range is covered using a smaller number of layers, thereby further reducing the amount of calculation compared to the existing hole convolution layers. In one example, the number of the hole convolution layers is 3, and the expansion rates of the plurality of hole convolution layers are 1, 2 and 5 in sequence, or the number of the plurality of hole convolution layers is 4, and the expansion rates of the plurality of hole convolution layers are 1, 2, 5 and 9 in sequence, which is set to ensure that a continuous characteristic region is covered, and will be described later with reference to fig. 4. However, the invention is not limited thereto, and other combinations of prime expansion ratios are possible. For ease of understanding, the neural network including the hole convolution layers having expansion rates of 1, 2, and 5 in this order will be described in more detail later with reference to fig. 2.

In another embodiment, the hole convolution layers include a plurality of hole convolution layers, each of the hole convolution layers includes a plurality of hole convolution layers which are sequentially connected and have sequentially increased expansion rates and are relatively prime, and the step of inputting the word-related information into the hole convolution layers to obtain the output of the hole convolution layers includes: inputting the word-related information into a first group of void convolution layers of the plurality of groups of void convolution layers to obtain an output of the first group of void convolution layers; for each of the remaining plurality of cavity convolution layers except the last cavity convolution layer, adding the input of the cavity convolution layer and the output of the cavity convolution layer, and taking the addition result as the input of the next cavity convolution layer of the cavity convolution layers; and splicing the outputs of the plurality of void convolution layers in the last group of void convolution layers together to serve as the output of the void convolution layers.

In a conventional hole convolution layer, a power of 2 (e.g., 1, 2, 4, 8, 16, etc.) is generally adopted as the expansion ratio of a group of hole convolution layers. However, in the present embodiment, by using a plurality of sets of successively increasing and coprime numbers as the expansion ratios of the plurality of sets of hole convolution layers, a larger range is covered with a smaller number of layers, thereby further reducing the amount of calculation compared to the conventional hole convolution layers. Here, it is necessary to ensure that the number of kernels of the convolution of the last layer of holes in each group is the same as the dimension of the input feature, that is, the output dimension of the last layer in each group is the same as the input dimension of the first layer, so that the input and output features can be added to ensure the transmissibility of the gradient. Meanwhile, the convolution of a plurality of groups of holes further expands the range of context processing, and simultaneously, the identification capability is enhanced due to the nonlinear accumulation. In one example, the number of the plurality of hole convolution layers included in each group of hole convolution layers is 3, and the expansion rates of the plurality of hole convolution layers included in each group of hole convolution layers are 1, 2 and 5 in sequence, or the number of the plurality of hole convolution layers included in each group of hole convolution layers is 4, and the expansion rates of the plurality of hole convolution layers included in each group of hole convolution layers are 1, 2, 5 and 9 in sequence. However, the invention is not limited thereto, and other combinations of prime expansion ratios are possible. For ease of understanding, the sets of void convolution layers having expansion ratios of 1, 2, and 5 in this order will be described in more detail below with reference to fig. 3.

In step S140, the output of the void convolution layer is input to the local attention suppression layer, and the output of the local attention suppression layer is obtained.

Here, the local attention mechanism layer may be used to calculate the correlation of different positions of the sequence features, and the features with strong correlation are used for calculation to ensure high accuracy of named entity identification. In one embodiment, the step of inputting the output of the void convolution layer to the local attention mechanism layer to obtain the output of the local attention mechanism layer includes: inputting the output of the hole convolution layer to a local attention mechanism layer to calculate correlations between features of each position in the output of the hole convolution layer and features within a predetermined range thereof; and obtaining final output characteristics of each position as the output of the local attention mechanism layer based on the correlation, the characteristics of each position and the characteristics in the preset range.

As an example, the correlation may be calculated by the following equation:

h_i，i′＝tanh(W_q·x_i+W_x·x_i′+b_q)

e_i，i′＝σ(W_a·h_i，i′+b_a)

is a down-rounding function, x_iIs an input feature of position i, x_i′Is an input feature of position i', W_qAnd W_xAre respectively used for inputting the feature x_iAnd attention input feature x_i′Trainable parameters for performing linear transformations, b_qIs a partial term, h_i，i′Is a relative representation of the position i' with respect to the corresponding feature of position i, W_aAnd b_aFor mixing h with_i，i′Parameters for performing linear transformations, e_i，i′Is the degree of correlation of location i' with respect to location i, σ is the activation function using the sigmoid function (i.e.,

). In one example, the parameters in the above formula may be learned or derived by the gradient back propagation of the loss of the final layer of the neural network.

As an example, the final output characteristics of the respective positions are obtained by the following equations:

a_i＝softmax(e_i)

Splicing to input features x_i′As new input feature x'_i′：x′_i′＝x_i′||p_i′. Here, let x'_i′Has a dimension of (string length, original feature dimension), p_i′Is (string length, 1), then the concatenated feature x'_i′The dimension of (string length, original feature dimension + 1).

In step S150, the output of the local attention mechanism layer is input to the classification layer, and the named entity information output by the classification layer is obtained.

In one embodiment, the sorting layer is a linear chain element random field layer. However, the present invention is not limited thereto, and the classification layer of the present invention may be other classification layers (for example, a classification layer composed of a fully-connected layer and a normalization layer).

In step S160, based on the named entity information and the corresponding named entity label output by the classification layer, the loss of the neural network is calculated, and the neural network is trained according to the loss of the neural network.

In addition, optionally, the training method in fig. 1 may further include: coding the marking information of the training text; and decoding the named entity information output by the classification layer. In one example, the step of encoding the annotation information of the training text comprises: carrying out BIO coding on the marking information of the training text, wherein the step of decoding the named entity information output by the classification layer comprises the following steps: and carrying out BIO decoding on the named entity information output by the classification layer, wherein B represents a first word of one named entity, I represents the rest words of the named entity, and O represents the words of non-named entities. As a non-limiting example, the text "clinton president also signals mr. costmann, which is a welcome to the international humanistic work undertaken by VOAs. "can be coded by BIO as: "B-PER I-PER I-PER O O O O B-ORG I-ORG I-ORG O O O O O O O O O O O O O O O O O", wherein PER represents the name of a person and ORG represents the name of an organization. Optionally, the BIO decoding may be implemented by: enumerating each position in turn, if the current label belongs to class B (such as B-PER), then taking the current label as a starting position, and continuing to enumerate the position downwards until the label is not a class I label of a different type (for example, corresponding I-PER) as a terminating position, then the entity from the starting position to the terminating position is an entity of a certain type.

Optionally, the neural network for named entity recognition of the present invention may also be optimized. Optimization Methods include, but are not limited to, Stochastic Gradient Descent (SGD), Adaptive Gradient Methods (AdaGrad), Adaptive learning rate Methods (AdaDelta), Adaptive moment estimation (Adam), and the like.

FIG. 2 shows a schematic diagram of a neural network including hole convolution layers having expansion rates of 1, 2, and 5 in this order, according to an embodiment of the present invention.

Although fig. 2 shows three void convolution layers having expansion ratios of 1, 2, and 5 in this order, this is for illustrative purposes only, and the present invention is not limited thereto. As described above with reference to fig. 1, the number of the void convolution layers may not be limited to 3, and the expansion ratio thereof may not be limited to 1, 3, and 5.

With reference to fig. 1 and 2, the text conversion layer may convert the received text into word-related information and output the word-related information to the hole convolution layer having an expansion rate of 1. The hole convolution layer with the expansion rate of 1 performs hole convolution operation on the received word related information to obtain a first hole convolution operation result, and outputs the first hole convolution operation result to the hole convolution layer with the expansion rate of 2. And the hole convolution layer with the expansion rate of 2 performs hole convolution operation on the received first hole convolution operation result to obtain a second hole convolution operation result, and outputs the second hole convolution operation result to the hole convolution layer with the expansion rate of 5. And the hole convolution layer with the expansion rate of 5 performs hole convolution operation on the received second hole convolution operation result to obtain a third hole convolution operation result. The local attention mechanism layer can receive the splicing result of the first cavity convolution operation result, the second cavity convolution operation result and the third cavity convolution operation result, operate the operation results and output the operation results to the classification layer. Here, the stitching may indicate that the first hole convolution operation result, the second hole convolution operation result, and the third hole convolution operation result are stitched by position, and the dimension is doubled by 3, so as to simultaneously utilize information of different granularity ranges. As a non-limiting example, when the first hole convolution operation result is [ 100 ], the second hole convolution operation result is [ 010 ], and the third hole convolution operation result is [ 001 ], the concatenation result of the first hole convolution operation result, the second hole convolution operation result, and the third hole convolution operation result may be [ 100010001 ]. The classification layer outputs named entity information by calculating the operation result of the local attention mechanism layer.

FIG. 3 shows a schematic diagram of a neural network including two sets of hole convolution layers with expansion rates of 1, 2, and 5 in order, according to an embodiment of the present invention.

Although fig. 3 shows that the neural network includes two sets of three hole convolution layers having expansion rates of 1, 2, and 5 in this order, this is for illustrative purposes only, and the present invention is not limited thereto. As described above with reference to fig. 1, the number of groups of the hole convolution layers may not be limited to two groups, the number of hole convolution layers in each group may not be limited to 3, and the expansion ratio may not be limited to 1, 3, and 5. In the present invention, multiple sets of hole convolution layers can be connected using a ResNet-like structure.

With reference to fig. 1 and 3, the text conversion layer may convert the received text into word-related information and output the word-related information to the hole convolution layer having an expansion rate of 1 in the first group of hole convolution layers. The hole convolution layer with the expansion rate of 1 in the first group of hole convolution layers performs hole convolution operation on the received word related information to obtain a first hole convolution operation result, and the first hole convolution operation result is output to the hole convolution layer with the expansion rate of 2 in the first group of hole convolution layers. And the hole convolutional layer with the expansion rate of 2 performs hole convolutional operation on the received first hole convolutional operation result to obtain a second hole convolutional operation result, and outputs the second hole convolutional operation result in the first group of hole convolutional layers to the hole convolutional layer with the expansion rate of 5 in the first group of hole convolutional layers. And the hole convolution layer with the expansion rate of 5 performs hole convolution operation on the received second hole convolution operation result to obtain a third hole convolution operation result. And adding the output third hole convolution operation result with the word related information output by the text conversion layer, wherein the addition result can be used as the input of the first hole convolution layer with the expansion rate of 1 in the second group of hole convolution layers. Here, it is necessary to ensure that the output dimension of the last layer of the hole convolution layer in each group of hole convolution layers is the same as the input dimension of the first layer of the hole convolution layer, so that the input and output characteristics can be added, and the transmissibility of the gradient is ensured. As a non-limiting example, when the input of the first hole convolution layer with the expansion ratio of 1 of the first group of hole convolution layers is [ 1001 ] and the input of the third hole convolution layer with the expansion ratio of 5 of the first group of hole convolution layers is [ 0001 ], the addition result of the two is [ 1002 ]. The classification layer outputs named entity information by calculating the operation result of the local attention mechanism layer.

In FIG. 3, the use of multiple sets of hole convolution layers further expands the scope of context processing, while enhancing recognition capabilities due to the non-linear accumulation.

FIG. 4 shows a schematic diagram of the coverage of a hole convolution layer using a set of successively increasing and coprime numbers as the expansion ratio of a set of hole convolution layers, according to an embodiment of the present invention.

Referring to fig. 4, one circle may indicate word related information of one word. Here, a row of circles may indicate one hole convolution layer. In the embodiment of FIG. 4, the set of hole convolution expansion rates is [1, 2, 5], and when the kernel size is 3, the contexts of 3, 7, 15 can be captured for each position in turn. It can be seen that each layer uses information from a bottom contiguous region, and that the top layer has an expansion ratio of 5 and uses all information from the region of length 15. It should be understood that fig. 4 omits a number of lines and partial circles for ease of illustration, and the present invention is not limited to the lines and circles shown in fig. 4.

By using a set of successively larger and coprime void convolution layers as the expansion ratio of a set of void convolution layers, the processing or coverage of the features is increased, thereby improving computational efficiency.

FIG. 5 shows a schematic diagram of a text conversion layer according to an embodiment of the invention.

Referring to FIG. 5, a text conversion layer may include an embedding layer and a bi-directional language model. The embedding layer may receive text and convert the text to first word related information. The bi-directional language model may receive the first word-related information output by the embedding layer and output the second word-related information. The first word-related information and the second word-related information may be concatenated together to represent the word characteristics (i.e., the final word-related information ("output" in fig. 4)) for which the current position has context information. As a non-limiting example, the dimension of the second word of related information may be twice that of the first word of related information. For example, when the first word-related information is [ 001 ], the second word-related information may be [ 100010 ], in which case the final word-related information may be [ 001100010 ]. Here, the present invention can pre-train the embedding layer and the bi-directional language model using the crawled unlabeled corpus or text. The pre-trained embedding layer and bi-directional language model may be understood as parameters of the embedding layer and bi-directional language model that have been initialized. Further, note that in the present invention, a bi-directional language model is an optional feature that may sacrifice some efficiency to provide context-dependent features for greater accuracy.

In one example, the embedding layer may use Skip-Gram to obtain the first word-related information. However, the present invention is not limited thereto, and Continuous bags of Words (CBoW), global vectors (glovevectors, gloves), and open source library fastText, etc. may be used to acquire the first word related information. For example, the first word related information may be a word vector.

FIG. 6 shows a schematic diagram of the text conversion layer of FIG. 5 during training.

Referring to fig. 6, the embedding layer 610 is an embedding layer that has been pre-trained and converts an input text "named entity recognition" into first word-related information to input the first word-related information to a recurrent neural network (i.e., RNN network) in a forward direction and a reverse direction, respectively. In fig. 6, one circle represents word-related information of one word, and EOS represents the end of one sentence or text. Although it is illustrated in fig. 6 that the forward network and the reverse network are RNN networks, the present invention is not limited thereto. The forward network and the reverse network may further include, but are not limited to, one or more layers of Long-Short Term Memory (LSTM), Gate Round Unit (GRU), bidirectional Long-Short Term Memory (Bi-LSTM), bidirectional gate round Unit (Bi-GRU), and the like.

Here, a combination of the forward RNN network, the reverse RNN network, and the network layer softmax, which is a full connectivity layer using softmax as an activation function, may be regarded as a bi-directional language model. In the present invention, the network layer RNN1 through RNNn and the corresponding network layer softmax of the forward RNN network of the bidirectional language model may be used to forward predict the probability distribution of the next word when a word is given, and the network layer RNN1 through RNNn and the corresponding network layer softmax of the reverse RNN network of the bidirectional language model may be used to reverse predict the probability distribution of the previous word when a word is given. Specifically, in fig. 5, assuming that the first five words are known to be a combination of "named entity identities", the forward RNN network may be trained to output word-related information indicating that the next word "is" with the highest probability of corresponding; assuming that the next five words are known to be a combination of "entity recognition", the inverse RNN network may be trained to output word-related information indicating the highest probability that the last word "hits". In one example, if the position of the text "know" corresponds to a prediction of the text "not" if the RNN is expanded, it does not participate in the calculation even with "EOS". Existing language prediction models are generally forward prediction models, i.e. only the previous context is considered. However, in the embodiment of fig. 5 of the present invention, the bi-directional language model not only considers the previous context, but also considers the subsequent context, and is calculated based on the word-related information output by the embedding layer, so that the word-related information output by the text conversion layer is more accurate. In addition, in fig. 5, a text conversion layer (or a bi-directional language model) may be trained using a cross entropy loss function (cross entropy), however, the present invention is not limited thereto, and other existing loss functions may be used to train the text conversion layer.

FIG. 7 shows a schematic diagram of the text conversion layer of FIG. 6 after training is complete.

Compared with the trained text conversion layer in fig. 6, the text conversion layer in fig. 7 removes the last and last full connection layer, retains the structures and weights of the rest layers, and concatenates the hidden states of the final forward and backward RNNs according to time (or position) to serve as the bidirectional language model of the current position. The final output of the Bi-directional language model may be referred to as the Bi-LM feature.

FIG. 8 illustrates a neural network-based named entity identification method, according to an embodiment of the invention.

Here, the neural network in fig. 8 may be a neural network trained by any one of the training methods described with reference to fig. 1.

Referring to fig. 8, in step S810, a predicted text to be recognized is acquired; in step S820, the predicted text is input to the neural network, and the named entity information output by the neural network is obtained.

FIG. 9 illustrates a training apparatus for a neural network for named entity recognition, according to an embodiment of the present invention.

Here, the training apparatus of fig. 9 may be an apparatus configured to perform any of the training methods described with reference to fig. 1.

Referring to fig. 9, a training apparatus 900 of a neural network for named entity recognition includes an acquisition unit 910, a named entity information generation unit 920, and a training unit 930. The acquisition unit 910, the named entity information generation unit 920, and the training unit 930 will be described in more detail below.

In the present invention, the obtaining unit 910 may be configured to obtain a training text and obtain labeling information of the training text, where the labeling information of the training text includes a named entity label.

In the present invention, the named entity information generating unit 920 may be configured to: and inputting the training text into a text conversion layer to obtain the word related information output by the text conversion layer.

That is, the named entity information generating unit 920 may be configured to convert the training text into word related information by using a text conversion layer. Here, the word-related information may be regarded as information having a mapping relation with the training text. In this case, the text conversion layer of the present invention may have various structures for converting training text into word-related information.

In one embodiment, the text conversion layer is an embedded layer, in which case the named entity information generating unit 920 may be configured to: and inputting the training text into the pre-trained embedding layer to obtain the word related information output by the embedding layer.

In another embodiment, the text conversion layer includes an embedding layer and a bi-directional language model. In this case, the named entity information generating unit 920 may be configured to: respectively inputting the training texts into a pre-trained embedding layer to obtain word related information output by the embedding layer, inputting the word related information output by the embedding layer into a bidirectional language model to obtain the word related information output by the bidirectional language model, and splicing the word related information output by the embedding layer and the word related information output by the bidirectional language model together to be used as the word related information output by a text conversion layer. Further, the description related to the text conversion layer with reference to fig. 4 is also applicable here.

In the present invention, the named entity information generating unit 920 may be further configured to: inputting the word-related information into the void convolutional layer to obtain the output of the void convolutional layer.

In one embodiment, the hole convolution layer includes a plurality of hole convolution layers connected in series with sequentially increasing expansion rates and being relatively prime. At this time, the named entity information generating unit 920 may be configured to: inputting the word-related information into a first cavity convolution layer of the plurality of cavity convolution layers to respectively obtain outputs of the plurality of cavity convolution layers; the outputs of the plurality of hole convolution layers are spliced together as the output of the hole convolution layer.

In a conventional hole convolution layer, a power of 2 (e.g., 1, 2, 4, 8, 16, etc.) is generally adopted as the expansion ratio of a group of hole convolution layers. However, in the present embodiment, by using a set of sequentially increasing and coprime numbers as the expansion ratio of a set of hole convolution layers, a larger range is covered using a smaller number of layers, thereby further reducing the amount of calculation compared to the existing hole convolution layers. In one example, the number of the hole convolution layers is 3, and the expansion rates of the plurality of hole convolution layers are sequentially 1, 2, and 5, or the number of the plurality of hole convolution layers is 4, and the expansion rates of the plurality of hole convolution layers are sequentially 1, 2, 5, and 9, which can ensure coverage of a continuous feature region, and the description made in conjunction with fig. 2 is also applicable thereto. However, the invention is not limited thereto, and other combinations of prime expansion ratios are possible.

In another embodiment, the hole convolution layers include a plurality of hole convolution layers, each of which includes a plurality of hole convolution layers with sequentially increasing expansion rates and being relatively prime, in which the named entity information generating unit 920 may be configured to: inputting the word-related information into a first group of void convolution layers of the plurality of groups of void convolution layers to obtain an output of the first group of void convolution layers; for each of the remaining plurality of cavity convolution layers except the last cavity convolution layer, adding the input of the cavity convolution layer and the output of the cavity convolution layer, and taking the addition result as the input of the next cavity convolution layer of the cavity convolution layers; and splicing the outputs of the plurality of hole convolution layers in the last group of hole convolution layers together to serve as the output of the hole convolution layers.

In a conventional hole convolution layer, a power of 2 (e.g., 1, 2, 4, 8, 16, etc.) is generally adopted as the expansion ratio of a group of hole convolution layers. However, in the present embodiment, by using a plurality of sets of successively increasing and coprime numbers as the expansion ratios of the plurality of sets of hole convolution layers, a larger range is covered with a smaller number of layers, thereby further reducing the amount of calculation compared to the conventional hole convolution layers. Here, it is necessary to ensure that the number of kernels of the convolution of the last layer of holes in each group is the same as the dimension of the input feature, that is, the output dimension of the last layer in each group is the same as the input dimension of the first layer, so that the input and output features can be added to ensure the transmissibility of the gradient. Meanwhile, the convolution of a plurality of groups of holes further expands the range of context processing, and simultaneously, the identification capability is enhanced due to the nonlinear accumulation. In one example, the number of the plurality of hole convolution layers included in each group of hole convolution layers is 3, and the expansion rates of the plurality of hole convolution layers included in each group of hole convolution layers are 1, 2 and 5 in sequence, or the number of the plurality of hole convolution layers included in each group of hole convolution layers is 4, and the expansion rates of the plurality of hole convolution layers included in each group of hole convolution layers are 1, 2, 5 and 9 in sequence. However, the invention is not limited thereto, and other combinations of prime expansion ratios are possible. Furthermore, the description made in connection with fig. 3 also applies here.

In the present invention, the named entity information generating unit 920 may be further configured to: and inputting the output of the cavity convolution layer to the local attention mechanism layer to obtain the output of the local attention mechanism layer.

Here, the local attention mechanism layer may be used to calculate the correlation of different positions of the sequence features, and the features with strong correlation are used for calculation to ensure high accuracy of named entity identification. In one embodiment, the named entity information generating unit 920 may be configured to: inputting the output of the hole convolution layer to a local attention mechanism layer to calculate correlations between features of each position in the output of the hole convolution layer and features within a predetermined range thereof; and obtaining final output characteristics of each position as the output of the local attention mechanism layer based on the correlation, the characteristics of each position and the characteristics in the preset range.

As an example, the named entity information generating unit 920 may calculate the correlation by the following equation:

h_i，i′＝tanh(W_q·x_i+W_x·x_i′+b_q)

e_i，i′＝σ(W_a·h_i，i′+b_a)

is a down-rounding function, x_iIs an input feature of position i, x_i′Is the input bit of position iSymbol, W_qAnd W_xAre respectively used for inputting the feature x_iAnd attention input feature x_i′Trainable parameters for performing linear transformations, b_qIs a partial term, h_i，i′Is a relative representation of the position i' with respect to the corresponding feature of position i, W_aAnd b_aFor mixing h with_i，i′Parameters for performing linear transformations, e_i，i′Is the degree of correlation of location i' with respect to location i, σ is the activation function using the sigmoid function (i.e.,

). In one example, the parameters in the above formula may be obtained through training. For example, the parameters in the above formula may be learned or derived by the gradient back propagation of the loss of the final layer of the neural network.

As an example, the named entity information generating unit 920 may obtain the final output characteristics of the respective locations by the following equations:

a_i＝softmax(e_i)

Alternatively, the named entity information generating unit 920 may generate the relative location information by the following equation

Splicing to input features x_i′As new input feature x'_i′：x′_i′＝x_i′||p_i′. Here, let x'_i′Has a dimension of (string length, original feature dimension), p_i′Is (string length, 1), then the concatenated feature x'_i′Dimension of (character string length, original feature dimension)Degree + 1).

In the present invention, the named entity information generating unit 920 may be further configured to: and inputting the output of the local attention mechanism layer to the classification layer to obtain the named entity information output by the classification layer.

Further, optionally, the training apparatus 900 for a neural network for named entity recognition may further include: the encoding unit is configured to encode the labeling information of the training text; a decoding unit configured to decode the named entity information output by the classification layer. In one example, the encoding unit is configured to perform BIO-encoding on the annotation information of the training text, and the decoding unit is configured to perform BIO-decoding on the named entity information output by the classification layer, where B denotes a first word of one named entity, I denotes remaining words of the one named entity, and O denotes a word of a non-named entity.

As a non-limiting example, the coding unit may show the text "clinton president also believes that mr. costas, to the international humanitarian work done by VOAs, is suggestive. BIO coding is: "B-PER I-PER I-PER O O O O B-ORG I-ORG I-ORG O O O O O O O O O O O O O O O O O", wherein PER represents the name of a person and ORG represents the name of an organization.

Alternatively, the decoding unit may implement the BIO decoding by: enumerating each position in turn, if the current label belongs to class B (such as B-PER), then taking the current label as a starting position, and continuing to enumerate the position downwards until the label is not a class I label of a different type (for example, corresponding I-PER) as a terminating position, then the entity from the starting position to the terminating position is an entity of a certain type.

In addition, optionally, the training apparatus 900 for neural network for named entity recognition may also optimize the neural network for named entity recognition of the present invention. Optimization Methods include, but are not limited to, Stochastic Gradient Descent (SGD), Adaptive gradient Methods (AdaGrad), Adaptive learning rate Methods (AdaDelta), Adaptive moment estimation (Adam), and the like.

Referring to fig. 10, the named entity recognition apparatus 1000 includes an obtaining unit 1010 and a named entity information generating unit 1020, wherein the obtaining unit 1010 is configured to obtain a predicted text to be recognized, and the named entity information generating unit 1020 is configured to input the predicted text to a neural network, resulting in named entity information output by the neural network.

Here, the neural network of the named entity recognition apparatus 1000 may be a neural network trained by any one of the training methods described with reference to fig. 1.

The training method and the training apparatus of the neural network for named entity recognition and the method and the apparatus for named entity recognition based on the neural network according to the exemplary embodiments of the present invention have been described above with reference to fig. 1 to 10. However, it should be understood that: the devices, systems, units, etc. used in fig. 1-10 may each be configured as software, hardware, firmware, or any combination thereof that performs a particular function. For example, these systems, devices, units, etc. may correspond to dedicated integrated circuits, to pure software code, or to a combination of software and hardware. Further, one or more functions implemented by these systems, apparatuses, or units, etc. may also be uniformly executed by components in a physical entity device (e.g., processor, client, server, etc.).

Further, the above-described method may be implemented by a computer program recorded on a computer-readable storage medium. For example, according to an exemplary embodiment of the present invention, a computer-readable storage medium may be provided, having stored thereon a computer program which, when executed by one or more computing devices, causes the one or more computing devices to implement any of the methods disclosed in the present application.

For example, the computer program, when executed by one or more computing devices, causes the one or more computing devices to perform the steps of: acquiring a training text and acquiring marking information of the training text, wherein the marking information of the training text comprises named entity marking; inputting the training text into a text conversion layer to obtain word related information output by the text conversion layer; inputting the word-related information into the cavity convolution layer to obtain the output of the cavity convolution layer; inputting the output of the cavity convolution layer to the local attention mechanism layer to obtain the output of the local attention mechanism layer; inputting the output of the local attention mechanism layer to a classification layer to obtain named entity information output by the classification layer; and calculating the loss of the neural network based on the named entity information output by the classification layer and the corresponding named entity labels, and training the neural network according to the loss of the neural network.

For another example, the computer program, when executed by one or more computing devices, causes the one or more computing devices to perform the steps of: acquiring a predictive text to be identified; and inputting the predicted text into the neural network to obtain the named entity information output by the neural network.

The computer program in the computer-readable storage medium may be executed in an environment deployed in a computer device such as a client, a host, a proxy apparatus, a server, etc., and it should be noted that the computer program may be further used to perform additional steps other than the above steps or perform more specific processes when the above steps are performed, and the contents of the additional steps and the further processes are mentioned in the description of the related methods and apparatuses with reference to fig. 1 to 10, and thus will not be described again here to avoid repetition.

It should be noted that the neural network training method and device for named entity recognition and the neural network-based named entity recognition method and device according to the exemplary embodiments of the present invention may fully rely on the execution of a computer program to implement corresponding functions, wherein each unit of the device or system corresponds to each step in the functional architecture of the computer program, so that the whole device or system is called by a special software package (e.g., lib library) to implement the corresponding functions.

On the other hand, when each unit or device mentioned in fig. 1 to 10 is implemented in software, firmware, middleware or microcode, a program code or a code segment for performing the corresponding operation may be stored in a computer-readable storage medium such as a storage medium, so that a computing device (e.g., a processor) may perform the corresponding operation by reading and executing the corresponding program code or code segment.

For example, a system according to embodiments of the invention comprises one or more computing devices and one or more storage devices, wherein the one or more storage devices have stored therein a computer program that, when executed by the one or more computing devices, causes the one or more computing devices to implement any of the methods disclosed herein. For example, causing the one or more computing devices to perform the steps of: acquiring a training text and acquiring marking information of the training text, wherein the marking information of the training text comprises named entity marking; inputting the training text into a text conversion layer to obtain word related information output by the text conversion layer; inputting the word-related information into the cavity convolution layer to obtain the output of the cavity convolution layer; inputting the output of the cavity convolution layer to the local attention mechanism layer to obtain the output of the local attention mechanism layer; inputting the output of the local attention mechanism layer to a classification layer to obtain named entity information output by the classification layer; and calculating the loss of the neural network based on the named entity information output by the classification layer and the corresponding named entity labels, and training the neural network according to the loss of the neural network. For another example, the one or more computing devices are caused to perform the steps of: acquiring a predictive text to be identified; and inputting the predicted text into the neural network to obtain the named entity information output by the neural network.

In particular, the computing devices described above may be deployed in servers as well as on node devices in a distributed network environment. Further, the computing device apparatus may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the computing device apparatus may be connected to each other via a bus and/or network.

The computing device here need not be a single device, but may be any collection of devices or circuits that can execute the instructions (or sets of instructions) described above, either individually or in combination. The computing device may also be part of an integrated control computing device or computing device manager, or may be configured as a portable electronic device that interfaces with local or remote (e.g., via wireless transmission).

The computing device for performing the training method or the named entity recognition method of the neural network according to the exemplary embodiments of the present invention may be a processor, and such a processor may include a Central Processing Unit (CPU), a Graphic Processing Unit (GPU), a programmable logic device, a dedicated processor, a microcontroller, or a microprocessor. By way of example, and not limitation, the processor may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like. The processor may execute instructions or code stored in one of the storage devices, which may also store data. Instructions and data may also be transmitted and received over a network via a network interface device, which may employ any known transmission protocol.

The storage device may be integral to the processor, e.g., having RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, the storage device may comprise a stand-alone device, such as an external disk drive, storage array, or other storage device usable by any database computing device. The storage device and the processor may be operatively coupled or may communicate with each other, such as through an I/O port, a network connection, etc., so that the processor can read files stored in the storage device.

It should be noted that the exemplary implementation of the present invention focuses on solving the problems of small feature processing range, low calculation efficiency and low recognition accuracy of the current named entity recognition method. In particular, according to the technical scheme of performing named entity identification with the local attention mechanism layer, on one hand, the embodiment of the invention increases the processing range of the features by using the hole convolution layer, thereby improving the calculation efficiency; on the other hand, the exemplary embodiments of the present invention utilize the local attention mechanism layer to reduce the problem of salient feature information, thereby increasing the accuracy of named entity identification.

While exemplary embodiments of the present application have been described above, it should be understood that the above description is exemplary only, and not exhaustive, and that the present application is not limited to the exemplary embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the present application. Therefore, the protection scope of the present application shall be subject to the scope of the claims.

Claims

1. A training method for a neural network for named entity recognition, wherein the neural network comprises a pre-trained text conversion layer, a hole convolution layer, a local attention mechanism layer and a classification layer, the training method comprising:

acquiring a training text and acquiring marking information of the training text, wherein the marking information of the training text comprises named entity marking;

inputting the training text into a text conversion layer to obtain word related information output by the text conversion layer;

inputting the word-related information into the cavity convolution layer to obtain the output of the cavity convolution layer;

inputting the output of the cavity convolution layer to the local attention mechanism layer to obtain the output of the local attention mechanism layer;

inputting the output of the local attention mechanism layer to a classification layer to obtain named entity information output by the classification layer;

and calculating the loss of the neural network based on the named entity information output by the classification layer and the corresponding named entity labels, and training the neural network according to the loss of the neural network.

2. The training method according to claim 1, wherein the hole convolution layer includes a plurality of hole convolution layers which are sequentially connected and have sequentially increased expansion rates and are relatively prime,

the step of inputting the word-related information into the hole convolution layer to obtain the output of the hole convolution layer includes:

inputting the word-related information into a first cavity convolution layer of the plurality of cavity convolution layers to respectively obtain outputs of the plurality of cavity convolution layers;

and splicing the outputs of the plurality of hole convolution layers together to serve as the output of the hole convolution layer.

3. The training method of claim 2, wherein the number of the plurality of hole convolution layers is 3 and the expansion rates of the plurality of hole convolution layers are 1, 2, and 5 in order, or the number of the plurality of hole convolution layers is 4 and the expansion rates of the plurality of hole convolution layers are 1, 2, 5, and 9 in order.

4. The training method of claim 1, wherein the hole convolution layers comprise a plurality of hole convolution layers, each hole convolution layer comprising a plurality of hole convolution layers sequentially connected and having successively increasing expansion rates and being relatively prime,

inputting the word-related information into a first group of cavity convolution layers in the multiple groups of cavity convolution layers to obtain output in the first group of cavity convolution layers;

for each of the remaining plurality of cavity convolution layers except the last cavity convolution layer, adding the input of the cavity convolution layer and the output of the cavity convolution layer, and taking the addition result as the input of the next cavity convolution layer of the cavity convolution layers;

splicing together the outputs of the plurality of hole convolution layers in the last group of hole convolution layers as the output of the hole convolution layers.

5. The training method according to claim 4, wherein the number of the plurality of hole convolution layers included in each group of hole convolution layers is 3, and the expansion rates of the plurality of hole convolution layers included in each group of hole convolution layers are 1, 2, and 5 in this order, or the number of the plurality of hole convolution layers included in each group of hole convolution layers is 4, and the expansion rates of the plurality of hole convolution layers included in each group of hole convolution layers are 1, 2, 5, and 9 in this order.

6. A named entity recognition method based on a neural network, wherein the neural network is trained by the training method according to any one of claims 1 to 5, the neural network comprises a pre-trained text conversion layer, a hole convolution layer, a local attention mechanism layer and a classification layer, and the named entity recognition method comprises the following steps:

acquiring a predictive text to be identified;

and inputting the predicted text into the neural network to obtain the named entity information output by the neural network.

7. A training apparatus for a neural network for named entity recognition, wherein the neural network includes a pre-trained text conversion layer, a hole convolution layer, a local attention mechanism layer, and a classification layer, the training apparatus comprising:

the system comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is configured to acquire a training text and acquire labeling information of the training text, and the labeling information of the training text comprises named entity labels;

a named entity information generating unit configured to:

and the training unit is configured to calculate the loss of the neural network based on the named entity information output by the classification layer and the corresponding named entity labels, and train the neural network according to the loss of the neural network.

8. A named entity recognition apparatus based on a neural network, wherein the neural network is trained by the training method of claim 7, the neural network comprises a pre-trained text conversion layer, a hole convolution layer, a local attention mechanism layer and a classification layer, the named entity recognition apparatus comprises:

an acquisition unit configured to acquire a predicted text to be recognized;

and the named entity information generating unit is configured to input the predicted text into the neural network to obtain the named entity information output by the neural network.

9. A computer-readable storage medium having stored thereon a computer program that, when executed by one or more computing devices, causes the one or more computing devices to implement the method of any of claims 1-6.

10. A system comprising one or more computing devices and one or more storage devices having a computer program recorded thereon, which, when executed by the one or more computing devices, causes the one or more computing devices to carry out the method of any of claims 1-6.