CN112069792A

CN112069792A - Named entity identification method, device and equipment

Info

Publication number: CN112069792A
Application number: CN201910441916.3A
Authority: CN
Inventors: 刘潇婧; 赵华厦
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2019-05-24
Filing date: 2019-05-24
Publication date: 2020-12-11

Abstract

The embodiment of the invention provides a named entity identification method, a device and equipment, wherein the method comprises the following steps: acquiring a plurality of information blocks contained in the rich text; determining semantic feature vectors corresponding to the information blocks and relative visual feature vectors among different information blocks; determining context feature vectors corresponding to the information blocks according to semantic feature vectors corresponding to the information blocks and relative visual feature vectors between different information blocks; and identifying the named entities contained in the rich text according to the context feature vectors corresponding to the information blocks respectively. The visual characteristic information combined with the rich text can more accurately identify the named entities of interest contained therein.

Description

Named entity identification method, device and equipment

Technical Field

The invention relates to the technical field of internet, in particular to a named entity identification method, device and equipment.

Background

Named entity recognition is a fundamental problem in the field of natural language processing, and belongs to the category of sequence tagging problems. Briefly, named entity recognition is the problem of identifying and categorizing entities of interest, such as person, place, and organization names, contained in a text sequence.

At present, most of the schemes for named entity recognition are performed on plain texts, but in actual life, many rich texts exist, such as value-added tax invoices, policy insurance policies, customs declaration forms and the like, and therefore, the demand for named entity recognition on the rich texts becomes more and more important and urgent. Rich text is a text format different from plain text, and contains rich format attributes such as font size, pictures, layout, and the like.

Disclosure of Invention

The embodiment of the invention provides a named entity identification method, a named entity identification device and named entity identification equipment, which are used for identifying named entities contained in rich texts.

In a first aspect, an embodiment of the present invention provides a method for identifying a named entity, where the method includes:

acquiring a plurality of information blocks contained in the rich text;

determining semantic feature vectors corresponding to the information blocks respectively and relative visual feature vectors between different information blocks;

determining context feature vectors corresponding to the information blocks according to the semantic feature vectors corresponding to the information blocks and the relative visual feature vectors between different information blocks;

and identifying the named entities contained in the rich text according to the context feature vectors corresponding to the information blocks respectively.

In a second aspect, an embodiment of the present invention provides a named entity identifying apparatus, where the apparatus includes:

the acquisition module is used for acquiring a plurality of information blocks contained in the rich text;

the determining module is used for determining semantic feature vectors corresponding to the information blocks and relative visual feature vectors among different information blocks; determining context feature vectors corresponding to the information blocks according to the semantic feature vectors corresponding to the information blocks and the relative visual feature vectors between different information blocks;

and the identification module is used for identifying the named entities contained in the rich text according to the context feature vectors corresponding to the information blocks respectively.

In a third aspect, an embodiment of the present invention provides an electronic device, which includes a processor and a memory, where the memory stores executable code, and when the executable code is executed by the processor, the processor is enabled to implement at least the named entity identifying method in the first aspect.

In a fourth aspect, an embodiment of the present invention provides a non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to implement at least the named entity recognition method of the first aspect.

In the embodiment of the invention, for the rich text needing named entity recognition, firstly, a plurality of information blocks contained in the rich text can be extracted, each information block contains a plurality of characters, and the information blocks may or may not contain named entities. Secondly, for each information block, determining its corresponding semantic information, which is represented as a semantic feature vector, and for each information block, determining its visual information relative to other information blocks, representing the relative visual information between different information blocks as a relative visual feature vector. And thirdly, determining a context feature vector corresponding to each information block by combining the corresponding semantic feature vector and the relative visual feature vector between the information block and other information blocks, wherein the context feature vector comprises the context semantic information and the visual information between the information block and other information blocks. Finally, the named entities contained in the information blocks, namely the named entities contained in the rich text, are identified from the information blocks by combining the context feature vector corresponding to each information block. In summary, the visual characteristic information of the rich text can be combined to more accurately identify the interested named entity contained in the text, for example, the tax free amount in the value-added tax invoice is to be identified, the value-added tax invoice has a plurality of amounts, such as the numbers of the single commodity amount, the tax amount, the total amount and the like, and the visual characteristic information of the text can be combined to judge which position number is the tax free amount, so that the correct identification of the named entity is realized.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

Fig. 1 is a flowchart of a named entity identification method according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating the result of dividing an information block;

FIG. 3 is a schematic diagram of a semantic feature vector of an information block;

FIG. 4 is a diagram illustrating a graph structure of rich text;

FIG. 5 is a diagram illustrating a process of obtaining context feature vectors of information blocks;

FIG. 6 is a schematic diagram of a rich text named entity recognition process;

fig. 7 is a schematic structural diagram of a named entity recognition apparatus according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of an electronic device corresponding to the named entity recognition apparatus provided in the embodiment shown in fig. 7.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and "a plurality" typically includes at least two.

The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.

It is also noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a good or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such good or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a commodity or system that includes the element.

In addition, the sequence of steps in each method embodiment described below is only an example and is not strictly limited.

The named entity identification method provided by the embodiment of the invention can be executed by an electronic device, and the electronic device can be a terminal device such as a PC (personal computer), a notebook computer and the like, and can also be a server. The server may be a physical server including an independent host, or may also be a virtual server carried by a host cluster, or may also be a cloud server.

The named entity identification method is suitable for identifying the interested named entities contained in the rich text. The rich text contains contents with various format attributes such as characters, pictures, tables and the like. In practical applications, the rich text may be various documents such as value-added tax invoice and policy, but is not limited thereto.

The following two examples are provided to schematically illustrate the practical requirements of named entity identification provided by embodiments of the present invention.

For example, named entity recognition targets are: the invoice date of invoicing is identified. Since various dates such as the invoicing date and the bill generation date may appear in the invoice, it is often impossible to judge which is the invoicing date only from the date format.

As another example, the target of named entity recognition is: and identifying the tax-free amount in the value-added tax invoice. Because the value-added tax invoice has a plurality of money amounts, such as single commodity money amount, tax-free money amount, total money amount and the like, and the formats are digital, which is the tax-free money amount cannot be judged only from the data format.

Therefore, if the rich text contains a plurality of contents with the same format as the named entity to be identified, how to accurately identify the named entity of interest becomes a problem to be solved urgently, which is also a core purpose of the method for identifying the named entity provided by the embodiment of the present invention.

In summary, the key to solving the problem of the named entity identification method provided by the embodiment of the present invention is: visual information in conjunction with rich text assists in accurate recognition of named entities. The visual information can be embodied as the difference characteristics of the content of the named entity of interest and other non-named entities on the visual characteristics of position, size and the like. For example, assuming that the identification object is an identification card, the identification purpose of the named entity is to identify the name on the identification card. Besides the name, the identity card also has other information, such as an identity card number, a certificate issuing authority and the like. Taking the name and the issuing authority as an example, since the area including the name and the area including the issuing authority information have a significant difference in size and position, the difference in visual information can be combined to assist in the recognition of the name.

The following describes the implementation of the named entity recognition method with reference to the following embodiments.

Fig. 1 is a flowchart of a named entity recognition method according to an embodiment of the present invention, which can be executed by a first wireless router. As shown in fig. 1, the method comprises the steps of:

101. a plurality of information blocks contained in the rich text are obtained.

In practical applications, rich text is generally expressed in a picture format or a PDF file format.

For the rich text in the picture format, a plurality of information blocks contained in the rich text can be acquired through Optical Character Recognition (OCR) software.

For the rich text in the PDF format, a plurality of information blocks contained in the rich text can be acquired by means of an Apache pdfbox plug-in or the like.

In this context, an information block refers to each region of the rich text containing independent text content, that is, non-text content such as graphics and background in the rich text is filtered, and the remaining content is divided into a plurality of information blocks.

In short, a block of information can be understood as a sentence. Therefore, the information blocks can be divided according to the layout information and semantic relevance of the text content.

The information block may also be referred to as text block or text block, however, it is mainly emphasized that the data content contained therein is text content of words, numbers, special symbols, etc. of respective languages.

To facilitate understanding of the meaning of the information blocks, fig. 2 illustrates the result of dividing a single table into information blocks, and fig. 2 illustrates a plurality of information blocks of different sizes and positions.

102. And determining semantic feature vectors corresponding to the information blocks respectively and relative visual feature vectors between different information blocks.

For each information block, semantic information represented by the text content contained in the information block is determined, the semantic information being represented by a semantic feature vector.

In an alternative embodiment, determining the semantic feature vector corresponding to each of the plurality of information blocks may be implemented as: and inputting words contained in each of the plurality of information blocks into the first neural network model so as to extract semantic feature vectors corresponding to each of the plurality of information blocks through the first neural network model.

Wherein the first neural network model comprises any one of: a Recurrent Neural Network (RNN), a Long Short-Term Memory Network (LSTM) or a bidirectional Long Short-Term Memory Network (Bi-LSTM).

The acquisition process of the semantic feature vector of an information block is illustrated in connection with fig. 3. The first neural network model assumed in FIG. 3 is Bi-LSTM. The arrows between adjacent LSTM in fig. 3 illustrate the hidden layer transfer directions of the forward LSTM and backward LSTM, respectively. Taking any one of the information blocks i as an example, suppose that the information block i includes three words of x1, x2 and x3, and the three words are input to Bi-LSTM after being converted by a word vector. Assuming that the hidden layer states of the three words are sequentially updated to Hf1, Hf2 and Hf3 after the three words are sequentially encoded by the forward LSTM, and the hidden layer states of the three words are sequentially updated to Hb1, Hb2 and Hb3 after the three words are sequentially encoded by the backward LSTM, then the hidden layer state Hf3 at the last time of the forward LSTM and the hidden layer state Hb3 at the last time of the backward LSTM may be spliced together to form { Hf3 Hb3} as a semantic feature vector of the information block i, wherein [ means ] a splice symbol.

In another alternative embodiment, determining the semantic feature vector corresponding to each of the plurality of information blocks may be implemented as: coding words contained in each of the plurality of information blocks to obtain corresponding word vectors; and carrying out average calculation on a plurality of word vectors corresponding to each information block to obtain a semantic feature vector corresponding to each information block.

In practical application, a plurality of words can be obtained by performing word segmentation processing on a large number of corpus samples in advance, and then the plurality of words can be sorted according to the occurrence frequency of each word, for example, the words are sorted from more to less according to the occurrence frequency, assuming that there are N words in total, so that a word list composed of N words arranged in sequence can be generated. In addition, for each word, word vector transformation may be performed according to an existing word vector transformation algorithm, and each word is assumed to be represented as a row vector of M dimensions, so that a word vector matrix of N × M dimensions is finally obtained, wherein the k-th row word vector of the word vector matrix corresponds to the k-th word in the word list.

Based on this, for any information block i in the plurality of information blocks, each word contained in the information block i can be converted into a corresponding word vector according to the corresponding relationship between the word vector matrix and the word table.

For the information block i, a plurality of words are generally included, each word corresponds to one word vector, so that the information block i corresponds to a plurality of word vectors, the average calculation is performed on the word vectors, and the average calculation result can be used as the semantic feature vector of the information block i.

For each information block, in addition to determining the semantic feature vector corresponding to the information block, it is also necessary to determine the visual information of the information block relative to other information blocks, where the visual information is represented by a relative visual feature vector, where a relative bigram refers to the relative visual information of one information block relative to another information block.

Based on this, determining the relative visual feature vector between different information blocks can be implemented as: the distance and/or relative size ratio between different information blocks is determined such that the distance and/or relative size ratio is included in the relative visual feature vector.

Wherein the distance may include a horizontal distance and a vertical distance.

Wherein the relative size ratios may include: aspect ratio of the same information block, height ratio of different information blocks, aspect ratio of different information blocks.

Based on this, it can be understood that when a plurality of information blocks included in the rich text are acquired, the position coordinates and the size information of each information block can be determined at the same time. The position coordinates of each information block may be expressed in coordinates of a center point of the corresponding rectangular frame, and the size information is determined from coordinates of four vertices of the rectangular frame.

In any two information blocks: for the example of the information block i and the information block j, it is assumed that the horizontal distance between the two information blocks is denoted by Xij, and the vertical distance between the two information blocks is denoted by Yij; suppose the width of information block i is denoted Wi and the height is denoted Hi; assuming that the width of the information block j is represented as Wj and the height is represented as Hj, the width-to-height ratio of the information block i is Wi/Hi, the height ratio of the information block j to the information block i is Hj/Hi, and the height ratio of the information block j to the information block i is Wj/Hi. Thus, optionally, the visual feature vector of information block i relative to information block j can be expressed as: rij ═ Xij, Yij, Wi/Hi, Hj/Hi, Wj/Hi.

103. And determining the context feature vector corresponding to each of the information blocks according to the semantic feature vector corresponding to each of the information blocks and the relative visual feature vector between different information blocks.

In practice, the rich text is divided into information blocks, semantic information of each information block and relative visual information between different information blocks are calculated, and this process can be understood as describing the rich text as a graph structure, namely a graph structure composed of nodes and edges.

The semantic feature vector corresponding to each information block can be regarded as a node in the graph structure, and the relative visual feature vector between different information blocks can be regarded as an edge in the graph structure.

For ease of understanding, the graph structure of rich text is described in conjunction with FIG. 4. It is assumed that the rich text includes six information blocks, i.e., information block 1 to information block 6, and semantic feature vectors corresponding to the six information blocks are represented as t1 to t6 in fig. 4. It can be set that there is a connecting edge between each information block and the rest of the other information blocks, so that, for information block 1, it is assumed that the connecting edges between it and the other five information blocks are sequentially represented as: R12-R16. Only the full connectivity between information block 1 and the remaining other respective information blocks is illustrated in fig. 4.

Rich text based representation can be represented as the graph structure, so that, when the step 103 is executed, the calculation of the upper and lower feature vectors corresponding to each information block can be realized through a graph convolution neural network. The context feature vector corresponding to each information block is determined according to the semantic feature vector of the information block, the relative visual feature vector of the information block relative to other information blocks, and the semantic feature vector of the other information blocks, so that the context feature vector corresponding to the information block includes context semantic information of the information block in a text dimension and visual feature information of the information block in a visual dimension.

Taking any information block i as an example, the process of obtaining the context feature vector corresponding to the information block i may be implemented as follows: acquiring a plurality of groups of feature vectors corresponding to an information block i, wherein any group of feature vectors consists of a semantic feature vector corresponding to the information block i, a semantic feature vector corresponding to an information block j and a relative visual feature vector between the information block i and the information block j, and the information block j is any information block except the information block i in a plurality of information blocks; and inputting the obtained multiple groups of feature vectors into the graph convolution neural network so as to output the context feature vectors corresponding to the information block i through the graph convolution neural network.

The determination of the context feature vector of an information block is schematically illustrated in connection with fig. 4 and 5. As described above, in fig. 4, semantic feature vectors corresponding to six information blocks, i.e., the information block 1 to the information block 6, are represented as t1 to t6, respectively. For information block 1, it is assumed that the connecting edges between it and the other five information blocks are sequentially represented as: R12-R16. R12-R16 are the relative visual feature vectors of the information block 1 with respect to the other five information blocks.

Based on the assumptions in fig. 4, as shown in fig. 5, for information block 1, the following five sets of feature vectors consisting of three elements can be obtained:

[ t1, R12, t2], [ t1, R13, t3], [ t1, R14, t4], [ t1, R15, t5], [ t1, R16, t6 ]. Inputting the five groups of feature vectors into a graph convolution neural network, and outputting five hidden layer state vectors aiming at the five groups of feature vectors through the calculation of the graph convolution neural network, wherein the assumed expression is as follows: h 12-h 16, the context feature vector corresponding to the information block 1 can be obtained by calculating the five hidden layer state vectors, and is assumed to be represented as C1. The five hidden-layer state vectors can be calculated by using a self-attention mechanism (self-attention mechanism) to obtain C1.

The calculation of the context feature vectors corresponding to other information blocks contained in the rich text is consistent with the calculation process of the information block 1, and is not repeated.

In practical applications, the graph convolution neural network may be implemented as a multi-layer perceptron having one or more fully connected layers.

104. And identifying the named entities contained in the rich text according to the context feature vectors corresponding to the information blocks respectively.

In the embodiment of the invention, the entity recognition model can be trained in advance to be used for recognizing the named entities contained in the rich text. The entity recognition model may also be referred to as a sequence tagging model because the named entity recognition problem is inherently a sequence tagging problem.

The core process of named entity identification is described in conjunction with fig. 6.

The entity recognition model comprises: the system comprises a second neural network model, a full connection Layer (FC Layer) and a Conditional Random Field (CRF) which are connected in sequence. Wherein the second neural network model comprises any one of: LSTM, Bi-LSTM, and Bi-LSTM is schematically shown in FIG. 6.

After the context feature vector corresponding to each information block is obtained, the input of the second neural network model can be determined according to the context feature vector corresponding to each information block.

Specifically, first, words contained in each of a plurality of information blocks in the rich text are encoded to obtain corresponding word vectors. For any information block i, the information block i corresponds to a plurality of word vectors. And secondly, splicing a plurality of word vectors corresponding to the information block i with the context feature vectors corresponding to the information block i. As shown in fig. 6, assuming that words included in the information block i are x1, x2 …, and xm, the context feature vector Ci corresponding to the information block i is spliced with the word vector corresponding to each word to obtain m spliced feature vectors. Assume that the stitching process is represented as: ci ≦ e (xi), where ≦ denotes a concatenation symbol, and e (xi) denotes a word vector corresponding to word xi. And then inputting the spliced m characteristic vectors into an entity recognition model so as to output a named entity recognition result corresponding to the information block i through the entity recognition model. The named entity recognition result for the information block i is to recognize whether the named entity is contained and what the named entity is contained specifically.

In summary, since the text of the named entity and the text of other non-named entities may show a certain difference in visual sense, the interested named entity contained therein may be more accurately identified by combining the visual feature information of the rich text. In addition, the rich text is divided into the information blocks, the rich text is represented as a graph structure, and the visual information of each information block contained in the rich text can be automatically extracted based on the graph convolution neural network, so that the named entity identification method has good universality and can be universally suitable for various rich texts.

The use processes of the convolutional neural network and the entity recognition model are introduced, and the training processes of the convolutional neural network and the entity recognition model are briefly described below.

First, a large number of training samples, which may be a large number of rich text, may be collected. Thereafter, a plurality of information blocks included in the training samples may be obtained, referred to herein as a plurality of information block samples for the purpose of distinguishing from the plurality of information blocks in the foregoing. The information block samples are obtained in the same manner as described above, and when a plurality of information block samples included in the training samples are obtained, the position coordinates and the size information of each information block sample can be determined at the same time. Since the purpose of training the graph convolution neural network and the entity recognition model is to recognize named entities, the named entities can be labeled for each information block sample.

The task of named entity recognition is to recognize one or more named entities included in rich text, such as named entities identifying a person's name, time, number, place name, organization name, and so on. In practical application, the named entity can be labeled by adopting a BIO mode. In particular, a labelset may be defined first according to the kind of named entity to be identified. For example, assuming that the named entity to be identified includes a person name and a facility name, the LabelSet may be defined as:

LabelSet＝{BA,IA,BP,IP,O},

taking a named entity in Chinese as an example, a sentence needing named entity recognition is regarded as a Chinese character sequence.

Wherein BA represents that the Chinese character is the first character of the organization name, and IA represents that the Chinese character is the middle character of the organization name; BP stands for the Chinese character is the first character of the name, IP stands for the Chinese character is the middle character of the name, and O stands for the Chinese character does not belong to the named entity.

After inputting the Chinese character sequence and the label set, each Chinese character is classified by the trained entity recognition model, namely, each Chinese character is marked with a label in the label set, so that the named entity in the sentence is recognized.

Based on the above description, it can be seen that each obtained sample of the information block can be regarded as a sentence, and the naming entity labeling of the sample of the information block is to label the text included in each sample of the information block by combining the defined label set. It will be appreciated that if there are no named entities in a sample of a block, all the words in the sample of a block are labeled with the letter O.

And then, training a convolutional neural network and an entity recognition model according to the marked information block samples. In the process of training the graph convolutional neural network and the entity identification model, the processing process of each information block sample is consistent with the processing process of the information block i in the foregoing, and details are not repeated, but for each information block sample, the final entity identification model outputs a prediction result of whether the information block sample contains the named entity and what named entity is included, and a loss function of the model is determined by comparing the prediction result with a labeling result of the information block sample, so that feedback adjustment of parameters is performed on the entity identification model and the graph convolutional neural network based on the loss function until the model converges.

The named entity recognition apparatus of one or more embodiments of the present invention will be described in detail below. Those skilled in the art will appreciate that these named entity recognition means can each be constructed by configuring the steps taught in this scheme using commercially available hardware components.

Fig. 7 is a schematic structural diagram of a named entity recognition apparatus according to an embodiment of the present invention, as shown in fig. 7, the named entity recognition apparatus includes: the device comprises an acquisition module 11, a determination module 12 and an identification module 13.

And an obtaining module 11, configured to obtain a plurality of information blocks included in the rich text.

A determining module 12, configured to determine semantic feature vectors corresponding to the multiple information blocks respectively and relative visual feature vectors between different information blocks; and determining the context feature vector corresponding to each of the plurality of information blocks according to the semantic feature vector corresponding to each of the plurality of information blocks and the relative visual feature vector between different information blocks.

And the identifying module 13 is configured to identify a named entity included in the rich text according to the context feature vector corresponding to each of the plurality of information blocks.

Optionally, in the process of determining the semantic feature vector corresponding to each of the plurality of information blocks, the determining module 12 may be specifically configured to: and inputting words contained in each of the plurality of information blocks into a first neural network model so as to extract semantic feature vectors corresponding to each of the plurality of information blocks through the first neural network model.

Wherein the first neural network model comprises any one of: a recurrent neural network model, a long-short term memory network model, or a bidirectional long-short term memory network model.

Optionally, in the process of determining the semantic feature vector corresponding to each of the plurality of information blocks, the determining module 12 may be specifically configured to: encoding words contained in each of the plurality of information blocks to obtain corresponding word vectors; and carrying out average calculation on a plurality of word vectors corresponding to each information block to obtain a semantic feature vector corresponding to each information block.

Optionally, in the process of determining the relative visual feature vector between the different information blocks, the determining module 12 may be specifically configured to: determining a distance and/or a relative size ratio between different information blocks, the distance and/or the relative size ratio being comprised in the relative visual feature vector.

Wherein the distance includes a horizontal distance and a vertical distance.

Wherein the relative size ratios include: aspect ratio of the same information block, height ratio of different information blocks, aspect ratio of different information blocks.

Optionally, in the process of determining the context feature vector corresponding to each of the plurality of information blocks, the determining module 12 may be specifically configured to: for an information block i in the plurality of information blocks, acquiring a plurality of groups of feature vectors corresponding to the information block i, wherein any group of feature vectors consists of a semantic feature vector corresponding to the information block i, a semantic feature vector corresponding to an information block j and a relative visual feature vector between the information block i and the information block j, the information block i is any one of the information blocks, and the information block j is any one of the information blocks except the information block i; and inputting the plurality of groups of feature vectors into a graph convolution neural network so as to output the context feature vector corresponding to the information block i through the graph convolution neural network.

Optionally, the identification module 13 may be specifically configured to: encoding words contained in each of the plurality of information blocks to obtain corresponding word vectors; for the information block i, splicing a plurality of word vectors corresponding to the information block i with context feature vectors corresponding to the information block i respectively; and inputting the spliced feature vectors into an entity recognition model so as to output a named entity recognition result corresponding to the information block i through the entity recognition model.

Wherein the entity recognition model comprises: the second neural network model, the full connection layer and the conditional random field model are connected in sequence; the second neural network model includes any one of: a long short term memory network model and a bidirectional long short term memory network model.

Optionally, the apparatus further comprises: the training module is used for acquiring a training sample, and the training sample is a rich text; obtaining a plurality of information block samples included in the training sample; labeling named entities contained in the plurality of information block samples; and training the graph convolutional neural network and the entity recognition model according to the marked information block samples.

The named entity recognition apparatus shown in fig. 7 can execute the methods provided in the foregoing embodiments, and parts not described in detail in this embodiment may refer to the related descriptions of the foregoing embodiments, which are not described herein again.

In one possible design, the named entity recognition apparatus shown in fig. 7 may be implemented as an electronic device. As shown in fig. 8, the electronic device may include: a processor 21 and a memory 22. Wherein the memory 22 has stored thereon executable code which, when executed by the processor 21, at least makes the processor 21 capable of implementing the named entity recognition method as provided in the previous embodiments.

The electronic device may further include a communication interface 23 for communicating with other devices or a communication network.

Additionally, an embodiment of the present invention provides a non-transitory machine-readable storage medium having stored thereon executable code, which, when executed by a processor of a wireless router, causes the processor to perform the named entity identification method provided in the foregoing embodiments.

The above-described apparatus embodiments are merely illustrative, wherein the various modules illustrated as separate components may or may not be physically separate. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by adding a necessary general hardware platform, and of course, can also be implemented by a combination of hardware and software. With this understanding in mind, the above-described aspects and portions of the present technology which contribute substantially or in part to the prior art may be embodied in the form of a computer program product, which may be embodied on one or more computer-usable storage media having computer-usable program code embodied therein, including without limitation disk storage, CD-ROM, optical storage, and the like.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A named entity recognition method, comprising:

acquiring a plurality of information blocks contained in the rich text;

2. The method according to claim 1, wherein the step of determining the semantic feature vector corresponding to each of the plurality of information blocks comprises:

and inputting words contained in each of the plurality of information blocks into a first neural network model so as to extract semantic feature vectors corresponding to each of the plurality of information blocks through the first neural network model.

3. The method of claim 2, wherein the first neural network model comprises any one of: a recurrent neural network model, a long-short term memory network model, or a bidirectional long-short term memory network model.

4. The method according to claim 1, wherein the step of determining the semantic feature vector corresponding to each of the plurality of information blocks comprises:

encoding words contained in each of the plurality of information blocks to obtain corresponding word vectors;

and carrying out average calculation on a plurality of word vectors corresponding to each information block to obtain a semantic feature vector corresponding to each information block.

5. The method according to claim 1, wherein the step of determining the relative visual feature vector between the different information blocks comprises:

determining a distance and/or a relative size ratio between different information blocks, the distance and/or the relative size ratio being comprised in the relative visual feature vector.

6. The method of claim 5, wherein the distance comprises a horizontal distance and a vertical distance.

7. The method of claim 5, wherein the relative size ratios comprise: aspect ratio of the same information block, height ratio of different information blocks, aspect ratio of different information blocks.

8. The method according to claim 1, wherein the step of determining the context feature vector corresponding to each of the plurality of information blocks comprises:

for an information block i in the plurality of information blocks, acquiring a plurality of groups of feature vectors corresponding to the information block i, wherein any group of feature vectors consists of a semantic feature vector corresponding to the information block i, a semantic feature vector corresponding to an information block j and a relative visual feature vector between the information block i and the information block j, the information block i is any one of the information blocks, and the information block j is any one of the information blocks except the information block i;

and inputting the plurality of groups of feature vectors into a graph convolution neural network so as to output the context feature vector corresponding to the information block i through the graph convolution neural network.

9. The method according to claim 8, wherein the identifying the named entities contained in the rich text according to the context feature vector corresponding to each of the plurality of information blocks comprises:

for the information block i, splicing a plurality of word vectors corresponding to the information block i with context feature vectors corresponding to the information block i respectively;

and inputting the spliced feature vectors into an entity recognition model so as to output a named entity recognition result corresponding to the information block i through the entity recognition model.

10. The method of claim 9, wherein the entity recognition model comprises: the second neural network model, the full connection layer and the conditional random field model are connected in sequence;

the second neural network model includes any one of: a long short term memory network model and a bidirectional long short term memory network model.

11. The method of claim 9, further comprising:

acquiring a training sample, wherein the training sample is a rich text;

obtaining a plurality of information block samples included in the training sample;

labeling named entities contained in the plurality of information block samples;

and training the graph convolutional neural network and the entity recognition model according to the marked information block samples.

12. A named entity recognition apparatus, comprising:

13. An electronic device, comprising: a memory, a processor; wherein the memory has stored thereon executable code which, when executed by the processor, causes the processor to perform the named entity recognition method of any of claims 1 to 11.