CN112329471A

CN112329471A - Named entity identification method, device and equipment based on matching graph

Info

Publication number: CN112329471A
Application number: CN202110014000.7A
Authority: CN
Inventors: 李直旭; 陈志刚; 陈大伟; 何莹
Original assignee: Iflytek Suzhou Technology Co Ltd
Current assignee: Iflytek Suzhou Technology Co Ltd
Priority date: 2021-01-06
Filing date: 2021-01-06
Publication date: 2021-02-05
Anticipated expiration: 2041-01-06
Also published as: CN112329471B

Abstract

The invention discloses a named entity identification method, a device and equipment based on matching. The invention has the conception that aiming at the problems that the input text in a specific scene has insufficient information and non-uniform forms, the invention leads some objects with indefinite meanings in the text to be difficult to be identified, introduces the attribute information of the image attached to the scene and the deep information of the text to assist the identification processing of the named entity, particularly, the invention provides the method for carrying out text level conversion on the image information, leads the image attribute, the deep information of the text and the basic information of the text to be unified, thus, when the basic text information, the deep information, the matching image attribute information and the basic image information are subjected to multi-mode comprehensive processing, on one hand, the problem of insufficient information of the input text can be made up, on the other hand, the spatial heterogeneity of the image and the text can be reduced, the text and the image can be fully interacted and combined deeply, and therefore the efficiency and the accuracy of the recognition of the named entity in the scene can be greatly improved.

Description

Named entity identification method, device and equipment based on matching graph

Technical Field

The invention relates to the field of knowledge graphs, in particular to a named entity identification method, device and equipment based on graph matching.

Background

Named Entity Recognition (NER) is a key technology for information extraction, the main task of Named Entity Recognition is to recognize entities with specific meanings in a text, mainly including names of people, places, names of organizations, proper nouns and the like, and the implementation process is generally to determine the boundary range of the entities in the input text and then determine the type labels of the entities.

At present, for texts with good sentence structures and sufficient context information, satisfactory entity identification results can be obtained based on technologies such as BilSTM + CRF or Bert + CRF. However, in some specific application fields, such as the social media field, the text to be processed has the characteristics of short text, insufficient context, colloquization, incorrect spelling, abbreviation and the like, so that the traditional named entity recognition technology cannot achieve a good enough recognition effect.

Disclosure of Invention

In view of the foregoing, the present invention aims to provide a named entity recognition method, apparatus and device based on matching graph, and accordingly provides a computer readable storage medium and a computer program product, so as to solve the problem of poor recognition effect of named entities in some specific application environments.

The technical scheme adopted by the invention is as follows:

in a first aspect, the present invention provides a named entity identification method based on matching graph, wherein the method includes:

according to a preset strategy, acquiring deep information of a text to be processed and attribute information of a matching picture attached to the text to be processed, wherein the attribute information is represented in a text form;

extracting text information of the text to be processed and visual information of the matching picture;

and carrying out named entity identification processing by combining the text information, the deep information, the attribute information and the visual information to obtain an entity type sequence of the text to be processed.

In at least one possible implementation manner, the obtaining deep information of the text to be processed according to the predetermined policy includes:

and acquiring entity knowledge information of the text to be processed according to a pre-constructed multi-mode knowledge graph, wherein the multi-mode knowledge graph comprises a plurality of entities and pictures associated with the entities.

In at least one possible implementation manner, the acquiring, according to a pre-constructed multi-modal knowledge graph, entity knowledge information of a text to be processed includes:

matching a plurality of candidate entities of the text to be processed by utilizing the multi-modal knowledge graph;

screening out target entities from the candidate entities by using the matching picture and pictures associated with the candidate entities;

and acquiring a plurality of knowledge of the target entity from the multi-modal knowledge graph to be used as entity knowledge information of the text to be processed.

In at least one possible implementation manner, the matching, by using the multi-modal knowledge-graph, a plurality of candidate entities of the text to be processed includes:

pre-constructing a nickname table corresponding to entities in the multi-modal knowledge graph;

matching the text to be processed with the entity name and the nickname table in the multi-modal knowledge graph;

and constructing the entities in the multi-modal knowledge graph and one or more hop entities thereof which meet the preset matching standard as a candidate entity set.

In at least one possible implementation manner, the obtaining of attribute information of a bitmap attached to a text to be processed includes: obtaining the type information of the matching picture expressed in the text form based on an image classification strategy.

In at least one possible implementation manner, the performing named entity recognition processing by combining the text information, the deep information, the attribute information, and the visual information includes:

obtaining text context representation of the character unit according to text information of the character unit in the text to be processed, the deep information and the attribute information of the matching graph;

obtaining the visual context representation of the character unit according to the text context representation and the visual information;

fusing the text information of the character unit, the text context representation and the visual context representation to obtain a comprehensive representation of the character unit;

and according to the comprehensive representation, carrying out entity type marking on the character unit.

In at least one possible implementation manner, the performing named entity recognition processing by combining the text information, the deep information, the attribute information, and the visual information specifically includes:

combining the text to be processed, the deep information and the attribute information, performing attention calculation on the character unit, and obtaining a first association degree between the target character unit and other character units;

performing attention calculation again by using the visual information and the first relevance to obtain a second relevance blended into the image information;

dynamically combining the first relevance and the second relevance to obtain a multi-modal context representation of the target character unit;

fusing the multi-modal context with the text information of the target character unit to obtain comprehensive representation of the target character unit;

and identifying the entity type of the target character unit by using the comprehensive representation.

In a second aspect, the present invention provides a named entity recognition apparatus based on matching graph, including:

the text-level auxiliary information acquisition module is used for acquiring deep information of a text to be processed and attribute information of a matching figure attached to the text to be processed according to a preset strategy, wherein the attribute information is represented in a text form;

the basic information extraction module is used for extracting text information of the text to be processed and visual information of the matching picture;

and the named entity identification module is used for carrying out named entity identification processing by combining the text information, the deep information, the attribute information and the visual information to obtain an entity type sequence of the text to be processed.

In at least one possible implementation manner, the text-level auxiliary information obtaining module includes:

and the entity knowledge acquisition sub-module is used for acquiring entity knowledge information of the text to be processed according to a pre-constructed multi-mode knowledge map, wherein the multi-mode knowledge map comprises a plurality of entities and pictures associated with the entities.

In at least one possible implementation manner, the entity knowledge obtaining sub-module includes:

the candidate entity matching unit is used for matching a plurality of candidate entities of the text to be processed by utilizing the multi-mode knowledge graph;

the target entity screening unit is used for screening a target entity from the candidate entities by utilizing the matching picture and pictures associated with the candidate entities;

and the entity knowledge acquisition unit is used for acquiring a plurality of knowledge of the target entity from the multi-modal knowledge map as entity knowledge information of the text to be processed.

In at least one possible implementation manner, the candidate entity matching unit includes:

an alias table construction component for pre-constructing an alias table corresponding to an entity in the multimodal knowledge graph;

the entity matching component is used for matching the text to be processed with the entity name and the alias table in the multi-mode knowledge graph;

and the candidate entity construction component is used for constructing the entities in the multi-modal knowledge graph and one or more hop entities thereof which meet the preset matching standard into a candidate entity set.

In at least one possible implementation manner, the text-level auxiliary information obtaining module includes: and the matching attribute acquisition submodule is used for acquiring the type information of the matching expressed in the text form based on the image classification strategy.

In at least one possible implementation manner, the named entity identifying module includes:

the text representation calculation unit is used for solving the text context representation of the character unit according to the text information of the character unit in the text to be processed, the deep information and the attribute information of the matching graph;

the visual representation calculation unit is used for obtaining the visual context representation of the character unit according to the text context representation and the visual information;

the multi-mode fusion unit is used for fusing the text information, the text context representation and the visual context representation of the character unit to obtain the comprehensive representation of the character unit;

and the entity type labeling unit is used for carrying out entity type labeling on the character unit according to the comprehensive representation.

In at least one possible implementation manner, the text representation calculating unit is specifically configured to:

and performing attention calculation on the character unit by combining the text to be processed, the deep information and the attribute information to obtain a first association degree between the target character unit and other character units.

In at least one possible implementation manner, the visual representation calculating unit is specifically configured to perform attention calculation again by using the visual information and the first relevance to obtain a second relevance blended into the image information.

In at least one possible implementation manner, the multi-modal fusion unit is specifically configured to:

and fusing the multi-modal context and the text information of the target character unit to obtain the comprehensive representation of the target character unit.

In at least one possible implementation manner, the entity type labeling unit is specifically configured to:

In a third aspect, the present invention provides a named entity recognition device based on matching graph, including:

one or more processors, memory which may employ a non-volatile storage medium, and one or more computer programs stored in the memory, the one or more computer programs comprising instructions which, when executed by the apparatus, cause the apparatus to perform the method as in the first aspect or any possible implementation of the first aspect.

In a fourth aspect, the present invention provides a computer-readable storage medium having stored thereon a computer program which, when run on a computer, causes the computer to perform at least the method as described in the first aspect or any of its possible implementations.

In a fifth aspect, the present invention also provides a computer program product for performing at least the method of the first aspect or any of its possible implementations, when the computer program product is executed by a computer.

In at least one possible implementation manner of the fifth aspect, the relevant program related to the product may be stored in whole or in part on a memory packaged with the processor, or may be stored in part or in whole on a storage medium not packaged with the processor.

The invention has the conception that aiming at the problems that the input text in a specific scene has insufficient information and non-uniform forms, the invention leads some objects with indefinite meanings in the text to be difficult to be identified, introduces the attribute information of the image attached to the scene and the deep information of the text to assist the identification processing of the named entity, particularly, the invention provides the method for carrying out text level conversion on the image information, leads the image attribute, the deep information of the text and the basic information of the text to be unified, thus, when the basic text information, the deep information, the matching image attribute information and the basic image information are subjected to multi-mode comprehensive processing, on one hand, the problem of insufficient information of the input text can be made up, on the other hand, the spatial heterogeneity of the image and the text can be reduced, the text and the image can be fully interacted and combined deeply, and therefore the efficiency and the accuracy of the recognition of the named entity in the scene can be greatly improved.

Furthermore, the invention provides that deep information of the text is obtained by combining the multimodal knowledge graph in other embodiments, namely, a target entity can be screened from the multimodal knowledge graph by using a matching graph on the basis of text matching, so that the knowledge information related to the input text is extracted from the multimodal knowledge graph according to the target entity, and the recognition of the named entity is assisted.

Drawings

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described with reference to the accompanying drawings, in which:

FIG. 1 is a sample illustration of a social media platform;

FIG. 2 is a flowchart of an embodiment of a named entity recognition method based on matching graph according to the present invention;

FIG. 3 is a schematic diagram of an embodiment of a multimodal knowledge-graph provided by the present invention;

FIG. 4 is a flowchart illustrating a method for acquiring entity knowledge information according to a preferred embodiment of the present invention;

FIG. 5 is a flowchart illustrating a multi-modal named entity recognition processing method according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating an embodiment of a named entity recognition apparatus based on matching graph according to the present invention;

fig. 7 is a schematic diagram of an embodiment of a named entity recognition device based on a matching graph according to the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative only and should not be construed as limiting the invention.

Before describing the embodiments of the present invention, the logical context and derivation of the inventive process of the present invention will be described. In order to solve the aforementioned problem of poor recognition effect of named entities caused by short text to be processed and insufficient information quantity in the specific field, the inventors analyzed the specific field: through research and observation of a large number of examples, taking a social media platform as an example, it can be found that a large number of sample examples provided by the application environment are all related to specific topics, such as but not limited to sports, movies, music, celebrities, tourism, and the like, and although the amount of text is limited, a plurality of matching drawings are attached to the samples to be processed in such fields, and the matching drawings are typical pictures which are highly related to the topics, such as movie posters, album covers, tourist cities, character photos, symbolic buildings, and the like.

Based on the analysis, the invention primarily considers the visual information combined with matching graph to realize the named entity recognition based on multiple modes, and if the visual information combined with the matching graph is not considered and only the text "Rocky is ready for snow search" is used for determining the entity type, the "Rocky" can be easily recognized as the "People" type incorrectly, namely, the Rocky entity is labeled as "person"; by the mapping information provided by the example, it can be known that the rough probability of "Rocky" refers to the name of the dog in the mapping, and therefore, the introduction of the attached mapping information is considered to assist the text entity identification for the named entity identification task in the specific field. However, the inventor finds, through practical analysis, that samples needing to be processed in reality are complex and various, and if the identification requirements of a large number of samples cannot be met and covered only by simple combination of pictures and texts, even though the scheme can be improved compared with a scheme of identification only depending on characters, the actually obtained entity identification result is still poor, and the reason for this is that the combination of simple pictures and texts cannot dig out deeper picture information, and the picture information is difficult to generate a deep interactive relationship with the text information sufficiently, and specifically, the inventor considers that the disadvantages of named entity identification by simply combining visual information mainly include the following two points: 1) when the image information is understood, only staying on a shallow layer of a content object presented by a matching picture, the hidden information of a deeper layer of the matching picture cannot be known, for example, the matching picture of a certain text is a movie poster, the shallow layer of visual processing can only find some people or objects in the matching picture, and the actual meaning represented by the matching picture cannot be known as the movie poster, so that some entities in the corresponding text cannot be identified or have identification deviation; 2) the text information and the image information are always in respective expression space expression semantics, and the different expression spaces of the text information and the image information have larger spatial heterogeneity, so that the text information and the image information are difficult to generate deep information interaction without simple fusion of analysis and processing, and further the final entity recognition effect is also influenced.

In view of the above, the inventor considers that, for the named entity identification problem in a specific application field, if the participation of the map matching information is involved, the foregoing two technical obstacles need to be effectively solved, and accordingly the present invention provides at least one embodiment of a named entity identification method based on map matching, and as shown in fig. 2, some embodiments may specifically include:

step S1, according to a preset strategy, obtaining deep information of the text to be processed and attribute information of matching figures attached to the text to be processed.

The deep information and the attribute information of the matching map are specific solutions designed for the first technical obstacle, that is, two basic modal information of a text and an image are not simply adopted, but the deep information of the text and the attribute information of the image are mined to form multi-modal information, so that the problems that the named entities in the text cannot be identified or are identified wrongly due to insufficient input text information amount and scattered and non-uniform forms in a specific scene can be solved; in addition, in order to overcome the second obstacle mentioned above, in this embodiment, the attribute information of the matching map is represented in a text form, in other words, part of the features of the matching map are presented in text-level information, so that the semantic gap between the text and the image can be minimized, that is, the introduced image attribute modality in the text form is used as a bridge between the image semantic expression space and the text semantic expression space, so that information interaction between multiple modalities can be completed more fully, and thus the entity recognition task for a specific scene can be completed. It should be noted that the specific scenarios contemplated by the present invention include, but are not limited to, social media platforms, and any scenarios with less text information and with text-related matching drawings may be applicable.

Regarding the acquisition of the deep information of the text, different strategies can be adopted in different embodiments to extract the required deep information, such as but not limited to syntactic and grammatical dependency relationship, TF-IDF score, topic information in natural language processing, word collocation characteristics, and multidimensional deep semantic features.

Preferably, on the basis, the present invention proposes to pre-construct a multi-modal knowledge graph including not only several entities but also pictures related to the entities, and taking fig. 3 as an example, the multi-modal knowledge graph may include a plurality of knowledge triples related to the entity "Kobe Bryant", such as: (Kobe Bryant, isA, basketball player), (Kobe Bryant, friend, Lebron James), etc., in addition to that, the multimodal knowledge map includes several pictures related to the entities "Kobe Bryant", "Lebron James", "Los Angeles", wherein the pictures image1, image2, image3 … … related to each entity are indicated by dotted lines and dotted boxes on one side of Cobbyland, Lobranhancjems, Los Angeles, respectively. Those skilled in the art will appreciate that the multimodal knowledge graph of fig. 3 is merely an example, and may include multiple knowledge triples and related pictures corresponding to other entities; secondly, when the multi-modal knowledge graph is constructed, for some specific relation triples, related pictures do not need to be set, for example, the entity "basetball player" in the example of figure 3 (Kobe Bryant, isA, basetball player) may preferably not be picture associated, because the isA expresses a conceptual relationship, the corresponding entity is a conceptual entity with a higher meaning, there is no directly related picture or it will be understood that if a related picture is constructed for a concept type entity such as a basketball player, not only is resources consuming, but it is also not too substantial for the technical task of the present invention to be concerned with, and therefore it is not necessary to associate a picture for each entity in the knowledge graph, but rather sets up several pictures of relevance for a particular entity, such as the entities in the aforementioned non-conceptual relationship triplets (residential, friends, etc. relationships) on an as-needed basis.

In addition, two points are indicated: firstly, the method is different from the traditional thought, the knowledge graph or the multi-mode knowledge graph is adopted, the conventional reasoning decoding is not carried out by the knowledge graph or the multi-mode knowledge graph, more background knowledge is injected into a small amount of text content in a specific scene, namely the knowledge graph is used for reversely helping the small amount of information text to effectively identify named entities in the text content, and particularly the text entities which are difficult to obtain accurate results based on the traditional entity identification mode; second, the role of the knowledge-graph in the present invention is to provide deep information (entity knowledge) of the text, wherein the contained entities may or may not be consistent with the named entities in the text finally recognized by the present invention, in other words, the entities in the knowledge-graph are not directly related to the final entities that the present invention needs to recognize, and the role of the entities in the knowledge-graph is only to provide "supplementary knowledge".

Specifically, based on the foregoing concept, the invention further considers in some embodiments that the required map knowledge is extracted by combining character matching and image comparison, and taking fig. 4 as an example, the manner of acquiring entity knowledge information may include the following steps:

step S11, matching a plurality of candidate entities of the text to be processed by utilizing the multi-modal knowledge graph;

step S12, screening out target entities from the candidate entities by using the matching picture and pictures associated with the candidate entities;

and step S13, acquiring a plurality of knowledge of the target entity from the multi-modal knowledge graph as entity knowledge information of the text to be processed.

In practical operation, the entity matching can be performed on the input text to be processed from the multimodal knowledge graph by using, but not limited to, a forward maximum matching algorithm, and preferably, a nickname table corresponding to part or all of the entities in the multimodal knowledge graph can be pre-constructed to ensure that no omission occurs in the character matching process. Further, in other preferred embodiments of the present invention, not only several entities matched by characters from the graph may be used as candidates, but also one-hop or multi-hop entities matched may be used as candidates to jointly construct a more comprehensive and non-missing candidate entity set with respect to the text to be processed.

Then, because the scene concerned by the invention is accompanied by matching, the matching and candidate entity in the candidate entity set can be further calculated, including the similarity of the associated pictures set by one or more than one entity, and the entity corresponding to the picture with the highest similarity can be selected from the comprehensive and non-missing candidate entities according to the established threshold criteria, and the entity is taken as the target entity. Continuing with the previous example, if the match is a picture containing the number "24", and the similarity comparison shows that the picture similarity associated with one or more hops of the candidate entity "cobra" is the highest, then the candidate entity "cobra" can be found to find the initial candidate entity "cobra", and the initial candidate entity "cobra" is determined to be the target entity.

Then, it is able to extract several knowledge information of the target entity from the multimodal knowledge graph as the deep information of the text to be processed, and it should be noted here that, because a certain entity in the knowledge graph may have a plurality of relationship triples, that is, has a plurality of relationship knowledge, it is theoretically possible not to limit the number of knowledge information of the target entity, and it is possible to splice all knowledge information of the target entity with the text to be processed for subsequent identification processing, but considering that the incorporation of more knowledge information increases the amount of computation on the one hand, and on the other hand, noise may be introduced to make the final processing effect not ideal, therefore, in some preferred embodiments of the present invention, it is proposed that only part of knowledge information with higher correlation with the named entity can be extracted, for example but not limited to the entity concept knowledge proposed by the isA relationship triples, as explained in the previous example, a (Kobe Bryant, isA, basketball player) can be extracted as the deep information of the text to be processed.

For the aforementioned case that the name of an entity (or its alias) in the knowledge graph may or may not be consistent with some entities in the input text, taking the input text "Kobe is one of the great layers in NBA history" as an example, the target entity obtained after the aforementioned processing is "Kobe Bryant", and the target entity obtained may also be "Kobe", and in any case, the relationship knowledge (for example, isA triple) corresponding to the target entity is embedded into the text to be processed as entity knowledge information, and is used as the input data containing deep information of the subsequent recognition processing link. Because the present invention aims to add deep information such as entity knowledge which can be determined to the input text to be processed, rather than directly marking the "Kobe" type of the entity in the graph (the entity type in the graph is known) to the "Kobe" in the text to be processed, the idea of extracting knowledge information of the present invention can be known to not only provide recognition assistance for the "Kobe" in the "Kobe is one of the great places in NBA history," but also play a positive feedback role in recognizing other named entities (such as "NBA" and the like) in the text to be processed. Of course, if the target entity selected from the map is exactly matched with a certain entity character in the text to be processed, the entity type result output by the subsequent named entity recognition may be checked by using the entity type of the target entity extracted here, and the present invention is not limited thereto.

In the following, regarding to the step S1, referring to obtaining the attribute information of the matching drawing, in practical operations, for example, but not limited to, semantic information, content, scene, and the like of the matching drawing described in the form of text may be obtained from the matching drawing, and the implementation means may also use corresponding techniques such as feature extraction, natural language conversion, image content analysis, and the like. In practical operation, but not limited to, an image classification model inclusion v3 or ResNet, etc. may be pre-trained on ImageNet, and the model may obtain probabilities that an arrangement diagram corresponds to a large number of predetermined categories, and in this embodiment, categories corresponding to top n probability values may be taken as attribute information of an arrangement diagram, for example, top5 type attributes of a certain arrangement diagram may be obtained as: cliff, Alp, jean, megalth, ski, and can express the five attribute information into a form consistent with the text information as auxiliary embedded information for subsequent processing. That is, the attribute value of the text form of matching image can be obtained by using the existing image classification network, and the same embedding mode as that of the input text to be processed is adopted in the subsequent identification processing link, so that the interaction is generated in the same expression space, and the semantic gap problem between the image and the text is effectively relieved.

Returning to fig. 2, step S2 is to extract the text information of the text to be processed and the visual information of the matching.

The text information and the visual information may be understood as basic features of the input text to be processed and the mapping, such as, but not limited to, dividing the text into character units and obtaining word-level and/or word-level vector expressions and basic features such as part of speech, dependency relationship, etc., and extracting basic image features such as resolution, color, bit depth, saturation, brightness, texture, semantic, etc., from the mapping. The specific extraction content and the extraction means themselves may refer to the existing mature technology, and are not described herein any more, but it should be noted that although the image attribute information represented by the text is extracted in the foregoing, the image attribute cannot completely replace the image information, and particularly, the visual information of the image still has high value and can assist the information of named entity identification, so that it is proposed in this step that the image modality of the mapping still needs to be retained.

Step S3, conducting named entity recognition processing by combining the text information, the deep information, the attribute information and the visual information to obtain an entity type sequence of the text to be processed.

The aim of this step is to combine multiple modalities so that the engagement possibilities of participating objects that have an impact in the process of identifying a named entity can be enriched. That is, there are many ways to combine the text information, the deep information, the attribute information, and the visual information, for example, the basic text information and the visual information may be processed as mentioned above, and the deep information of the text and the attribute information of the matching map are fused, and then the two-stage processing results are combined; the text information directly related to the text to be processed and the deep information may be fused, the attribute information directly related to matching and the visual information are fused, and then the two fusion results are combined, so as to match the manner, and the like, based on this, in some preferred embodiments, the invention considers that the three information in the text form are first cooperatively processed and then are fused with the visual information, and specifically, the multi-modal named entity recognition processing manner shown in fig. 5 may include the following steps:

step S31, according to the text information of the character unit in the text to be processed, the deep information and the attribute information of the matching graph, the text context representation of the character unit is obtained.

In this embodiment, a single character unit is taken as a processing object, and the character unit herein may refer to one word, one symbol, and the like. In actual operation, this step may use the existing named entity recognition mode, and adopt a model architecture such as BiLSTM + CRF or Bert + CRF, etc., and take the text to be processed, the aforementioned deep information, and the attribute information as inputs, and perform attention calculation on the character unit, to obtain the first association between the target character unit (i.e., the single object whose entity type needs to be determined) and other character units.

And step S32, obtaining the visual context representation of the character unit according to the text context representation and the visual information.

In actual operation, attention calculation can be performed again by using the visual information of the matching image and the first relevance obtained in the previous step, so as to obtain a second relevance blended with the image information, that is, to obtain the context representation with the visual information.

And step S33, fusing the text information of the character unit, the text context representation and the visual context representation to obtain the comprehensive representation of the character unit.

In practical operation, the obtained first relevance and the obtained second relevance can be dynamically combined to obtain a multi-modal context representation of the target character unit, and then the multi-modal context and the text information of the target character unit can be fused to obtain a comprehensive representation of the target character unit.

And step S34, according to the comprehensive representation, carrying out entity type marking on the character unit.

Finally, the entity type of the target character unit can be identified by using the comprehensive representation.

Since the above process is not the focus of the present invention, and the applied model framework or related algorithm can refer to the existing entity recognition technology, the foregoing process is briefly described by a specific example:

the core concept of the recognition processing is that for each character unit in the text to be processed, text information, deep information, image attributes and visual information can be integrated to form a multi-modal context information representation. Specifically, the vector embedded with the entity knowledge information and the image attribute vector may be subjected to self-attention operation together with the word-char vector corresponding to the character unit in the text to be processed. Then, for the target character unit, an alignment score of the target character unit and the context is obtained, and the alignment score can be regarded as a degree of association (first degree of association) between the target character unit and the context of the text to be processed; then, based on the image visual features extracted by using tools such as VGG and the like, calculating the cross-modal visual attention by combining the obtained alignment scores between the text character units to obtain a corresponding visual context; then, but not limited to, dynamically combining the alignment scores between the character units and the visual context by using a gating fusion mechanism, obtaining the probability distribution of a dynamic fusion result through a softmax layer in a model architecture, and combining the probability distribution with the embedded representation of each character unit to perform weighted summation operation to obtain a multi-modal context finally blended into text, images, image attributes and deep knowledge; then, word-char embedding of the target character unit can be preferably fused to obtain final comprehensive vector representation containing multi-modal information; finally, the comprehensive vector can be input to a CRF layer through two fully-connected layers in the model architecture to complete the labeling task of the entity type sequence.

In conclusion, the invention has the conception that aiming at the situation that the input text in a specific scene has insufficient information and non-uniform forms, the invention leads some objects with indefinite meanings in the text to be difficult to be identified, introduces the attribute information of the image attached to the scene and the deep information of the text to assist the identification processing of the named entity, particularly, the invention provides the method for carrying out text level conversion on the image information, leads the image attribute, the deep information of the text and the basic information of the text to be unified, thus, when the basic text information, the deep information, the matching image attribute information and the basic image information are subjected to multi-mode comprehensive processing, on one hand, the problem of insufficient information of the input text can be made up, on the other hand, the spatial heterogeneity of the image and the text can be reduced, the text and the image can be fully interacted and combined deeply, and therefore the efficiency and the accuracy of the recognition of the named entity in the scene can be greatly improved.

Corresponding to the above embodiments and preferred solutions, the present invention further provides an embodiment of a named entity recognition apparatus based on a matching graph, as shown in fig. 6, which may specifically include the following components:

the text-level auxiliary information acquisition module 1 is used for acquiring deep information of a text to be processed and attribute information of a matching figure attached to the text to be processed according to a preset strategy, wherein the attribute information is represented in a text form;

the basic information extraction module 2 is used for extracting text information of the text to be processed and visual information of the matching picture;

and the named entity identification module 3 is used for carrying out named entity identification processing by combining the text information, the deep information, the attribute information and the visual information to obtain an entity type sequence of the text to be processed.

It should be understood that the division of each component in the assignment diagram-based named entity recognition apparatus shown in fig. 6 is merely a logical division, and the actual implementation may be wholly or partially integrated into one physical entity or physically separated. And these components may all be implemented in software invoked by a processing element; or may be implemented entirely in hardware; and part of the components can be realized in the form of calling by the processing element in software, and part of the components can be realized in the form of hardware. For example, a certain module may be a separate processing element, or may be integrated into a certain chip of the electronic device. Other components are implemented similarly. In addition, all or part of the components can be integrated together or can be independently realized. In implementation, each step of the above method or each component above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.

For example, the above components may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), one or more microprocessors (DSPs), one or more Field Programmable Gate Arrays (FPGAs), etc. For another example, these components may be integrated together and implemented in the form of a System-On-a-Chip (SOC).

In view of the foregoing examples and preferred embodiments thereof, it will be appreciated by those skilled in the art that, in practice, the technical idea underlying the present invention may be applied in a variety of embodiments, the present invention being schematically illustrated by the following vectors:

(1) a named entity recognition device based on matching graph. The device may specifically include: one or more processors, memory, and one or more computer programs, wherein the one or more computer programs are stored in the memory, the one or more computer programs comprising instructions, which when executed by the apparatus, cause the apparatus to perform the steps/functions of the foregoing embodiments or an equivalent implementation.

Fig. 7 is a schematic structural diagram of an embodiment of a named entity recognition device based on a matching graph provided in the present invention, where the device may be a cloud server, a computer of a related platform, an intelligent terminal, or the like.

As shown in particular in fig. 7, the match diagram-based named entity recognition device 900 includes a processor 910 and a memory 930. Wherein, the processor 910 and the memory 930 can communicate with each other and transmit control and/or data signals through the internal connection path, the memory 930 is used for storing computer programs, and the processor 910 is used for calling and running the computer programs from the memory 930. The processor 910 and the memory 930 may be combined into a single processing device, or more generally, separate components, and the processor 910 is configured to execute the program code stored in the memory 930 to implement the functions described above. In particular implementations, the memory 930 may be integrated with the processor 910 or may be separate from the processor 910.

In addition to this, to further improve the functionality of the mapping-based named entity recognition device 900, the device 900 may further comprise one or more of an input unit 960, a display unit 970, an audio circuit 980, a camera 990, a sensor 901, etc., which may further comprise a speaker 982, a microphone 984, etc. The display unit 970 may include a display screen, among others.

Further, the apparatus 900 may also include a power supply 950 for providing power to various devices or circuits within the apparatus 900.

It should be understood that the operation and/or function of the various components of the apparatus 900 can be referred to in the foregoing description with respect to the method, system, etc., and the detailed description is omitted here as appropriate to avoid repetition.

It should be understood that the processor 910 in the named entity recognition device 900 based on a match graph shown in fig. 7 may be a system on a chip SOC, and the processor 910 may include a Central Processing Unit (CPU), and may further include other types of processors, for example: an image Processing Unit (GPU), etc., which will be described in detail later.

In summary, various portions of the processors or processing units within the processor 910 may cooperate to implement the foregoing method flows, and corresponding software programs for the various portions of the processors or processing units may be stored in the memory 930.

(2) A readable storage medium, on which a computer program or the above-mentioned apparatus is stored, which, when executed, causes the computer to perform the steps/functions of the above-mentioned embodiments or equivalent implementations.

In the several embodiments provided by the present invention, any function, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a computer readable storage medium. Based on this understanding, some aspects of the present invention may be embodied in the form of software products, which are described below, or portions thereof, which substantially contribute to the art.

(3) A computer program product (which may include the above-described apparatus) that, when run on a terminal device, causes the terminal device to perform the mapping-based named entity recognition method of the preceding embodiment or an equivalent.

From the above description of the embodiments, it is clear to those skilled in the art that all or part of the steps in the above implementation method can be implemented by software plus a necessary general hardware platform. With this understanding, the above-described computer program products may include, but are not limited to, refer to APP; in the foregoing, the device/terminal may be a computer device, and the hardware structure of the computer device may further specifically include: at least one processor, at least one communication interface, at least one memory, and at least one communication bus; the processor, the communication interface and the memory can all complete mutual communication through the communication bus. The processor may be a central Processing unit CPU, a DSP, a microcontroller, or a digital Signal processor, and may further include a GPU, an embedded Neural Network Processor (NPU), and an Image Signal Processing (ISP), and may further include a specific integrated circuit ASIC, or one or more integrated circuits configured to implement the embodiments of the present invention, and the processor may have a function of operating one or more software programs, and the software programs may be stored in a storage medium such as a memory; and the aforementioned memory/storage media may comprise: non-volatile memories (non-volatile memories) such as non-removable magnetic disks, U-disks, removable hard disks, optical disks, etc., and Read-Only memories (ROM), Random Access Memories (RAM), etc.

In the embodiments of the present invention, "at least one" means one or more, "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, and means that there may be three relationships, for example, a and/or B, and may mean that a exists alone, a and B exist simultaneously, and B exists alone. Wherein A and B can be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" and similar expressions refer to any combination of these items, including any combination of singular or plural items. For example, at least one of a, b, and c may represent: a, b, c, a and b, a and c, b and c or a and b and c, wherein a, b and c can be single or multiple.

Those of skill in the art will appreciate that the various modules, elements, and method steps described in the embodiments disclosed in this specification can be implemented as electronic hardware, combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In addition, the embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments may be referred to each other. In particular, for embodiments of devices, apparatuses, etc., since they are substantially similar to the method embodiments, reference may be made to some of the descriptions of the method embodiments for their relevant points. The above-described embodiments of devices, apparatuses, etc. are merely illustrative, and modules, units, etc. described as separate components may or may not be physically separate, and may be located in one place or distributed in multiple places, for example, on nodes of a system network. Some or all of the modules and units can be selected according to actual needs to achieve the purpose of the above-mentioned embodiment. Can be understood and carried out by those skilled in the art without inventive effort.

The structure, features and effects of the present invention have been described in detail with reference to the embodiments shown in the drawings, but the above embodiments are merely preferred embodiments of the present invention, and it should be understood that technical features related to the above embodiments and preferred modes thereof can be reasonably combined and configured into various equivalent schemes by those skilled in the art without departing from and changing the design idea and technical effects of the present invention; therefore, the invention is not limited to the embodiments shown in the drawings, and all the modifications and equivalent embodiments that can be made according to the idea of the invention are within the scope of the invention as long as they are not beyond the spirit of the description and the drawings.

Claims

1. A named entity recognition method based on matching graph is characterized by comprising the following steps:

according to a preset strategy, acquiring deep information of a text to be processed and attribute information of a matching picture attached to the text to be processed, wherein the attribute information is represented in a text form; the acquiring of the deep information of the text to be processed comprises: acquiring entity knowledge information of a text to be processed according to a pre-constructed multi-modal knowledge map, wherein the multi-modal knowledge map comprises a plurality of entities and pictures associated with the entities;

2. The named entity recognition method of claim 1, wherein the obtaining entity knowledge information of the text to be processed according to a pre-constructed multi-modal knowledge graph comprises:

3. The named entity recognition method of claim 2, wherein matching a number of candidate entities of text to be processed using the multimodal knowledge graph comprises:

4. The named entity recognition method of claim 1, wherein obtaining attribute information of a bitmap attached to the text to be processed comprises: obtaining the type information of the matching picture expressed in the text form based on an image classification strategy.

5. The named entity recognition method of any one of claims 1 to 4, wherein the named entity recognition processing in combination with the text information, the deep information, the attribute information, and the visual information comprises:

6. The named entity recognition method of claim 5, wherein the conducting named entity recognition processing in combination with the textual information, the deep information, the attribute information, and the visual information specifically comprises:

7. A named entity recognition apparatus based on matching graph, comprising:

the text-level auxiliary information acquisition module is used for acquiring deep information of a text to be processed and attribute information of a matching figure attached to the text to be processed according to a preset strategy, wherein the attribute information is represented in a text form; the acquiring of the deep information of the text to be processed comprises: acquiring entity knowledge information of a text to be processed according to a pre-constructed multi-modal knowledge map, wherein the multi-modal knowledge map comprises a plurality of entities and pictures associated with the entities;

8. The named entity recognition device of claim 7, wherein the named entity recognition module comprises:

9. A named entity recognition device based on a matching graph, comprising:

one or more processors, memory, and one or more computer programs, wherein the one or more computer programs are stored in the memory, the one or more computer programs comprising instructions which, when executed by the apparatus, cause the apparatus to perform the atlas-based named entity identification method of any of claims 1-6.