CN112528639A

CN112528639A - Object recognition method and device, storage medium and electronic equipment

Info

Publication number: CN112528639A
Application number: CN202011376835.9A
Authority: CN
Inventors: 张龙
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-11-30
Filing date: 2020-11-30
Publication date: 2021-03-19
Anticipated expiration: 2040-11-30
Also published as: CN112528639B

Abstract

The invention discloses an object identification method and device, a storage medium and electronic equipment. Wherein, the method comprises the following steps: acquiring a combined knowledge graph corresponding to the combination of the objects of the interest points to be identified, converting the combined knowledge graph into a combined knowledge tree, and distributing position codes for each tree node in the combined knowledge tree; inputting the combined knowledge tree and the position code into a text recognition network to obtain object text characteristics corresponding to the object combination of the interest points; inputting combined characteristics after splicing object text characteristics and object space characteristics into an object identification network; in the case where the recognition result output by the object recognition network indicates that the degree of similarity between the respective objects in the point-of-interest object combination is greater than or equal to the target threshold value, the respective objects in the point-of-interest object combination are recognized as the same object. The method and the device can be used for map association scenes. The method and the device solve the technical problem of low accuracy of the result of identifying and distinguishing the similar POI entity objects.

Description

Object recognition method and device, storage medium and electronic equipment

Technical Field

The present invention relates to the field of computers, and in particular, to an object recognition method and apparatus, a storage medium, and an electronic device.

Background

In a geographic information system, a certain landmark is usually identified by a Point of Interest (POI) to represent functions, such as government departments, business institutions of various industries (gas stations, department stores, supermarkets, restaurants, hotels, convenience stores, hospitals, etc.), tourist attractions, or transportation facilities (various stations, parking lots, speeding cameras, speed limit signs), etc. However, different naming methods are often used for the same POI entity object or the same naming method is used for different POI entity objects due to different expression habits. This makes many similar POI entity objects appear in the real scene, but difficult to identify and distinguish.

The identification method of similar POI entity objects commonly used in the related art at present is to perform interactive judgment by using a Convolutional Neural Network (CNN). For example, the name text of the POI entity object is firstly segmented, then word vector representation of each word is pre-trained, similarity between any two words is calculated from the angle of the words to construct a matching matrix, then the CNN is utilized to extract features, and splicing comprehensive judgment is carried out on discretization vector representation of other attributes such as addresses and the like in an output layer to judge whether the words are similar.

However, in the similarity determination process, the similarity calculation interaction degree between every two words in the word granularity is low, so that the combination relationship between features cannot be effectively obtained when the CNN performs feature extraction, and the accuracy of the result of identifying and distinguishing similar POI entity objects is low.

No effective solution has been proposed to the above-mentioned problems.

Disclosure of Invention

The embodiment of the invention provides an object identification method and device, a storage medium and electronic equipment, which at least solve the technical problem of low accuracy of results of identifying and distinguishing similar POI entity objects caused by low similarity calculation interaction degree between every two POI entity objects in word granularity.

According to an aspect of an embodiment of the present invention, there is provided an object recognition method including: acquiring a combined knowledge graph corresponding to an interest point object combination to be identified, wherein the combined knowledge graph comprises text attribute labels of object texts corresponding to each object in the interest point object combination and text relation labels among the object texts; converting the combined knowledge graph into a combined knowledge tree, and distributing position codes for each tree node in the combined knowledge tree; inputting the combined knowledge tree and the position code into a text recognition network to acquire object text characteristics corresponding to the interest point object combination, wherein the text recognition network is used for recognizing the context relationship among words in the object text; inputting object space information corresponding to each object in the interest point object combination into a feature extraction network based on an attention mechanism to acquire object space features corresponding to the interest point object combination, wherein the feature extraction network based on the attention mechanism is used for performing cross combination on a plurality of space attribute sub-features extracted from the object space information; inputting the combined features after splicing the object text features and the object space features into an object recognition network, wherein the object recognition network is a neural network obtained by performing machine training by using a plurality of sample data; and in the case that the identification result output by the object identification network indicates that the similarity between the objects in the point-of-interest object combination is greater than or equal to a target threshold, identifying the objects in the point-of-interest object combination as the same object.

According to another aspect of the embodiments of the present invention, there is also provided an object recognition apparatus, including: a first obtaining unit, configured to obtain a combined knowledge graph corresponding to a point-of-interest object combination to be identified, where the combined knowledge graph includes text attribute tags of object texts corresponding to respective objects in the point-of-interest object combination, and text relationship tags between the object texts; the configuration unit is used for converting the combined knowledge graph into a combined knowledge tree and distributing position codes for each tree node in the combined knowledge tree; a second obtaining unit, configured to input the combined knowledge tree and the position code into a text recognition network to obtain an object text feature corresponding to the interest point object combination, where the text recognition network is used to recognize a context relationship between words in the object text; a third obtaining unit, configured to input object space information corresponding to each object in the interest point object combination into a feature extraction network based on an attention mechanism, so as to obtain object space features corresponding to the interest point object combination, where the feature extraction network based on the attention mechanism is configured to perform cross-combination on a plurality of spatial attribute sub-features extracted from the object space information; the splicing unit is used for inputting the combined characteristics after the object text characteristics and the object space characteristics are spliced into an object recognition network, wherein the object recognition network is a neural network obtained by performing machine training by using a plurality of sample data; and a recognition unit configured to recognize each object in the point-of-interest object combination as the same object in a case where the recognition result output by the object recognition network indicates that the degree of similarity between each object in the point-of-interest object combination is greater than or equal to a target threshold.

According to a further aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to execute the above-mentioned object recognition method when running.

According to still another aspect of the embodiments of the present invention, there is also provided an electronic device, including a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the object recognition method through the computer program.

In the embodiment of the invention, a combined knowledge graph corresponding to the combination of the objects of interest points to be identified is adopted, the combined knowledge graph is converted into a combined knowledge tree, and position codes are distributed to each tree node in the combined knowledge tree; inputting the combined knowledge tree and the position code into a text recognition network to obtain object text characteristics corresponding to the object combination of the interest points; inputting object space information corresponding to each object in the interest point object combination into a feature extraction network based on an attention mechanism to obtain object space features corresponding to the interest point object combination; inputting combined characteristics after splicing object text characteristics and object space characteristics into an object identification network; under the condition that the recognition result output by the object recognition network indicates that the similarity between the objects in the interest point object combination is greater than or equal to the target threshold, the object text characteristics corresponding to the interest point object combination are acquired by inputting the combined knowledge tree and the position code into the text recognition network in a mode of recognizing the objects in the interest point object combination as the same object, and the purpose of accurately and effectively acquiring the combination relation between the characteristics when CNN is used for characteristic extraction is achieved, so that the technical effect of improving the accuracy of the result of recognizing and distinguishing the similar POI entity objects is achieved, and the technical problem of low accuracy of the result of recognizing and distinguishing the similar POI entity objects caused by low similarity calculation interaction degree between every two similar POI entity objects in word granularity is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a schematic diagram of an application environment of an alternative object recognition method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an application environment of an alternative object recognition method according to an embodiment of the present invention;

FIG. 3 is a schematic flow chart diagram of an alternative object recognition method according to an embodiment of the present invention;

fig. 4 is a schematic diagram illustrating POI similarity determination performed in an optional MatchPyramid manner based on the MatchPyramid model in the related art;

FIG. 5 is a schematic diagram of the structure of a name knowledge graph of an alternative object recognition method according to an embodiment of the present invention;

FIG. 6 is an architectural diagram of an alternative object recognition method according to an embodiment of the invention;

FIG. 7 is a schematic diagram of a model architecture of an alternative object recognition method according to an embodiment of the present invention;

FIG. 8 is a schematic representation of a POI knowledge tree of an alternative object recognition method in accordance with embodiments of the present invention;

FIG. 9 is a schematic diagram of a POI optimization process of an alternative object recognition method according to an embodiment of the present invention;

FIG. 10 is a schematic illustration of soft position coding and visibility matrices for an alternative object recognition method according to an embodiment of the present invention;

FIG. 11 is a schematic diagram of an alternative object recognition method using AFM network optimization, according to an embodiment of the present invention;

FIG. 12 is a schematic diagram of a process for optimizing using an AFM network according to another alternative object recognition method in accordance with embodiments of the present invention;

FIG. 13 is a schematic diagram of a process of knowledge distillation pruning for an alternative object recognition method according to an embodiment of the present invention;

FIG. 14 is a schematic diagram of an alternative object recognition apparatus according to an embodiment of the present invention;

fig. 15 is a schematic structural diagram of an alternative electronic device according to an embodiment of the invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

According to an aspect of the embodiments of the present invention, an object recognition method is provided, and optionally, as an optional implementation manner, the object recognition method may be applied to, but is not limited to, an environment as shown in fig. 1. The above data resource management method can be applied, but not limited, to the application environment shown in fig. 1. The application environment comprises: the terminal equipment 102, the network 104 and the server 106 are used for human-computer interaction with the user. The user 108 and the terminal device 102 can perform human-computer interaction, and a data resource management application client runs in the terminal device 102. The terminal device 102 includes a human-machine interaction screen 1022, a processor 1024, and a memory 1026. The human-computer interaction screen 1022 is used for presenting whether each object in the point-of-interest object combination is the same object; the processor 1024 is configured to obtain a combined knowledge graph corresponding to an interest point object combination to be identified, where the combined knowledge graph includes text attribute tags of object texts corresponding to each object in the interest point object combination and text relationship tags between the object texts; the memory 1026 is used for storing object text characteristics corresponding to the point-of-interest object combinations.

In addition, the server 106 includes a database 1062 and a processing engine 1064, where the database 1062 is used to store object text features corresponding to the point-of-interest object combinations. The processing engine 1064 is configured to identify the objects in the point-of-interest object combination as the same object in a case where the identification result output by the object identification network indicates that the degree of similarity between the objects in the point-of-interest object combination is greater than or equal to the target threshold.

The specific process comprises the following steps: assuming that an object identification application client is running in the terminal device 102 shown in fig. 1, the user 108 operates the human-computer interaction screen 1022 to search for an interest point, and in step S102, obtains a combined knowledge graph corresponding to an object combination of interest points to be identified, where the combined knowledge graph includes text attribute tags of object texts corresponding to respective objects in the object combination of interest points and text relationship tags between the object texts; then step S104 is carried out, the combined knowledge graph is converted into a combined knowledge tree, and a position code is distributed to each tree node in the combined knowledge tree; step S106, identifying a network by the combined knowledge tree and the position code input text; step S108, inputting object space information corresponding to each object in the interest point object combination into a feature extraction network based on an attention mechanism to acquire object space features corresponding to the interest point object combination, wherein the feature extraction network based on the attention mechanism is used for performing cross combination on a plurality of space attribute sub-features extracted from the object space information; step S110, inputting the combined features after splicing the object text features and the object space features into an object recognition network, wherein the object recognition network is a neural network obtained after machine training is carried out by utilizing a plurality of sample data; and step S112, in the case that the identification result output by the object identification network indicates that the similarity between the objects in the interest point object combination is greater than or equal to a target threshold, identifying the objects in the interest point object combination as the same object. Step S114, returning the identification result of each object in the point-of-interest object combination to the terminal device 102.

As another alternative, the object identification method described above in this application may be applied to the application environment shown in fig. 2. As shown in fig. 2, a user 202 and a terminal device 204 can interact with each other. The user equipment 204 includes a memory 206 and a processor 208. The terminal device 204 in this embodiment may refer to, but is not limited to, performing the operations performed by the terminal device 102 described above, and performing management operations on the respective object recognition results in the point-of-interest object combination.

Alternatively, the terminal device 102 and the terminal device 204 may be, but not limited to, a mobile phone, a tablet computer, a notebook computer, a PC, and the like, and the network 104 may include, but is not limited to, a wireless network or a wired network. Wherein, this wireless network includes: WIFI and other networks that enable wireless communication. Such wired networks may include, but are not limited to: wide area networks, metropolitan area networks, and local area networks. The server 106 may include, but is not limited to, any hardware device capable of performing computations.

In the related art, similar judgment of a Point of Interest (POI) is performed based on a MatchPyramid matching manner. As shown in fig. 4, the essence is to use a Convolutional Neural Network (CNN) to perform interactive determination. The method is characterized in that firstly, texts such as names and the like are segmented, then word vector representation of each word is pre-trained, similarity between any two words is calculated from the angle of the words to construct a matching matrix, then, a CNN (convolutional neural network) is used for feature extraction, and splicing comprehensive judgment is carried out on discretization vector representation of other attributes such as addresses and the like at an output layer to judge whether the words are similar.

The existing MatchPyramid scheme has poor similarity judgment effect due to insufficient congenital model learning capacity, and is specifically represented in the following 2 aspects:

1) the ability to understand and combine attributes such as POI names, addresses, etc. from different sources is insufficient.

The name interaction mode of the MatchPyramid model has two disadvantages. One is strongly dependent on pre-trained word vector representations after word segmentation. If the word segmentation is wrong or the word vector pre-training mode is unreasonable, the overall similarity judgment is easy to be wrong. Secondly, the degree of interaction of pairwise similarity calculation on the granularity of the words is low, and the method cannot effectively adapt to the name distribution situation of the real world. For example, for similar entities, "the development and reform committee government service center" in north river province and "the administrative service center (development and reform committee in north river province)", the matching matrix constructed by the MatchPyramid model through word segmentation interaction cannot effectively represent the matching relationship between the two, and thus the similar entities cannot be identified. For example, the 'le shan san jiang biochemistry technology limited company' and the 'le shan san jiang new port technology limited company' have only two characters, and the MatchPyramid model misjudges different entities with the same structure and similar texts to be similar. Meanwhile, in the original scheme, the name similarity features extracted by MatchPyramid are directly spliced with the discretized address similarity features, so that the combination relationship among the features cannot be effectively learned, and the object identification effect is influenced.

2) Knowledge of the vertical domain of the map is not well understood.

Compared with the number of words of the general Natural Language Processing (NLP for short) text of news and the like, the POI entity name has the advantages of less number, high abstract and summary semantic meaning and uniqueness in a vertical field. The existing scheme only starts from the text angle of the surface layer, and is easy to cause similarity judgment errors. For example, the "longding garden 9 building" and the "nine buildings in the south and the west areas of the green landscape in the pear garden" are similar in relation in the real world, but cannot be recognized from the perspective of texts.

In order to solve the foregoing technical problem, an embodiment of the present invention provides an object identification method, as an optional implementation manner, and as shown in fig. 3, the object identification method includes:

s302, acquiring a combined knowledge graph corresponding to an interest point object combination to be identified, wherein the combined knowledge graph comprises text attribute labels of object texts corresponding to each object in the interest point object combination and text relation labels among the object texts;

s304, converting the combined knowledge graph into a combined knowledge tree, and distributing position codes for each tree node in the combined knowledge tree;

s306, inputting the combined knowledge tree and the position code into a text recognition network to obtain object text characteristics corresponding to the interest point object combination, wherein the text recognition network is used for recognizing the context relationship among words in the object text;

s308, inputting the object space information corresponding to each object in the interest point object combination into a feature extraction network based on an attention mechanism to obtain object space features corresponding to the interest point object combination, wherein the feature extraction network based on the attention mechanism is used for performing cross combination on a plurality of space attribute sub-features extracted from the object space information;

s310, inputting combined features obtained by splicing the text features and the space features of the object into an object recognition network, wherein the object recognition network is a neural network obtained by performing machine training by using a plurality of sample data;

s312, in the case where the recognition result output by the object recognition network indicates that the degree of similarity between the respective objects in the point-of-interest object combination is greater than or equal to the target threshold, the respective objects in the point-of-interest object combination are recognized as the same object.

In step S302, in actual application, a combined knowledge graph corresponding to a to-be-identified point-of-interest object combination is obtained, where the combined knowledge graph includes text attribute tags of object texts corresponding to each object in the point-of-interest object combination and text relationship tags between the object texts;

as shown in fig. 5, it can be known from the map that the objects of interest to be identified are combined into "longding garden 9 building" and "nine buildings in south and west areas of green view of a pear garden", and what the expression of "longding garden 9 building" is the position sub-point 9 building in the special name of the area of the longding garden. The nine buildings in the south-Ri-West area of the green-scape of the pear garden are required to be correctly participated and sentence-broken first, then the name of the street in the pear garden is identified, the name of the south-Ri-West area of the green-scape is a special name of a community, and 9 buildings are 9 buildings. The text attribute label in the 'Longding garden 9 th building' can be a main sub-relation, the role is a cell special name, and the role of the 9 th building is a position sub-point and the like.

In step S304, during actual application, the combined knowledge graph is converted into a combined knowledge tree, and a position code is assigned to each tree node in the combined knowledge tree; as shown in fig. 8, each character corresponds to a soft position code of a character, such as a knowledge tree constructed by dragon, tripod, role, kingdom, china, etc., and the soft position code of a character corresponds to 1 for dragon and 6 for building.

In step S306, in practical application, the combined knowledge tree and the position code are input into a text recognition network to obtain object text features corresponding to the object combination of the interest point, where the text recognition network is used to recognize context relationships between words in the object text; the text recognition network may here be the ALBert network model. The method can obtain the object text characteristics corresponding to the combination of the 'Longding garden 9 th building' and the 'pear garden Cuiying-southern-Rixi-district-nine-span' through the ALBert network model.

In step S308, in actual application, the object space information corresponding to each object in the interest point object combination is input to the feature extraction network based on the attention mechanism to obtain the object space features corresponding to the interest point object combination, for example, the geographical location of each object in "longding garden 9 th building" and "nine places in south and west areas of the pear garden" can be input to the feature extraction network based on the attention mechanism to obtain the spatial features such as the geographical locations corresponding to the two. Here, the feature extraction network may be an AFM network model.

In step S310, in actual application, a combined feature obtained by splicing an object text feature and an object space feature is input to an object recognition network, where the object recognition network is a neural network obtained by performing machine training using a plurality of sample data; here, for example, the text features of the object after being processed by the ALBert network model and the attention mechanism feature extraction network (AFM) of each object of "lounge garden 9 building" and "nine buildings in south and western areas of the green view of the pear garden" may be obtained from the spatial features such as the geographic positions corresponding to the two, and then the text features of the object and the spatial features such as the geographic positions of the two may be spliced and input to the object identification network.

In step S312, in actual application, for example, when the target threshold is set to be 95% of similarity, "longding garden 9 building" and "nine places in the south china and west of the green view of the pear garden" and when the recognition result output by the object recognition network indicates that the similarity between the objects in the point-of-interest object combination is greater than or equal to 95%, the "longding garden 9 building" and the "nine places in the south china and west of the green view of the pear garden" are regarded as the same object.

In the embodiment of the invention, a combined knowledge graph corresponding to the combination of the objects of interest points to be identified is adopted, the combined knowledge graph is converted into a combined knowledge tree, and position codes are distributed to each tree node in the combined knowledge tree; inputting the combined knowledge tree and the position code into a text recognition network to obtain object text characteristics corresponding to the object combination of the interest points; inputting object space information corresponding to each object in the interest point object combination into a feature extraction network based on an attention mechanism to obtain object space features corresponding to the interest point object combination; inputting combined characteristics after splicing object text characteristics and object space characteristics into an object identification network; under the condition that the recognition result output by the object recognition network indicates that the similarity between the objects in the interest point object combination is greater than or equal to the target threshold, the object text characteristics corresponding to the interest point object combination are acquired by inputting the combination knowledge tree and the position code into the text recognition network in a mode of recognizing the objects in the interest point object combination as the same object, and the purpose of accurately and effectively acquiring the combination relation between the characteristics when CNN is used for characteristic extraction is achieved, so that the technical effect of improving the accuracy of the result of recognizing and distinguishing the similar POI entity objects is achieved, and the technical problem that the accuracy of the result of recognizing and distinguishing the similar POI entity objects is low is solved.

In one embodiment, step S304 includes: converting each character in each object text into a trunk tree node of a combined knowledge tree, as shown in fig. 8, all characters in the "9 th building of Longding garden" and the "nine trees in the south-Ri-West area of the green view of the pear garden" can be combined into the knowledge tree and are the trunk tree nodes; distinguishing placeholders are configured among different object texts; for example, "CLS," "syn," and "SEP" in FIG. 8 are location placeholders. Generating branch tree nodes associated with the trunk tree nodes according to the text attribute labels and the text relationship labels in the combined knowledge graph, wherein the branch tree nodes comprise attribute identification characters used for identifying the text attribute labels or relationship identification characters used for identifying the text relationship labels; here, the branch tree nodes are, for example, "role" and "syn", and the attributes identify characters such as "proper name" and "place name". And allocating position codes for each tree node in the trunk tree nodes and the branch tree nodes in the combined knowledge tree. For example, the position code corresponding to the character "dragon" is 1, and the code corresponding to the character "garden" is 3.

In one embodiment, the step of generating the branch tree nodes associated with the trunk tree nodes according to the text attribute tags and the text relationship tags in the combined knowledge-graph comprises: the text attribute label or the text relation label is used as a branch which is branched from a target trunk tree node in the trunk tree nodes to obtain a branch mark of the branch tree node; as in fig. 8, "role" is the branch identification of the branch tree node. Taking an attribute identification character of the text attribute label as a first branch tree node connected with a target trunk tree node, wherein the first branch tree node is a tree node corresponding to a 'role', and if the 'role' is taken as the first branch tree node connected with the target trunk tree node, a 'garden' is the target trunk tree node; converting the attribute characters indicated by the text attribute labels into branch tree nodes connected with the first branch tree nodes; the attribute character may be, for example, "proper name".

Or taking the relation identification character of the text relation label as a second branch tree node connected with the target trunk tree node, and converting the relation character indicated by the text relation label into a branch tree node connected with the second branch tree node. Here, the relationship identification character may be "syn", and the second branching tree node may be a tree node corresponding to "syn"; the relationship character may be "Cuiyan south Lii cell".

In one embodiment, assigning a position code to each of the trunk tree nodes and branch tree nodes in the combined knowledge tree comprises: determining a plurality of node character strings from the combined knowledge tree, wherein the node character strings comprise trunk character strings and branch character strings, the trunk character strings comprise characters on trunk tree nodes, and each branch character string comprises characters on part of trunk tree nodes and all branch tree nodes for forming a branch; position coding is performed starting from a root node of each node string, wherein the root node is located in a trunk tree node. That is, each string is encoded starting with the first character (root node). For example, in fig. 8, there is a combined knowledge tree composed of 4 character strings, the main character string may be "CLS dragon ancient garden 9 building SEP pear garden emerald green landscape south china nine-span", the branch character string may be "CLS dragon ancient garden role special name" CLS dragon ancient garden 9 building SEP pear garden role place name ". The encoding starts with the character "CLS" starting encoding to 0.

In one embodiment, after assigning a position code to each of the trunk tree nodes and the branch tree nodes in the combined knowledge tree, the method further comprises: configuring characteristic visible authority among all the trunk tree nodes; configuring feature visible authorities among all branch tree nodes on the same branch; configuring invisible authority of features between tree nodes located on different trunks, wherein the tree nodes on the different trunks include: trunk tree nodes, any one branch tree node and branch tree nodes on different branches. As shown in fig. 10, no authority is configured between the branch tree nodes "CLS dragon ancient park" and "pear garden" on the same branch, and no authority is configured between the tree nodes on different branches such as "role" and "green view south and western area". However, extra knowledge noise is inevitably brought in the process, for example, the information acquired for the sequence of # two soft positions is: the dragon ancient cooking vessel-role-green scene south Lissi district, which is an error sequence, belongs to the noise brought in the knowledge introduction process. In order to solve the problem, rules for introducing knowledge are further specified on the basis that: the context of the same branch in the knowledge tree is visible so as to retain original information and priori knowledge; different branches are not visible, and knowledge noise is avoided. In a specific implementation, the method is realized by constructing a Transformer node of the visual matrix optimization ALBert. The visibility matrix constructed by the above example is shown in fig. 10, the color is gray to indicate that two features are visible, the color is invisible, the role and the green south-west area are invisible from the red frame area, and the normalized value of the softmax function calculated by the corresponding Transformer tends to zero and does not directly affect the state. While other visible features work properly. Thereby eliminating the noise effect.

Based on the soft position coding and the visible matrix, the lexical, semantic and prior knowledge in name understanding can be introduced to improve the accuracy of similar recognition. For example, the similar judgment of wrongly written characters is solved by utilizing phonetic letters and shape-similar characters in a character method, such as Laodi casserole porridge, Laodi Chaoshan casserole porridge, Jinguan mansion and Jinduzhongxia; by utilizing synonym matching in the lexical method, the similarity judgment of the 9 th building in the Longding garden and the nine-span buildings in the Cuiying south-Rixi area in the pear garden can be realized; by utilizing the main and sub-hierarchical relationship in semantics, the fact that the Nanmen of the humanistic institute of China of Beijing post and telecommunications university is dissimilar to the Nanmen of Beijing post and telecommunications university is recognized as different entities and the like.

In one embodiment, step S306 includes: inputting the combined knowledge tree and the position code into a text recognition network to acquire object text characteristics corresponding to the point-of-interest object combination, wherein the object text characteristics comprise: converting characters on each tree node of the combined knowledge tree into character vectors; converting the position codes distributed to each tree node into position code vectors; converting characters on each tree node of the combined knowledge tree into identification vectors according to object identifications of objects to which the characters belong; carrying out weighted summation on the character vector, the position coding vector and the identification vector to obtain a candidate text characteristic vector corresponding to the object combination of the interest point; and calculating the candidate text characteristic vector through a multilayer conversion structure in the text recognition network to obtain the object text characteristic. As shown in fig. 9, the character vector may be a vector corresponding to each character below Token embedding, and the position encoding vector may be a position encoding vector corresponding to each character below soft position embedding. The identification vector may be, for example, that each character identification vector "a" is set in the object "CLS dragon role named syn emerald landscape south west zone No. 9 building", each character identification vector "B" is set in the object "SEP pear role named nineteen buildings in emerald landscape south west zone", and then the character vectors, the position encoding vectors, and the identification vectors are weighted and summed to obtain candidate text feature vectors corresponding to the object combination of the interest point.

In an embodiment, step S306 further includes: acquiring an original text recognition network, wherein the original text recognition network comprises an N-layer M-dimensional conversion structure; and cutting the original text recognition network to obtain the text recognition network, wherein the text recognition network comprises a P-layer K-dimensional conversion structure, P is less than N, and K is less than M. The text recognition network may here be the ALBert network model. As shown in fig. 13, the Knowledge-based ALBert model and AFM model can well solve the POI similarity determination problem, but have a higher requirement on the recognition speed in some resource-limited scenes, and at this time, the efficiency needs to be processed at a high-quality level. In order to solve the problem, a Knowledge distillation pruning scheme is adopted, a Knowledge ALBert model and an AFM model are used as a teacher network, and a smaller network is used as a student network for Knowledge migration. The specific distillation scheme is shown in figure 13: the original number of network layers is changed from 4 layers to 2 layers during distillation, the multi-head attention mechanism is changed from 12 heads x26 to 12 heads x12, and the parameters of the crossed layers are changed from shared to unshared so as to improve the characterization capability. Thus, the object recognition efficiency can be improved by 3.5 times on the basis of almost no loss of precision.

In one embodiment, step S308 includes: acquiring a plurality of spatial attribute sub-features extracted from object spatial information; and acquiring space attribute vectors corresponding to the plurality of space attribute sub-features respectively, and performing cross combination calculation on the plurality of space attribute vectors to obtain the object space features. As shown in fig. 11, the spatial attribute sub-features may be address pixels, point/plane distances, classification matching, road network conflicts, and the like, and then the plurality of spatial attribute sub-features are cross-combined to obtain the object spatial feature.

In an embodiment, the performing cross-combination computation on the plurality of spatial attribute vectors to obtain the object spatial feature includes: traversing the plurality of spatial attribute vectors, performing the following operations: acquiring a current spatial attribute vector; respectively carrying out point multiplication calculation on the current spatial attribute vector and other spatial attribute vectors except the current spatial attribute vector in the plurality of spatial attribute vectors to obtain a to-be-combined feature vector corresponding to the current spatial attribute vector; and acquiring the next spatial attribute vector as the current spatial attribute vector. And integrating and calculating the feature vectors to be combined corresponding to the plurality of spatial attribute vectors through an attention processing layer in the feature extraction network based on an attention mechanism to obtain the object spatial features. The network parameters in the feature extraction network of the attention mechanism are obtained through training and can be obtained through the following steps: and acquiring sample data training, and adjusting weight parameters through back propagation. As shown in FIG. 11, the to-be-combined feature vector Emb layer corresponding to the address similarity vector is v 0x 0, dot product calculations are respectively performed with the other three space attribute vectors to obtain (v0 v1) x0x1, (v0 v2) x0x2, (v0 v3) x0x3, and then the three are integrally calculated through the attention layer to obtain the object space feature.

In one embodiment, obtaining the plurality of spatial attribute sub-features extracted from the object space information comprises at least one of: acquiring the address sub-characteristics of each object in the interest point object combination; obtaining the classification sub-characteristics of each object in the interest point object combination; acquiring a distance sub-feature between each object in the interest point object combination and a reference object; and acquiring the road network sub-characteristics of each object in the interest point object combination in the road environment. Here, as shown in fig. 11, the spatial attribute sub-features may be spatial sub-features such as address pixels, point/plane distances, class matching, road network conflicts, and the like, which is not limited herein.

And the FM network is utilized to realize the cross combination of different features, and on the other hand, an attention mechanism is utilized to automatically learn the importance of the cross features. For example, whether two convenience bees which are equally spaced by 300 meters are similar to two hospitals or not is shown in fig. 12, the convenience bees are classified as convenience stores, are small points, have low tolerance to distance, and are not similar to each other; the comprehensive hospital has certain uniqueness, large outline area and high tolerance to distance errors, and the comprehensive hospital are similar. Such high-order cross features as distance and classification can be automatically learned through an AFM model network.

In one embodiment, step S310 includes: and inputting the spliced combination characteristics in the object identification network into a normalization index function configured for the object identification network so as to calculate and obtain the similarity between the objects in the interest point object combination. And then judging whether the objects are the same object according to the similarity of the objects in each interest point object combination. When the target threshold is set to be that the similarity is 95%, and the similarity between the 'Longding garden 9 building' and the 'nine-span in the Nanlixi area of the Cui landscape' of the Pear garden is greater than or equal to 95% under the condition that the recognition result output by the object recognition network indicates that the similarity between the objects in the interest point object combination is greater than or equal to 95%, the 'Longding garden 9 building' and the 'nine-span in the Nanlixi area of the Cui landscape' of the Pear garden are regarded as the same object.

Based on the above embodiments, in an application embodiment, an architecture diagram of the object recognition method is shown in fig. 6 as follows. Mainly divided into an algorithm strategy layer. The algorithm strategy layer comprises entity analysis and similar calculation. When dealing with intelligence entities, first an understanding analysis is performed, including entity analysis: name understanding, address understanding, space understanding, and the like, the name understanding including: the method comprises the following steps of word segmentation role, synonym analysis, hierarchical analysis and semantic function; the address understanding includes: identifying address roles, standardizing, calculating grades and extracting entities; the space is understood as: dotted line, category, chain store, source, etc. The underlying features are then generated and a priori knowledge is introduced. And then carrying out similarity judgment by using a Knowledge ALBert model and an AFM model.

Based on the ALBert network model, the Knowledge Graph fusion map vertical domain prior Knowledge is introduced to optimize the model architecture diagram of the name similarity scheme, and as shown in FIG. 7, the model takes the ALBert network at the leading edge of the industry as a baseline model to extract the name similarity characteristics of the POI entity. The ALBert network carries out pre-training on the basis of mass corpus knowledge, so that a large amount of general knowledge of Chinese texts is fused. Meanwhile, the ALBert network depends on the strong learning capacity of the Transformer structure, so that the model has strong generalization capacity on the basis of a small amount of special sample Fine-tuning, and is effectively suitable for POI name texts with different sources and different characteristics.

The object identification method disclosed by the invention integrates a POI (point of interest) similarity scheme of a knowledge map and an ALBert network model, improves the generalization capability of the model by introducing the prior knowledge in the vertical field of the map and the scheme of optimizing feature combination learning based on the deep learning network at the front edge of the industry, greatly improves the precision of object similarity calculation, and improves the precision of character identification from 92% to 98.5% on the same test set. Meanwhile, when the POI similarity judgment method is actually deployed, the efficiency is greatly improved by 3.5 times on the basis of almost unchanged precision through knowledge distillation pruning operation, the POI similarity judgment method can be applied under the condition of resource limitation, and the POI similarity judgment requirements under different scenes are met.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

According to another aspect of the embodiment of the present invention, there is also provided an object recognition apparatus for implementing the above object recognition method. As shown in fig. 14, the apparatus includes:

a first obtaining unit 1402, configured to obtain a combined knowledge graph corresponding to a to-be-identified interest point object combination, where the combined knowledge graph includes text attribute tags of object texts corresponding to each object in the interest point object combination, and text relationship tags between the object texts;

a configuration unit 1404, which converts the combined knowledge graph into a combined knowledge tree and allocates a position code to each tree node in the combined knowledge tree;

a second obtaining unit 1406, configured to input the combined knowledge tree and the position code into a text recognition network to obtain an object text feature corresponding to the point-of-interest object combination, where the text recognition network is used to identify a context relationship between words in the object text;

a third obtaining unit 1408, configured to input object space information corresponding to each object in the interest point object combination into a feature extraction network based on an attention mechanism, so as to obtain object space features corresponding to the interest point object combination, where the feature extraction network based on the attention mechanism is configured to perform cross combination on a plurality of spatial attribute sub-features extracted from the object space information;

a splicing unit 1410, configured to input a combined feature obtained by splicing the object text feature and the object spatial feature into an object identification network, where the object identification network is a neural network obtained after performing machine training by using multiple sample data;

a recognition unit 1412 configured to recognize the respective objects in the point-of-interest object combination as the same object in case that the recognition result output by the object recognition network indicates that the degree of similarity between the respective objects in the point-of-interest object combination is greater than or equal to the target threshold.

In this embodiment, a combined knowledge graph corresponding to an interest point object combination to be identified is obtained, where the combined knowledge graph includes text attribute tags of object texts corresponding to each object in the interest point object combination and text relationship tags between the object texts;

In the embodiment, the combined knowledge graph is converted into a combined knowledge tree, and a position code is distributed to each tree node in the combined knowledge tree; as shown in fig. 8, each character corresponds to a soft position code of a character, such as a knowledge tree constructed by dragon, tripod, role, kingdom, china, etc., and the soft position code of a character corresponds to 1 for dragon and 6 for building.

In this embodiment, in practical application, the combined knowledge tree and the position code are input into a text recognition network to obtain object text features corresponding to the object combination of the interest point, where the text recognition network is used to recognize context relationships between words in the object text; the text recognition network may here be the ALBert network model. The method can obtain the object text characteristics corresponding to the combination of the 'Longding garden 9 th building' and the 'pear garden Cuiying-southern-Rixi-district-nine-span' through the ALBert network model.

In this embodiment, in actual application, the object space information corresponding to each object in the interest point object combination is input to the feature extraction network based on the attention mechanism to obtain the object space features corresponding to the interest point object combination, for example, the geographical location of each object in "longding garden 9 th building" and "nine places in south, lixi area of the pear garden" may be input to the feature extraction network based on the attention mechanism to obtain the spatial features such as the geographical locations corresponding to the two. Here, the feature extraction network may be an AFM network model.

In this embodiment, a combined feature obtained by splicing an object text feature and an object space feature is input to an object recognition network, wherein the object recognition network is a neural network obtained by performing machine training by using a plurality of sample data; here, for example, the text features of the object after being processed by the ALBert network model for each of the "lounge garden 9 building" and "nine buildings in south and western areas of the green view of the pear garden" and the attention mechanism feature extraction network (attention mechanism Machine) may be obtained from the spatial features such as the geographic positions corresponding to the object text features and the geographic positions, and then the spatial features such as the object text features and the geographic positions of the two may be spliced and input into the object identification network.

In the present embodiment, for example, when the target threshold is set to be 95% of similarity, "longding garden 9 building" and "nine places in the south and the west areas of the green view of the pear garden" when the recognition result output by the object recognition network indicates that the similarity between the objects in the point-of-interest object combination is greater than or equal to 95%, the "longding garden 9 building" and the "nine places in the south and the west areas of the green view of the pear garden" are regarded as the same object.

For other examples of this embodiment, reference may be made to the above embodiments, which are not described herein again.

According to yet another aspect of the embodiments of the present invention, there is also provided an electronic device for implementing the object recognition method, as shown in fig. 15, the electronic device includes a memory 1502 and a processor 1504, the memory 1502 stores a computer program, and the processor 1504 is configured to execute the steps of any one of the method embodiments described above through the computer program.

Optionally, in this embodiment, the electronic device may be located in at least one network device of a plurality of network devices of a computer network.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

s1, acquiring a combined knowledge graph corresponding to the interest point object combination to be identified, wherein the combined knowledge graph comprises text attribute labels of object texts corresponding to each object in the interest point object combination and text relationship labels among the object texts;

s2, converting the combined knowledge graph into a combined knowledge tree, and distributing position codes for each tree node in the combined knowledge tree;

s3, inputting the combined knowledge tree and the position code into a text recognition network to obtain object text characteristics corresponding to the interest point object combination, wherein the text recognition network is used for recognizing the context relationship among words in the object text;

s4, inputting the object space information corresponding to each object in the interest point object combination into a feature extraction network based on attention mechanism to obtain the object space features corresponding to the interest point object combination, wherein the feature extraction network based on attention mechanism is used for cross-combining a plurality of space attribute sub-features extracted from the object space information;

s5, inputting combined features obtained by splicing the text features and the space features of the object into an object recognition network, wherein the object recognition network is a neural network obtained by performing machine training by using a plurality of sample data;

s6, in the case where the recognition result output by the object recognition network indicates that the degree of similarity between the respective objects in the point-of-interest object combination is greater than or equal to the target threshold, the respective objects in the point-of-interest object combination are recognized as the same object.

Alternatively, it can be understood by those skilled in the art that the structure shown in fig. 15 is only an illustration, and the electronic device may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 15 does not limit the structure of the electronic device. For example, the electronic device may also include more or fewer components (e.g., network interfaces, etc.) than shown in FIG. 15, or have a different configuration than shown in FIG. 15.

The memory 1502 may be used for storing software programs and modules, such as program instructions/modules corresponding to the object recognition method and apparatus in the embodiments of the present invention, and the processor 1504 executes various functional applications and data processing by running the software programs and modules stored in the memory 1502, that is, implements the object recognition method. The memory 1002 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 1002 may further include memory located remotely from the processor 1004, which may be connected to the terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 1004 may be specifically, but not limited to, used for storing information such as a combined knowledge map corresponding to a combination of point-of-interest objects to be identified. As an example, as shown in fig. 15, the memory 1502 may include, but is not limited to, a first obtaining unit 1402, a configuration unit 1404, a second obtaining unit 1406, a third obtaining unit 1408, a splicing unit 1410, and a recognition unit 1412 in the object recognition apparatus. In addition, the device may further include, but is not limited to, other module units in the object recognition apparatus, which is not described in detail in this example.

Optionally, the transmission device 1506 is used for receiving or transmitting data via a network. Examples of the network may include a wired network and a wireless network. In one example, the transmission device 1506 includes a Network adapter (NIC) that can be connected to a router via a Network cable and other Network devices to communicate with the internet or a local area Network. In one example, the transmission device 1506 is a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

In addition, the electronic device further includes: a display 1508 for displaying similar POI entity objects; and a connection bus 1510 for connecting the respective module parts in the above-described electronic apparatus.

In other embodiments, the terminal device or the server may be a node in a distributed system, where the distributed system may be a blockchain system, and the blockchain system may be a distributed system formed by connecting a plurality of nodes through a network communication. Nodes can form a Peer-To-Peer (P2P, Peer To Peer) network, and any type of computing device, such as a server, a terminal, and other electronic devices, can become a node in the blockchain system by joining the Peer-To-Peer network.

According to a further aspect of an embodiment of the present invention, there is also provided a computer-readable storage medium having a computer program stored thereon, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.

Alternatively, in the present embodiment, the above-mentioned computer-readable storage medium may be configured to store a computer program for executing the steps of:

Alternatively, in this embodiment, a person skilled in the art may understand that all or part of the steps in the methods of the foregoing embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing one or more computer devices (which may be personal computers, servers, network devices, etc.) to execute all or part of the steps of the method according to the embodiments of the present invention.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. An object recognition method, comprising:

acquiring a combined knowledge graph corresponding to an interest point object combination to be identified, wherein the combined knowledge graph comprises text attribute labels of object texts corresponding to each object in the interest point object combination and text relation labels among the object texts;

converting the combined knowledge graph into a combined knowledge tree, and distributing position codes for each tree node in the combined knowledge tree;

inputting the combined knowledge tree and the position code into a text recognition network to obtain object text characteristics corresponding to the interest point object combination, wherein the text recognition network is used for recognizing context relations among words in the object text;

inputting object space information corresponding to each object in the interest point object combination into an attention-based feature extraction network to obtain object space features corresponding to the interest point object combination, wherein the attention-based feature extraction network is used for performing cross combination on a plurality of space attribute sub-features extracted from the object space information;

inputting the combined features after splicing the object text features and the object space features into an object recognition network, wherein the object recognition network is a neural network obtained after machine training is carried out by utilizing a plurality of sample data;

and in the case that the identification result output by the object identification network indicates that the similarity between the objects in the interest point object combination is greater than or equal to a target threshold, identifying the objects in the interest point object combination as the same object.

2. The method of claim 1, wherein transforming the combined knowledge-graph into a combined knowledge-tree and assigning a position code to each tree node in the combined knowledge-tree comprises:

converting each character in each object text into a trunk tree node of the combined knowledge tree, wherein distinguishing placeholders are configured among different object texts;

generating branch tree nodes associated with the trunk tree nodes according to the text attribute tags and the text relationship tags in the combined knowledge graph, wherein the branch tree nodes comprise attribute identification characters used for identifying the text attribute tags or relationship identification characters used for identifying the text relationship tags;

assigning the position code to each of the trunk tree nodes and the branch tree nodes in the combined knowledge tree.

3. The method of claim 2, wherein generating a trunk node associated with the trunk node from the text attribute tags and the text relationship tags in the combined knowledge-graph comprises:

taking the text attribute label or the text relation label as a branch which is branched from a target trunk tree node in the trunk tree nodes to obtain a branch identifier of the branch tree node;

taking the attribute identification character of the text attribute label as a first branch tree node connected with the target trunk tree node, and converting the attribute character indicated by the text attribute label into the branch tree node connected with the first branch tree node; or, the relation identification character of the text relation label is used as a second branch tree node connected with the target trunk tree node, and the relation character indicated by the text relation label is converted into the branch tree node connected with the second branch tree node.

4. The method of claim 2, wherein assigning a position code to each of the trunk tree nodes and the branch tree nodes in the combined knowledge tree comprises:

determining a plurality of node character strings from the combined knowledge tree, wherein the node character strings comprise trunk character strings and branch character strings, the trunk character strings comprise characters on the trunk tree nodes, and each branch character string comprises characters on part of trunk tree nodes and all branch tree nodes for forming a branch;

and carrying out position coding from a root node of each node character string, wherein the root node is positioned in the trunk tree node.

5. The method of claim 2, further comprising, after said assigning the position code to each of the trunk tree nodes and the branch tree nodes in the combined knowledge tree:

configuring characteristic visible authority for each trunk tree node;

configuring feature visible authority among all branch tree nodes on the same branch;

configuring feature invisible permissions between tree nodes located on different trunks, wherein the tree nodes on the different trunks include: the trunk tree node, any one branch tree node and branch tree nodes on different branches.

6. The method of any one of claims 1 to 5, wherein associating the combined knowledge tree with the location-coded input text recognition network to obtain object text features corresponding to the point-of-interest object combination comprises:

converting characters on each tree node of the combined knowledge tree into character vectors;

converting the position code allocated to each of the tree nodes into a position code vector; converting characters on each tree node of the combined knowledge tree into identification vectors according to the object identification of the object to which the characters belong;

carrying out weighted summation on the character vector, the position coding vector and the identification vector to obtain a candidate text feature vector corresponding to the interest point object combination;

and calculating the candidate text characteristic vector through a multilayer conversion structure in the text recognition network to obtain the object text characteristic.

7. The method of claim 1, wherein prior to associating the combined knowledge tree with the location-encoded input text recognition network to obtain object text features corresponding to the point-of-interest object combination, further comprising:

acquiring an original text recognition network, wherein the original text recognition network comprises an N-layer M-dimensional conversion structure;

and cutting the original text recognition network to obtain the text recognition network, wherein the text recognition network comprises a P-layer K-dimensional conversion structure, P is less than N, and K is less than M.

8. The method of claim 1, wherein inputting the object space information corresponding to each object in the point-of-interest object combination into a feature extraction network based on an attention mechanism to obtain the object space feature corresponding to the point-of-interest object combination comprises:

acquiring the plurality of spatial attribute sub-features extracted from the object space information;

and acquiring space attribute vectors corresponding to the plurality of space attribute sub-features respectively, and performing cross combination calculation on the plurality of space attribute vectors to obtain the object space features.

9. The method of claim 8, wherein the cross-combining the plurality of spatial attribute vectors to obtain the spatial feature of the object comprises:

traversing a plurality of the spatial attribute vectors, performing the following operations:

acquiring a current spatial attribute vector;

respectively performing point multiplication calculation on the current spatial attribute vector and other spatial attribute vectors except the current spatial attribute vector in the plurality of spatial attribute vectors to obtain a feature vector to be combined corresponding to the current spatial attribute vector;

acquiring a next space attribute vector as the current space attribute vector;

and performing integrated calculation on the feature vectors to be combined corresponding to the plurality of spatial attribute vectors through an attention processing layer in the feature extraction network based on the attention mechanism to obtain the object spatial features.

10. The method according to claim 8, wherein the obtaining the plurality of spatial attribute sub-features extracted from the object space information comprises at least one of:

acquiring the address sub-characteristics of each object in the interest point object combination;

acquiring the classification sub-features of each object in the interest point object combination;

acquiring a distance sub-feature between each object in the interest point object combination and a reference object;

and acquiring the road network sub-characteristics of each object in the interest point object combination in the road environment.

11. The method according to any one of claims 1 to 10, wherein inputting the combined features of the object text features and the object space features after splicing into an object recognition network comprises:

and inputting the spliced combination characteristics in the object identification network into a normalized exponential function configured for the object identification network so as to calculate and obtain the similarity between the objects in the interest point object combination.

12. An object recognition apparatus, comprising:

the device comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring a combined knowledge graph corresponding to an interest point object combination to be identified, and the combined knowledge graph comprises text attribute labels of object texts corresponding to each object in the interest point object combination and text relation labels among the object texts;

the configuration unit is used for converting the combined knowledge graph into a combined knowledge tree and distributing position codes for each tree node in the combined knowledge tree;

a second obtaining unit, configured to input the combined knowledge tree and the position code into a text recognition network to obtain an object text feature corresponding to the interest point object combination, where the text recognition network is used to identify a context relationship between words in the object text;

a third obtaining unit, configured to input object space information corresponding to each object in the point-of-interest object combination into an attention-based feature extraction network to obtain object space features corresponding to the point-of-interest object combination, where the attention-based feature extraction network is configured to perform cross-combination on a plurality of spatial attribute sub-features extracted from the object space information;

the splicing unit is used for inputting the combined features obtained by splicing the object text features and the object space features into an object recognition network, wherein the object recognition network is a neural network obtained by performing machine training by using a plurality of sample data;

a recognition unit configured to recognize each object in the point-of-interest object combination as the same object in a case where the recognition result output by the object recognition network indicates that the degree of similarity between each object in the point-of-interest object combination is greater than or equal to a target threshold.

13. A computer-readable storage medium, comprising a stored program, wherein the program when executed performs the method of any of claims 1 to 11.

14. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method of any of claims 1 to 11 by means of the computer program.