CN113360791B

CN113360791B - Interest point query method and device of electronic map, road side equipment and vehicle

Info

Publication number: CN113360791B
Application number: CN202110730642.7A
Authority: CN
Inventors: 王昆; 余威
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-06-29
Filing date: 2021-06-29
Publication date: 2023-07-18
Anticipated expiration: 2041-06-29
Also published as: CN113360791A

Abstract

The disclosure provides a method and a device for inquiring interest points of an electronic map, road side equipment and a vehicle, and relates to the technical field of deep learning and intelligent traffic in the technical field of artificial intelligence. Comprising the following steps: identifying the signboard image to be queried to obtain initial text content and attribute information of the initial text content of the signboard image to be queried, filtering the initial text content according to the attribute information of the initial text content to filter invalid text content in the initial text content to obtain valid text content, and determining the preset interest point as the interest point of the signboard image to be queried if the valid text content is matched with the text content of the preset interest point in the electronic map, so that the text content matched with the text content of the interest point is reduced, the matching efficiency is improved, the query efficiency is further improved, and the noise interference of the invalid text content on the matching is avoided, thereby improving the technical effects of accuracy and reliability of query.

Description

Interest point query method and device of electronic map, road side equipment and vehicle

Technical Field

The disclosure relates to the technical field of deep learning and intelligent traffic in the technical field of artificial intelligence, in particular to a method and a device for inquiring interest points of an electronic map, road side equipment and a vehicle.

Background

In the electronic map, one point of interest (Point of Interest, POI) may be a house, a shop, a mailbox, a bus stop, etc.

In the prior art, a commonly adopted interest point query method includes: the longest common subsequence (Longest Common Sequency, LCS) querying method, for example, determines a character string corresponding to a sign image to be queried (referred to as a character string a), determines a character string corresponding to each interest point (referred to as a character string B), determines a common character string between the character string a and each character string B, and determines the character string B having the longest common character string with the character string a as the interest point corresponding to the sign image to be queried.

However, by determining all the character strings of the signboard image and comparing all the character strings, the comparison time of the query is long, so that the technical problem of low query efficiency is caused.

Disclosure of Invention

The disclosure provides an interest point query method and device for improving query efficiency of an electronic map, road side equipment and a vehicle.

According to a first aspect of the present disclosure, there is provided a method for querying an interest point of an electronic map, including:

identifying a signboard image to be queried to obtain initial text content of the signboard image to be queried and attribute information of the initial text content;

Filtering the initial text content according to the attribute information of the initial text content to filter invalid text content in the initial text content to obtain valid text content;

if the effective text content is matched with the text content of a preset interest point in the electronic map, determining the preset interest point as the interest point of the signboard image to be queried, wherein the electronic map is provided with a plurality of interest points, and each interest point in the plurality of interest points is provided with the text content.

According to a second aspect of the present disclosure, there is provided a training method of a signboard text filter model, comprising:

obtaining a first sample set comprising a plurality of sample signage images;

determining rectangular boxes for framing the interest point names of each sample signboard image, and determining image information and text position information of each rectangular box;

inputting the image information, the text position information and the interest point names in each rectangular frame into a Board-transformer model frame, training the Board-transformer model frame, and generating a signboard text filtering model, wherein the signboard text filtering model is used for filtering invalid text contents in a signboard image to be queried.

According to a third aspect of the present disclosure, there is provided an interest point query device of an electronic map, including:

the identification unit is used for identifying the signboard image to be queried to obtain the initial text content of the signboard image to be queried and the attribute information of the initial text content;

the filtering unit is used for filtering the initial text content according to the attribute information of the initial text content so as to filter invalid text content in the initial text content and obtain valid text content;

the first determining unit is used for determining the preset interest point as the interest point of the signboard image to be queried if the effective text content is matched with the text content of the preset interest point in the electronic map, wherein the electronic map is provided with a plurality of interest points, and each interest point in the plurality of interest points is provided with text content.

According to a fourth aspect of the present disclosure, there is provided a training device for a sign text filtering model, comprising:

an acquisition unit configured to acquire a first sample set including a plurality of sample signboard images;

a second determining unit configured to determine a rectangular frame for framing a point of interest name of each of the sample signboard images, and determine image information and text position information of each of the rectangular frames;

The training unit is used for inputting the image information, the text position information and the interest point names in each rectangular frame into a Board-transformer model frame, training the Board-transformer model frame and generating a signboard text filtering model, wherein the signboard text filtering model is used for filtering invalid text contents in a signboard image to be queried.

According to a fifth aspect of the present disclosure, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect; or to enable the at least one processor to perform the method of the second aspect.

According to a sixth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of the first aspect; alternatively, the computer instructions are for causing the computer to perform the method of the second aspect.

According to a seventh aspect of the present disclosure, there is provided a computer program product comprising: a computer program stored in a readable storage medium, from which at least one processor of an electronic device can read, the at least one processor executing the computer program causing the electronic device to perform the method of the first aspect; alternatively, execution of the computer program by the at least one processor causes the electronic device to perform the method of the second aspect.

According to an eighth aspect of the present disclosure, there is provided a vehicle comprising: the apparatus of the second aspect.

According to a ninth aspect of the present disclosure, there is provided a roadside apparatus, comprising: the apparatus of the second aspect.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a scene graph of a point of interest query method that may implement an electronic map of one embodiment of the present disclosure;

FIG. 2 is a scene graph of a point of interest query method that may implement an electronic map of one embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a method for searching points of interest of an electronic map in the related art;

FIG. 4 is a schematic diagram according to a first embodiment of the present disclosure;

FIG. 5 is a schematic diagram according to a second embodiment of the present disclosure;

FIG. 6 is a schematic diagram according to a third embodiment of the present disclosure;

FIG. 7 is a schematic diagram of the principles of training a text matching model according to the present disclosure;

FIG. 8 is a schematic diagram according to a fourth embodiment of the present disclosure;

FIG. 9 is a schematic diagram according to a fifth embodiment of the present disclosure;

FIG. 10 is a schematic diagram of a point of interest query method of an electronic map according to the present disclosure;

FIG. 11 is a schematic diagram according to a sixth embodiment of the present disclosure;

FIG. 12 is a schematic diagram of training of a sign text filtering model according to the present disclosure;

FIG. 13 is a schematic diagram according to a seventh embodiment of the disclosure;

FIG. 14 is a schematic diagram according to an eighth embodiment of the disclosure;

FIG. 15 is a schematic diagram according to a ninth embodiment of the present disclosure;

FIG. 16 is a schematic diagram according to a tenth embodiment of the present disclosure;

Fig. 17 is a block diagram of an electronic device for implementing the point of interest query method of the electronic map of the embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Points of interest are a term in geographic information systems that generally refer to all geographic objects that can be abstracted into points, especially some geographic entities that are closely related to people's life, such as schools, banks, restaurants, gas stations, hospitals, supermarkets, and the like. The main purpose of the interest points is to describe the addresses of things or events, so that the description capability and the inquiry capability of the positions of the things or events can be enhanced to a great extent, and the accuracy and the speed of geographic positioning are improved.

With the wide application of artificial intelligence in intelligent transportation and smart cities, in order to facilitate the travel of users, interest points are marked in an electronic map, for example, interest points representing schools, banks and the like can be marked in the electronic map.

Point of interest queries (which may also be referred to as point of interest searches, or map searches), are one of the underlying technologies for location information services, directly affecting the user's service experience. Point of interest queries are technically equivalent to Web search, which supports users searching for points of interest that are related to geographic locations.

In one example, the interest point query may be applied to update processing of interest points marked on the electronic map, such as adding missing interest points on the electronic map, or adding newly added interest points, or deleting outdated interest points, and so on.

For example, for a newly constructed office building, points of interest of the newly constructed office building may be newly added to the electronic map based on position information of the newly constructed office building, or the like.

For example, in connection with the application scenario shown in FIG. 1, user 101 may transmit a sign image of a newly constructed office building to computer 102.

The computer 102 may determine whether the interest point corresponding to the signboard image is marked in the electronic map by using an interest point query method, and if it is determined that the interest point corresponding to the signboard image is not marked in the electronic map, the computer 102 may mark the interest point corresponding to the newly constructed office building in the electronic map according to the signboard image.

In another example, the point of interest query may be applied to path planning to implement business such as autopilot.

For example, in connection with the application scenario shown in fig. 2, a vehicle 201 is traveling on a road 202.

A user (not shown in the figure) in the vehicle 201 may input a sign image of a destination to the vehicle 201.

The vehicle 201 adopts an interest point query method to determine the interest point of the signboard image in the electronic map, and determines a driving path according to the current position and the interest point, thereby realizing automatic driving.

As can be seen in fig. 3, in the related art, a commonly adopted method for querying an interest point includes:

a first step of: optical character recognition (OCR, optical Character Recognition) is performed on the sign image to be queried to obtain text content of the sign image to be queried. Specifically, the text content is "XX medicine, NO233XX garden store" as shown in fig. 3.

And a second step of: text matching (LCS) is performed on the text content with the point of interest names in a point of interest name library (e.g., the POI name library shown in fig. 3).

The interest point name library comprises the following components: a name library composed of names of interest points which are already marked in the electronic map. For example, "XX Garden, XX Bank" as shown in FIG. 3 may be included in the point of interest name library.

The specific text matching method can comprise the following steps: determining a character string of the text content (i.e., a character string of "XX medicine, NO233XX Garden store"); determining character strings of each interest point name in the interest point name library (namely character strings of 'XX Garden', character strings of 'XX Bank', and the like); and determining the common character strings of the text content and the character strings of the names of the points of interest, determining the longest common character string from the common character strings, and determining the name of the point of interest corresponding to the longest common character string as the name of the point of interest of the signboard image to be queried, thereby determining the point of interest marked by the name of the point of interest on the electronic map as the point of interest of the signboard image to be queried (namely, the query result shown in fig. 3).

However, on the one hand, when the related method shown in fig. 3 is adopted to determine the interest point of the to-be-queried signboard image, by matching all text contents of the to-be-queried signboard image with the interest point names, there may be a technical problem that the accuracy of the query result is low due to invalid text contents in the text contents.

For example, the "NO233XX garden" shown in fig. 3 is an invalid text content, and although the point of interest name corresponding to the "NO233XX garden" can be matched, the point of interest on the electronic map obtained by the matching is "XX bank". That is, the disadvantage of obtaining the wrong point of interest is caused by the interference of the partially invalid text content in the text content.

On the other hand, when the interest points of the signboard images to be queried are determined by adopting the related method shown in fig. 3, by matching all text contents of the signboard images to be queried with the interest point names, the text contents for text matching are more, and text matching needs to be performed in a relatively long time, so that the technical problem of low query efficiency is caused.

To avoid at least one of the above technical problems, the inventors of the present disclosure have creatively worked to obtain the inventive concept of the present disclosure: after the text content of the sign image to be queried is obtained, filtering processing is carried out on invalid text content in the text content so as to match the text content of the interest point based on the valid text content, and thus the interest point of the sign image to be queried is obtained.

Based on the above inventive concept, the disclosure provides a method, a device, a road side device and a vehicle for inquiring interest points of an electronic map, which are applied to the technical fields of deep learning and intelligent traffic in the technical field of artificial intelligence, and can be particularly applied to an automatic driving scene so as to improve inquiring efficiency and accuracy.

Fig. 4 is a schematic diagram of a first embodiment of the present disclosure, and as shown in fig. 4, an interest point query method of an electronic map provided in this embodiment includes:

S401: identifying the signboard image to be queried to obtain the initial text content of the signboard image to be queried and the attribute information of the initial text content.

An executing body of the embodiment may be an interest point query device (hereinafter referred to as a query device) of an electronic map, and the query device may be a server (including a local server and a cloud server, where the server may be a cloud control platform, a vehicle-road collaborative management platform, a central subsystem, an edge computing platform, a cloud computing platform, etc.), may also be a road side device, may also be a vehicle (such as a vehicle-mounted terminal in a vehicle, etc.), may also be a terminal device, may also be a processor, may also be a chip, etc., and the embodiment is not limited.

In the system architecture of intelligent traffic road cooperation, the road side equipment comprises the road side sensing equipment and the road side computing equipment, wherein the road side sensing equipment (such as a road side camera) is connected to the road side computing equipment (such as a road side computing unit RSCU), the road side computing equipment is connected to a server, and the server can communicate with an automatic driving or assisted driving vehicle in various modes; alternatively, the roadside awareness device itself includes a computing function, and the roadside awareness device is directly connected to the server. The above connections may be wired or wireless.

The attribute information of the initial text content refers to attributes of the initial text, such as semantic attributes, location attributes, image attributes, and the like. That is, attribute information of the initial text content may be used to describe the initial text content from different dimensions.

In some embodiments, the query sign image may be identified by an optical character recognition method, so as to obtain the initial text content and attribute information of the initial text content.

S402: and filtering the initial text content according to the attribute information of the initial text content to filter invalid text content in the initial text content, so as to obtain valid text content.

Illustratively, in combination with the related art as shown in fig. 3, the initial text content includes "XX medicine, NO233XX garden", and in this embodiment, the querying device performs filtering processing on the invalid text "NO233XX garden" in the initial text content "XX medicine, NO233XX garden" based on attribute information of the initial text content, to obtain the valid text "XX medicine".

S403: and if the effective text content is matched with the text content of the preset interest point in the electronic map, determining the preset interest point as the interest point of the signboard image to be queried.

The electronic map is provided with a plurality of interest points, and each interest point in the plurality of interest points is provided with text content.

In combination with the above example, the effective text "XX medicine" is adopted to match the text content of the preset interest point by filtering the ineffective text "NO233XX garden", and as shown in fig. 3, "XX medicine" is matched with the text content of the preset interest point matched with "XX medicine" by respectively matching the "XX medicine" with "XX garden, XX bank" to obtain the interest point of the signboard image to be queried.

Based on the above analysis, the embodiment of the disclosure provides a method for searching for interest points of an electronic map, including: identifying a signboard image to be queried to obtain initial text content and attribute information of the initial text content of the signboard image to be queried, filtering the initial text content according to the attribute information of the initial text content to filter invalid text content in the initial text content to obtain valid text content, and determining the preset interest point as the interest point of the signboard image to be queried if the valid text content is matched with the text content of the preset interest point in the electronic map, wherein the electronic map is provided with a plurality of interest points, each interest point in the plurality of interest points is provided with text content, and in the embodiment, the method comprises the following steps: after the text content of the signboard image to be queried is obtained, invalid text content in the text content is filtered so as to be matched with the text content of the interest point based on the valid text content, so that the characteristics of the interest point of the signboard image to be queried are obtained, the text content used for being matched with the text content of the interest point can be reduced, the matching efficiency is improved, the query efficiency is further improved, and the noise interference of the invalid text content on the matching is avoided by filtering the invalid text content, so that the technical effects of accuracy and reliability of the query are improved.

Fig. 5 is a schematic diagram according to a second embodiment of the present disclosure, and as shown in fig. 5, an interest point query method of an electronic map provided in this embodiment includes:

s501: identifying the signboard image to be queried to obtain the initial text content of the signboard image to be queried and the attribute information of the initial text content.

For example, regarding the implementation principle of S501, reference may be made to the first embodiment, which is not described herein.

Wherein the attribute information of the initial text content includes: semantic attributes, location attributes, and image attributes.

The semantic attribute of the initial text content refers to information related to the semantic of the initial text content (such as the name of the interest point in the initial text content); the location attribute of the initial text content refers to information related to the location of the initial text content (such as the pixel location of the initial text content); the image attribute of the initial text content refers to information related to an image (e.g., image identification) of the initial text content.

S502: and respectively carrying out coding processing on the semantic attribute, the position attribute and the image attribute of the initial text content to obtain coding feature sets comprising the corresponding coding features.

Illustratively, S502 may include the steps of:

A first step of: and carrying out coding processing on semantic attributes of the initial text content according to a first coder of the pre-trained signboard text filtering model to obtain first coding features.

And a second step of: and encoding the position attribute of the initial text content according to a first encoder of the signboard text filtering model to obtain a second encoding characteristic.

And a third step of: and encoding the image attribute of the initial text content according to the first encoder of the signboard text filtering model to obtain a third encoding characteristic.

Wherein the sign text filtering model is: training the Board-transducer model framework based on each sample signboard image in the first sample set.

In the embodiment, the semantic attribute, the position attribute and the image attribute of the initial text content are respectively encoded by using the signboard text filtering model generated based on the multi-mode Board-transformer model framework training, so that parallel encoding processing can be realized, the encoding processing is not affected, and the accuracy and the reliability of each encoding feature in the obtained encoding feature set are improved.

S503: and determining a first coding feature from the coding feature set, taking the first coding feature as a filtering reference coding feature, and filtering coding features except the first coding feature in the coding feature set to filter invalid text content in the initial text content so as to obtain valid text content.

It should be understood that "first" of the first-order encoding features of the present embodiment is used to distinguish the second-order encoding feature hereinafter, and is not to be construed as limiting the first-order encoding feature.

For example, if the set of coding features includes m coding features and the first coding feature is coding feature a, then the coding feature a is used as a filtering reference coding feature, and the (m-1) coding features (i.e., other coding features of the m coding features that do not include coding feature a) are filtered to filter invalid coding features, so as to filter invalid text content in the initial text content.

In this embodiment, filtering the invalid text content in the initial text content based on the semantic attribute, the location attribute, and the image attribute corresponding to the coding feature of the initial text content may filter the invalid text content from multiple dimensions, so that the valid text content has a technical effect of higher accuracy and reliability.

In some embodiments, the principles of the filtering process may include: and determining the association degree between each coding feature except the first-bit coding feature and the first-bit coding feature in the coding feature set, and filtering the coding features with the association degree smaller than a preset association degree threshold.

The degree of association can be realized by calculating the similarity, and the greater the similarity is, the higher the degree of association is, and conversely, the smaller the similarity is, the lower the degree of association is.

In particular, when filtering invalid coding features by calculating the similarity, the following examples can be referred to:

and (3) calculating the similarity between any coding feature and coding feature a in the (m-1) coding features, filtering any coding feature if the similarity is smaller than a preset similarity threshold, otherwise, reserving any coding feature if the similarity is larger than or equal to the similarity threshold, fusing the reserved coding feature and coding feature a after all reserved coding features are obtained, obtaining the fused coding feature, and determining text content corresponding to the fused coding feature as effective text content.

The similarity threshold may be set by the query device based on a requirement, a history, a test, and the like, which is not limited in this embodiment.

In the embodiment, filtering the encoding features based on the association degree can filter the interference encoding features, so that the interference text content is used as the invalid text content to be filtered, and the accuracy and reliability of filtering the invalid text content are improved.

S504: and if the effective text content is matched with the text content of the preset interest point in the electronic map, determining the preset interest point as the interest point of the signboard image to be queried.

For example, regarding the implementation principle of S504, reference may be made to the first embodiment, which is not described herein.

Fig. 6 is a schematic diagram according to a third embodiment of the present disclosure, and as shown in fig. 6, an interest point query method of an electronic map provided in this embodiment includes:

s601: identifying the signboard image to be queried to obtain the initial text content of the signboard image to be queried and the attribute information of the initial text content.

For example, regarding the implementation principle of S601, reference may be made to the first embodiment, which is not described herein.

S602: and filtering the initial text content according to the attribute information of the initial text content to filter invalid text content in the initial text content, so as to obtain valid text content.

For example, regarding the implementation principle of S602, reference may be made to the first embodiment or the second embodiment, which will not be described herein.

S603: inputting the effective text content and the text content of each interest point of the electronic map into a pre-trained text matching model, and carrying out matching processing on the effective text content and the text content of each interest point of the electronic map according to the text matching model to obtain the interest point of the signboard image to be queried.

The text matching model is generated by performing triple loss training on the self-supervision model framework based on text contents of all interest points in the second sample set.

In this embodiment, the matching process is performed on the effective text content and the text content of each interest point of the electronic map according to the text matching model generated by performing the triplet loss training on the self-supervision model framework (triplet loss), that is, the matching result (that is, the text content of the preset interest point in the electronic map matched with the effective text content) is determined from the triplet dimension (that is, the original text content, the active text content and the passive text content), so that the matching has comprehensiveness, and semantic features are fully considered, so that the accuracy and reliability of the matching can be improved, and further the technical effects of the accuracy and reliability of the determined interest point of the signboard image to be queried can be realized.

In some embodiments, the text matching model is generated by adjusting parameters of a self-supervision model framework based on first difference information between origin text content and active text content, the origin text content being randomly selected from among the respective interest point text contents, the active text content being interest point text content of the respective interest point text contents that is of the same type as the origin text content, and second difference information between the origin text content and passive text content being determined as passive text content for interest point text content of a different type than the origin text content.

Illustratively, in some embodiments, the principles of training the text matching model may refer to fig. 7, where origin text content, active text content, and passive text content are respectively input into a language representation model (such as BERT shown in fig. 7), and the text feature u of the output origin text content, the text feature v of the active text content, and the text feature w of the passive text content are trained by a triplet loss function (such as triplet loss shown in fig. 7), specifically, parameters of a self-supervision model frame are adjusted, so as to obtain the text matching model.

It should be understood that the above examples are merely exemplary illustrations that may employ BERT as a language representation model and are not to be construed as limiting the model. Similarly, the loss function is only exemplarily described by taking a triple loss as an example, and the loss function is not to be construed as being limited.

Fig. 8 is a schematic diagram according to a fourth embodiment of the present disclosure, and as shown in fig. 8, an interest point query method of an electronic map provided in this embodiment includes:

s801: identifying the signboard image to be queried to obtain the initial text content of the signboard image to be queried and the attribute information of the initial text content.

For example, regarding the implementation principle of S801, reference may be made to the first embodiment, and a detailed description is omitted here.

S802: and removing the initial text content according to the attribute information of the initial text content to remove the invalid text content in the initial text content and obtain the valid text content.

For example, regarding the implementation principle of S802, reference may be made to the first embodiment or the second embodiment, which will not be described herein.

S803: inputting the text of each interest point of the effective text content and the electronic map into a pre-trained text matching model, and carrying out alignment processing and matching processing on the text content of each interest point of the effective text content and the electronic map according to the text matching model to obtain the interest point of the signboard image to be queried.

The text matching model is generated by training the face recognition model framework ArcFace based on text content of each interest point in the third sample set.

In the embodiment, a text matching model generated based on a face recognition model frame is introduced, and the text content of each interest point of the effective text content and the electronic map is aligned and matched, so that the interest point of the to-be-queried signboard image is obtained, the image characteristics of the effective text content are fully considered, and the accuracy and reliability of the determined interest point of the to-be-queried signboard image can be improved.

In some embodiments, S803 may include the steps of:

a first step of: and carrying out text detection on the effective text content to obtain a text image of the effective text content, and carrying out alignment processing on the text image and the text image of the text content of each interest point of the electronic map.

For the principle of the alignment process, reference may be made to the principle of performing the alignment process on the face image in the related art, which is not described herein.

And a second step of: and extracting the characteristics of the text images of the aligned effective text contents to obtain first image characteristics, extracting the characteristics of the text images of the text contents of each interest point of the electronic map to obtain second image characteristics, and carrying out matching processing on the first image characteristics and the second image characteristics to obtain the interest points of the signboard images to be queried.

In the embodiment, the matching is targeted by performing alignment processing on the matched text image and matching the image features obtained after the alignment processing, so that the matching accuracy is improved, and the technical effects of accuracy and reliability of the obtained interest points of the sign image to be queried are achieved.

In some embodiments, the text matching model is generated by training the ArcFace according to a third image feature and a fourth image feature, where the third image feature is obtained by feature extraction of a text image of the text content of each interest point in the aligned third sample set, and the fourth image feature is obtained by feature extraction of the preset standard text image, and the alignment process refers to: and performing text detection on the text content of each interest point in the third sample set to obtain a text image of the text content of each interest point in the third sample set, and performing alignment processing on the text image and a preset standard text image.

The preset standard text image may be obtained by performing a mean value processing based on the text image of each interest point in the third sample set.

That is, a method of training generation of a text matching model may include: text detection is conducted on the text content of each interest point in the third sample set to obtain text images of the text content of each interest point in the third sample set, alignment processing is conducted on the text images and preset standard text images, feature extraction is conducted on the text images of the text content of each interest point in the aligned third sample set to obtain third image features, feature extraction is conducted on the preset standard text images to obtain fourth image features, and ArcFace is trained according to the third image features and the fourth image features to generate a text matching model.

Based on the third embodiment and the fourth embodiment, in the present embodiment, different methods may be used to implement matching between the effective text content and the text content of each interest point in the electronic map, so that the technical effects of flexibility and diversity of matching may be implemented.

Fig. 9 is a schematic diagram of a fifth embodiment of the present disclosure, and as shown in fig. 9, an interest point query method of an electronic map provided in this embodiment includes:

s901: identifying the signboard image to be queried to obtain the semantic attribute, the position attribute and the image attribute of the signboard image to be queried.

In connection with the schematic diagram shown in fig. 10, the sign image to be queried may be input to an optical character recognition model (i.e., an OCR module as shown in fig. 10), recognized by the optical character recognition model, and semantic attributes (i.e., OCR text as shown in fig. 10), location attributes (i.e., OCR locations as shown in fig. 10), and image attributes (i.e., OCR images as shown in fig. 10) are output.

S902: and inputting the semantic attributes, the position attributes and the image attributes into a signboard text filtering model, and filtering according to the signboard text filtering model to obtain the effective text content of the signboard image to be queried.

In combination with the schematic diagram shown in fig. 10, the signboard text filtering model (i.e., the OCR text Board-transducer shown in fig. 10) filters the invalid text content of the signboard image to be queried according to the OCR text, the OCR image and the OCR position to obtain valid text content (i.e., the valid POI text shown in fig. 10).

S903: and matching the effective text content with the text content of each interest point of the electronic map to obtain the interest point of the signboard image to be queried.

In conjunction with the schematic diagram shown in fig. 10, each interest point text content of the electronic map may be stored in a POI name library shown in fig. 10, and each interest point text content in the valid POI text and POI name library is input into a text matching model (i.e., POI-match shown in fig. 10) to output an interest point (not shown in fig. 10) corresponding to the signboard image to be queried.

In some embodiments, S903 may include: according to the position attribute of the signboard image to be queried, selecting text content of the interest point with the position attribute within a preset range from text content of each interest point of the electronic map, and matching the effective text content with the text content of the interest point within the preset range to obtain the interest point of the signboard image to be queried.

The preset range may be set based on a requirement, a history, a test, and the like, which is not limited in this embodiment.

In the embodiment, the number of times of matching can be reduced by matching in a preset range, so that the matching efficiency can be improved, noise interference can be avoided, and the technical effects of accuracy and reliability of matching can be improved.

The matching process may be understood as a process of calculating similarity, and the interest point corresponding to the text content of the interest point with the best matching result may be determined as the interest point of the sign image to be queried by setting a similarity threshold.

In some embodiments, the method according to the embodiment is applied to a scene of labeling interest points in an electronic map, and it can be verified whether any interest point is already labeled in the electronic map by the method according to the embodiment.

For example, if there is no text content of the interest point matching the valid text content when matching the valid text content with the text content of each interest point in the electronic map, it is indicated that the to-be-queried signboard image has not yet marked a corresponding interest point in the electronic map, then the interest point corresponding to the to-be-queried signboard image may be marked in the electronic map according to the location attribute of the to-be-queried signboard image, and the POI name corresponding to the to-be-queried signboard image may be added in the POI name library (may be determined based on the valid text content).

Fig. 11 is a schematic diagram of a sixth embodiment of the disclosure, and as shown in fig. 11, a training method for a signboard text filter model provided in the present embodiment includes:

s1101: a first sample set is acquired, the first sample set including a plurality of sample signage images.

S1102: a rectangular box for framing the point of interest name of each sample signboard image is determined, and image information and text position information of each rectangular box are determined.

S1103: and inputting the image information, the text position information and the interest point name in each rectangular frame into a Board-transformer model frame, training the Board-transformer model frame, and generating a signboard text filtering model.

Wherein the sign text filtering model is used for filtering invalid text content in the sign image to be queried.

In some embodiments, S1103 may include the steps of:

a first step of: and respectively carrying out coding processing on the input image information, text position information and interest point names in each rectangular frame according to the Board-transformer model frame to obtain coding features respectively corresponding to the image information, the text position information and the interest point names in each rectangular frame.

And a second step of: training a Board-transformer model frame according to the image information and the text position information of each rectangular frame and the coding characteristics corresponding to the interest point names in each rectangular frame, and generating a signboard text filtering model.

In some embodiments, the second step may comprise the sub-steps of:

a first substep: and determining the image information, the text position information and the second first coding feature in the coding features corresponding to the interest point names in each rectangular frame.

A second substep: and according to the second first coding feature, filtering and fusing the image information, the text position information and the coding feature corresponding to the interest point name in each rectangular frame to obtain the coding feature after filtering and fusing, and adjusting the parameters of the Board-transformer model frame according to the coding feature after filtering and fusing to obtain the signboard text filtering model.

In some embodiments, the second sub-step may comprise: and filtering invalid coding features in the coding features corresponding to the image information, the text position information and the interest point names in each rectangular frame based on the second first coding features, and carrying out fusion processing on the valid coding features in the coding features corresponding to the image information, the text position information and the interest point names in each rectangular frame by taking the second first coding features as basic features to obtain the coding features after the filtering fusion processing.

In some embodiments, the Board-transformer model framework includes a plurality of encoders for encoding each rectangular frame of input image information, text position information, and a point of interest name in each rectangular frame, the encoders being different from each other.

In some embodiments, the image information for each rectangular box includes an image identification for each rectangular box, the text location information for each rectangular box includes a location, a length, and a width of a center point of each rectangular box, and the point of interest names in each rectangular box include pixels of the point of interest names in each rectangular box.

Illustratively, in conjunction with the training principle shown in fig. 12, it is known that the text encoder (i.e., the text symbol shown in fig. 12) encodes the point name of interest, the position encoder (i.e., the position symbol shown in fig. 12) encodes the text position information, the image encoder (i.e., the image symbol shown in fig. 12) encodes the image information, each encoder inputs the encoded feature obtained by encoding to the transducer, the transducer determines the second first encoded feature, and filters and fuses the other encoded features based on the second first encoded feature, and when the number of iterations is preset or the preset training requirement is satisfied, a signboard text filtering model is obtained.

Fig. 13 is a schematic diagram of a seventh embodiment of the disclosure, as shown in fig. 13, a point of interest query device 1300 of an electronic map provided in this embodiment includes:

the identifying unit 1301 is configured to identify the sign image to be queried, and obtain initial text content of the sign image to be queried and attribute information of the initial text content.

The filtering unit 1302 is configured to perform filtering processing on the initial text content according to the attribute information of the initial text content, so as to filter invalid text content in the initial text content, and obtain valid text content.

The first determining unit 1303 is configured to determine, if the valid text content matches the text content of a preset interest point in the electronic map, the preset interest point as the interest point of the to-be-queried signboard image, where the electronic map has a plurality of interest points, and each of the plurality of interest points has text content.

Fig. 14 is a schematic diagram of an eighth embodiment of the disclosure, as shown in fig. 14, a point of interest query device 1400 of an electronic map provided in this embodiment includes:

and the recognition unit 1401 is used for recognizing the signboard image to be queried to obtain the initial text content of the signboard image to be queried and the attribute information of the initial text content.

The filtering unit 1402 is configured to perform filtering processing on the initial text content according to the attribute information of the initial text content, so as to filter invalid text content in the initial text content, and obtain valid text content.

As can be seen in conjunction with fig. 14, in some embodiments, the filtering unit 1402 includes:

the first encoding subunit 14021 is configured to perform encoding processing on the semantic attribute, the location attribute, and the image attribute of the initial text content, to obtain an encoding feature set that includes the encoding features corresponding to the semantic attribute, the location attribute, and the image attribute, respectively.

And the filtering subunit 14022 is configured to determine a first-bit coding feature from the coding feature set, and filter coding features in the coding feature set except for the first-bit coding feature by using the first-bit coding feature as a filtering reference coding feature, so as to filter invalid text content in the initial text content, and obtain valid text content.

In some embodiments, the filtering subunit 14022 is configured to determine a degree of association between each of the encoding features in the set of encoding features except the first-bit encoding feature and the first-bit encoding feature, and perform filtering processing on the encoding features with the degree of association being less than a preset association degree threshold.

In some embodiments, the first encoding subunit 14021 is configured to perform encoding processing on the semantic attribute of the initial text content according to a first encoder of the pre-trained sign text filtering model to obtain a first encoding feature, perform encoding processing on the position attribute of the initial text content according to the first encoder of the sign text filtering model to obtain a second encoding feature, and perform encoding processing on the image attribute of the initial text content according to the first encoder of the sign text filtering model to obtain a third encoding feature; wherein the sign text filtering model is: training the Board-transducer model framework based on each sample signboard image in the first sample set.

The first determining unit 1403 is configured to determine that the preset interest point is the interest point of the to-be-queried signboard image if the valid text content matches the text content of the preset interest point in the electronic map, where the electronic map has a plurality of interest points, and each of the plurality of interest points has text content.

As can be seen in connection with fig. 14, in some embodiments, the first determining unit 1403 comprises:

an input subunit 14031 is configured to input the valid text content and the text content of each point of interest that the electronic map has to a text matching model that is trained in advance.

And the matching subunit 14032 is configured to perform matching processing on the valid text content and the text content of each interest point of the electronic map according to a text matching model to obtain the interest point of the to-be-queried signboard image, where the text matching model is generated by performing triplet loss training on the self-supervision model framework based on the text content of each interest point in the second sample set.

In some embodiments, the text matching model is generated by adjusting parameters of the self-supervising model framework based on first difference information between origin text content randomly selected from the respective interest point text content and positive text content determined as negative text content for interest point text content of the respective interest point text content belonging to the same type as the origin text content, and second difference information between the origin text content and negative text content.

In other embodiments, the input subunit 14031 is configured to input the valid text content and the text of each point of interest that the electronic map has to a pre-trained text matching model.

And the matching subunit 14032 is configured to perform alignment processing and matching processing on the valid text content and text content of each interest point of the electronic map according to a text matching model to obtain the interest point of the to-be-queried signboard image, where the text matching model is generated by training the face recognition model frame ArcFace based on the text content of each interest point in the third sample set.

In some embodiments, the matching subunit 14032 includes:

the detection module is used for carrying out text detection on the effective text content to obtain a text image of the effective text content, and carrying out alignment processing on the text image and the text image of the text content of each interest point of the electronic map.

The extraction module is used for extracting the characteristics of the text images of the aligned effective text contents to obtain first image characteristics, and extracting the characteristics of the text images of the text contents of each interest point of the electronic map to obtain second image characteristics.

And the matching module is used for carrying out matching processing on the first image features and the second image features to obtain interest points of the signboard images to be queried.

In some embodiments, the text matching model is generated by training the ArcFace according to a third image feature and a fourth image feature, where the third image feature is obtained by feature extraction of a text image of the text content of each interest point in the aligned third sample set, and the fourth image feature is obtained by feature extraction of a preset standard text image, and the alignment process refers to: and performing text detection on the text content of each interest point in the third sample set to obtain a text image of the text content of each interest point in the third sample set, and performing alignment processing on the text image and a preset standard text image.

Fig. 15 is a schematic diagram of a training device 1500 of a sign text filtering model according to a ninth embodiment of the disclosure, as shown in fig. 15, including:

an acquisition unit 1501 for acquiring a first sample set including a plurality of sample signboard images.

The second determination unit 1502 is configured to determine rectangular boxes for framing the point-of-interest names of each sample signboard image, and determine image information and text position information of each rectangular box.

The training unit 1503 is configured to input the image information of each rectangular box, the text position information, and the interest point name in each rectangular box into a Board-transformer model frame, train the Board-transformer model frame, and generate a signboard text filtering model, where the signboard text filtering model is used to filter invalid text content in the signboard image to be queried.

Fig. 16 is a schematic diagram of a training device 1600 for a signboard text filter model according to a tenth embodiment of the present disclosure, as shown in fig. 16, including:

an acquisition unit 1601 for acquiring a first sample set including a plurality of sample signboard images.

A second determining unit 1602 for determining rectangular boxes for framing the point of interest names of the images of each sample signboard, and determining image information and text position information of each rectangular box.

The training unit 1603 is configured to input the image information of each rectangular frame, the text position information, and the interest point name in each rectangular frame into a Board-transformer model frame, train the Board-transformer model frame, and generate a signboard text filtering model, where the signboard text filtering model is used to filter invalid text content in the signboard image to be queried.

As can be seen in conjunction with fig. 16, in some embodiments, training unit 1603 comprises:

the second encoding subunit 16031 is configured to encode the input image information, text position information, and the interest point name in each rectangular frame according to the Board-transformer model frame, respectively, to obtain the image information, text position information, and the corresponding encoding feature of the interest point name in each rectangular frame.

The training subunit 16032 is configured to train the Board-transformer model framework according to the image information, the text position information, and the coding features corresponding to the interest point names in each rectangular frame, so as to generate a signboard text filtering model.

In some embodiments, training subunit 16032 includes:

And the determining module is used for determining the image information, the text position information and the second first coding feature in the coding features corresponding to the interest point names in each rectangular frame.

And the processing module is used for carrying out filtering fusion processing on the image information and the text position information of each rectangular frame and the coding characteristics corresponding to the interest point names in each rectangular frame according to the second first coding characteristics to obtain the coding characteristics after the filtering fusion processing.

And the adjusting module is used for adjusting parameters of the Board-transformer model frame according to the coding characteristics after the filtering fusion processing to obtain a signboard text filtering model.

In some embodiments, the processing module comprises:

and the filtering sub-module is used for filtering the image information and the text position information of each rectangular frame and the invalid coding features in the coding features corresponding to the interest point names in each rectangular frame based on the second first coding features.

And the processing sub-module is used for carrying out fusion processing on the image information and the text position information of each rectangular frame and the effective coding features in the coding features corresponding to the interest point names in each rectangular frame by taking the second first coding feature as a basic feature to obtain the coding features after filtering and fusion processing.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device and a readable storage medium.

According to an embodiment of the present disclosure, the present disclosure also provides a computer program product comprising: a computer program stored in a readable storage medium, from which at least one processor of an electronic device can read, the at least one processor executing the computer program causing the electronic device to perform the solution provided by any one of the embodiments described above.

Fig. 17 illustrates a schematic block diagram of an example electronic device 1700 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 17, the electronic device 1700 includes a computing unit 1701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1702 or a computer program loaded from a storage unit 1708 into a Random Access Memory (RAM) 1703. In the RAM 1703, various programs and data required for the operation of the device 1700 may also be stored. The computing unit 1701, the ROM 1702, and the RAM 1703 are connected to each other via a bus 1704. An input/output (I/O) interface 1705 is also connected to the bus 1704.

Various components in device 1700 are connected to I/O interface 1705, including: an input unit 1706 such as a keyboard, a mouse, etc.; an output unit 1707 such as various types of displays, speakers, and the like; a storage unit 1708 such as a magnetic disk, an optical disk, or the like; and a communication unit 1709 such as a network card, modem, wireless communication transceiver, or the like. The communication unit 1709 allows the device 1700 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunications networks.

The computing unit 1701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 1701 performs the respective methods and processes described above, for example, the interest point query method of the electronic map. For example, in some embodiments, the method of interest point interrogation of an electronic map may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 1708. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 1700 via ROM 1702 and/or communication unit 1709. When the computer program is loaded into the RAM 1703 and executed by the computing unit 1701, one or more steps of the point of interest query method of the electronic map described above may be performed. Alternatively, in other embodiments, the computing unit 1701 may be configured to perform the point of interest query method of the electronic map in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.

According to another aspect of the disclosed embodiments, there is provided a vehicle including: the apparatus for querying points of interest of an electronic map as described in any of the above embodiments.

According to another aspect of the disclosed embodiments, the disclosed embodiments provide a roadside apparatus, including: the apparatus for querying points of interest of an electronic map as described in any of the above embodiments.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions provided by the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. An interest point query method of an electronic map comprises the following steps:

Identifying a signboard image to be queried to obtain initial text content of the signboard image to be queried and attribute information of the initial text content; the attribute information of the initial text content includes: semantic attributes, location attributes, and image attributes;

respectively carrying out coding processing on semantic attributes, position attributes and image attributes of the initial text content to obtain coding feature sets comprising respective corresponding coding features;

determining a first-bit encoding feature from the set of encoding features;

determining each coding feature except the first-bit coding feature in the coding feature set, and the association degree between the coding feature and the first-bit coding feature, and filtering the coding features with the association degree smaller than a preset association degree threshold;

after all the reserved coding features are obtained, carrying out fusion processing on the reserved coding features and the first coding features to obtain fused coding features, and determining text contents corresponding to the fused coding features as effective text contents;

2. The method of claim 1, wherein the encoding process is performed on semantic attributes, location attributes, and image attributes of the initial text content, respectively, to obtain a set of encoding features including respective corresponding encoding features, including:

coding the semantic attribute of the initial text content according to a first coder of a pre-trained signboard text filtering model to obtain a first coding feature;

performing coding processing on the position attribute of the initial text content according to a first coder of the signboard text filtering model to obtain a second coding characteristic;

encoding the image attribute of the initial text content according to a first encoder of the signboard text filtering model to obtain a third encoding characteristic;

3. The method according to claim 1 or 2, wherein if the valid text content matches the text content of a preset interest point in the electronic map, determining the preset interest point as the interest point of the sign image to be queried comprises:

Inputting the effective text content and the text content of each interest point of the electronic map to a pre-trained text matching model, and carrying out matching processing on the effective text content and the text content of each interest point of the electronic map according to the text matching model to obtain the interest point of the signboard image to be queried, wherein the text matching model is generated by carrying out triplet loss training on a self-supervision model frame based on the text content of each interest point in a second sample set.

4. The method of claim 3, wherein the text matching model is generated by adjusting parameters of the self-supervising model framework based on first difference information between origin text content randomly selected from the respective interest point text content and positive text content determined as negative text content for interest point text content of the respective interest point text content belonging to the same type as the origin text content, and second difference information between the origin text content and negative text content.

5. The method according to claim 1 or 2, wherein if the valid text content matches the text content of a preset interest point in the electronic map, determining the preset interest point as the interest point of the sign image to be queried comprises:

inputting the effective text content and the text of each interest point of the electronic map into a pre-trained text matching model, and carrying out alignment processing and matching processing on the effective text content and the text content of each interest point of the electronic map according to the text matching model to obtain the interest point of the signboard image to be queried, wherein the text matching model is generated by training a face recognition model frame ArcFace based on the text content of each interest point in a third sample set.

6. The method of claim 5, wherein performing alignment processing and matching processing on the valid text content and the text content of each interest point of the electronic map according to the text matching model to obtain the interest point corresponding to the sign image to be queried, comprises:

performing text detection on the effective text content to obtain a text image of the effective text content, and performing alignment processing on the text image and the text image of the text content of each interest point of the electronic map;

And extracting features of the text images of the aligned effective text contents to obtain first image features, extracting features of the text images of the text contents of each interest point of the electronic map to obtain second image features, and carrying out matching processing on the first image features and the second image features to obtain interest points of the signboard images to be queried.

7. The method of claim 6, wherein the text matching model is generated by training the ArcFace according to a third image feature and a fourth image feature, the third image feature is obtained by feature extraction of a text image of text content of each interest point in the third sample set after the alignment process, and the fourth image feature is obtained by feature extraction of a preset standard text image, and the alignment process is that: and performing text detection on the text content of each interest point in the third sample set to obtain a text image of the text content of each interest point in the third sample set, and performing alignment processing on the text image and a preset standard text image.

8. An interest point query device of an electronic map, comprising:

the first determining unit is used for determining the preset interest point as the interest point of the signboard image to be queried if the effective text content is matched with the text content of the preset interest point in the electronic map, wherein the electronic map is provided with a plurality of interest points, and each interest point in the plurality of interest points is provided with text content;

wherein the attribute information of the initial text content includes: semantic attributes, location attributes, and image attributes; the filter unit comprises:

the first coding subunit is used for respectively coding the semantic attribute, the position attribute and the image attribute of the initial text content to obtain a coding feature set comprising respective corresponding coding features;

the filtering subunit is used for determining a first-bit coding feature from the coding feature set, taking the first-bit coding feature as a filtering reference coding feature, and filtering coding features except the first-bit coding feature in the coding feature set so as to filter invalid text content in the initial text content and obtain the valid text content;

The filtering subunit is configured to determine each coding feature except the first-bit coding feature in the coding feature set, and a degree of association between the coding feature and the first-bit coding feature, and perform filtering processing on the coding features with the degree of association smaller than a preset association degree threshold; after all the reserved coding features are obtained, carrying out fusion processing on the reserved coding features and the first-bit coding features to obtain fused coding features, and determining text contents corresponding to the fused coding features as effective text contents.

9. The device of claim 8, wherein the first coding subunit is configured to perform coding processing on semantic attributes of the initial text content according to a first encoder of a pre-trained signboard text filtering model to obtain a first coding feature, perform coding processing on position attributes of the initial text content according to the first encoder of the signboard text filtering model to obtain a second coding feature, and perform coding processing on image attributes of the initial text content according to the first encoder of the signboard text filtering model to obtain a third coding feature;

10. The apparatus according to claim 8 or 9, wherein the first determining unit comprises:

an input subunit, configured to input the valid text content and the text content of each interest point of the electronic map to a pre-trained text matching model;

and the matching subunit is used for carrying out matching processing on the effective text content and the text content of each interest point of the electronic map according to the text matching model to obtain the interest point of the to-be-queried signboard image, wherein the text matching model is generated by carrying out triplet loss training on a self-supervision model frame based on the text content of each interest point in the second sample set.

11. The apparatus of claim 10, wherein the text matching model is generated by adjusting parameters of the self-supervising model framework based on first difference information between origin text content randomly selected from the respective interest point text content and positive text content determined as negative text content for interest point text content of the respective interest point text content that is of a same type as the origin text content, and second difference information between the origin text content and negative text content determined as negative text content for interest point text content of a different type than the origin text content.

12. The apparatus according to claim 8 or 9, wherein the first determining unit comprises:

an input subunit, configured to input the valid text content and a text of each interest point of the electronic map to a pre-trained text matching model;

and the matching subunit is used for carrying out alignment processing and matching processing on the effective text content and the text content of each interest point of the electronic map according to the text matching model to obtain the interest point of the to-be-queried signboard image, wherein the text matching model is generated by training the face recognition model frame Arcface based on the text content of each interest point in the third sample set.

13. The apparatus of claim 12, wherein the matching subunit comprises:

the detection module is used for carrying out text detection on the effective text content to obtain a text image of the effective text content, and carrying out alignment processing on the text image and the text image of the text content of each interest point of the electronic map;

the extraction module is used for carrying out feature extraction on the text images of the aligned effective text contents to obtain first image features, and carrying out feature extraction on the text images of the text contents of each interest point of the electronic map to obtain second image features;

And the matching module is used for carrying out matching processing on the first image features and the second image features to obtain interest points of the to-be-queried signboard images.

14. The apparatus of claim 13, wherein the text matching model is generated by training the ArcFace according to a third image feature and a fourth image feature, the third image feature is obtained by feature extraction of a text image of text content of each interest point in the third sample set after the alignment process, and the fourth image feature is obtained by feature extraction of a preset standard text image, and the alignment process is that: and performing text detection on the text content of each interest point in the third sample set to obtain a text image of the text content of each interest point in the third sample set, and performing alignment processing on the text image and a preset standard text image.

15. A vehicle, comprising: the device of any one of claims 8 to 14.

16. A roadside apparatus comprising: the device of any one of claims 8 to 14.