CN115712749A

CN115712749A - Image processing method and device, computer equipment and storage medium

Info

Publication number: CN115712749A
Application number: CN202110963747.7A
Authority: CN
Inventors: 陈颖; 徐尚; 刘建林; 刘永
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-08-20
Filing date: 2021-08-20
Publication date: 2023-02-24

Abstract

The embodiment of the application provides an image processing method, an image processing device, computer equipment and a storage medium, wherein the method comprises the following steps: acquiring feature information of elements included in a first image and feature information of elements included in a second image, wherein the first image includes an environment image of a position acquired by data acquisition equipment, and the second image is an image similar to the first image in an image library included in map data; matching the elements in the first image and the elements in the second image according to the feature information of the elements in the first image and the feature information of the elements in the second image to obtain an element matching result; and updating the image library based on the element matching result and the first image, so that the accuracy of updating the image library can be improved.

Description

Image processing method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to an image processing method and apparatus, a computer device, and a storage medium.

Background

With the development of computer technology, maps have become an important tool for both users and businesses. For the user, the map can be used for inquiring a smaller place more simply, and a most convenient riding route can be found. And for enterprises, the map can be used for pushing enterprise information to the eyes of users more easily. As the real-world environment may be constantly changing, the currently acquired image may not be a better match to the image in the map. The prior art is mainly based on a similarity method between images, and the general principle of the similarity between the images is as follows: calculating the similarity between the current frame image and the image included in the map data, and performing image matching according to the similarity between the current frame image and the image included in the map data to determine whether to update the map data; but image similarity based matching methods are susceptible to illumination, angular variations, and do not use relative geometric constraints, resulting in inaccuracies in updating the images included in the map data.

Disclosure of Invention

The embodiment of the application provides an image processing method and device, a computer device and a storage medium, which can improve the accuracy of updating an image library.

In one aspect, an embodiment of the present application provides an image processing method, where the method includes:

acquiring feature information of elements included in a first image and feature information of elements included in a second image, wherein the first image includes an environment image of a position acquired by a data acquisition device, and the second image is an image similar to the first image in an image library included in map data;

matching the elements in the first image and the elements in the second image according to the feature information of the elements included in the first image and the feature information of the elements included in the second image to obtain an element matching result;

and updating the image library based on the element matching result and the first image.

In one aspect, an embodiment of the present application provides an image processing apparatus, including:

an acquisition unit configured to acquire feature information of an element included in a first image including an environmental image of a location acquired by a data acquisition apparatus and feature information of an element included in a second image similar to the first image in an image library included in map data;

the processing unit is used for matching the elements in the first image and the elements in the second image according to the feature information of the elements included in the first image and the feature information of the elements included in the second image to obtain an element matching result;

the processing unit is further configured to update the image library based on the element matching result and the first image.

In one aspect, the present application provides a computer device, which includes an input device, an output device, a processor, and a computer storage medium, where the processor and the computer storage medium are connected to each other, where the computer storage medium is used for storing a computer program, and the processor is configured to invoke the computer program to execute the above-mentioned image processing method.

In one aspect, the present application provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the computer program is used to implement the image processing method described above.

In one aspect, an embodiment of the present application provides a computer program product, which includes a computer program; the computer program is stored in a computer-readable storage medium, and when executed by a processor of a computer device, performs the service processing method described above.

In this embodiment of the application, the computer device may acquire feature information of an element included in the first image and feature information of an element included in the second image, and match the element in the first image and the element in the second image according to the feature information of the element included in the first image and the feature information of the element included in the second image to obtain an element matching result; and then, updating the image library based on the element matching result and the first image, and updating the image library included in the map data in an element matching mode, so that the problem that the updating of the image library included in the map data is inaccurate due to the existence of seasons, weather, illumination changes, serious distortion, low frame rate and the like in the long-time span of the images can be solved, and the updating accuracy of the image library included in the map data is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1a is an architecture diagram of an image processing system according to an embodiment of the present application;

FIG. 1b is a schematic flow chart of an image processing scheme provided by an embodiment of the present application;

fig. 2 is a schematic flowchart of an image processing method according to an embodiment of the present application;

fig. 3a is a schematic diagram of a plurality of two-dimensional coordinates corresponding to one three-dimensional coordinate in the same scene according to an embodiment of the present application;

FIG. 3b is a schematic diagram of a plurality of two-dimensional coordinates corresponding to a three-dimensional coordinate in the same scene provided by the embodiment of the present application;

fig. 3c is a schematic diagram of a two-dimensional coordinate in an image corresponding to a three-dimensional coordinate and a two-dimensional coordinate not corresponding to a three-dimensional coordinate provided in an embodiment of the present application;

FIG. 3d is a schematic diagram of matching elements in two images provided by an embodiment of the present application;

fig. 4 is a schematic flowchart of an image processing method according to an embodiment of the present application;

FIG. 5 is a schematic diagram of an overall architecture of image processing and a structure of matching an element network according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In order to better realize image processing and update an image library included in map data, the embodiment of the application provides an image processing system and an image processing scheme. Referring to fig. 1a, fig. 1a is an image processing system provided in an embodiment of the present application, where the image processing system may include at least one terminal device 101 and at least one server 102. The terminal device 101 may be a device dedicated to image data acquisition, any device on which a camera may be installed on the terminal device 101, and the terminal device 101 may call the camera to acquire an environmental image of a location where the terminal device 101 is located, and send the acquired environmental image to the server 102, so that the server 102 updates the map data according to the acquired environmental image. The terminal device 101 may also collect an environmental image of a location where the terminal device is located, update the map data based on the collected environmental image, and store the updated map data in the server 102.

The terminal device 101 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a vehicle-mounted computer, a smart home, a wearable electronic device, a VR/AR device, or the like; the terminal device 101 and the server 102 can be directly or indirectly connected through wired or wireless communication; the server 102 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a web service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like.

Based on the image processing system provided above, the embodiment of the present application provides an image processing scheme, which may be as shown in fig. 1b, and the image processing scheme may be executed by a computer device, which may be the terminal device 101 or the server 102, and the image processing scheme has the following general principle: matching elements of an image acquired at a current position or under a current view angle with elements of an image in an image library included in map data according to whether the elements belong to the same entity in a real environment, performing MLP (Multi-layer Perceptron) operation and feature fusion on description features and geometric features corresponding to detection results of the elements to obtain features (hereinafter referred to as fusion features) for element matching, and performing element matching on the fusion features through an optimal transmission algorithm (such as a sinkhorn algorithm). Specifically, when a current frame image (i.e., a first image) is acquired by the data acquisition device, the first image includes an environment image of a current position or a current view angle acquired by the data acquisition device, and then the computer device may acquire the first image, query a plurality of images similar to the first image from an image library included in the map data, and vote for the plurality of images to acquire a three-dimensional coordinate set (or referred to as a 3D point cloud set) corresponding to each image, where the three-dimensional coordinate set may include one or more three-dimensional coordinates.

And then screening the plurality of images according to the 3D point cloud set to obtain a second image similar to the first image. Then the computer device can generate an image pair of the first image and the second image, perform element detection on the first image to obtain a detection result of the first image, and perform element detection on the second image to obtain a detection result of the second image, wherein one detection result actually corresponds to the feature information of one element; and then inputting the feature information of the elements included in each image in the image pair into an element matching network for feature fusion to obtain a first fusion feature of the elements included in the first image and a second fusion feature of the elements included in the second image. The feature information may include a geometric feature and a descriptive feature (the descriptive feature may also be referred to as a visual feature, a descriptor feature) of the element, and the descriptive feature is used to describe a category of the element, an outline of the element, and the like. Then the computer device carries out inner product calculation on the first fusion characteristic of the elements in the first image and the second fusion characteristic of the elements in the second image to obtain element association scores between the elements included in the first image and the elements included in the second image, the element association scores between the elements included in the first image and the elements included in the second image are stored in a matching matrix, and the matching matrix is calculated through a Sinkhorn algorithm (an optimal transmission algorithm). And then matching the elements in the first image with the elements in the second image according to the calculated matching matrix. After obtaining the element matching result, the computer device may perform an update process on the image library based on the obtained element matching result and the first image.

The image processing scheme can be applied to an automatic map data updating scene, in the automatic map data updating scene, the accuracy rate of element association matching can reach 98% and the recall rate can reach more than 90% in a task with high difficulty of full element matching of crowd-sourced data of a map, the problem that the map data is inaccurate to be updated due to factors such as seasons, weather, illumination change, serious distortion, low frame rate and the like existing in long-time span of images is solved, the accuracy of updating an image library included in the map data is improved, the automation degree of a map data production process is improved, the first image and the images in the database are matched without any artificial design strategy through element matching, the problem of matching ambiguity can be effectively solved, and a large amount of manpower is saved.

Referring to fig. 2, fig. 2 is a schematic flowchart illustrating an image processing method according to an embodiment of the present disclosure. The image processing method may be executed by a computer device, which may be the terminal device 101 or the server 102, and may include the following steps S201 to S203:

s201, acquiring feature information of elements included in the first image and feature information of elements included in the second image. The first image includes an image of an environment of a location captured by the data capture device, and the second image is an image of a library of images included in the map data that is similar to the first image.

Wherein the first image may include one or more elements, the term element generally refers to an element containing semantic information, and the element includes but is not limited to a traffic light, a sign, an electronic police, a zebra crossing, a ground arrow, a trade mark, and the like in the map data; the feature information may include descriptive and geometric features of the elements. The descriptive features may be categories and confidences of elements, etc. Element classes may include, but are not limited to, traffic lights, signs, electronic police, etc., as described above. The data acquisition device may be the terminal device 101 described above, or other devices dedicated to data acquisition, which is not limited in this embodiment of the present application. The number of the second images may be one or more. The candidate images in the image library may include one or more elements; likewise, the second image may include one or more elements. The similarity of the first image to the second image may be understood as: the first image and the second image belong to images in the same environment (or the same scene), and the first image and the second image can be acquired by using the same data acquisition device under different viewing angles, or the first image and the second image can be acquired by using different data acquisition devices under the same viewing angle. For example, the first image and the second image both belong to images in a scene corresponding to the XX street intersection.

In a specific implementation, the data acquisition equipment is provided with a camera assembly or an image acquisition sensor capable of acquiring images; when a user wants to acquire an environment image of a certain position or a certain view angle, the user can be a common user using map data; or the user here may also be a map management user who wants to update the map data, or the like; a data acquisition device may be used to acquire an image of the environment at the location or at the perspective. Then, the data acquisition equipment can call and acquire a first image by using a camera assembly or an image acquisition sensor of the image, wherein the first image comprises an environment image of the position or the visual angle acquired by the data acquisition equipment; the computer device may then obtain the first image and obtain a second image similar to the first image from an image library included in the map data; and then, performing element detection on the first image to obtain the feature information of the elements included in the first image, and performing element detection on the second image to obtain the feature information of the elements included in the second image. It should be understood that when the computer device is a data acquisition device, the computer device may directly acquire the first image; when the computer device is not a data acquisition device, the computer device needs to acquire a first image from the data acquisition device.

In one embodiment, the computer device performs element detection on the first image to obtain feature information of elements included in the first image, and performs element detection on the second image to obtain feature information of elements included in the second image, and the implementation manner of the feature information of the elements included in the second image is as follows: the computer device can adopt the target detection frame to perform element detection on the first image to obtain feature information of elements included in the first image, and adopt the target detection frame to perform element detection on the second image to obtain feature information of elements included in the first image; the target detection frame may be a two-dimensional detection frame (or called a 2D detection frame), or the target detection frame may also be other detection frames for element detection. The feature information may include descriptive features (descriptive features may be referred to as descriptor features) and geometric features; wherein the description features are used to describe element classes, confidences, etc. of the elements; the geometric features may include center point coordinates, length, width, etc. of the target detection box.

In one embodiment, the first image comprises an environmental image of the location where the data acquisition device is located acquired in real time, for example, when the data acquisition device is located at location 1, the first image acquired by the data acquisition device comprises an environmental image of the location where the data acquisition device is located at location 1; when the data acquisition device is in position 2, the first image acquired by the data acquisition device comprises an image of the environment in position 2 where the data acquisition device is located. In another embodiment, the first image is an environmental image of the target location acquired by the data acquisition device within a preset period. For example, the preset period is 2 days; the target position is position 1; the first image may be an image of the environment at location 1 acquired by the data acquisition device in a two-day cycle. It can be understood that: the data acquisition device will acquire a first image at position 1 every two days.

In one embodiment, the map data includes an image library including a plurality of candidate images; the specific implementation manner of the computer device obtaining the second image similar to the first image from the image library included in the map data may include the following two manners:

(1) A similarity between the first image and each of the plurality of candidate images is calculated, and a second image similar to the first image is selected according to the similarity.

In particular implementations, the computer device may calculate a similarity between each candidate image and the first image, and then determine a second image from the image library that is similar to the first image based on the similarity between each candidate image and the first image. For example, the image library includes 3 candidate images, 3 candidate images being candidate image 1, candidate image 2, and candidate image 3, respectively; the computer device may calculate the similarity between the candidate image 1 and the first image, the similarity between the candidate image 2 and the first image, and the similarity between the candidate image 3 and the first image.

As an embodiment, the specific implementation manner of determining the second image similar to the first image from the image library according to the similarity between each candidate image and the first image is as follows: the computer device may determine a candidate image having a similarity greater than or equal to a similarity threshold as a second image similar to the first image. Wherein, the similarity threshold value can be set according to requirements; for example, the degree of similarity between candidate image 1 and the first image is greater than the similarity threshold, the degree of similarity between candidate image 2 and the first image is less than the similarity threshold, and the computer device determines candidate image 1 having the degree of similarity greater than the similarity threshold as the second image similar to the first image.

As another embodiment, the computer device may arrange the candidate images included in the image library according to the similarity between each candidate image and the first image from large to small, and then acquire K candidate images arranged in the front from one or more arranged candidate images as the second images similar to the first image. Wherein K is an integer greater than or equal to 1; k may be set according to requirements, for example, K takes a value of 1. The computer device arranges 3 candidate images according to the similarity between the candidate image 1 and the first image, the similarity between the candidate image 2 and the first image and the similarity between the candidate image 3 and the first image from large to small, and arranges the 3 candidate images according to the similarity from large to small: candidate image 1, candidate image 3, and candidate image 2; the computer device selects a candidate image 1 arranged at the front from the 3 candidate images as a second image similar to the first image; i.e. candidate image 1 as the second image similar to the first image.

(2) Calling a retrieval network to acquire one or more images similar to the first image from candidate images included in the image library; the computer device then determines a second image from the one or more images based on the three-dimensional coordinates corresponding to each image and the two-dimensional coordinates corresponding to each three-dimensional coordinate.

In a specific implementation, the computer device invokes a retrieval network to retrieve one or more images similar to the first image from an image library included in the map data, wherein the retrieval network may be a neural network model for filtering the images, and the retrieval network may be a retrieval network. Each candidate image in the image library has one or more corresponding three-dimensional coordinates, and it is understood that the one or more corresponding three-dimensional coordinates of each candidate image are actually used to indicate one or more elements included in each candidate image; i.e. each image comprises 3 elements if each image corresponds to 3 three-dimensional coordinates. Each three-dimensional coordinate corresponds to one or more two-dimensional coordinates. As shown in fig. 3 a; FIG. 3a is a visualization of three-dimensional coordinates of the signboards corresponding to two-dimensional coordinates in three images in the same scene; it can be seen that there may be three corresponding two-dimensional coordinates (i.e. 31 in fig. 3a represents two-dimensional coordinates) in the three-dimensional coordinates of the indication board in the same scene, wherein the 3 images may be imaging data of a certain scene at different positions or viewing angles. For example, the 3 images may be images acquired by the same camera at different times, or multiple images may be images acquired by different cameras at the same time. Also in fig. 3b, the three-dimensional coordinates of the signal traffic light may correspond to 3 two-dimensional coordinates as well.

In one embodiment, the one or more three-dimensional coordinate acquisition manners corresponding to each candidate image may be: the computer device may acquire one or more three-dimensional coordinates (or referred to as 3D points) of each candidate image through a voting algorithm, where the one or more three-dimensional coordinates corresponding to each candidate image may form a three-dimensional coordinate set (or referred to as a 3D point cloud set), and the three-dimensional coordinate set (or referred to as a 3D point cloud set) corresponding to each candidate image is represented as:

wherein, P _i Three-dimensional set of coordinates, R, representing the ith candidate image ^3×N Representing three dimensions; n represents the number of three-dimensional coordinates. Each three-dimensional coordinate may correspond to one or more two-dimensional coordinates. Each candidate image may include one or more elements。

It is understood that one or more images similar to the first image obtained through the search network may each have one or more three-dimensional coordinates, and each three-dimensional coordinate may also have one or more two-dimensional coordinates (or 2D points). The computer device may then determine a second image from the one or more images based on the three-dimensional coordinates corresponding to each image and the two-dimensional coordinates corresponding to each three-dimensional coordinate.

In one embodiment, since there may be outliers in the images obtained by the search network (i.e., the images include inconsistent images of the environment, which may also be understood as inconsistent scenes), for example, the images correspond to two scenes, a first scene is an intersection, and a second scene is a junction), a common-view calculation may be performed on the images similar to the first image obtained by the search network, so as to filter out the images with inconsistent scenes, thereby reducing the amount of calculation for subsequent element matching. In this case, the specific implementation manner of the computer device determining the second image from the one or more images according to the three-dimensional coordinates corresponding to each image and the two-dimensional coordinates corresponding to each three-dimensional coordinate is as follows: the computer equipment acquires target three-dimensional coordinates of which the number of corresponding two-dimensional coordinates exceeds a number threshold from one or more three-dimensional coordinates corresponding to each image; it is also understood that the computer device filters out three-dimensional coordinates, of the one or more three-dimensional coordinates corresponding to each image, for which the number of corresponding two-dimensional coordinates does not exceed the number threshold. The quantity threshold may be set according to requirements, for example, the quantity threshold may be set to 3, 5, and so on. Then, the computer equipment can obtain a first total number of the two-dimensional coordinates corresponding to each image and a second total number of the two-dimensional coordinates corresponding to the target three-dimensional coordinates; then, determining the reconstruction rate of each image according to the first total quantity and the second total quantity; the computer device may then determine an image from the one or more images having a reconstruction rate greater than or equal to the target threshold and treat a corresponding image from the one or more images having a reconstruction rate greater than or equal to the target threshold as the second image. Wherein the target threshold value can be set according to requirements. Wherein the reconstruction rate of each image actually refers to the reconstruction rate of the three-dimensional coordinates in each image, and the reconstruction rate of the three-dimensional coordinates is used for indicating the probability that the two-dimensional image can be reconstructed into the three-dimensional image.

For example, the target threshold is 10%; as shown in fig. 3c, the left image in fig. 3c is a collection track for collecting three-dimensional coordinates corresponding to the target image by using a voting algorithm, and three-dimensional coordinates in the plurality of 3D point cloud sets, while the right image in fig. 3c represents two-dimensional coordinates in the target image, and there are 4595 two-dimensional coordinates (i.e., black dots in fig. 3 c) in the two-dimensional coordinates in the right image without corresponding three-dimensional coordinates; there are 573 two-dimensional coordinates with corresponding three-dimensional coordinates (i.e. white points in fig. 3 c). Then the three-dimensional coordinate reconstruction rate corresponding to the target image is 573/(4595 + 573) =11.09%. The computer device may then determine that the three-dimensional coordinate reconstruction rate corresponding to the target image is greater than 10% and treat the target image as a second image similar to the first image.

In one embodiment, the map data including the image library may be obtained by acquiring an environmental image by an object sensor and processing the acquired environmental image, and the object sensor may include one or more of the following: image sensors (i.e., cameras), GPS (Global Positioning System) sensors, and IMU (Inertial Measurement Unit) sensors. For example, an environmental image acquired by the target sensor has noise, and the environmental image data needs to be subjected to noise elimination, and then the environmental image is stored in an image library after being subjected to noise elimination. The image library comprises a plurality of candidate images, which can be images acquired by the same camera at different times, images acquired by different cameras at the same time, and the like.

And S202, matching the elements in the first image and the elements in the second image according to the feature information of the elements included in the first image and the feature information of the elements included in the second image to obtain an element matching result.

Wherein, the element matching result may indicate whether there is a matching element between the element included in the first image and the element included in the second image, where the matching element means that the two elements belong to the same physical point, and the physical point means that: in a real environment, the same entity belongs to the same spatial location. In one embodiment, the element matching result may include a matching element; the matching element may be an element in the first image and an element in the second image, for example, the matching element is an element a in the first image and an element B in the second image, that is, the element a and the element B are matching elements; alternatively, the element matching result may include elements in which the elements in the first image and the elements in the second image do not match. For example, the unmatched elements include an element a in the first image and an element B in the second image, that is, the element a and the element B are unmatched elements.

In one embodiment, the specific implementation manner of step S202 may be: the computer device can directly compare the feature information of the elements included in the first image with the feature information of the elements included in the second image in pairs, and determine whether matched elements exist between the elements included in the first image and the elements included in the second image according to the comparison result to obtain an element matching result. For example, the feature information of the element 1 included in the first image is compared with the feature information of the element 2 included in the second image, the feature information of the element 1 is the same as the feature information of the element 2, and the computer device determines that the elements 1 and 2 are matching elements. For another example, the feature information of the element 3 included in the first image is compared with the feature information of the element 4 included in the second image, the feature information of the element 3 is not identical to the feature information of the element 4, and the computer device determines that the elements 3 and 4 are unmatched elements, that is, the element matching result includes unmatched elements.

In an embodiment, since it is directly tedious to perform the comparison according to the feature information of the element included in the first image and the feature information of the element included in the second image, and the result of matching the element is also inaccurate, the specific implementation manner of step S202 may also be: the computer equipment can also call an element matching network to perform feature fusion on feature information of elements included in the first image and feature information of elements included in the second image to obtain first fusion features of the elements included in the first image and second fusion features of the elements included in the second image; and matching the elements in the first image with the elements in the second image according to the first fusion characteristic and the second fusion characteristic to obtain an element matching result. The first fused feature and the second fused feature may be understood as features constructed for element matching.

And S203, updating the image library based on the element matching result and the first image.

As is understood from step S202, the element matching result indicates whether there is a matching element between the element included in the first image and the element included in the second image.

In one embodiment, if the element matching result indicates that the target element exists in the elements included in the first image, the target element is any element in the first image, and the second image does not include the target element. And when the target element in the first image does not find a matched element in the second image, and the target element is a newly added element, the computer device updates the image library by using the first image. In particular, the computer device may replace a first image directly with a second image in the image library that is similar to the first image. Or the computer device may replace an element comprised by the second image in the image library with an element comprised by the first image. In another embodiment, if the element matching result indicates that there is a matching element, the matching element is a first element included in the first image and a second element included in the second image, wherein matching means that the first element and the second element correspond to the same entity; the computer device may establish an association between the first element and the second element; for example, as shown in fig. 3d, fig. 3d includes image 1 and image 2, where the image 1 and the image 2 belong to the same scene; in fig. 3d, 4 pairs of matching elements are exemplarily listed; element 101 in image 1 and element 101 in image 2, respectively; element 102 in image 1 and element 102 in image 2, element 103 in image 1 and element 103 in image 2; element 104 in image 1 and element 104 in image 2; the computer device may establish an association between each of the 4 pairs of matching elements, respectively. Then, after establishing the association relationship between the first element in the first image and the second element in the second image, the computer device performs an updating process on the image library according to the association relationship and the first image. Specifically, the computer device does not update the second element of the second image in the image library according to the association relationship between the first element in the first image and the second element in the second image, and then updates the second image in the image library by using other elements included in the first image, so that when the image library is updated, the elements do not need to be matched one by one according to a matching strategy, only the matched elements need to be determined according to the association relationship, and only the elements included in the second image in the image library and the elements of the first image which are not matched are updated according to the first image, thereby improving the automatic processing capability of the map data.

In one embodiment, if the element matching result indicates that the elements included in the first image are completely matched with the elements included in the second image, for example, the first image includes 3 elements, the second image includes 3 elements, and the 3 elements included in the first image and the 3 images included in the second image are completely matched two by two, the computer device does not need to perform the update process on the image library.

In this embodiment of the application, the computer device may acquire feature information of an element included in the first image and feature information of an element included in the second image, and match the element in the first image and the element in the second image according to the feature information of the element included in the first image and the feature information of the element included in the second image to obtain an element matching result; and then, updating the image library based on the element matching result and the first image, and updating the image library included in the map data in an element matching mode, so that the problem that the update of the image library included in the map data is inaccurate due to the existence of seasons, weather, illumination change, serious distortion, low frame rate and the like of the images in a long time span can be solved, and the update accuracy of the image library included in the map data is improved.

Referring to fig. 4, fig. 4 is a schematic flowchart of an image processing method according to an embodiment of the present disclosure. The image processing method may be executed by a computer device, which may be the terminal device 101 or the server 102; the image processing method may include the following steps S401 to S404:

s401, acquiring feature information of elements included in a first image and feature information of elements included in a second image, wherein the first image includes an environment image of a position acquired by a data acquisition device, and the second image is an image similar to the first image in an image library included in map data.

In a specific implementation, the computer device may perform element detection on the first image by using the target detection frame to obtain a detection result included in the first image, where the detection result includes feature information of an element in the first image, and perform element detection on the second image by using the target detection frame to obtain a detection result included in the second image, where the detection result includes feature information of an element in the second image. For example, FIG. 5 is a diagram of the overall architecture of image processing and the structure of element network matching; in fig. 5, the first image is image I _A (ii) a The second image is image I _B (ii) a The computer equipment adopts the target detection frame to carry out image I _A By performing the element detection, an image I as shown in FIG. 5 can be obtained _A Including 4 detection results, each detection result corresponding to an image I _A Characteristic information of one element. Similarly, the computer equipment adopts the target detection frame to carry out image I _B By performing the element detection, an image I as shown in FIG. 5 can be obtained _B Comprises 5 detection results, each detection result corresponds to an image I _B Characteristic information of one element of (a); the target detection frame may be a two-dimensional detection frame (or called a 2D detection frame), or the target detection frame may also be other detection frames for element detection. Wherein, the feature information may include descriptive features (the descriptive features may be referred to as descriptive sub-features) and geometric features; wherein the description features are used to describe element classes, confidences, etc. of the elements; the geometric features can be obtained by calling MLP to fuse a plurality of geometric sub-features, so that the geometric features are subjected to dimension-increasing processing. The geometric sub-features may include one or more of: coordinates of a center point, a length, a width, and the like of the target detection frame, the geometric feature may be expressed as:

fp＝MLP(|x,y,w,h,s|)

wherein, f _p Representing geometric features, MLP () representing upscaling processing; x and y represent coordinates of the center point of the target detection frame; w and h represent the width and height of the target detection frame, respectively; s represents the confidence.

It should be noted that the geometric features of the elements included in any image can be obtained by fusing the geometric sub-features by using MLP.

In one embodiment, the number of second images may be multiple, and the computer device may generate multiple image pairs of the first image and the multiple second images. For example, the plurality of image pairs may be represented as: { (I) _q ,I _db0 ),(I _q ,I _db1 ),(I _q ,I _db2 ) A will turn, wherein, I _q Representing a first image, I _db0 、I _db1 、I _db2 Representing a plurality of second images. Step S401 may acquire feature information of the first image and feature information of the second image in each image pair. For obtaining the feature information of the first image and the feature information of the second image in each image pair, reference may be made to the implementation corresponding to the embodiment provided in fig. 2, which is not described herein again.

S402, calling a factor matching network to perform feature fusion on feature information of the factor included in the first image and feature information of the factor included in the second image to obtain a first fusion feature of the factor included in the first image and a second fusion feature of the factor included in the second image.

Wherein, the element matching network can be as shown in fig. 5, and the element matching network includes a feature fusion layer and an optimal matching layer. The feature fusion layer may construct features for element matching, and in order to ensure accuracy of element matching, features of elements in the graph need to be fused in the feature fusion layer, and features of elements between the graph need to be crossed for feature fusion, so that the feature fusion layer may include an selfing layer (self layer) for feature fusion of elements in the graph and a cross layer (cross layer) for feature fusion of elements between the graph. The optimal matching layer is used for performing element matching on elements included in the two images.

In one embodiment, the specific implementation manner of step S402 is: the computer equipment calls a feature fusion layer of the element matching network to perform intra-image fusion on feature information of elements included in the first image and feature information of elements included in the second image respectively to obtain intra-fusion features of the elements included in the first image and intra-fusion features of the elements included in the second image; specifically, the computer device invokes an inbreeding layer in the feature fusion layer to perform image-in fusion on feature information of elements included in the first image to obtain an internal fusion feature of the elements included in the first image, and invokes an inbreeding layer in the feature fusion layer to perform image-in fusion on feature information of elements included in the second image to obtain an internal fusion feature of the elements included in the second image. For example, in FIG. 5, the first image is image I _A The input of the feature fusion layer of the element matching network is an image I _A Geometrical characteristics of the included elements

And description feature table

It will be appreciated that image I _A The geometric characteristics of the i-th element included are

And describes the characteristics of

Setting the number of elements included in the first image to be 4; element 1, element 2, element 3, and element 4, respectively; and then calling an inbreeding layer in the feature fusion layer by the computer equipment to carry out inbreeding processing on the geometric features and the descriptive features of the element 1, the geometric features and the descriptive features of the element 2, the geometric features and the descriptive features of the element 3 and the geometric features and the descriptive features of the element 4 respectively to obtain the internal fusion features of the element 1. Similarly, the computer device may calculate the intra-fusion feature of element 2, the intra-fusion feature of element 3, and the intra-fusion feature of element 4. All in oneFor image I _B The input of the feature fusion layer of the element matching network is an image I _B Geometrical characteristics of the included elements

And description of the characteristic representation

It will be appreciated that image I _B Geometrical characteristics of the i-th element involved

And description of the features

Then according to the obtained image I _A Implementation of the inter-fusion feature of the included elements may result in an image I _B Internal fusion characteristics of the included elements.

Then, the computer device calls a feature fusion layer of the element matching network to perform image fusion on feature information of elements included in the first image and feature information of elements included in the second image to obtain cross fusion features of the elements included in the first image and cross fusion features of the elements included in the second image; for example, image I _B Comprises 3 elements, wherein 3 elements are respectively an element 5, an element 6 and an element 7; for image I _A And in the medium element 1, the computer device calls a cross layer in the feature fusion layer to perform cross processing on the feature information of the element 1 and the element 5, cross processing on the feature information of the element 1 and the feature information of the element 6, and cross processing on the feature information of the element 1 and the feature information of the element 7 to finally obtain the cross fusion feature of the element 1. Similarly, the computer device may determine image I _A Cross-blend features of the remaining elements included, and image I _B Cross-blending features of the included elements.

After obtaining the intra-blending features and the cross-blending features of the elements included in the first image, the computer device obtains the intra-blending features and the cross-blending features of the elements included in the first image according to the intra-blending features and the cross-blending features of the elements included in the first imageThe fused feature determines a first fused feature of the elements comprised by the first image. In a specific implementation, the computer device iterates the internal fusion features and the cross fusion features of the elements included in the first image for N times to obtain first fusion features of the elements included in the first image. The first fused feature may be used for element matching. The first fused feature of the element comprised by the first image may be denoted as f ^A I.e. f ^A ＝[self(f _A ,f _A ),cross(f _A ,f _B )]X N, wherein self (f) _A ,f _A ) It is understood that image a includes elements that are intra-graphically fused with the elements that image a includes; cross (f) _A ,f _B ) It is understood that image a includes elements that are inter-image fused with elements included in image B. The first image includes a first fused feature f of an element ^A The dimensions of (a) are: f. of ^A ∈R ^n1×d (ii) a n1 represents the number of detection results corresponding to the first image, which may also refer to the number of elements included in the first image, and d represents a dimension, for example, d may be 256 dimensions. The first fusion feature for the ith element included in the first image may be expressed as

After obtaining the intra-fusion features and the cross-fusion features of the elements included in the second image, second fusion features of the elements included in the second image are determined according to the intra-fusion features and the cross-fusion features of the elements included in the second image. In a specific implementation, the computer device iterates the internal fusion features and the cross fusion features of the elements included in the second image for N times to obtain second fusion features of the elements included in the second image. The second fused feature is used for element matching. The second fused feature of the element comprised by the second image may be denoted as f ^B I.e. f ^B ＝[self(f _B ,f _B ),cross(f _B ,f _A )]×N，self(f _B ,f _B ) It is understood that image B includes elements that are intra-graphically fused with the elements included in image B; cross (f) _B ,f _A ) It is understood that image B includes elements and that image A includes elementsAnd performing image fusion. Wherein the second image comprises a second fused feature f of the element ^A The dimensions of (a) are: f. of ^B ∈R ^n2×d N2 is the number of detection results corresponding to the second image, and this number of detection results may be the number of elements included in the second image, and d represents the number of dimensions. The second fusion feature for the ith element included in the second image may be expressed as

In one embodiment, before step S402 is performed, the initial neural network needs to be trained to obtain the element matching network. The computer device may obtain a training sample set, where the training sample set includes a positive sample pair, a negative sample pair, and labeling information of each sample pair, where the labeling information includes feature information of elements in each sample of each sample pair; then, calling an initial neural network to process the positive sample pairs and the negative sample pairs to obtain a prediction matching factor corresponding to each sample pair; and adjusting the network parameters of the initial neural network by using the labeling information of each sample pair and the corresponding prediction matching elements to obtain an element matching network. The positive sample pair includes samples having elements corresponding to the same entity, and may be an image pair acquired in the same scene (or the same environment), for example, the positive sample pair includes two images acquired in an intersection scene, which are image 1 and image 2 respectively. The positive sample pair includes elements in image 1 and image 2 that correspond to the same entity. If the image 1 in the positive sample pair includes the element indicators, the image 2 also includes the element indicators, and the element indicators in the image 1 and the element indicators in the image 2 correspond to the same entity. Negative pairs include samples in which there are no elements corresponding to the same entity. The pair of negative examples may be an image pair in different scenes, or the pair of negative examples may also be an image pair acquired in the same scene. Because the image pair acquired in the same scene may or may not have elements corresponding to the same entity; therefore, the elements corresponding to the same entity in the image pair can be used as a positive sample pair, and the marking information of the positive sample pair is the characteristic information of the elements corresponding to the same entity; there are elements that do not correspond to the same entity among the elements included in image 1 and the elements included in image 2; then, the elements which do not correspond to the same entity in the image pair can be used as a negative sample pair, and at this time, the labeling information of the negative sample pair is the feature information of the elements which do not correspond to the same entity, for example, the labeling information of the negative sample pair includes the feature information of the element indicator in the image 1 and the feature information of the element traffic light in the image 2. By acquiring the image pair under the same scene, taking the corresponding element of the same entity in the image pair as a positive sample pair, and taking the element without the corresponding entity in the image pair as a negative sample pair, the acquisition of a large amount of training data can be reduced.

The specific implementation manner of the computer device invoking the initial neural network to process the positive sample pair and the negative sample pair to obtain the predicted matching element corresponding to each sample pair may be: the computer equipment calls the initial neural network to process the positive sample pairs, namely the initial neural network learns the positive sample pairs firstly; then the computer device calls the initial neural network to process the negative sample pair, namely the initial neural network learns the positive sample pair again; thereby being beneficial to obtaining a better element matching network. The number of the positive sample pairs and the number of the negative sample pairs need to meet preset conditions, so that the positive and negative sample pairs are balanced. For example, the number of negative sample pairs does not exceed 20% of the number of positive sample pairs. The computer equipment calls the initial neural network to process the 10 positive sample pairs, namely, the initial neural network firstly learns the 10 positive sample pairs; the computer device then invokes the initial neural network to process the 2 negative sample pairs, i.e., lets the initial neural network relearn the 2 positive sample pairs.

In one embodiment, the computer device adjusts the network parameters of the initial neural network by using the label information of each sample pair and the corresponding predicted matching element, and the specific implementation manner of the obtained element matching network is as follows: and the computer equipment calculates the model loss of the initial neural network according to the labeling information of each sample pair and the corresponding predicted matching element, and then adjusts the network parameters of the initial neural network according to the direction of reducing the model loss to obtain the element matching network. Wherein, the model loss calculation formula is as follows:

wherein (i, j) epsilon M represents the labeling information of elements included in each sample in the positive sample pair; i belongs to I and represents the marking information of elements included in the sample I in the positive sample pair; j e J represents the labeling information of the elements included in the sample J in the positive sample pair.

And S403, matching the elements in the first image with the elements in the second image according to the first fusion feature and the second fusion feature to obtain an element matching result.

In a specific implementation, the computer device may invoke an optimal distribution layer of the element matching network to perform feature cross multiplication (which may also be referred to as inner product calculation) on a first fused feature of an element in the first image and a second fused feature of an element in the second image, so as to obtain a matching matrix between the element in the first image and the element in the second image. For example, image I _A The first fusion characteristic of the ith element of (1) is

Image I _B The second fusion characteristic of the jth element in (a) is

The computer device can invoke the optimal distribution layer pair of the element matching network

And

performing characteristic cross multiplication to obtain a multiplication fraction S _i，j According to the multiplication fraction S _i，j Finally generate image I _A Element in (1) and image I _B A matching matrix between the elements in (a). WhereinThe dimension of the matching matrix may be determined by the number of elements of the first image and the number of elements of the second image. For example, the above-mentioned image I _A First fusion feature f of the element in (1) ^A The dimensions of (a) are: f. of ^A ∈R ^n1×d (ii) a Image I _B Second fusion feature f of the element in (1) ^B The dimensions of (a) are: f. of ^B ∈R ^n2×d (ii) a Computer device calls optimal distribution layer pair image I of element matching network _A First fusion feature of the element in (1) and image I _B The second fusion characteristics of the elements in (1) are subjected to characteristic cross multiplication to obtain I _A Element (1) and _B m, the dimension of the matching matrix being M e R ^{(n1+1)×(n2+1)} . Among them, the embodiment of the present application considers that in the image I _A Including elements and images I _B There are elements included that do not match, so the embodiments of the present application can be extended by the dimension of the matching matrix, i.e., the matching matrix M ∈ R ^{(n1+1)×(n2+1)} +1 in (1) is to solve the case where there is no matching element.

After obtaining the matching matrix, the optimal transmission algorithm may be used to perform optimal transmission calculation on the matching matrix, so as to obtain an optimal distribution matrix P between the elements in the first image and the elements in the second image _c (ii) a And determining an element matching result of the elements in the first image and the elements in the second image according to the distribution matrix. Each row in the optimal allocation matrix may represent an element in the first image; each column may represent an element in the second image; or each row in the optimal allocation matrix may represent an element in the second image; each column may represent an element in the first image and the optimal allocation matrix P _c The sum of the scores in each row is 1, and the sum of the scores in each column is also 1. Wherein, the optimal transmission algorithm can be sinkhorn algorithm; the corresponding score in the optimal assignment matrix may be used to indicate a correlation between an element in the first image and a corresponding element in the second image.

In one embodiment, the specific implementation of the computer device determining the element matching result of the element in the first image and the element in the second image according to the optimal allocation matrix is as follows: the computer device determines an element in the first image and a corresponding element in the second image satisfying the element matching condition in the optimal allocation matrix as a matching element, and determines an element in the first image and a corresponding element in the second image not satisfying the element matching condition as a non-matching element. That is, the element matching result may include at least one of: a matching element and a non-matching element. The higher the score in the optimal distribution matrix is, the greater the relevance of the two elements, the more the elements are associated, and thus the elements may not be associated in a one-to-many manner, and therefore the element matching condition may include one or more of the following: the corresponding scores of the elements in the first image and the elements in the second image in the optimal distribution matrix are greater than a target threshold, for example, the target threshold is 0.2; and the scores corresponding to the elements in the first image and the elements in the second image are maximum in the row and the column of the optimal distribution matrix.

And S404, updating the image library based on the element matching result and the first image.

In one embodiment, the element matching result comprises a matching element which is a first element in the first image and a second element in the second image, and the computer device establishes an association relationship for the matching element. For example, in fig. 5, after step S403, there are 4 pairs of matching elements in the obtained element matching result, and the computer device establishes an association relationship between each pair of matching elements in the 4 pairs of matching elements (as the dotted line in fig. 5 indicates the association relationship established between each pair of matching elements); then the computer equipment does not update the matching elements in the image library according to the incidence relation of the matching elements; and updating the image library according to the other elements except the matched element in the first image.

In the embodiment of the application, the computer device acquires the feature information of the first image and the feature information of the second image, and then invokes the element matching network to perform feature fusion on the feature information of the first image and the feature information of the second image to obtain a first fusion feature of elements included in the first image and a second fusion feature of elements included in the second image; and then matching the elements in the first image with the elements in the second image according to the first fusion characteristic and the second fusion characteristic to obtain an element matching result, wherein the characteristics for matching obtained through the element matching network can better adapt to the conditions of long-time change, visual angle change, lens distortion and the like, element matching is completed through the element matching network, one-to-one matching is not required to be obtained through any artificial design strategy, the ambiguity problem of element matching can be effectively solved, and the method is very suitable for multi-visual-angle element matching under long time and different visual angles. And then, updating the image library based on the element matching result and the first image, so that the accuracy of updating the image library can be improved.

Based on the image processing method provided above, it is a schematic structural diagram of an image processing apparatus provided in the embodiments of the present application. As shown in fig. 6, the image processing apparatus may be applied to the computer device in the corresponding embodiment of fig. 2 or fig. 4 described above; specifically, the image processing apparatus may be a computer program (including program code) running in a computer device, for example, the image processing apparatus is an application software; the image processing device can be used for executing corresponding steps in the method provided by the embodiment of the application.

An obtaining unit 601, configured to obtain feature information of an element included in a first image and feature information of an element included in a second image, where the first image includes an environment image of a location acquired by a data acquisition device, and the second image is an image similar to the first image in an image library included in map data;

a processing unit 602, configured to match the element in the first image with the element in the second image according to feature information of the element included in the first image and feature information of the element included in the second image, so as to obtain an element matching result;

the processing unit 602 is further configured to update the image library based on the element matching result and the first image.

In an embodiment, the processing unit 602 is further configured to:

calling a retrieval network to acquire one or more images similar to the first image from candidate images included in the image library, wherein each candidate image in the image library correspondingly has one or more three-dimensional coordinates, and each three-dimensional coordinate corresponds to one or more two-dimensional coordinates;

and determining a second image from the one or more images according to the three-dimensional coordinates corresponding to each image and the two-dimensional coordinates corresponding to each three-dimensional coordinate.

In an embodiment, when determining the second image from the one or more images according to the three-dimensional coordinates corresponding to each image and the two-dimensional coordinates corresponding to each three-dimensional coordinate, the processing unit 602 may specifically be configured to:

acquiring target three-dimensional coordinates of which the number of corresponding two-dimensional coordinates exceeds a number threshold from one or more three-dimensional coordinates corresponding to each image;

acquiring a first total number of the two-dimensional coordinates corresponding to each image and a second total number of the two-dimensional coordinates corresponding to the target three-dimensional coordinates;

determining a reconstruction rate of each image according to the first total number and the second total number;

and taking the image of which the corresponding reconstruction rate is greater than or equal to the target threshold value in the one or more images as a second image.

In an embodiment, when the processing unit 602 performs the update processing on the map based on the element matching result and the first image, it may specifically be configured to:

and if the element matching result indicates that target elements exist in the elements included in the first image, updating the image library by using the first image, wherein the target elements are any elements in the first image, and the second image does not include the target elements.

In an embodiment, when the processing unit 602 matches the element in the first image and the element in the second image according to the feature information of the element included in the first image and the feature information of the element included in the second image to obtain an element matching result, the processing unit may specifically be configured to:

calling an element matching network to perform feature fusion on feature information of elements included in the first image and feature information of elements included in the second image to obtain first fusion features of the elements included in the first image and second fusion features of the elements included in the second image;

and matching the elements in the first image with the elements in the second image according to the first fusion feature and the second fusion feature to obtain an element matching result.

In an embodiment, when invoking an element matching network to perform feature fusion on feature information of an element included in the first image and feature information of an element included in the second image to obtain a first fusion feature of the element included in the first image and a second fusion feature of the element included in the second image, the processing unit 602 may specifically be configured to:

calling a feature fusion layer of the element matching network to perform intra-image fusion on feature information of elements included in the first image and feature information of elements included in the second image respectively to obtain internal fusion features of the elements included in the first image and internal fusion features of the elements included in the second image;

calling a feature fusion layer of the element matching network to perform inter-image fusion on feature information of elements included in the first image and feature information of elements included in the second image to obtain cross fusion features of the elements included in the first image and cross fusion features of the elements included in the second image;

determining a first fusion feature of the elements included in the first image according to the internal fusion feature and the cross fusion feature of the elements included in the first image;

and determining a second fused feature of the elements included in the second image according to the internal fused feature and the cross fused feature of the elements included in the second image.

In one embodiment, the obtaining unit 601 is further configured to obtain a training sample set, where the training sample set includes a positive sample pair, a negative sample pair, and labeling information of each sample pair, where the labeling information includes feature information of elements in each sample of the each sample pair, the positive sample pair includes samples in which elements corresponding to the same entity exist, and the negative sample pair includes samples in which elements corresponding to the same entity do not exist;

the processing unit 602 is further configured to invoke an initial neural network to process the positive sample pair and the negative sample pair, so as to obtain a prediction matching element corresponding to each sample pair; and adjusting the network parameters of the initial neural network by using the labeling information of each sample pair and the corresponding prediction matching elements to obtain an element matching network.

It can be understood that the functions of each functional unit of the image processing apparatus of this embodiment may be specifically implemented according to the method in the foregoing method embodiment, and the specific implementation process may refer to the description related to fig. 2 or fig. 4 in the foregoing method embodiment, which is not described herein again.

Further, please refer to fig. 7, where fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present application. The computer device in the corresponding embodiment of fig. 2 or fig. 4 may be the computer device shown in fig. 7. As shown in fig. 7, the computer device may include: a processor 701, an input device 702, an output device 703, and a memory 704. The processor 701, the input device 702, the output device 703, and the memory 704 are connected by a bus 705. The memory 704 is used to store a computer program comprising program instructions, and the processor 701 is used to execute the program instructions stored by the memory 704.

In the embodiment of the present application, the processor 701 executes the executable program code in the memory 704 to perform the following operations:

acquiring feature information of elements included in a first image and feature information of elements included in a second image, wherein the first image includes an environment image of a position acquired by data acquisition equipment, and the second image is an image similar to the first image in an image library included in map data;

matching the elements in the first image and the elements in the second image according to the feature information of the elements in the first image and the feature information of the elements in the second image to obtain an element matching result;

In one embodiment, the processor 701 is further configured to:

calling a retrieval network to obtain one or more images similar to the first image from candidate images included in the image library, wherein each candidate image in the image library is corresponding to one or more three-dimensional coordinates, and each three-dimensional coordinate is corresponding to one or more two-dimensional coordinates;

In an embodiment, the processor 701, when determining the second image from the one or more images according to the three-dimensional coordinates corresponding to each image and the two-dimensional coordinates corresponding to each three-dimensional coordinate, may specifically be configured to:

and taking the image of which the corresponding reconstruction rate is greater than or equal to a target threshold value in the one or more images as a second image.

In an embodiment, when the processor 701 performs the update processing on the map based on the element matching result and the first image, it may specifically be configured to:

In an embodiment, when the processor 701 matches the element in the first image with the element in the second image according to the feature information of the element included in the first image and the feature information of the element included in the second image to obtain an element matching result, the processor may specifically be configured to:

In an embodiment, when invoking an element matching network to perform feature fusion on feature information of an element included in the first image and feature information of an element included in the second image to obtain a first fusion feature of the element included in the first image and a second fusion feature of the element included in the second image, the processor 701 may specifically be configured to:

calling a feature fusion layer of the element matching network to perform image fusion on feature information of elements included in the first image and feature information of elements included in the second image to obtain cross fusion features of the elements included in the first image and cross fusion features of the elements included in the second image;

In one embodiment, the processor 701 is further configured to:

acquiring a training sample set, wherein the training sample set comprises a positive sample pair, a negative sample pair and marking information of each sample pair, the marking information comprises characteristic information of elements in each sample of each sample pair, the samples of the positive sample pair comprise elements corresponding to the same entity, and the samples of the negative sample pair comprise no elements corresponding to the same entity;

calling an initial neural network to process the positive sample pairs and the negative sample pairs to obtain a prediction matching factor corresponding to each sample pair;

and adjusting the network parameters of the initial neural network by using the labeling information of each sample pair and the corresponding prediction matching elements to obtain an element matching network.

It should be understood that, in the embodiment of the present Application, the Processor 701 may be a Central Processing Unit (CPU), and the Processor 701 may also be other general-purpose processors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field-Programmable Gate arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 704 may include both read-only memory and random-access memory, and provides instructions and data to the processor 701. A portion of the memory 704 may also include non-volatile random access memory.

The input device 702 may include a camera assembly or the like, and inputs a first image to the processor 701; the output device 703 may include a display or the like.

In a specific implementation, the processor 701, the input device 702, the output device 703, and the memory 704 described in this embodiment may perform the implementation described in all the embodiments, or may also perform the implementation described in the apparatus described above, and no further description is provided herein.

In an embodiment of the present application, a computer-readable storage medium is provided, and a computer program is stored in the computer-readable storage medium, where the computer program includes program instructions, and when the program instructions are executed by a processor, the steps performed in all the embodiments described above may be performed.

Embodiments of the present application further provide a computer program product or a computer program, where the computer program product or the computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium, and when the computer instructions are executed by a processor of a computer device, the computer instructions perform the methods in all the embodiments described above.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. An image processing method, comprising:

2. The method of claim 1, further comprising:

3. The method of claim 2, wherein determining the second image from the one or more images based on the three-dimensional coordinates corresponding to each image and the two-dimensional coordinates corresponding to each three-dimensional coordinate comprises:

4. The method according to any one of claims 1 to 3, wherein the updating the map based on the element matching result and the first image includes:

if the element matching result indicates that target elements exist in the elements included in the first image, updating the image library by using the first image, wherein the target elements are any elements in the first image, and the second image does not include the target elements.

5. The method according to claim 1, wherein the matching the element in the first image and the element in the second image according to the feature information of the element included in the first image and the feature information of the element included in the second image to obtain an element matching result comprises:

and matching the elements in the first image with the elements in the second image according to the first fusion characteristic and the second fusion characteristic to obtain an element matching result.

6. The method according to claim 5, wherein the invoking the feature matching network to perform feature fusion on the feature information of the feature included in the first image and the feature information of the feature included in the second image to obtain a first fused feature of the feature included in the first image and a second fused feature of the feature included in the second image comprises:

calling a feature fusion layer of the element matching network to respectively perform intra-image fusion on feature information of elements included in the first image and feature information of elements included in the second image to obtain intra-fusion features of the elements included in the first image and intra-fusion features of the elements included in the second image;

and determining a second fusion feature of the elements included in the second image according to the internal fusion feature and the cross fusion feature of the elements included in the second image.

7. The method of claim 5, further comprising:

8. An image processing apparatus characterized by comprising:

an acquisition unit, configured to acquire feature information of an element included in a first image and feature information of an element included in a second image, where the first image includes an environmental image of a location acquired by a data acquisition device, and the second image is an image similar to the first image in an image library included in map data;

the processing unit is used for matching the elements in the first image and the elements in the second image according to the feature information of the elements in the first image and the feature information of the elements in the second image to obtain an element matching result;

9. A computer device, comprising:

a processor adapted to implement one or more computer programs; and

computer storage medium storing one or more computer programs adapted to be loaded by the processor and to perform the image processing method according to any of claims 1-8.

10. A computer storage medium, characterized in that the computer storage medium stores a computer program which, when executed by a processor, performs the image processing method of any one of claims 1 to 8.