CN113902047B

CN113902047B - Image element matching method, device, equipment and storage medium

Info

Publication number: CN113902047B
Application number: CN202111509469.4A
Authority: CN
Inventors: 谭川奇
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-12-10
Filing date: 2021-12-10
Publication date: 2022-03-04
Anticipated expiration: 2041-12-10
Also published as: CN113902047A

Abstract

The embodiment of the application discloses an image element matching method, device, equipment and storage medium, which are applicable to the fields of maps, navigation, automatic driving, Internet of vehicles, intelligent transportation, artificial intelligence, cloud computing and the like. The method comprises the following steps: determining a depth map corresponding to the first road image; determining a second road image matched with the second road image, and determining a prediction area of a first image element in the first road image in the second road image based on the depth map; and determining the element type of the first image element, and if the prediction region comprises a second image element with the same element type, determining that the first image element and the second image element are the same image element. By adopting the embodiment of the application, the same image elements in different road images can be accurately and efficiently determined, and the applicability is high.

Description

Image element matching method, device, equipment and storage medium

Technical Field

The present application relates to the field of intelligent transportation, and in particular, to a method, an apparatus, a device, and a storage medium for matching image elements.

Background

In an existing road application scene (such as a road automation scene, a high-precision map scene, an automatic driving scene, and the like), it is often necessary to identify and associate the same image element in different road images to implement computer vision tasks such as target tracking, map generation, and the like. The method is generally realized by image feature point matching, Kalman filtering and the like in the traditional scheme, on one hand, the accuracy of the scheme is not enough, the updating range and frequency of the road image are high, the data acquisition amount is huge, on the other hand, huge manpower and material resources are consumed, and the method is difficult to be widely applied.

Therefore, how to efficiently and accurately determine the same image elements in different road images becomes an urgent problem to be solved.

Disclosure of Invention

The embodiment of the application provides an image element matching method, device, equipment and storage medium, which can accurately and efficiently determine the same image elements in different road images and have high applicability.

In one aspect, an embodiment of the present application provides an image element matching method, where the method includes:

determining a depth map corresponding to a first road image, wherein the image depth of each pixel point of the depth map represents the distance between a corresponding image element and shooting equipment, and the first road image is any road image shot by the shooting equipment along a first travel track;

determining a second road image matched with the first road image, and determining a prediction area of a first image element in the first road image in the second road image based on the depth map;

and determining an element type of the first image element, and if the prediction region includes a second image element having the same element type, determining that the first image element and the second image element are the same image element.

In another aspect, an embodiment of the present application provides an image element matching apparatus, including:

the system comprises a determining module, a calculating module and a processing module, wherein the determining module is used for determining a depth map corresponding to a first road image, the image depth of each pixel point of the depth map represents the distance between a corresponding image element and shooting equipment, and the first road image is any road image shot by the shooting equipment along a first travel track;

a prediction module, configured to determine a second road image matching the first road image, and determine, based on the depth map, a prediction region of a first image element in the first road image in the second road image;

and a determining module, configured to determine an element type of the first image element, and if the prediction region includes a second image element having the same element type, determine that the first image element and the second image element are the same image element.

In another aspect, an embodiment of the present application provides an electronic device, including a processor and a memory, where the processor and the memory are connected to each other;

the memory is used for storing computer programs;

the processor is configured to execute the image element matching method provided by the embodiment of the application when the computer program is called.

In another aspect, an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored, where the computer program is executed by a processor to implement the image element matching method provided in the embodiment of the present application.

In another aspect, the present application provides a computer program product, which includes a computer program or computer instructions, and when the computer program or the computer instructions are executed by a processor, the computer program or the computer instructions implement the image element matching method provided by the present application.

In the embodiment of the application, because the depth map corresponding to the first road image can be used for representing the distance between each pixel point in the first road image and the shooting device, the predicted position of the first image element in the first road image in the second road image can be accurately and efficiently determined by the depth map corresponding to the first road image under the condition that the distance between the image element and the shooting device is fully considered. And by judging the element type of the second image element in the prediction region, the second image element in the prediction region and the first image element in the first road image can be further ensured to be the same image element under the condition that the prediction region comprises the image element, the accuracy of image element matching is further improved, and the applicability is high.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic flowchart of an image element matching method provided in an embodiment of the present application;

FIG. 2 is a schematic diagram of a road image provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of a road image and a depth map provided by an embodiment of the present application;

FIG. 4 is a schematic view of a scene for determining a shooting position according to an embodiment of the present disclosure;

FIG. 5 is a schematic view of a scene for determining an interference image element according to an embodiment of the present application;

fig. 6a is a schematic view of a scene for determining a first image feature point according to an embodiment of the present application;

fig. 6b is a schematic view of another scene for determining a first image feature point according to an embodiment of the present application;

FIG. 7 is a schematic view of a scene for determining feature pairs between road images according to an embodiment of the present application;

FIG. 8 is a schematic view of a scene for determining a first image element according to an embodiment of the present application;

FIG. 9a is a schematic structural diagram of a neural network provided in an embodiment of the present application;

FIG. 9b is a schematic diagram of another structure of a neural network provided in an embodiment of the present application;

FIG. 10 is a flowchart of a method for determining a prediction region according to an embodiment of the present application;

FIG. 11 is a schematic diagram of a depth map provided by an embodiment of the present application;

FIG. 12 is a schematic diagram of a scenario for determining a target area according to an embodiment of the present application;

FIG. 13 is a schematic illustration of planar imaging provided by embodiments of the present application;

fig. 14 is a scene schematic diagram of a mapping relationship between a pixel point and an epipolar line according to an embodiment of the present application;

FIG. 15 is a scene diagram of a prediction area provided in an embodiment of the present application;

FIG. 16a is a schematic diagram of a scene of an associated image element according to an embodiment of the present application;

FIG. 16b is a schematic diagram of another scene of an associated image element provided by an embodiment of the present application;

fig. 17 is a schematic structural diagram of an image element matching apparatus provided in an embodiment of the present application;

fig. 18 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The image element matching method provided by the embodiment of the application can be suitable for the fields of maps, navigation, automatic driving, Intelligent Vehicle control, internet of vehicles, Intelligent transportation, cloud computing and the like, such as Intelligent Traffic Systems (ITS) and Intelligent vehicular access coordination Systems (IVICS) in the transportation field.

The Intelligent Transportation System is a comprehensive Transportation System which effectively and comprehensively applies advanced scientific technologies (information technology, computer technology, data communication technology, sensor technology, electronic control technology, automatic control theory, operation research, artificial intelligence and the like) to Transportation, service control and vehicle manufacturing and strengthens the relation among vehicles, roads and users, thereby ensuring safety, improving efficiency, improving environment and saving energy. Based on the image element matching method provided by the embodiment of the application, the actual running track of the vehicle can be determined and adjusted in real time, so that powerful guarantee is provided for the aspects of transportation, service control and the like.

The intelligent vehicle-road cooperative system is a development direction of an Intelligent Transportation System (ITS). The vehicle-road cooperative system adopts the advanced wireless communication, new generation internet and other technologies, implements vehicle-vehicle and vehicle-road dynamic real-time information interaction in all directions, develops vehicle active safety control and road cooperative management on the basis of full-time dynamic traffic information acquisition and fusion, fully realizes effective cooperation of human and vehicle roads, ensures traffic safety, improves traffic efficiency, and thus forms a safe, efficient and environment-friendly road traffic system. The image element matching method provided by the embodiment of the application can provide technical support for traffic safety and vehicle and road cooperation.

The image element matching method provided by the embodiment of the application can also be applied to computer vision tasks such as road image matching and tracking, or related fields based on image element matching or main auxiliary means such as road production cartoons and high-precision map generation, so that the efficiency and the accuracy of image element matching are improved.

The image element matching method provided by the embodiment of the application can be executed by a terminal or a server, wherein the server can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and a cloud server providing cloud computing service. The terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, a vehicle-mounted terminal, a smart television, and the like.

Referring to fig. 1, fig. 1 is a schematic flowchart of an image element matching method provided in an embodiment of the present application. As shown in fig. 1, an image element matching method provided in an embodiment of the present application may include the following steps:

and step S11, determining a depth map corresponding to the first road image.

In some possible embodiments, the first road image is any one of road images captured by the capturing device along the first travel track, where the capturing device may be a mobile terminal, a vehicle data recorder, or other device with image capturing capability, which is not limited herein.

As an example, the first road image may be any one of road images captured by a vehicle recorder during the vehicle traveling along the first travel track.

As shown in fig. 2, fig. 2 is a schematic diagram of a road image provided in an embodiment of the present application. Fig. 2 shows a road image taken by the tachograph during the travel of the vehicle along the first travel track, and can therefore be determined as the first road image. And the road image comprises image elements such as lane lines, traffic signs, shooting watermarks and other vehicles.

As an example, the first road image may be any one of road images photographed by the mobile terminal while the pedestrian walks along the first travel trajectory.

The shooting equipment can shoot road images according to a preset time interval or a preset distance interval, and the preset time interval or the preset distance interval can be determined based on actual application scene requirements and is not limited herein.

Optionally, when the first road image is selected, the positioning information of the photographing apparatus when photographing each road image along the first travel track may be determined, and then the first positioning point and the travel direction of the photographing apparatus when photographing each road image along the first travel track may be determined based on the positioning information.

Further, for any road image taken by the photographing apparatus along the first travel track, a straight-line distance of the first location point of the photographing apparatus at the time of taking the road image and the first travel track may be determined, and a direction deviation of the travel direction of the photographing apparatus at the time of taking the road image compared to the first travel track may be determined. If the straight-line distance between the first positioning point of the shooting device and the first travel track when the shooting device shoots the road image is larger than or equal to a certain threshold (for convenience of description, hereinafter referred to as a fifth threshold), and/or the direction deviation of the travel direction of the shooting device when the shooting device shoots the road image compared with the first travel track is larger than or equal to a certain threshold (for convenience of description, hereinafter referred to as a sixth threshold), the road image is indicated as the road image which is not shot by the shooting device in the normal travel process, and therefore the road image can be removed and the first road image can be selected from the rest of the road images.

The straight-line distance between the first positioning point and the first travel track is the projection distance from the first positioning point to the first travel track, the travel direction of any road image shot by the shooting equipment is deviated from the direction of the first travel track, the travel direction of the road image shot by the shooting equipment is deviated from the travel direction of the corresponding travel point in the first travel track, and the travel point is the projection point from the first positioning point to the first travel track when the road image shot by the shooting equipment.

Optionally, when the first road image is selected, the selection may be performed based on actual selection requirements, for example, determining a road image located at a preset position in the road image shot by the shooting device along the first travel track as the first road image, or determining a road image including preset image elements (such as preset buildings, preset traffic signs, and the like) as the first road image, or determining a road image shot at a preset time as the first road image, and the like, and the determination may be specifically based on actual application scene requirements, which is not limited herein.

In some possible embodiments, the image depth of each pixel point of the depth map corresponding to the first road image represents the distance between the corresponding image element and the shooting device. When determining the depth map corresponding to the first road, the relative distance between the image elements corresponding to the pixel points in the first road image can be determined.

Specifically, the first road image can be processed through the feature processing network to obtain the image features of the first road image, and then the relative distance between image elements corresponding to each pixel point in the first road image is determined based on the depth of field estimation network.

The depth-of-field estimation network is a neural network model and can be obtained by pre-training based on a training sample set, wherein the training sample set comprises a plurality of sample road images. During specific training, the image features of each sample road image may be input into the initial depth of field estimation network to obtain the predicted relative distance between the image elements corresponding to each pixel point in each sample road image, and then a training loss value (hereinafter referred to as a first training loss value for convenience of description) is determined based on the actual relative distance and the predicted relative distance between the image elements corresponding to each pixel point in each sample road image. And performing iterative training on the initial depth of field estimation network based on the training sample set and the first training loss value until the first training loss value meets the training ending condition, and determining the network at the training ending time as the final depth of field estimation network.

The training sample set can be constructed based on the existing road image and also can be constructed based on a KITTI data set. The KITTI data set comprises real road image data acquired in urban areas, villages, expressways and other scenes, and a training sample set for training the depth-of-field estimation network can be constructed based on the KITTI data set.

Further, based on the relative distance between the image elements in the first road image and the shooting position of the shooting device when shooting the first road image, the distance between each image element in the first road image and the shooting device can be determined, so that based on the distance between each image element in the first road image and the shooting device, a depth map corresponding to the first road image can be obtained through a pinhole imaging principle.

When the distance between the image element corresponding to each pixel point in the first road image and the shooting device is determined, fixed image elements in the first road image, such as image elements with unchanged positions of buildings, signs and the like, can be determined, and then the positions of the fixed image elements are determined based on road data of a road network where the first travel track is located. Meanwhile, the shooting position of the shooting device when shooting the first road image can be determined through the positioning information of the shooting device when shooting the first road image, and further the distance between the fixed image element and the shooting device can be determined based on the position of the fixed image element and the shooting position of the shooting device when shooting the first road image.

After the distances between the fixed image elements and the shooting device are obtained, the distances between the image elements in the first road image and the shooting device can be determined based on the relative distances between the image elements in the first road image, so that the depth map corresponding to the first road image can be further obtained through the pinhole imaging principle.

Alternatively, a road image (hereinafter referred to as a third road image for convenience of description) in which the first road image is adjacent in the first travel locus may be determined, and a relative distance between image elements in the third road image may be determined. Meanwhile, determining a distance difference value of shooting positions of the shooting equipment for shooting the first road image and the third road image, determining a change difference value of relative distances corresponding to the same fixed image elements in the first road image and the third road image, determining a corresponding relation between the distance between the image elements and the shooting positions and the relative distances based on the change difference value, and further determining the distance between each image element in the first road image and the shooting equipment based on the corresponding relation.

Alternatively, the distance between each image element in the road image and the shooting device may also be directly determined based on other devices, such as a vehicle-mounted laser radar, without limitation.

Referring to fig. 3, fig. 3 is a schematic diagram of a road image and a depth map provided in an embodiment of the present application. As fig. 3 shows a first road image taken by a tachograph, a depth map corresponding to the first road image in fig. 3 can be determined based on the above. The image depth of each pixel point in the depth map represents the distance between an image element corresponding to the pixel point and the automobile data recorder, that is, for the first road image in fig. 3, the farther an image element is shot by the automobile data recorder, the greater the image depth of the image element corresponding to the pixel point in the depth map. For the depth map, the larger the image depth of the pixel point is, the larger the gray value is, the darker the color is, the smaller the image depth of the pixel point is, the smaller the gray value is, and the lighter the color is.

For example, the first road image includes two traffic signs, namely, sign 1 for indicating the direction and sign 2 for indicating the driving speed, and the distance between sign 1 and the driving recorder is less than the distance between sign 2 and the driving recorder. After determining the depth map of first road image, the image depth of a certain pixel in the place of sign 1 is 5, and the image depth of a certain pixel in the place of sign 2 is 8 to it is difficult to confirm that the distance between sign 2 and the vehicle event data recorder is greater than the distance between sign 1 and the vehicle event data recorder.

The distance between the corresponding image element and the automobile data recorder can be determined based on the depth of each pixel point in the depth map, namely, the distance between the image element corresponding to each pixel point in the depth map and the automobile data recorder can be determined based on the conversion relation between the pixel point and the distance.

In some possible embodiments, the shooting position of the shooting device when shooting the first road image may be determined by the positioning information of the shooting device when shooting the first road image. And if the first positioning point is determined based on the longitude and latitude information of the positioning information, determining the first positioning point as a shooting position of the shooting equipment when the first road image is shot.

Because the first positioning point determined based on the positioning information often has an error with an actual position where the shooting device shoots the first road image, the shooting position when the shooting device shoots the first road image can be determined based on the first travel track corresponding to the first road image and the first positioning point.

Specifically, after the first positioning point when the first road image is captured by the capturing device is determined, the projection position of the first positioning point corresponding to the first travel track is determined, and the projection position is determined as the capturing position when the first road image is captured by the capturing device.

Referring to fig. 4, fig. 4 is a schematic view of a scene for determining a shooting position according to an embodiment of the present application. As shown in fig. 4, a point a is a first positioning point of the photographing apparatus when the first road image is photographed, which is determined based on the positioning information, and a projection position B of the first positioning point to the first travel trajectory may be further determined as a photographing position of the photographing apparatus when the first road image is photographed. Namely, the position closest to the first fixed point in the first travel track is determined as the shooting position of the shooting device when the first road image is shot.

In some possible embodiments, the shooting position of the shooting device when shooting any one road image can also be determined based on the differential GPS. Firstly, a pseudo-range correction quantity or a position correction quantity is obtained by using a differential GPS reference platform with known accurate three-dimensional coordinates, and then the correction quantity is used for correcting the positioning information of the shooting equipment to obtain the shooting position of the shooting equipment in the shooting road image.

Step S12, determining a second road image matching the first road image, and determining a predicted region of the first image element in the first road image in the second road image based on the depth map.

In some possible embodiments, the second road image matching the first road image is a road image of road images captured by the capturing device along the second travel track, and the road image may include the same fixed image elements as those in the first road image, such as the same traffic signs or the same buildings.

The first travel track corresponding to the second travel track and the first road image may be the same travel track, that is, the first road image and the second road image may be different road images in the same travel track. Alternatively, the first travel track and the second travel track may be different travel tracks, such as travel tracks generated when the vehicle in which the photographing apparatus is located travels the same road or different roads at different times, that is, the first road image and the second road image may be different road images in different travel tracks.

Specifically, when determining the second road image matching the first road image, the image feature points of the first road image and the image feature points of the road images captured by the capturing device along the second travel track may be determined, and the second road image matching the first road image may be determined from the road images corresponding to the second travel track further according to the image feature points of the first road image and the image feature points of the road images corresponding to the second travel track.

When determining the image feature points of the road image, the image feature points of the road image may be determined through a neural network, for example, through a network such as D2NET, SuperPoint, or the like. It should be noted that, for the first road image and each road image corresponding to the second travel track, the same neural network is used to determine the feature points.

The image feature points in any road image are pixel points with significantly changed gray scale values or pixel points with greatly changed curvature (such as image edge points), and each image feature point may contain abundant image information. And then the image characteristic points of any road image can identify each image element in the image to a certain extent so as to reflect the essence content of the road image.

For any one of the road images corresponding to the second travel track, feature point matching can be performed on the image feature points of the road image and the image feature points of the first road image, that is, for each image feature point of the road image, an image feature point matched with the image feature point is determined from the first road image. The number of feature pairs corresponding to the road image and the first road image may be further determined, and if the number of feature pairs is greater than a certain threshold (for convenience of description, hereinafter referred to as a first threshold), it may be determined that most image elements in the road image and the first road image are the same, and thus it may be determined that the road image is a second road image matched with the first road image.

Each feature pair corresponding to the road image and the second road image comprises an image feature point in the road image and an image feature point matched with the image feature point in the first road image. The feature point matching between the road images can be realized based on the FLANN algorithm, the SuperGlue algorithm and other manners, and is not limited herein. And the first threshold may be determined based on actual application scenario requirements, for example, the first threshold may be 40, which is not limited herein.

Optionally, since the road image often includes non-fixed image elements, such as pedestrians, vehicles coming and going, and even image watermarks, when feature point matching is performed on image feature points of any one road image corresponding to the second travel track and the first road image, the feature points corresponding to the non-fixed image elements may affect a matching result, so that the determined second road image and the determined first road image may not include the same fixed image elements.

Based on this, after determining the image feature points of the first road image and the image feature points of the road images corresponding to the second travel track, the interference image elements in the first road image and the road images corresponding to the second travel track may be further determined. The disturbing image elements and the non-fixed image elements in the road image, such as the vehicle rearview mirrors, pedestrians, etc., may be determined based on the actual application scene requirements, and are not limited herein.

Referring to fig. 5, fig. 5 is a schematic view of a scene for determining an interference image element according to an embodiment of the present application. If the interference image elements are vehicles, pedestrians, watermarks, animals, and the like, after the interference image elements are identified for the road image shown in fig. 2, it is determined that the vehicles in fig. 2 (the vehicle in which the drive recorder is located and the preceding vehicle) and the watermarks are the interference image elements.

Wherein the disturbance image elements in the road image can be determined based on a disturbance recognition network, and the disturbance recognition network is likewise a neural network model. The road image can be processed through the feature processing network to obtain the image features of the road image, and then the interference image elements in the road image are determined based on the interference identification network.

The interference identification network can be obtained based on training of a plurality of sample road images. During specific training, the image characteristics of each sample road image can be input into the initial interference recognition network, and the interference image elements of each sample road image are obtained. Each image element in each sample road image is marked with a sample label, and each sample label represents that the corresponding image element is an interference image element or other image elements except for a non-interference image element. And then determining a training loss value (for convenience of description, referred to as a second training loss value hereinafter) based on the interference image elements predicted by the initial interference recognition network and whether the interference image elements are actually interference image elements, and performing iterative training on the initial interference recognition network based on the second training loss value and each sample road image until the second training loss value meets a training end condition, and determining the network at the training end as a final interference recognition network.

The training sample set can be constructed based on the existing road image, and can also be constructed based on the KITTI data set, which is not limited herein.

Further, image feature points, which are located outside the area where the interference image element is located, in the first road image are determined as first image feature points, and image feature points, which are located outside the area where the interference image element is located, in any one of the road images corresponding to the second travel track are determined as second image feature points.

When the first image feature point or the second image feature point is determined, the interference image element in the road image can be determined first, and then the image feature point outside the area where the interference image is located in the road image is determined and is used as the first image feature point or the second image feature point.

Taking the first road image as an example, referring to fig. 6a, fig. 6a is a schematic view of a scene for determining the first image feature point according to the embodiment of the present application. For the first road image shown in fig. 2, the interference image elements therein may be determined and removed, so as to obtain the first road image with the interference image elements removed. And further determining the image characteristic points in the first road image after the interference image elements are removed, and obtaining the first image characteristic points in the first road image.

When the first image feature point or the second image feature point is determined, all image feature points and interference image elements in the road image can be determined, and then the image feature points in the area where the interference image elements are located are removed, so that the first image feature point or the second image feature point is obtained.

Taking the first road image as an example, referring to fig. 6b, fig. 6b is another scene schematic diagram for determining the first image feature point provided in the embodiment of the present application. For the first road image shown in fig. 2, all image feature points of the first road image may be determined in advance, and the interference image elements (vehicle and watermark) in the first road image may be determined. Based on the image feature points, the image feature points of the first road image, which are located outside the area where the vehicle and the watermark are located, can be determined as the first image feature points of the first road image.

Further, for each road image corresponding to the second travel track, a feature pair corresponding to the road image and the first road image may be determined, where the feature pair includes a first image feature point and a second image feature point matching the first image feature point. And if the number of the determined feature pairs is larger than a first threshold value, determining that the road image is a second road image matched with the first road image.

As shown in fig. 7, fig. 7 is a schematic view of a scene for determining a feature pair between road images according to an embodiment of the present application. After removing the interference image elements and the image feature points of the area where the interference image elements are located from the first road image shown in fig. 2, a first road image including the first image feature points in fig. 7 is obtained. The second road image in fig. 7 is a road image corresponding to the vehicle in the same travel track at different times, and after removing the image feature points of the area where the interference image elements are located, the second road image including the second image feature points in fig. 7 is obtained. It can be easily found from fig. 7 that there is no second image feature point in the second road image that matches the first image feature point corresponding to the direction indicator in the first road image, and the first image feature points corresponding to the road edge and the speed limit indicator in the first road image are respectively matched with the second image feature points corresponding to the road edge and the speed limit indicator in the second road image, so that the feature pairs corresponding to the first road image and the second road image are not difficult to obtain.

In some possible embodiments, when determining each road image captured by the capturing device along the second travel track, the initial images captured by the capturing device along the second travel track may be determined, and the road image captured by the capturing device along the second travel track may be further screened from each initial image based on the positioning information of the capturing device when capturing each initial image.

Specifically, the positioning information of each initial image captured by the capturing device along the second travel track can be determined, and then the second positioning point and the travel direction of each initial image captured by the capturing device along the second travel track can be determined based on the positioning information. Further determining an initial image meeting preset conditions as a road image shot by the shooting device along the second travel track, wherein the preset conditions comprise at least one of the following conditions:

the distance between the second positioning point of the shooting device and the second travelling track is smaller than a fifth threshold value;

the traveling direction of the photographing apparatus is deteriorated by less than the sixth threshold compared to the direction of the second traveling locus.

That is, for any initial image captured by the capturing device along the second travel track, if the straight-line distance between the second positioning point and the second travel track when the capturing device captures the initial image is less than the fifth threshold, and/or the deviation of the travel direction of the capturing device when capturing the initial image compared with the direction of the second travel track is less than the sixth threshold, the initial image may be determined as a road image captured by the capturing device along the second travel track.

If the straight-line distance between the second behavior point and the second travel track when the shooting device shoots the initial image is not less than the fifth threshold, and/or the deviation of the travel direction of the shooting device when the shooting device shoots the initial image compared with the direction of the second travel track is not less than the sixth threshold, it indicates that the initial image is probably not the image shot in the normal travel process, such as the road image shot when the automobile data recorder is adjusted, or the self-timer image of a pedestrian in the travel process.

The fifth threshold and the sixth threshold may be specifically determined based on requirements of an actual application scenario, and are not limited herein.

The linear distance between the second positioning point and the second travel track is a projection distance from the second positioning point to the second travel track, the travel direction of the shooting device when shooting any initial image is different from the direction deviation of the second travel track, the travel direction of the shooting device when shooting the initial image is different from the direction deviation of the travel direction of the corresponding travel point in the second travel track, and the travel point is a projection point from the second positioning point to the second travel track when the shooting device shoots the initial image.

Alternatively, when determining each road image captured by the capturing device along the second travel track, the selection may be performed based on actual selection requirements from each initial image captured by the capturing device along the second travel track, such as determining at least one initial image located at a preset position in the initial images captured by the capturing device along the second travel track as the road image captured by the capturing device, or determining at least one initial image including preset image elements (such as preset buildings, preset traffic signs, and the like) as the road image captured by the capturing device, or determining the initial image captured within a preset time period as the road image captured by the capturing device, and the like, and the determination may be specifically based on actual application scene requirements, and is not limited herein.

In some possible embodiments, after determining the depth map corresponding to the first road image, a prediction region of the first image element in the first road image in the second road image may be determined. The first image element is any image element in the first road image, and may be a preset image element, or an image element determined based on a user selection operation, and may specifically be determined based on a requirement of an actual application scene, which is not limited herein.

Wherein the first image element in the first road image is further determinable based on a target detection network, and the target detection network is also a neural network model. The road image can be processed through the feature processing network to obtain the image features of the road image, and then the interference image elements in the road image are determined based on the target detection network. Referring to fig. 8, fig. 8 is a schematic view of a scene for determining a first image element according to an embodiment of the present application. If the first image element in the embodiment of the present application is a sign, the target detection network may determine the sign in the first road image shown in fig. 2 and use the sign as the first image element in the first road image.

The target detection network can be obtained based on training of a plurality of sample road images. During specific training, the image characteristics of each sample road image can be input into the initial target detection network, so as to obtain predicted image elements of each sample road image. Each sample road image is marked with a sample label, and each sample label represents an actual image element in the corresponding road image. And then determining a training loss value (hereinafter referred to as a third training loss value for convenience of description) based on predicted image elements and actual image elements obtained by the initial target detection network, and performing iterative training on the initial target detection network based on the third training loss value and each sample road image until the third training loss value meets training end conditions, and determining the network at the end of training as a final target detection network.

And detecting image elements in the first road image based on the trained target detection network, and determining the first image elements from the image elements. If the image elements in each sample road image are the first image elements, the first image elements in the first road image can be directly detected based on the target detection network obtained through training.

Optionally, the target detection network and the interference identification network may be the same type detection network, the type of each pixel point in the road image may be determined by the type detection network, and then different types of image elements in the road image may be determined based on the type of each pixel point, and then the first image element and the interference image element in the road image may be determined therefrom.

Optionally, the input of the target detection network, the interference identification network, and the depth of field estimation network may be image features output by the same feature processing network, that is, the target detection network, the interference identification network, and the depth of field estimation network are respectively next-stage networks of the feature processing network. Taking the first road image as an example, referring to fig. 9a, fig. 9a is a schematic structural diagram of a neural network provided in the embodiment of the present application. As shown in fig. 9a, the image feature of the first road image can be obtained after the first road image is processed by the feature processing network, the first image element in the first road image can be obtained after the image feature is processed by the target detection network, the interference image element in the first road image can be obtained after the image feature is processed by the interference identification network, and the depth map corresponding to the first road image can be obtained after the image feature is processed by the depth estimation network.

Optionally, referring to fig. 9b, fig. 9b is another schematic structural diagram of the neural network provided in the embodiment of the present application. As shown in fig. 9b, also taking the first road image as an example, the target detection network, the interference identification network, and the depth estimation network may be independent neural network models, that is, all of them may independently perform feature processing on the first road image to obtain image features of the first road image, and further perform processing on the obtained image features to obtain first image elements, interference image elements, and a depth map of the first road image. Moreover, when the target detection network, the interference recognition network, and the depth of field estimation network are respectively independent neural network models, feature processing networks when the target detection network, the interference recognition network, and the depth of field estimation network perform feature processing on the first road image may be the same or different, and are not limited herein.

In some possible embodiments, after determining the first image element in the first road image, a specific manner of determining the prediction region of the first image element in the second road image may be as shown in fig. 10. Fig. 10 is a flowchart of a method for determining a prediction area according to an embodiment of the present application, where the method for determining a prediction area shown in fig. 10 may specifically include the following steps:

step S101, determining a target area in the depth map based on the first image element.

In some possible embodiments, the target region includes third image feature points in the first image feature points corresponding to the first image element, the number of the third image feature points is greater than the second threshold, and the difference between the image depths corresponding to any two third image feature points is smaller than the third threshold.

In order to avoid that the difference between the image depth of some pixel points (i.e., image feature points) in the depth map determined based on the depth-of-field estimation network and the actual image depth is large, the image depth corresponding to each first image feature point in the region where the first image element is located may be determined first, and the first image feature points with the same or similar image depths (the difference value is smaller than the third threshold value) are used as the third image feature points.

And each first image feature point of the area where the first image element is located has a second image feature point matched with the first image feature point in the second road image.

Further, the region corresponding to the previously determined third image feature point is determined as the initial region, and if the number of the third image feature points in the initial region is small, the information related to the first image element is small, so that an error may exist in the prediction region of the first image element in the second road image determined based on the initial region. Based on the above, the number of the third image feature points in the initial region can be determined, if the number is smaller than the second threshold, the initial region is expanded until the number of the third image feature points is larger than the second threshold and the difference between the image depths corresponding to the third image feature points is smaller than the third threshold, the expansion is stopped, and the initial region when the expansion is stopped is determined as the target region in the depth map.

When the initial area is expanded, image feature points which have an image depth difference value not less than a third threshold value and do not have a second image feature point matched with the image depth difference value in the initial area and do not exist in the second road image are excluded and are not included in the initial area. That is, the newly added third image feature point in the initial region satisfies the following condition:

the difference value between the image depth and the image depth of each third image feature point in the initial region is smaller than a third threshold value;

a second image feature point matched with the second road image exists in the second road image;

and disturbing other image characteristic points except the image characteristic points corresponding to the image elements.

The shape and the expansion manner of the initial region, and the second threshold and the third threshold may be determined based on the actual application scene requirements, and are not limited herein.

Referring to fig. 11, fig. 11 is a schematic diagram of a depth map provided by an embodiment of the present application. Fig. 11 shows a depth map corresponding to the first road image, where the image depth of a pixel point can visually reflect the distance between the corresponding image element and the shooting device, and the larger the image depth of the pixel point is, the farther the corresponding image element is from the shooting device is, and the image depths of the pixel points in different areas are different. A pixel point corresponding to a pixel point with an image depth of 12.8 as in fig. 11 corresponds to an image element closer to the photographing device than a pixel point corresponding to a pixel point with an image depth of 17.3.

Meanwhile, the image depths of the pixel points in the same region are often different slightly, for example, there is no large difference between the image depths corresponding to the first image features in the region where the first image element is located. As shown in fig. 11, the area where the first image element is located includes 3 first image feature points, and the image depths in the depth map are 12.8, 12.7, and 13, respectively, so that it can be seen that the distances between the image elements corresponding to the 2 first image feature points and the shooting device are close.

Based on this, for the initial region, most or all of the third image feature points in the initial region are the first image feature points corresponding to the first image elements, so that the difference between the image depths corresponding to any two third image feature points is smaller than a third threshold, that is, the difference between the image depths corresponding to any two third image feature points is within a certain range, and when the third threshold is smaller, the difference indicates that the difference between the image depths corresponding to any two third image feature points is smaller.

Referring to fig. 12, fig. 12 is a schematic view of a scene for determining a target area according to an embodiment of the present application. As shown in fig. 12, assuming that the second threshold value is 9 and the number of the third image feature points in the initial region is only 7, the initial region needs to be expanded to encompass more third image feature points. And when the number of the third image characteristic points in the initial area is more than 9, stopping expansion, and determining the initial area when the expansion is stopped as the target area in the depth map under the condition that the difference value of the image depth among the third image characteristic points in the initial area is less than a third threshold value.

Based on the implementation manner, the target area can comprise most first image feature points corresponding to the first image elements, and the depth errors of partial pixel points in the depth map caused by noise and other factors can be reduced.

Step S102, based on the target area, a prediction area of the first image element in the second road image is determined.

In some possible embodiments, after the target area in the depth map is determined, the second image feature points in the second road image that match the third image feature points in the target area may be determined. And further determining an area corresponding to each second image feature point matched with each third image feature point, and determining the area as a prediction area of the first image element in the first road image in the second road image. I.e. the prediction area in the second road image may comprise the same image elements as the first image elements.

Optionally, target feature pairs corresponding to the target region may be determined, where each target feature pair includes a third image feature point in the target region and a second image feature point in the second road image that matches the third image feature point.

Further, a transfer matrix (i.e., a homography matrix) is determined based on the position information of each third image feature point and the corresponding second image feature point in each target feature pair. The transfer matrix represents the position transformation relationship between each third image feature point in the target feature pair and the corresponding second image feature point, that is, for any third image feature point in the target feature pair, the coordinates of the corresponding second image feature point in the second road image can be determined based on the coordinates of the third image feature point in the first road image and the transfer matrix.

Wherein, a linear transformation of the transfer matrix with respect to the three-dimensional homogeneous vector can be performed by using a 3 x 3 non-singular matrixHAnd (4) showing. Assume that a target feature pair includes a third image feature pointp ₁(x ₁,y ₁1) and second image feature pointsp ₂(x ₂,y ₂1), the transfer matrix isHThen, there are:

；

the transfer matrix may be determined based on coordinates of a third image feature point and a second image feature point of a plurality of target feature pairs (e.g., 4 pairs of matched image feature points). The specific computing process may be based on cloud computing or a related function, such as a findhomograph () function solution in an opencv software package, which is not limited herein.

When points in a scene are all on one plane, the positional relationship between two different imaging planes of the points in the scene can be represented by a transition matrix. As shown in fig. 13, fig. 13 is a schematic view of planar imaging provided by the embodiment of the present application. A plane may have image 1 in plane 1 and image 2 in plane 2 in different mapping modes. The positional relationship of the same point in the image 1 and the image 2 can be determined based on the matching points in the plane 1 and the plane 2, that is, the position of any point in the image 1 in the image 2 can be determined based on the transfer matrix.

The target area comprises most or all first image feature points in the first image element, and the image depths of all pixel points (or all image feature points) corresponding to the first image element are the same or close to each other. Therefore, for the first image element, each pixel point of the first image element can be regarded as a pixel point in the same plane, so that the predicted position of each pixel point of the first image element in the second road image can be determined based on the position of each pixel point of the first image element and the transfer matrix, and the predicted region of the first image element in the second road image can be determined based on the predicted position of each pixel point of the first image element in the second road image.

Or, a target pixel point capable of representing the contour of the first image element can be determined from all pixel points of the first image element, and then the predicted position of each target pixel point in the second road image is determined based on the position of each target pixel point and the transfer matrix, so that the predicted region of the first image element in the second road image is determined based on the predicted position of each target pixel point in the second road image.

It should be noted that, if the first road image includes a plurality of first image elements, each first image element in the first road image corresponds to an independent basic matrix. That is, for each first image element, it is necessary to determine a target region of the first image element in the depth map, and determine a corresponding basic matrix according to the target region, so as to determine a prediction region of the first image element in the second road image based on the basic matrix.

In some feasible embodiments, since the image depths corresponding to the third image feature points in the target region may not be completely the same, the determined transfer matrix may not completely represent the position transformation relationship between each third image feature point and the corresponding second image feature point, so that a certain error exists between the predicted region and the real region of the first image element in the second road image.

Based on this, a basis matrix may be determined based on the feature pairs in the first road image, the predicted positions corresponding to the respective pixel points of the first image element may be adjusted according to the basis matrix, and the predicted region of the first image element in the second road image may be determined based on the adjusted predicted positions. That is, if the first image element exists in the second road image, the position of the first image element is located in the prediction area.

The basic matrix represents the mapping relation between each first image feature point in each feature pair and the epipolar line of the first image feature point in the second road image, and the predicted position of the first image feature point in each feature pair in the second road image is located above the epipolar line of the first image feature point in the second road image.

As shown in fig. 14, fig. 14 is a scene schematic diagram of a mapping relationship between a pixel point and an epipolar line provided in the embodiment of the present application. In fig. 14, a plane P2 is a position of the photographing apparatusO ₁An imaging plane obtained by photographing a plane P1, the plane P3 being the position of the photographing apparatusO ₂The imaging plane obtained by the photographing plane P1, i.e.O ₁AndO ₂respectively the camera centers of the shooting devices at different shooting angles. Connecting line of camera centerO ₁ O ₂Referred to as the baseline.e ₁Ande ₂base line and plane P2 andthe point of intersection of the P3 lines,e ₁for photographing devices in placeO ₁When shooting the plane P1O ₂At the point of the pixel in the plane P2,e ₂for photographing devices in placeO ₂When shooting the plane P1O ₁A pixel point in plane P2.

Wherein the content of the first and second substances,x ₁for photographing devices in placeO ₁The pixel point of a1 at plane P2 when shooting plane P1,x ₂for photographing devices in placeO ₂When the plane P1 is photographed, a1 is a pixel point on the plane P2. Based on the basis matrices H andx ₁the position of A1 in the plane P3 can be determinedx ₂. The epipolar line is the intersection line of the epipolar plane and the imaging plane, the epipolar plane is the plane passing through the base line, ande ₁、e ₂、O ₁、O ₂、x ₁、x ₂and a1 are in the same plane.

From the above information, it can be seen that, for any one pixel in the planes P2 and P3 obtained by the plane P1 at different viewing angles, there is an epipolar line corresponding to the pixel in the plane P2 in the plane P3. Such as a pixel in plane P2x ₂With the epipolar line in the plane P3l ₁Corresponds to (F)x ₂=l ₁. Wherein F is a basic matrix and represents pixel pointsx ₂With the epipolar line in the plane P3l ₁The mapping relationship of (2).

For point A2 in three-dimensional space, it is compared withO ₁And A1 are in the same line, so the shooting device is in positionO ₁When the plane P1 is photographed, the pixels of a1 and a2 on the plane P2 are the same and the image depths are different. As can be seen from fig. 14, the photographing apparatus is in positionO ₂When the plane P1 is shot, the A1 and the A2 are respectively at the pixel points of the plane P2x ₂Andx ₃still over the same pair of polar lines.

Thus, after determining the base matrix, it may be determined whether the predicted position of each pixel point of the first image element in the second road image is located above the corresponding epipolar line based on the base matrix. If the pixel points are located on the corresponding epipolar lines, the predicted positions of the pixel points are not adjusted, and if the pixel points are located outside the corresponding epipolar lines, the predicted positions of the pixel points are adjusted to be located on the corresponding epipolar lines, so that the predicted positions of the pixel points of the first image element are adjusted. The predicted region of the first image element in the second road image determined based on the adjusted predicted position has higher accuracy.

In FIG. 14, the basis matrix

。

Wherein R and t respectively represent the position of the photographing apparatusO ₂At a position other than that atO ₁Rotation and translation of time, t_aT denotes a transpose, and K denotes an internal parameter of the photographing apparatus, which is determined by a focal length of the photographing apparatus, a height and a width of a pixel point (block) at the time of imaging, and the like.

Based on this, after the predicted position of each pixel point in the first image element in the second road image is determined, the translation information and the rotation information when the second road image is captured by the capturing device can be determined compared with the first road image. The basis matrix is further determined based on the translation information, the rotation information, the focal length of the photographing apparatus, and the plurality of feature pairs.

And determining the number of the characteristic pairs in the basis matrix, wherein the number of the characteristic pairs is greater than a fourth threshold value, and the fourth threshold value is an integer greater than or equal to 7. That is, after determining the translation information, the rotation information, and the focal length of the photographing device when the photographing device photographs the second road image, compared with when the photographing device photographs the first road image, it is necessary that at least 7 pairs of matched image feature points in the first road image and the second road image are used to determine the basic matrix.

The feature pairs used in determining the basis matrix may be image feature points arbitrarily matched in the first road image and the second road image, or may also be feature pairs included in a target region in the depth map, and may specifically be determined based on an actual application scene, which is not limited herein.

Step S13, determining the element type of the first image element, and if the prediction region includes a second image element of the same element type, determining that the first image element and the second image element are the same image element.

In some possible embodiments, the element type of the first image element may be determined after determining the predicted region of the first image element in the second road image. If a second image element of the same element type is included in the prediction region in the second road image, it may be determined that the first image element and the second image element are the same image element in different road images.

The dividing manner of the element types may be determined based on the actual application scene requirements, for example, the element types of the image elements in the road image may be divided into traffic signs, buildings, street lamps, and the like, and the element types of the image elements in the road image may also be divided into road facilities, roadside buildings, billboards, and the like, which is not limited herein.

For example, as shown in fig. 15, fig. 15 is a scene schematic diagram of a prediction area provided in the embodiment of the present application. The first road image is the road image shown in fig. 2, and fig. 15 is the second road image taken by the same drive recorder at different times during the driving process. If the first image element is the speed limit sign in fig. 2 and the predicted area of the second road image also includes the speed limit sign, it may be determined that the speed limit sign in the first image road and the speed limit sign of the second image element are the same speed limit sign.

Optionally, in order to further reduce the image depth of the image feature points and the error caused by the calculation process and improve the fault tolerance rate, the determined prediction region may be expanded to obtain the expanded prediction region. The expansion prediction region includes a prediction region determined previously, and the specific expansion manner is not limited herein.

Further, it may be determined whether a second image element of the same element type as the first image element is included in the expansion prediction region, and if included, it is determined that the second image element in the second road image is the same image element as the first image element.

Based on the method provided by the embodiment of the application, the same image elements in different road images in the same travel track can be associated. As shown in fig. 16a, two road images captured by a vehicle recorder 5 seconds apart during the driving process of a vehicle, the same image element in the two road images can be determined and correlated based on the method provided by the embodiment of the present application.

Optionally, based on the method provided by the embodiment of the present application, the same image element in the road image in different travel tracks may be associated. As shown in fig. 16b, which shows two road images taken when the travel tracks on different dates pass through the same position, the same image element in the two road images can be determined and associated based on the method provided by the embodiment of the present application.

The neural network training, the calculation of the transfer matrix, the calculation of the basis matrix and other data processing processes related to the embodiment of the application can be performed based on computer technology, cloud computing and other modes. The cloud Computing is a product of development and fusion of traditional computers and Network Technologies, such as Grid Computing (Grid Computing), Distributed Computing (Distributed Computing), Parallel Computing (Parallel Computing), Utility Computing (Utility Computing), Network Storage (Network Storage Technologies), Virtualization (Virtualization), Load balancing (Load Balance), and the like, and the efficiency of data processing and Computing in the embodiment of the application can be improved based on the cloud Computing.

In the embodiment of the application, the sample road image used for training the neural network, the image element matching result, and each road image corresponding to each travel track may be stored in a Database (Database), a cloud storage (cloud storage), or a block chain (Blockchain), and may be specifically determined based on the actual application scene requirements, which is not limited herein.

The database can be regarded as an electronic file cabinet, namely a place for storing electronic files, and can be used for storing multi-person scene images in the application. The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A blockchain is essentially a decentralized database, a string of data blocks that are associated using cryptography. In the present application, each data block in the block chain may store the above information and the road image. Cloud storage is a new concept extended and developed on the cloud computing concept, and refers to that a large number of storage devices (storage devices are also called storage nodes) of various different types in a network are gathered through application software or application interfaces to cooperatively work through functions such as cluster application, a grid technology, a distributed storage file system and the like, and the road images are stored together.

In the embodiment of the application, because the depth map corresponding to the first road image can be used for representing the distance between each pixel point in the first road image and the shooting device, the predicted position of the first image element in the first road image in the second road image can be accurately and efficiently determined through the transfer matrix under the condition that the distance between the image element and the shooting device is fully considered through the depth map corresponding to the first road image. And by judging the element type of the second image element in the prediction region, the second image element in the prediction region and the first image element in the first road image can be further ensured to be the same image element under the condition that the prediction region comprises the image element, the accuracy of image element matching is further improved, and the applicability is high.

Referring to fig. 17, fig. 17 is a schematic structural diagram of an image element matching apparatus provided in an embodiment of the present application. The image element matching device provided by the embodiment of the application comprises:

a determining module 171, configured to determine a depth map corresponding to a first road image, where an image depth of each pixel of the depth map indicates a distance between a corresponding image element and a capturing device, and the first road image is any road image captured by the capturing device along a first travel track;

a prediction module 172, configured to determine a second road image matching the first road image, and determine a prediction region of a first image element in the first road image in the second road image based on the depth map;

the determining module 173 is configured to determine an element type of the first image element, and if the prediction region includes a second image element having the same element type, determine that the first image element and the second image element are the same image element.

In some possible embodiments, the determining module 171 is configured to:

determining the relative distance between image elements corresponding to each pixel point in the first road image;

and determining a depth map corresponding to the first road image based on each of the relative distances and an imaging position of the imaging device at the time of imaging the first road image.

In some possible embodiments, the prediction module 172 is configured to:

determining image characteristic points of the first road image;

determining image characteristic points of each road image shot by the shooting equipment along a second travel track;

a second road image matching the first road image is specified from among the road images corresponding to the second travel track based on the image feature points of the first road image and the image feature points of the road images corresponding to the second travel track.

In some possible embodiments, the prediction module 172 is configured to:

determining interference image elements in the first road image;

for each of the road images corresponding to the second travel locus:

determining an interference image element in the road image;

determining the first road image and feature pairs corresponding to the road image, wherein each feature pair comprises a first image feature point and a second image feature point matched with the first image feature point, any first image feature point is an image feature point outside an area where an interference image element in the first road image is located, and any second image feature point is an image feature point outside the area where the interference image element in the road image is located;

and if the number of the feature pairs is larger than a first threshold value, determining that the road image is a second road image matched with the first road image.

In some possible embodiments, the prediction module 172 is configured to:

determining a target area in the depth map based on the first image element, wherein the target area comprises third image feature points in first image feature points corresponding to the first image element, the number of the third image feature points is greater than a second threshold, and the difference value of image depths corresponding to any two third image feature points is smaller than a third threshold;

based on the target area, a prediction area of the first image element in the second road image is determined.

In some possible embodiments, the prediction module 172 is configured to:

determining each target feature pair corresponding to the target area, wherein each target feature pair comprises one third image feature point and a second image feature point matched with the third image feature point;

determining a transfer matrix based on each target feature pair, wherein the transfer matrix represents the position conversion relationship between each third image feature point and the corresponding second image feature point in each target feature pair;

determining the predicted position of each pixel point of the first image element in the second road image based on the transfer matrix;

based on each of the predicted positions, a predicted region of the first image element in the second road image is determined.

In some possible embodiments, the prediction module 172 is configured to:

determining a basis matrix based on each feature pair, wherein the basis matrix represents the mapping relationship between each first image feature point in each feature pair and the epipolar line corresponding to the second road image, and the predicted position of each first image feature point in each feature pair in the second road image is positioned above the epipolar line corresponding to the first image feature point in the second road image;

based on the basis of the basis matrix, the prediction positions corresponding to the pixel points of the first image element are adjusted, and based on each adjusted prediction position, a prediction region of the first image element in the second road image is determined.

In some possible embodiments, the prediction module 172 is configured to:

determining translation information and rotation information when the second road image is captured by the capturing device compared with the first road image;

and determining a basis matrix based on the translation information, the rotation information, the focal length of the shooting device and each of the feature pairs, wherein the number of the feature pairs in determining the basis matrix is greater than a fourth threshold value.

In some possible embodiments, the determining module 171 is further configured to:

a first fixed point when the photographing apparatus photographs the first road image;

and determining the projection position of the first fixed point corresponding to the first travel track, and determining the projection position as the shooting position of the shooting equipment when shooting the first road image.

In some possible embodiments, the prediction module 172 is configured to:

determining each initial image shot by the shooting equipment along the second travel track;

determining positioning information of the shooting device when shooting each initial image, and determining a second positioning point and a traveling direction of the shooting device when shooting each initial image based on the positioning information;

determining the initial image meeting preset conditions as a road image shot by the shooting device along a second travel track, wherein the preset conditions comprise at least one of the following conditions:

the distance between the second positioning point of the shooting equipment and the second travel track is smaller than a fifth threshold value;

the direction deviation of the moving direction of the shooting device compared with the second moving track is smaller than a sixth threshold value.

In a specific implementation, the image element matching apparatus may execute, through each built-in functional module thereof, the implementation manners provided in each step in fig. 1 and/or fig. 10, which may be referred to specifically for the implementation manners provided in each step, and are not described herein again.

Referring to fig. 18, fig. 18 is a schematic structural diagram of an electronic device provided in an embodiment of the present application. As shown in fig. 18, the electronic device 1800 in the present embodiment may include: the processor 1801, the network interface 1804, and the memory 1805, wherein the electronic device 1800 further includes: a user interface 1803, and at least one communication bus 1802. A communication bus 1802 is used to enable, among other things, connectivity communication between these components. The user interface 1803 may include a Display screen (Display) and a Keyboard (Keyboard), and the selectable user interface 1803 may also include a standard wired interface and a standard wireless interface. The network interface 1804 may optionally include a standard wired interface, a wireless interface (e.g., a WI-FI interface). The memory 1804 may be a high-speed RAM memory or a non-volatile memory (NVM), such as at least one disk memory. The memory 1805 may optionally be at least one storage device located remotely from the processor 1801. As shown in fig. 18, the memory 1805, which is a type of computer-readable storage medium, may include therein an operating system, a network communication module, a user interface module, and a device control application program.

In the electronic device 1800 shown in fig. 18, the network interface 1804 may provide network communication functions; and user interface 1803 is primarily an interface for providing input to a user; and the processor 1801 may be configured to invoke a device control application stored in the memory 1805 to implement:

In some possible implementations, the processor 1801 is configured to:

determining image characteristic points of the first road image;

In some possible implementations, the processor 1801 is configured to:

determining interference image elements in the first road image;

for each of the road images corresponding to the second travel locus:

determining an interference image element in the road image;

In some possible implementations, the processor 1801 is configured to:

In some possible implementations, the processor 1801 is further configured to:

It should be understood that in some possible implementations, the processor 1801 may be a Central Processing Unit (CPU), and the processor may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), field-programmable gate arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and so on. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The memory may include both read-only memory and random access memory, and provides instructions and data to the processor. The portion of memory may also include non-volatile random access memory. For example, the memory may also store device type information.

In a specific implementation, the electronic device 1800 may execute, through each built-in functional module thereof, the implementation manners provided in each step in fig. 1 and/or fig. 10, which may be referred to specifically for the implementation manners provided in each step, and are not described herein again.

An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and is executed by a processor to implement the method provided in each step in fig. 1 and/or fig. 10, which may specifically refer to implementation manners provided in each step, and details of which are not described herein again.

The computer-readable storage medium may be the image element matching apparatus provided in any of the foregoing embodiments or an internal storage unit of an electronic device, such as a hard disk or a memory of the electronic device. The computer readable storage medium may also be an external storage device of the electronic device, such as a plug-in hard disk, a Smart Memory Card (SMC), a Secure Digital (SD) card, a flash card (flash card), and the like, which are provided on the electronic device. The computer readable storage medium may further include a magnetic disk, an optical disk, a read-only memory (ROM), a Random Access Memory (RAM), and the like. Further, the computer readable storage medium may also include both an internal storage unit and an external storage device of the electronic device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the electronic device. The computer readable storage medium may also be used to temporarily store data that has been output or is to be output.

The embodiment of the present application provides a computer program product, which includes a computer program or computer instructions, and when the computer program or the computer instructions are executed by a processor, the voice playing method provided by the embodiment of the present application performs the method provided by each step in fig. 1 and/or fig. 10.

The terms "first", "second", and the like in the claims and in the description and drawings of the present application are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or electronic device that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or electronic device. Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments. The term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not intended to limit the scope of the present application, which is defined by the appended claims.

Claims

1. A method of image element matching, the method comprising:

determining a second road image matched with the first road image, and determining a plurality of target feature pairs, wherein each target feature pair comprises a third image feature point corresponding to a first image element in the first road image and a second image feature point matched with the third image feature point in the second road image, the number of the third image feature points corresponding to the first image element is greater than a second threshold, and the difference value of the image depths corresponding to any two third image feature points is less than a third threshold;

determining a transfer matrix based on each target feature pair, wherein the transfer matrix represents the position transformation relation between each third image feature point and the corresponding second image feature point;

determining a predicted position of each pixel point of the first image element in the second road image based on the transition matrix, and determining a predicted area of the first image element in the second road image based on each predicted position;

determining an element type of the first image element, and if the prediction region includes a second image element of the same element type, determining that the first image element and the second image element are the same image element.

2. The method of claim 1, wherein determining the depth map corresponding to the first road image comprises:

and determining a depth map corresponding to the first road image based on each relative distance and the shooting position of the shooting device when the first road image is shot.

3. The method of claim 1, wherein determining the second road image that matches the first road image comprises:

determining image feature points of the first road image;

determining image feature points of each road image shot by the shooting equipment along a second travel track;

determining a second road image matching the first road image from among the road images corresponding to the second travel locus based on the image feature points of the first road image and the image feature points of the road images corresponding to the second travel locus.

4. The method according to claim 3, wherein the determining, from the road images corresponding to the second travel track, a second road image that matches the first road image based on the image feature points of the first road image and the image feature points of the road images corresponding to the second travel track, comprises:

determining an interfering image element in the first road image;

for each of the road images corresponding to the second travel track:

determining an interference image element in the road image;

5. The method of claim 4, wherein determining the predicted region of the first image element in the second road image based on each of the predicted positions comprises:

determining a basic matrix based on each feature pair, wherein the basic matrix represents the mapping relation between each first image feature point in each feature pair and an epipolar line corresponding to the second road image, and the predicted position of each first image feature point in each feature pair in the second road image is positioned above the epipolar line corresponding to the first image feature point in the second road image;

and adjusting the prediction position corresponding to each pixel point of the first image element based on the basic matrix, and determining the prediction area of the first image element in the second road image based on each adjusted prediction position.

6. The method of claim 5, wherein determining a basis matrix based on each of the feature pairs comprises:

determining translation information and rotation information when the second road image is shot by the shooting device compared with the first road image;

determining a basis matrix based on the translation information, the rotation information, the focal length of the photographing apparatus, and each of the feature pairs, wherein the number of feature pairs in determining the basis matrix is greater than a fourth threshold.

7. The method of claim 2, further comprising:

a first positioning point when the shooting equipment shoots the first road image;

determining a projection position of the first positioning point corresponding to the first travel track, and determining the projection position as a shooting position of the shooting device when the first road image is shot.

8. The method of claim 3, further comprising:

determining each initial image shot by the shooting device along the second travel track;

the distance between a second positioning point of the shooting device and the second travelling track is smaller than a fifth threshold value;

the direction deviation of the travel direction of the photographing apparatus compared to the second travel trajectory is less than a sixth threshold.

9. An image element matching apparatus, characterized in that the apparatus comprises:

the prediction module is used for determining a second road image matched with the first road image and determining a plurality of target feature pairs, wherein each target feature pair comprises a third image feature point corresponding to a first image element in the first road image and a second image feature point matched with the third image feature point in the second road image, the number of the third image feature points corresponding to the first image element is greater than a second threshold, and the difference value of image depths corresponding to any two third image feature points is less than a third threshold;

the prediction module is used for determining a transfer matrix based on each target feature pair, wherein the transfer matrix represents the position transformation relation between each third image feature point and the corresponding second image feature point;

the prediction module is configured to determine, based on the transition matrix, prediction positions of pixels of the first image element in the second road image, and determine, based on the prediction positions, a prediction region of the first image element in the second road image;

and the judging module is used for determining the element type of the first image element, and if the prediction region comprises a second image element with the same element type, determining that the first image element and the second image element are the same image element.

10. An electronic device comprising a processor and a memory, the processor and the memory being interconnected;

the memory is used for storing a computer program;

the processor is configured to perform the method of any of claims 1 to 8 when the computer program is invoked.

11. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which is executed by a processor to implement the method of any one of claims 1 to 10.