CN117671619A

CN117671619A - Traffic scene image matching method and system based on deep learning

Info

Publication number: CN117671619A
Application number: CN202311793364.5A
Authority: CN
Inventors: 张永; 李凡平; 石柱国
Original assignee: ISSA Technology Co Ltd
Current assignee: ISSA Technology Co Ltd
Priority date: 2023-12-22
Filing date: 2023-12-22
Publication date: 2024-03-08

Abstract

The disclosure provides a traffic scene image matching method and system based on deep learning, wherein the method comprises the following steps: acquiring an original image of a traffic scene, preprocessing the original image to obtain a preprocessed reference image and an image to be matched, respectively inputting a deep learning model, and outputting a first reference feature and a first feature to be matched; calculating the similarity of the two features, and if the similarity is larger than a set threshold, successfully matching the images; otherwise, dividing the reference image and the image to be matched to obtain a reference image subset and an image subset to be matched; respectively inputting subgraphs in each image subset into a deep learning model, and outputting a second reference feature subset and a second feature subset to be matched; and calculating the similarity of the sub-features in the second reference feature subset and the second feature subset to be matched, if the similarity of the three groups of sub-features is larger than a set threshold value, successfully matching the images, otherwise, unsuccessfully matching the images. And the accuracy of image matching is improved by considering the foreground duty factor in the model training and matching stage.

Description

Traffic scene image matching method and system based on deep learning

Technical Field

The invention relates to the technical field of image processing, in particular to a traffic scene image matching method and system based on deep learning.

Background

In the field of image processing, image matching refers to a method for searching out images identical or similar to query images, and the method is widely applied to the fields of target tracking, face recognition or automatic driving or quality detection and the like, and brings great convenience to life and work of users. However, the inventor finds that the current image matching method does not consider the influence of different image foreground ratios on the accuracy of image matching.

The application number is 202111058412.7, the name is a patent of an open set image scene matching method based on deep learning, and the defect that an untrained scene cannot be identified by using a deep learning method is avoided by utilizing similarity; however, when the matching model is trained, training data sets covering different image scenes are directly imported into the deep learning model, and the influence on the final matching result caused by different foreground ratio of images under the same scene is not considered.

The application number is 201711129595.0, the name is a SURF-based image matching method, and the detection range of the feature points is improved by introducing angular point detection; in the image matching stage, angular point detection is performed on the whole image of the reference image and the image to be matched in the traffic scene by using a traditional method, and only calculating the similarity of the whole image can lead to incapability of capturing detailed information, and the characteristics of key points such as extracted angular points are similar, so that the matching precision can not be ensured.

The image matching of the traffic scene is mainly used for identifying people, non-motor vehicles and motor vehicles, and has high matching precision requirement. Therefore, how to improve the accuracy of matching traffic scene images is a current urgent problem to be solved.

Disclosure of Invention

In order to solve the problems, the invention provides a traffic scene image matching method and system based on deep learning. In the method, in a model training stage, foreground features of an image with large foreground occupation ratio are fully considered, so that the foreground features are used as priori knowledge, and the matching performance and the robustness of the model are improved; in the image matching stage, under the condition that the certainty factor of the result of the original image similarity calculation is not very high, the influence of the foreground on the image matching is eliminated as much as possible by splitting the original image, and the image matching precision is improved.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

in a first aspect, the present invention provides a traffic scene image matching method based on deep learning, including:

acquiring an original reference image and an original image to be matched of a traffic scene, and preprocessing to obtain a preprocessed reference image and an preprocessed image to be matched;

respectively inputting the preprocessed reference image and the image to be matched into a trained deep learning model, and outputting a first reference feature and a first feature to be matched; calculating the similarity of the first reference feature and the first feature to be matched, and if the similarity is larger than a set threshold, successfully matching the images;

otherwise, dividing the reference image and the image to be matched to obtain a reference image subset and an image subset to be matched; the image subsets each comprise four subgraphs;

respectively inputting subgraphs in each image subset into a trained deep learning model, and outputting a second reference feature subset and a second feature subset to be matched; the feature subsets each comprise four sub-features; and calculating the similarity of the sub-features in the second reference feature subset and the second feature subset to be matched, if the similarity of the sub-features in the three groups is larger than a set threshold, successfully matching the images, otherwise, unsuccessfully matching the images.

Preferably, the pretreatment comprises: and unifying the sizes of the original reference image and the original image to be matched by using a random horizontal overturn image, a random vertical overturn image and a random rotation data enhancement mode.

Preferably, the training process of the trained deep learning model is as follows:

acquiring a traffic scene sample image set, wherein the sample image set comprises an example graph, a positive sample image and a negative sample image;

preprocessing a positive sample image and a negative sample image to obtain foreground area occupation ratio information of each image;

and inputting the sample image set and the foreground area ratio information of each image into a deep learning model, calculating the triplet loss, and obtaining a trained deep learning model when the triplet loss is minimum.

Preferably, the specific process of preprocessing the positive sample image and the negative sample image to obtain the foreground area occupation ratio information of each image is as follows:

using a target detection network to obtain the coordinates of the foreground in each of the positive sample image and the negative sample image; the prospects include people, non-motor vehicles and motor vehicles;

and calculating and summing the areas of the foreground in each image based on the acquired coordinates, wherein the ratio of the foreground area to the whole image area is the foreground area occupation ratio information.

Preferably, the triplet loss is:

L＝max(d(a,p)*e ^-x -d(a,n)+margin,0)

where d represents the distance between two images, a represents an example graph, p represents a positive sample image, i.e., an image of the same scene as a, n represents a negative sample image, i.e., an image of the different scene from a, margin represents a constant greater than 0, and x is foreground area ratio information.

Preferably, the trained deep learning model is a twin neural network.

Preferably, the similarity threshold is 0.9.

In a second aspect, the present invention provides a traffic scene image matching system based on deep learning, including:

and a pretreatment module: acquiring an original reference image and an original image to be matched of a traffic scene, and preprocessing to obtain a preprocessed reference image and an preprocessed image to be matched;

and a primary calculation module: respectively inputting the preprocessed reference image and the image to be matched into a trained deep learning model, and outputting a first reference feature and a first feature to be matched; calculating the similarity of the first reference feature and the first feature to be matched, and if the similarity is larger than a set threshold, successfully matching the images;

and a segmentation module: otherwise, dividing the reference image and the image to be matched to obtain a reference image subset and an image subset to be matched; the image subsets each comprise four subgraphs;

and a secondary calculation module: respectively inputting subgraphs in each image subset into a trained deep learning model, and outputting a second reference feature subset and a second feature subset to be matched; the feature subsets each comprise four sub-features; and calculating the similarity of the sub-features in the second reference feature subset and the second feature subset to be matched, if the similarity of the sub-features in the three groups is larger than a set threshold, successfully matching the images, otherwise, unsuccessfully matching the images.

In a third aspect, the present invention provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps in the above-described deep learning based traffic scene image matching method.

In a fourth aspect, the present invention provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps in the method for matching traffic scene images based on deep learning.

Compared with the prior art, the beneficial effects of the present disclosure are:

(1) In the invention, when a matched model is trained, in order to enable the model to fully capture foreground information and improve the performance of the model, the contribution of pictures with large foreground duty ratio to a loss function is considered by adding the foreground duty ratio information in triplet loss (tripleloss); the picture foreground duty ratio information is transmitted to the network as priori knowledge, and the contribution of pictures with different foreground duty ratios to loss is dynamically adjusted, so that a trained deep learning model can output more reliable results, the problem that the results are inaccurate when the cosine similarity is calculated by the final extracted features due to different image foreground duty ratios can be effectively solved, and the robustness of the model is improved.

(2) In the image matching stage, under the condition that the certainty factor of the similarity calculation results of the two pictures is not very high, the method and the device perform secondary matching, and are beneficial to capturing detail information by splitting the original pictures. And each sub-image of the reference image subset corresponds to the position relation of each sub-image of the image subset to be matched one by one, sub-image features are respectively extracted, similarity is calculated based on the position relation, accuracy of similarity calculation is improved, influences of different foreground occupation ratios on matching are eliminated as much as possible, and accuracy of image matching is improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate and explain the exemplary embodiments of the disclosure and together with the description serve to explain and do not limit the disclosure.

Fig. 1 is a schematic flow chart of a traffic scene image matching method based on deep learning according to an embodiment of the disclosure;

fig. 2 is a flow chart of matching traffic scene images according to an embodiment of the present disclosure.

Detailed Description

The invention will be further described with reference to the drawings and examples.

In the model training stage, as the sample set contains different sample images, the foreground of each image has different duty ratio relative to the whole image, and when the images with small foreground duty ratio are used for training, the model cannot fully learn the foreground features, so that when the trained model is used for image matching, whether the images are matched cannot be accurately identified. Meanwhile, in the image matching stage, when matching the picture input models with different foreground duty ratios, the influence of the foreground duty ratios on the similarity calculation is ignored only based on the similarity calculation of the whole picture, and the matching result is inaccurate. In order to solve the above-described problems, the present disclosure proposes the following solutions:

example 1

As shown in fig. 1, the embodiment discloses a traffic scene image matching method based on deep learning, and as shown in fig. 2, an image matching flow chart is shown, and the method comprises the following steps:

s1: and acquiring an original reference image and an original image to be matched of the traffic scene, and preprocessing to obtain a preprocessed reference image and an preprocessed image to be matched.

In a specific embodiment, the preprocessing includes: and unifying the sizes of the original reference image and the original image to be matched by using a random horizontal overturn image, a random vertical overturn image and a random rotation data enhancement mode.

S2: respectively inputting the preprocessed reference image and the image to be matched into a trained deep learning model, and outputting a first reference feature and a first feature to be matched; and calculating the similarity of the first reference feature and the first feature to be matched, and if the similarity is larger than a set threshold value, successfully matching the images.

Specifically, the deep learning model is a feature extraction network for extracting features. The back bone of the feature extraction network is a twin network formed by repVGG, the twin network is composed of two identical sub-networks, each sub-network receives an input, and the input is mapped to a feature vector in a feature space through learning. The two sub-networks are identical in structure and share parameters.

Before training the deep learning model, a traffic scene sample image set is first preprocessed. The traffic scene sample image set comprises an example graph, a positive sample image and a negative sample image.

And detecting the foreground coordinates of each image of the positive sample image and the negative sample image by using a target detection network, calculating the foreground area of each image according to the obtained coordinates, summing, dividing the foreground area by the whole image, taking the divided foreground area as foreground area ratio information, and storing the result. It should be understood that the type of the target detection network is not particularly limited in this embodiment, and those skilled in the art may select according to actual needs. The image matching of the traffic scene is mainly used for identifying people, non-motor vehicles and motor vehicles, so that the foreground in the matched images is people, non-motor vehicles and motor vehicles.

And inputting the sample image set and the foreground area ratio information of each image into a deep learning model, performing gradient feedback by using the improved triplet loss function, and obtaining a trained deep learning model when the triplet loss is minimum. The triplet loss function expression is:

L＝max(d(a,p)-d(a,n)+margin,0) (1)

L＝max(d(a,p)*e ^-x -d(a,n)+margin,0) (2)

equation (1) is a conventional triplet loss function, where d represents the distance between two images, a represents an example graph, p represents a positive sample image, i.e., an image of the same scene as a, n represents a negative sample image, i.e., an image of the different scene than a, and margin represents a constant greater than 0.

Equation (2) is an improved triplet loss function proposed by this embodiment, taking into account the impact of the foreground area in the model training process. Wherein x is foreground area ratio information, x is the ratio of foreground to whole picture in picture preprocessing, and the value is 0-1, so that when the foreground area ratio of p is large, d (a, p) e ^-x The smaller the value of the (C) is, the higher the similarity between the two values is, so that the problem that matching is unsuccessful due to large foreground change in a traffic scene can be effectively avoided. Wherein the foreground area ratio of all images is calculated, but the image ratio of the positive sample is calculated only when the loss is calculated. This is because the negative sample image corresponding to this example image, i.e., the positive sample image of the other example image, i.e., all of the images in the dataset, will be used.

In this embodiment, the similarity calculation is a cosine similarity calculation, the threshold value is 0.9, if the similarity is greater than 0.9, the matching of the two pictures is successfully output, and if the two pictures are different, the step S3 is entered.

S3: otherwise, dividing the reference image and the image to be matched to obtain a reference image subset and an image subset to be matched; the image subsets each contain four subgraphs.

In a specific embodiment, the reference image and the image to be matched are respectively equally divided into four sub-images according to a unified division mode to form a reference image subset and an image subset to be matched. Wherein, each sub-image of the reference image subset corresponds to the position relation of each sub-image of the image subset to be matched one by one.

S4: respectively inputting subgraphs in each image subset into a trained deep learning model, and outputting a second reference feature subset and a second feature subset to be matched; the feature subsets each comprise four sub-features; and calculating the similarity of the sub-features in the second reference feature subset and the second feature subset to be matched, if the similarity of the sub-features in the three groups is larger than a set threshold, successfully matching the images, otherwise, unsuccessfully matching the images.

In a specific embodiment, the calculated reference image subset and the image subset to be matched are respectively input into the trained deep learning model in S2, secondary feature extraction is performed, and a second reference feature subset and a second feature subset to be matched are output.

Because each sub-image of the reference image subset has a one-to-one correspondence with the position of each sub-image of the image subset to be matched, each sub-feature of the second reference feature subset also has a one-to-one correspondence with the position of each sub-feature of the second feature subset to be matched. And calculating the similarity of the cosines of the sub-features at the corresponding positions in the second reference feature subset and the second feature subset to be matched, outputting two scene images to be successfully matched if three groups of sub-feature similarity are larger than a set threshold value of 0.9, otherwise, outputting the scene images to be unsuccessfully matched.

In the stage of training the matched model, in order to improve the capability of the model in recognizing the foreground features, the invention considers the foreground area occupation ratio information in the ternary loss function calculation, increases the contribution of pictures with large foreground occupation ratio to loss, enables the trained deep learning model to output more reliable results, can effectively reduce the influence of the foreground on matching, and improves the robustness of the image matching model. In the image matching stage, under the condition that the certainty factor of the similarity calculation results of the two pictures is not very high, secondary matching is carried out, and the influence of different foreground duty ratios on matching is eliminated as much as possible by splitting the original picture, so that the accuracy of image matching is improved.

Example two

The embodiment provides a traffic scene image matching system based on deep learning, which comprises the following components:

Example III

The present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps in the deep learning-based traffic scene image matching method according to the above embodiment.

Example IV

The present embodiment provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps in the method for matching traffic scene images based on deep learning according to the above embodiment when executing the program.

The steps or modules in the second to fourth embodiments correspond to the first embodiment, and the detailed description of the first embodiment may be referred to in the related description section of the first embodiment. The term "computer-readable storage medium" should be taken to include a single medium or multiple media including one or more sets of instructions; it should also be understood to include any medium capable of storing, encoding or carrying a set of instructions for execution by a processor and that cause the processor to perform any one of the methods of the present invention.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The traffic scene image matching method based on deep learning is characterized by comprising the following steps of:

2. The deep learning based traffic scene image matching method according to claim 1, wherein the preprocessing includes:

and unifying the sizes of the original reference image and the original image to be matched by using a random horizontal overturn image, a random vertical overturn image and a random rotation data enhancement mode.

3. The deep learning-based traffic scene image matching method according to claim 1, wherein the training process of the trained deep learning model is as follows:

4. The traffic scene image matching method based on deep learning as claimed in claim 3, wherein the specific process of preprocessing the positive sample image and the negative sample image to obtain the foreground area ratio information of each image is as follows:

5. The method for matching traffic scene images based on deep learning according to claim 3, wherein the triplet loss is:

L＝max(d(a,p)*e ^-x -d(a,n)+margin,0)

6. The deep learning-based traffic scene image matching method according to claim 1, wherein the trained deep learning model is a twin neural network.

7. The deep learning based traffic scene image matching method according to claim 1, wherein the similarity threshold is 0.9.

8. A deep learning-based traffic scene image matching system, comprising:

9. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the deep learning based traffic scene image matching method as claimed in claims 1-7.

10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps in the deep learning based traffic scene image matching method according to claims 1-7 when the program is executed.