CN110119768B

CN110119768B - Visual information fusion system and method for vehicle positioning

Info

Publication number: CN110119768B
Application number: CN201910332583.0A
Authority: CN
Inventors: 苏岩
Original assignee: Suzhou Gst Infomation Technology Co ltd
Current assignee: Suzhou Gst Infomation Technology Co ltd
Priority date: 2019-04-24
Filing date: 2019-04-24
Publication date: 2023-10-31
Anticipated expiration: 2039-04-24
Also published as: CN110119768A

Abstract

The invention discloses a visual information fusion system and a visual information fusion method for vehicle positioning, wherein the system comprises a vehicle-mounted visual sensor module, an online visual information processing module and an offline training module, wherein the output end of the vehicle-mounted visual sensor module is respectively connected with the input end of the online visual information processing module and the input end of the offline training module in a signal manner, and the offline training module is used for training a proposed deep neural network and copying network parameters obtained in a training stage to the online visual information processing module. The invention can effectively utilize the image information acquired by a plurality of vision sensors in the vehicle-mounted vision sensor module, and through complementation among the image information, the problem that a positioning system cannot work normally under the condition that a single sensor fails is avoided, and the accuracy of vehicle positioning and the reliability of an algorithm are improved.

Description

Visual information fusion system and method for vehicle positioning

Technical Field

The invention relates to a visual information fusion system and a visual information fusion method, in particular to a visual information fusion system and a visual information fusion method for vehicle positioning, and belongs to the field of vehicle navigation or unmanned vehicle positioning.

Background

Vehicle positioning is an important technology in the fields of vehicle navigation and unmanned, and most intelligent vehicles are currently provided with visual sensor modules for completing tasks such as environment sensing, visual navigation, target identification and the like. The conventional vehicle-mounted vision sensor module generally comprises one or more vision sensors, the vision sensors can acquire images around the vehicle, and the images contain rich environment information and are important information sources for sensing the environment of the vehicle carrier. In recent years, thanks to rapid development of computer vision technology, machine learning technology and the like, more and more researchers begin to utilize image information collected by vision sensors to help intelligent vehicles achieve positioning functions.

In general, in the prior art, the manner of implementing the vehicle positioning function by means of visual information can be largely divided into two major categories, namely, a direct method and an indirect method. The direct method is to reconstruct motion information of a vehicle through geometric relations between images, and further calculate the position of the vehicle, such as a visual odometer. The indirect rule is that under the condition that the environment information is known, the visual image information is utilized to search out the information consistent with the current vehicle position in the database through matching or scene recognition with the image in the database, so as to recover the vehicle position or eliminate the positioning error of the vehicle, and improve the positioning precision.

The technical scheme of the invention relates to an indirect method in the two types of methods. Taking this kind of method as an example, although there has been some development and progress in the prior art, it has been found that there are still problems in this technology in practical application. In particular, existing vehicle positioning methods generally rely on single visual sensor information, which has the disadvantage that visual sensors often lose visual information in certain special situations, such as blocked, violent shake, etc., and thus lead to failure of the positioning algorithm.

In summary, how to provide a new system and method for positioning a vehicle based on the prior art, so as to improve accuracy and reliability of positioning the vehicle, is a problem to be solved by those skilled in the art.

Disclosure of Invention

In view of the above-mentioned drawbacks of the prior art, an object of the present invention is to provide a visual information fusion system and method for vehicle positioning, which are as follows.

A visual information fusion system for vehicle positioning, comprising:

the vehicle-mounted visual sensor module is used for acquiring image information in the external environment;

the on-line visual information processing module is used for processing the image information output by the vehicle-mounted visual sensor module and forming a deep neural network model;

the off-line training module is used for training the deep neural network model and copying network parameters obtained in the training stage to the on-line visual information processing module;

the output end of the vehicle-mounted vision sensor module is respectively in signal connection with the input end of the online vision information processing module and the input end of the offline training module.

Preferably, the vehicle-mounted vision sensor module comprises a plurality of vision sensors for acquiring image information in the external environment.

Preferably, the online visual information processing module takes a deep neural network model formed and trained by the offline training module as a basic frame, and comprises an image characterization sub-module and an image feature fusion sub-module;

the image characterization submodule is composed of a multipath convolutional neural network, the input of the image characterization submodule is the image information acquired by the vehicle-mounted vision sensor module, and the output of the image characterization submodule is a high-dimensional image feature vector;

the image feature fusion sub-module is composed of multiple paths of neural networks, each path of the neural network comprises three layers of fully-connected neural networks, wherein the scale of a first layer of fully-connected layers is 512 multiplied by 512, the scale of a second layer of fully-connected layers is 512 multiplied by 128, the scale of a third layer of fully-connected layers is 128 multiplied by 1, the multiple paths of the neural networks are linked by means of Softmax layers to form the deep neural network model, the input of the image feature fusion sub-module is an image feature vector constructed by the image representation sub-module, and the output of the image feature fusion sub-module is a weight value of different images.

Preferably, the offline training module comprises a training set generating sub-module and an end-to-end training sub-module;

the training set generation sub-module is used for generating a training data set of the triplets;

the end-to-end training sub-module is used for training the deep neural network model by using the Triplet training data set generated by the training set generation sub-module.

A visual information fusion method for vehicle positioning, comprising the steps of:

s1, acquiring image information and finishing preprocessing of the image information;

s2, inputting the image information obtained in the S1 into an image characterization submodule, respectively inputting the image information obtained by different visual sensors in the vehicle-mounted visual sensor module into corresponding convolutional neural networks to obtain convolutional layer feature map output, and constructing an image feature vector according to the convolutional layer feature map output;

s3, inputting the image feature vector obtained in the S2 into an image feature fusion submodule, and calculating to obtain a weight value of each image;

s4, calculating the visual similarity between the image information of the current position of the vehicle and the pre-stored image information in the database according to the image feature vector obtained in the S2 and the weight value obtained in the S3;

s5, comparing the visual similarity obtained in the S4 with a preset threshold, if the visual similarity is higher than the preset threshold, judging that the current position of the vehicle is the same as a pre-stored position in a corresponding database, and realizing vehicle positioning, and if the visual similarity is lower than the preset threshold, continuing to search the next group of images.

Preferably, the image information in S1 is acquired by a vehicle-mounted vision sensor module, where the vehicle-mounted vision sensor module is a system including a plurality of vision sensors, and the plurality of vision sensors simultaneously acquire image information of surrounding environments of the vehicle;

counting the number of the vision sensors in the vehicle-mounted vision sensor module as C; time alignment is carried out by adopting the time stamp of the image information, so that a group of images are collected at the same time and the same position, namely, the group of image information represents one position;

each group of image information contains C images, which are marked as { I } ¹ ，I ² ，...，I ^C }。

Preferably, the processing flow of the image characterization submodule in S2 is as follows: inputting each image in a group of image information into a corresponding convolutional neural network respectively, wherein each path of convolutional neural network obtains a part of convolutional layer feature map output, the size of the convolutional layer feature map is W multiplied by H multiplied by K, W multiplied by H is the size of each feature map, and K is the number of the feature maps;

then for each feature mapPerforming maximum pooling processing to obtain K-dimensional image feature vectors, and further obtaining a group of image information image feature vectors, which are marked as { f } ¹ ，f ² ，...，f ^C }。

Preferably, the calculating in S3 obtains a weight value of each image, and the calculating formula is:

wherein c=1, 2,., C, represents the index of the visual sensor; f (f) ^c An image feature vector representing a c-th image in the set of image information; v ^c 、Representing network parameters of a c-th full-connection layer in the deep neural network model; w (w) ^c Is the characteristic vector f ^c Is a function of (c), a weight representing the characteristic of the image of the c-th path, and satisfies +.>

Preferably, in S4, the calculating a visual similarity between the image information of the current position of the vehicle and the pre-stored image information in the database includes:

S _ai ＝l-D _ai ，

wherein a represents an index value of the a-th group image information; i=1, 2, 3.,. N, N is the number of groups of image information in the database; d (D) _ai Representing the distance between two sets of image information, i.e. the distance of the visual space between the corresponding two positions.

Preferably, said D _ai The calculation formula of (2) is as follows:

wherein,,a feature vector representing a c-th image in the a-th image information; w (w) ^c The weight value corresponding to the c-th path image in the a-th group image information is obtained;

the Euclidean distance between the corresponding c-th image feature vector in the a-th position and the i-th position is represented, the calculation formula is that,

wherein,,a feature vector representing a c-th image in the i-th set of image information in the database; I.I ₂ Representing the 2-norm of the vector.

Compared with the prior art, the invention has the advantages that:

the visual information fusion system and the visual information fusion method for vehicle positioning can effectively utilize image information acquired by a plurality of visual sensors in the vehicle-mounted visual sensor module, solve the problem that a positioning system cannot work normally under the condition that a single sensor fails in the prior art by complementation among the image information, and remarkably improve the accuracy of vehicle positioning and the reliability of an algorithm.

Meanwhile, the invention constructs high-quality image features based on the convolutional neural network, and reasonably fuses the image information from a plurality of vision sensors through the designed image feature fusion neural network, and the network has sample self-adaption capability, and can automatically adjust the weight values of different images through training, thereby further improving the environmental adaptability of the invention and the robustness performance of the method.

In addition, the invention provides reference for other technical schemes in the same field, can be used for expanding and extending based on the reference, and is applied to other technical schemes related to vehicle positioning technology or visual information fusion technology, and has high use and popularization values.

The following detailed description of the embodiments of the present invention is provided with reference to the accompanying drawings, so that the technical scheme of the present invention can be understood and mastered more easily.

Drawings

FIG. 1 is a schematic diagram of the system architecture of the present invention;

FIG. 2 is a schematic diagram of the structure of an image characterization sub-module according to the present invention;

FIG. 3 is a schematic diagram of an image feature fusion sub-module according to the present invention;

fig. 4 is a schematic flow chart of the method of the present invention.

Detailed Description

Aiming at the defects of the prior art, the invention provides a visual information fusion system and a visual information fusion method for vehicle positioning, which are specifically as follows.

As shown in fig. 1, a visual information fusion system for vehicle positioning, comprising:

The vehicle-mounted vision sensor module comprises a plurality of vision sensors for acquiring image information in the external environment. In this embodiment, three vision sensors, i.e., c=3, are included in the vehicle vision sensor module. The three visual sensors are respectively mounted right in front of the vehicle, on the left side and on the right side.

The online visual information processing module takes the depth neural network model formed and trained by the offline training module as a basic frame, and comprises an image representation sub-module and an image feature fusion sub-module.

As shown in fig. 2, the image characterization sub-module is composed of a multi-path convolutional neural network, and in this embodiment, the image characterization sub-module is composed of a three-path convolutional neural network, and adopts a pretrained VGG-16 as a convolutional neural network model.

The input of the image characterization submodule is the image information acquired by the vehicle-mounted vision sensor module, and the output of the image characterization submodule is a high-dimensional image feature vector.

The construction process of the specific image feature vector is roughly as follows, each image in a group of image information is respectively input into a corresponding convolutional neural network, and for each path of convolutional neural network, the final layer of convolutional layer output of VGG-16 is obtained, namely, tensor (feature map) with the size of W multiplied by H multiplied by K is obtained, wherein W multiplied by H is the size of each feature map, and K is the number of the feature maps; the last convolutional layer of VGG-16 has 512 feature maps, i.e., k=512. And then carrying out maximum pooling processing on each feature map according to the feature map to obtain 512-dimensional image feature vectors.

As shown in fig. 3, the image feature fusion sub-module is formed by a multi-path neural network, and in this embodiment, the image feature fusion sub-module is formed by three paths of neural networks, where each path of neural network includes three layers of fully-connected neural networks. The scale of the first layer of fully-connected layer is 512×512, the scale of the second layer of fully-connected layer is 512×128, the scale of the third layer of fully-connected layer is 128×1, and multiple nerve networks are linked by means of a Softmax layer to form the deep nerve network model. The input of the image feature fusion sub-module is an image feature vector constructed by the image characterization sub-module, and the output of the image feature fusion sub-module is a weight value of different images.

The off-line training module comprises a training set generation sub-module and an end-to-end training sub-module; the training set generation sub-module is used for generating a training data set of the triplets; the end-to-end training sub-module is used for training the deep neural network model by using the Triplet training data set generated by the training set generation sub-module.

It should be noted that, in the offline training module of the present invention, the training of the deep neural network is based on the optimization method of batch gradient descent, and an end-to-end training mode is adopted. The specific training steps are as follows:

step 1, generating a Triplet sample on a training set; each Triplet sample contains three groups of image information, namely image information to be searched, positive correlation image information and negative correlation image information. The positive correlation image information and the image information to be searched come from the same position, and the negative correlation image information and the image information to be searched come from different positions.

According to the training data set of the triplets, a training method based on the triplets loss function is adopted, the calculation method of the loss function L is that,

L＝max(D _qp -D _qn +m，0)，

wherein D is _qp Representing the visual spatial distance between the image information to be retrieved and the positively correlated image information, D _qn The calculation method of the distance representing the visual space distance between the image information to be retrieved and the positively correlated image information can be derived according to the calculation formula below.

Step 2, initializing each path of convolutional neural network of a feature extraction network (image characterization submodule) by using a pretrained VGG-16 convolutional neural network model respectively; and then fixing parameters of each path of convolutional neural network and training the image feature fusion network.

And 3, fine-tuning parameters of the feature extraction network and the feature fusion network at the same time.

As shown in fig. 4, the present invention further includes a visual information fusion method for vehicle positioning, corresponding to the above visual information fusion system for vehicle positioning, including the following steps:

s1, collecting image information and finishing preprocessing of the image information.

S2, inputting the image information obtained in the S1 into an image characterization submodule, respectively inputting the image information obtained by different visual sensors in the vehicle-mounted visual sensor module into corresponding convolutional neural networks to obtain convolutional layer feature map output, and constructing image feature vectors according to the convolutional layer feature map output.

S3, inputting the image feature vector obtained in the S2 into an image feature fusion submodule, and calculating to obtain a weight value of each image.

S4, calculating the visual similarity between the image information of the current position of the vehicle and the pre-stored image information in the database according to the image feature vector obtained in the S2 and the weight value obtained in the S3.

S5, comparing the visual similarity obtained in the S4 with a preset threshold value, and performing position retrieval. If the visual similarity is higher than a preset threshold, judging that the current position of the vehicle is the same as the pre-stored position in the corresponding database, and realizing vehicle positioning, and if the visual similarity is lower than the preset threshold, indicating that the current position cannot be searched in the corresponding database, continuing to search the next group of images.

Specifically, the image information in S1 is acquired by a vehicle-mounted vision sensor module, where the vehicle-mounted vision sensor module is a system including a plurality of vision sensors, and a plurality of vision sensors acquire image information of the surrounding environment of the vehicle at the same time.

The number of the vision sensors in the vehicle-mounted vision sensor module is denoted as C, and in this embodiment, the number of the vision sensors is three, i.e., c=3. To ensure time synchronization of images obtained by a plurality of sensors, time alignment is performed using time stamps of image information to ensure that a set of images are acquired at the same time and at the same location, i.e., a set of image information represents a location.

The processing flow of the image characterization submodule in the S2 is as follows:

and respectively inputting each image in a group of image information into a corresponding convolutional neural network, wherein each path of convolutional neural network obtains a set of convolutional layer feature map output, the size of the convolutional layer feature map is W multiplied by H multiplied by K, W multiplied by H is the size of each feature map, and K is the number of the feature maps.

Then, according to the feature images, carrying out maximum pooling processing on each feature image to obtain K-dimensional image feature vectors, and further obtaining image feature vectors of a group of image information, which are marked as { f } ¹ ，f ² ，...，f ^C }。

And S3, calculating a weight value of each image, wherein a calculation formula is as follows:

And S4, calculating the visual similarity between the image information of the current position of the vehicle and the pre-stored image information in the database, wherein the calculation formula is as follows:

S _ai ＝1-D _ai ，

The saidD _ai The calculation formula of (2) is as follows:

wherein,,a feature vector representing a c-th image in the a-th image information; w (w) ^c The weight value corresponding to the c-th path image in the a-th group image information is obtained; />And the Euclidean distance between the corresponding c-th image feature vector in the a-th position and the i-th position is represented.

The saidThe calculation formula of (a) is as follows,

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Furthermore, it should be understood that although the present disclosure describes embodiments, not every embodiment is provided with a separate embodiment, and that this description is provided for clarity only, and that the disclosure is not limited to the embodiments described in detail below, and that the embodiments described in the examples may be combined as appropriate to form other embodiments that will be apparent to those skilled in the art.

Claims

1. A visual information fusion system for vehicle positioning, comprising:

the vehicle-mounted visual sensor module comprises a plurality of visual sensors for acquiring image information in the external environment;

the on-line visual information processing module comprises an image characterization sub-module and an image feature fusion sub-module, and the image characterization sub-module is trained to form a basic framework of a deep neural network model by the image information output by the vehicle-mounted visual sensor module through an off-line training module, wherein the image characterization sub-module is composed of a plurality of paths of convolutional neural networks, the input of the image characterization sub-module is the image information acquired by the vehicle-mounted visual sensor module, and the output of the image characterization sub-module is a high-dimensional image feature vector;

the image feature fusion sub-module is composed of multiple paths of neural networks, each path of the neural network comprises three layers of fully-connected neural networks, wherein the scale of a first layer of fully-connected neural network is 512 multiplied by 512, the scale of a second layer of fully-connected neural network is 512 multiplied by 128, the scale of a third layer of fully-connected neural network is 128 multiplied by 1, the multiple paths of the neural networks are linked by means of a Softmax layer to form the deep neural network model, the input of the image feature fusion sub-module is an image feature vector constructed by the image characterization sub-module, and the output of the image feature fusion sub-module is a weight value of different images;

2. The visual information fusion system for vehicle localization of claim 1, wherein: the off-line training module comprises a training set generation sub-module and an end-to-end training sub-module;

3. A visual information fusion method for vehicle positioning, characterized by comprising the steps of:

s1, vehicle-mounted visual senseThe method comprises the steps that an image information is obtained by a processor module, preprocessing of the image information is completed, wherein the vehicle-mounted vision sensor module is a system comprising a plurality of vision sensors, the vision sensors simultaneously obtain the image information of the surrounding environment of a vehicle, and the number of the vision sensors in the vehicle-mounted vision sensor module is marked as C; time alignment is carried out by adopting the time stamp of the image information, so that a group of images are collected at the same time and the same position, namely, the group of image information represents one position; each group of image information contains C images, which are marked as { I } ¹ ，I ² ，...，I ^C }；

S2, inputting the image information obtained in the S1 into an image characterization submodule, wherein the processing flow is that each image in a group of image information is respectively input into a corresponding convolutional neural network, each path of convolutional neural network obtains a part of convolutional layer feature map output, and the size of the convolutional layer feature map is as follows: w×h×k, wherein w×h is the size of each feature map and K is the number of feature maps; then carrying out maximum pooling processing on each feature map to obtain K-dimensional image feature vectors, and further obtaining image feature vectors of a group of image information, which are marked as { f } ¹ ，f ² ，...，f ^C The method comprises the steps that image information acquired by different visual sensors in a vehicle-mounted visual sensor module is respectively input into corresponding convolutional neural networks to obtain convolutional layer feature map output, and image feature vectors are constructed according to the convolutional layer feature map output;

s3, inputting the image feature vector obtained in the S2 into an image feature fusion submodule, and calculating to obtain a weight value of each image, wherein a calculation formula is as follows:wherein c=1, 2, C, index representing visual sensor; f (f) ^C An image feature vector representing a c-th image in the set of image information; v ^C 、/>Representing deep neural network modesNetwork parameters of the c-th full-connection layer in the model; w (w) ^C Is the characteristic vector f ^C Is a function of (c) path image characteristics, and satisfies the following

4. A visual information fusion method for vehicle localization as claimed in claim 3, wherein: and S4, calculating the visual similarity between the image information of the current position of the vehicle and the pre-stored image information in the database, wherein the calculation formula is as follows: s is S _ai ＝1-D _ai Wherein a represents an index value of the image information of the a-th group; i=1, 2,3,.. N, N is the number of sets of image information in the database; d (D) _ai Representing the distance between two sets of image information, i.e. the distance of the visual space between the corresponding two positions.

5. The visual information fusion method for vehicle localization of claim 4, wherein: the D is _ai The calculation formula of (2) is as follows:wherein (1)>Feature vector, w, representing the c-th path image in the a-th set of image information ^C For group a of figuresThe weight value corresponding to the c-th image in the image information; />The Euclidean distance between the corresponding c-th image feature vector in the a-th position and the i-th position is represented, and the calculation formula is as follows: />Wherein (1)>Feature vectors representing the c-th image in the i-th set of image information in the database ₂ Representing the 2-norm of the vector.