WO2020083103A1

WO2020083103A1 - Vehicle positioning method based on deep neural network image recognition

Info

Publication number: WO2020083103A1
Application number: PCT/CN2019/111840
Authority: WO
Inventors: 冯江华; 胡云卿; 袁浩; 林军; 刘悦; 游俊; 熊群芳; 丁驰; 岳伟
Original assignee: 中车株洲电力机车研究所有限公司
Priority date: 2018-10-24
Filing date: 2019-10-18
Publication date: 2020-04-30
Also published as: SG11202103814PA; CN109446973A; CN109446973B

Abstract

A vehicle positioning method based on deep neural network image recognition, and a training method for a deep neural network. The training method comprises: road marking graphic configuration (101), photographing device configuration (102), image sample collection (103), training sample making (104), deep neural network building (105), and deep neural network training (106). The image sample collection process is carried out in different periods under different illumination and weather conditions, so that the environmental adaptability of the deep neural network is improved. In addition, by taking sample images every certain angle in a traveling direction of a vehicle and the direction perpendicular to the traveling direction of the vehicle, a large amount of training sample data is obtained, the training precision for a deep neural network is improved, and thus the precision of vehicle positioning is improved.

Description

Vehicle positioning method based on deep neural network image recognition

Technical field

The invention relates to the technical field of image recognition and positioning, in particular to a vehicle positioning method based on deep neural network image recognition, and a training method of the deep neural network.

Background technique

At present, before a public transportation vehicle enters a station, the driver only judges the distance between the vehicle and the station by visual inspection, and cannot achieve accurate inbound route planning and speed control. In order to enable vehicles to achieve accurate pitting route planning and speed control before entering the station, it is necessary to accurately locate the distance between the vehicle and the station. At present, vehicle positioning technology mainly uses GPS technology and high-precision map matching positioning.

The GPS technology has the following problems when it is used: when using the ordinary GPS mode for positioning, the positioning error reaches the meter level, which cannot meet the accuracy requirements of the vehicle; if the GPS RTK mode is used, it is necessary to obtain both satellite information and ground reference positioning information It is necessary to install reference positioning communication equipment along the road. The equipment cost and use cost are high. When the vehicle enters the road section with poor satellite signal, such as dense forest or tunnel, the GPS signal is easy to be lost, thereby losing positioning information.

High-precision map matching and positioning generally use point cloud data matching or stereo vision matching. Map data needs to be established and stored on the vehicle in advance. When the vehicle is running, point cloud data or image data of the current environment of the vehicle is obtained through an external laser radar or camera device And match with pre-stored map data. The cost of map making and software and hardware cost of matching calculation are relatively high.

Therefore, a low-cost, high-precision vehicle positioning method is needed to provide reliable data support for vehicle positioning, pit route planning, and speed control.

Summary of the invention

The invention provides a vehicle positioning method based on deep neural network image recognition, and a training method of the deep neural network. The invention can increase the training accuracy of the deep neural network by increasing the number of training samples and optimizing the parameters of the neural network, thereby improving the positioning accuracy of the vehicle, and the required equipment cost and use cost are low.

The first aspect of the present invention provides a deep neural network training method for road sign recognition, including the following steps:

(1) Road sign graphic setting step: set a road sign graphic on the road surface of the station inbound direction, and the distance between the marking point of the road sign graphic and the edge of the station inbound direction is L;

(2) Steps for setting the camera: install the camera on the vehicle, the optical axis of the camera coincides with the longitudinal centerline of the vehicle body, and the distance between the lens center of the camera and the ground is H

(3) Image sample collection step: under different lighting or weather conditions, the road sign graphic is photographed using the shooting device, respectively in the direction of the vehicle's advance and the direction perpendicular to the direction of the vehicle's advance, Changing the angle between the optical axis of the photographing device and the road surface, so that the photographing device takes an image sample of the road marking graphic at a certain angle within a certain range of the angle between the optical axis and the road surface;

(4) Training sample production step: calculate the position coordinates of the identification point of the road sign graphic in the image coordinate system in each of the image samples, make a label set, and combine each of the image samples with the corresponding labels Set pairing to form training samples;

(5) Steps for building a deep neural network: on the basis of the target recognition classification deep neural network, the final classification output layer of the network is modified into an output layer composed of 2 nodes to output the position of the marking point of the road marking graphic coordinate;

(6) Deep neural network training step: input the training samples to the deep neural network for training.

Preferably, in the image sample collection step, the shooting time is selected at noon on a sunny day and at night on a sunny day.

Preferably, in the image sample collection step, the shooting time is selected at noon on rainy days and night on rainy days.

Preferably, in the image sample collection step, the shooting time is selected at noon on a foggy day and at night on a foggy day.

Preferably, in the image sample collection step, the photographing device photographs an image sample of the road marking pattern every 5 ° within an angle range of 5 ° to 180 ° between the optical axis and the road surface.

Preferably, the lens parameters of the shooting device are selected so that when all the road sign graphics appear in the lens screen, the road sign graphics can occupy more than 20% of the area of the lens screen.

Preferably, the photographing device is installed at the front roof position of the vehicle, and points in the direction of the vehicle.

Preferably, the road sign graphic adopts a triangle, a rectangle, an arc, or other geometric element combinations that are easy to recognize.

Preferably, the road sign graphic is a bar code or a two-dimensional code.

Preferably, the identification point of the road identification graphic is its geometric center.

Preferably, the deep neural network adopts ResNet50 network, and replaces the last classification output layer of the network with two fully connected layers with 1024 nodes, and the fully connected layer is connected with an output layer with 2 nodes output.

Preferably, the deep neural network adopts a ResNet50 network, and replaces the final classification output layer of the network with two fully connected layers with 2048 nodes, and the fully connected layer is connected with an output layer with 2 nodes output.

Preferably, the floating point data output by the two nodes belong to the closed interval of [0, 1], and the pixel coordinates can be obtained by multiplying the output floating point data by the corresponding image width and height.

A second aspect of the present invention provides a vehicle positioning method using the above deep neural network training method, including the following steps:

(1) Road sign pattern recognition step: use the deep neural network after training to identify the road sign pattern photographed during the actual pit stop of the vehicle and obtain its sign point P in the image coordinate system Position coordinates (u, v);

(2) Road sign graphic positioning step: Through the transformation relationship between the image coordinate system and the world coordinate system, the coordinates (X _w , Y _w , Z _w ) of the road sign graphic identification point P in the world coordinate system are calculated, thereby Obtaining the distance between the road marking graphic marking point P and the shooting device;

(3) Vehicle positioning step: determine the distance between the shooting device and the edge of the station in the direction of the station according to the distance between the obtained road marking graphic identification point P and the shooting device, and then combine the shooting device on the vehicle To determine the distance between the vehicle and the edge of the station into the station.

Preferably, the marking point P of the road marking graphic is on the optical axis of the lens of the shooting device;

The origin of the coordinate system of the shooting device is set at the position of the imaging aperture of the shooting device, and the horizontal distance between the optical center of the lens of the shooting device and the marking point P of the road marking graphic is Z _C ;

The positive direction of the Z axis of the camera coordinate system is selected as the forward direction of the vehicle, the positive direction of the Y axis of the camera coordinate system is selected as the downward direction of the vehicle, and the positive X axis of the camera coordinate system is positive Select the right direction of the vehicle;

The world coordinate system coincides with the camera coordinate system;

The origin of the image coordinate system is on the Z axis of the camera coordinate system, and the X axis and Y axis of the image coordinate system are parallel to the X axis and Y axis of the camera coordinate system, respectively;

According to the formula:

Obtain the horizontal distance Z _C between the photographing device and the road identification graphic identification point Pi;

Then according to the formula:

L _cz = Z _c + L

Obtain the horizontal distance L _CZ between the shooting device and the edge of the station in the direction of the station.

The advantages of the present invention are:

(1) The image sample collection process is carried out in multiple periods under different lighting and weather conditions, which reduces the influence of environmental factors on the training results and improves the environmental adaptability of the deep neural network.

(2) Take a sampled image at a certain angle in the vehicle's direction of travel and the direction perpendicular to the direction of travel. The amount of training samples obtained is large, which improves the training accuracy of the deep neural network and thus improves the accuracy of subsequent vehicle positioning .

(3) Use the trained deep neural network to identify the road sign graphic set in front of the station, and through the conversion of the image coordinate system and the world coordinate system, calculate the spatial position of the on-board shooting device and then locate the vehicle position. The above method can provide the distance data between the vehicle and the station, provide data support for vehicle positioning, inbound route planning, and speed control, and has the advantages of simple operation, low cost, and high reliability.

BRIEF DESCRIPTION

The above content of the present invention and the following specific embodiments will be better understood when read in conjunction with the accompanying drawings. It should be noted that the drawings are only examples of the claimed invention. In the drawings, the same reference numerals represent the same or similar elements.

FIG. 1 is a flowchart of a deep neural network training method for road sign recognition;

Figure 2 is a schematic diagram of shooting in the image sample collection step;

3 is a flowchart of a vehicle positioning method based on a deep neural network after training is completed;

Figure 4 is a side view of the vehicle during the vehicle entering the station;

5 is a plan view of the vehicle during the vehicle entering the station;

Fig. 6 is a schematic diagram of calculating the edge distance between the vehicle and the station in the direction of the station.

detailed description

The present invention will be further described in detail below with reference to the drawings and embodiments.

1 is a flowchart of a deep neural network training method for road sign recognition provided by the present invention, including a road sign graphic setting step 101, a camera setting step 102, an image sample acquisition step 103, a training sample production step 104, Deep neural network construction step 105, deep neural network training step 106.

Road sign graphic setting step 101: Set a road sign graphic on the road surface of the station inbound direction, and the distance between the marking point of the road sign graphic and the edge of the station inbound direction is L. The road marking graphics may be, but not limited to, triangles, rectangles, arcs, or other easily identifiable combinations of geometric elements, or text graphics, or bar codes or two-dimensional codes incorporating station-related information. The identification point of the road identification graphic may be a geometric center point, vertex or other geometric feature point of the road identification graphic.

Shooting device setting step 102: Install a shooting device on the vehicle. The lens of the shooting device points to the direction of the vehicle. It is installed on the roof of the front of the vehicle or other position where the road sign can be photographed. The longitudinal symmetry centerline of the vehicle body coincides, and the distance H between the lens optical center of the shooting device and the ground is recorded. The lens parameters of the shooting device are selected such that, when all the road sign graphics appear in the lens screen, the road sign graphics can occupy more than 20% of the area of the lens screen, and the larger the area, the sign point of the road sign graphics is The more precise the positioning.

Image sample collection step 103: Under different lighting or weather conditions, such as sunny noon and sunny night, rainy noon and rainy night, foggy noon and foggy night, the road sign pattern is photographed using the above-mentioned shooting device. The shooting angle is shown in Figure 2, where the letter A represents the road sign graphic, and the angle between the optical axis of the camera and the road surface is changed in the direction of the vehicle's advance and the direction perpendicular to the direction of the vehicle's direction, so that the camera is in its An image sample of a road marking image is taken every 5 ° within an angle of 5 ° to 180 ° between the optical axis and the road surface.

Training sample production step 104: Calculate the position coordinates of the road marking graphics in each image sample in the image coordinate system, make a label set, and pair each image sample with the corresponding label set to form a training sample, so that Then input deep neural network for training.

Deep neural network construction step 105: Use a target recognition classification deep neural network, but modify the final classification output layer of the network to an output layer composed of two nodes, the values output by these two nodes are the identification points of the road marking graphics. Coordinates in the image frame. More specifically, the ResNet50 network can be used, and the final classification output layer is removed. According to the required recognition effect, two fully connected layers with 1024 nodes or 2048 nodes are used instead. After the fully connected layer, there are 2 connected In the output layer output by the node, the floating point data output by these two nodes belongs to the closed interval of [0, 1], and the pixel coordinates can be obtained by multiplying the output floating point data by the corresponding image width and height.

Deep neural network training step 106: The aforementioned training samples are input to the optimized deep neural network for training. After the training is completed, the deep neural network can be used to identify the road sign graphic and obtain the geometric center position coordinates.

3 is a flowchart of a vehicle positioning method using the above deep neural network training method provided by the present invention, including a road sign pattern recognition step 201, a road sign pattern positioning step 202, and a vehicle positioning step 203.

Road sign pattern recognition step 201: Using the deep neural network after the training is completed, the road sign graphics photographed during the actual pit stop of the vehicle are recognized and the position coordinates (u, v) of the sign point P in the image coordinate system are obtained .

Road sign graphic positioning step 202: Calculate the coordinates (X _w , Y _w , Z _w ) of the road sign graphic point P in the world coordinate system through the transformation relationship between the image coordinate system and the world coordinate system, thereby obtaining the road sign graphic The distance between the marking point P and the camera. The transformation between the above image coordinate system and the world coordinate system can be described using a small hole imaging model. A point P _w in the world coordinate system, whose coordinates in the world coordinate system are (X _w , Y _w , Z _w ), and the point P _i in the two-dimensional image coordinate system imaged by the lens, the coordinates are (u, v), the coordinates of P _w and P _i points can be converted using formula (1):

(1) In the formula, Z _C represents the horizontal distance between the road marking graphic marking point P and the optical center of the camera lens; d _x , d _y , u ₀ , v ₀ , f are the internal lens parameters related to the camera lens, specific for:

d _x , d _y represent the physical length of the unit pixel in the X and Y directions of the image coordinate system; u ₀ , v ₀ respectively represent the origin of the image coordinate system and the origin of the camera coordinate system in the X and Y directions Offset; f represents the lens imaging focal length.

In the formula (1), R represents the rotation relationship between the world coordinate system and the camera coordinate system, and the formula (2) is used to calculate:

Where α, β, and γ respectively represent the angles required to rotate around the X axis, Y axis, and Z axis when the world coordinate system and the camera coordinate system coincide.

In the formula (1), T represents the translation relationship between the world coordinate system and the camera coordinate system, and the formula (3) is used to calculate:

T = [t _x t _y t _z ] ^T (3)

Where t _x , t _y , and t _z represent the translation amounts of the world coordinate system and the camera coordinate system on the X axis, Y axis, and Z axis, respectively.

During specific implementation, the above parameters d _x , d _y , u ₀ , v ₀ , f, α, β, γ, t _x , t _y , t _z can be calibrated using but not limited to the conditions described below.

As shown in FIGS. 4 and 5, the shooting device is installed at the front roof of the vehicle and points in the forward direction. The axis of the lens optical center of the shooting device coincides with the geometric symmetry centerline of the longitudinal axis of the vehicle. The marking point P of the road marking on the road surface in front of the vehicle is on the axis of the lens optical center of the shooting device, and the horizontal distance from the lens optical center of the shooting device is Z _C . Set the origin of the camera coordinate system to the location of the imaging aperture of the camera. To simplify the calculation, it is assumed that the world coordinate system coincides with the camera coordinate system, and the forward direction of the vehicle is selected as the positive direction of the Z axis, the downward direction of the vehicle is the positive direction of the Y axis, and the right direction of the vehicle is the positive direction of the X axis. The origin of the image coordinate system is on the Z axis of the camera coordinate system, and the X axis and Y axis of the image coordinate system are parallel to the X axis and Y axis of the camera coordinate system, respectively.

According to the above conditions, the coordinates of point P in the world coordinate system are (X _w , Y _w , Z _w ), the coordinates in the camera coordinate system are (X _c , Y _c , Z _c ), and X _w = X _c = 0, Y _w = H, Z _w = Z _c . After the point P is imaged through the aperture of the camera, the coordinates in the image coordinate system are (u, v), and u = 0, u ₀ = 0, v ₀ = 0. The translation parameter t _x = t _y = t _z = 0 of the camera coordinate system and the world coordinate system. Thus equation (1) can be simplified as:

which is

Vehicle positioning step 203: As shown in FIG. 6, after obtaining the horizontal distance Z _C of the marking point P of the road marking graphic from the optical center of the lens of the shooting device, and then combining the distance L of the marking point P of the road marking graphic from the edge of the station in the direction of stop , You can calculate the horizontal distance L _{CZ of the} optical center of the camera and the edge of the station in the direction of the station:

L _cz = Z _c + L (6)

Combined with the installation position of the photographing device on the vehicle, the distance between the vehicle and the edge of the station entering direction is determined, so as to realize vehicle positioning.

The terms and expressions based here are for description only, and the present invention should not be limited to these terms and expressions. The use of these terms and expressions is not meant to exclude any schematic and descriptive (or part of) equivalent features, and it should be recognized that various modifications that may exist should also be included within the scope of the claims. Other modifications, changes and substitutions may also exist. Accordingly, the claims should be considered to cover all such equivalents.

Similarly, it should be pointed out that although the present invention has been described with reference to the current specific embodiments, those of ordinary skill in the art should realize that the above embodiments are only used to illustrate the present invention without departing from the present invention. Various equivalent changes or substitutions can also be made under the spirit, so as long as the changes and modifications to the above-mentioned embodiments are within the spirit of the present invention, they will fall within the scope of the claims of the present application.

Claims

A deep neural network training method for road sign recognition, characterized in that the method includes:

The road sign graphic setting step: set a road sign graphic on the road surface of the station inbound direction, and the distance between the mark point of the road sign graphic and the edge of the station inbound direction is L;

The setting step of the shooting device: install the shooting device on the vehicle, the optical axis of the shooting device coincides with the longitudinal symmetry centerline of the vehicle body, and the distance from the lens optical center of the shooting device to the ground is H;

Image sample collection step: under different lighting or weather conditions, use the shooting device to shoot the road sign graphic, and change the direction of the vehicle and the direction perpendicular to the direction of the vehicle respectively The angle between the optical axis of the photographing device and the road surface makes the photographing device take an image sample of the road marking graphic at a certain angle within a certain range of the angle between the optical axis and the road surface;

Training sample production step: Calculate the position coordinates of the identification point of the road sign graphic in the image coordinate system in each image sample, make a label set, and pair each image sample with the corresponding label set, Form training samples;

Steps for building a deep neural network: on the basis of a deep neural network for target recognition classification, the final classification output layer of the network is modified to an output layer composed of 2 nodes to output the position coordinates of the marking points of the road marking graphics;

Deep neural network training step: input the training samples to the deep neural network for training.
The deep neural network training method for road sign recognition according to claim 1, characterized in that, in the image sample collection step, the shooting time is selected to be at noon on a sunny day and at night on a sunny day.
The deep neural network training method for road sign recognition according to claim 1, characterized in that, in the image sample collection step, the shooting time is selected to be noon on rainy days and night on rainy days.
The deep neural network training method for road sign recognition according to claim 1, characterized in that, in the image sample collection step, the shooting time is selected at noon on foggy days and at night on foggy days.
A deep neural network training method for road sign recognition according to claims 1 to 4, characterized in that in the image sample collection step, the angle between the optical axis of the shooting device and the road surface is 5 ° An image sample of the road sign graphic is taken every 5 ° within the range of 180 °.
A deep neural network training method for road sign recognition according to claim 1, characterized in that the lens parameters of the shooting device are selected when all of the road sign graphics appear in the lens screen, The road sign graphic can occupy more than 20% of the area of the lens screen.
A deep neural network training method for road sign recognition according to claim 6, characterized in that the photographing device is installed at a position on the roof of the front of the vehicle and points in the direction of the vehicle.
A deep neural network training method for road sign recognition according to claim 1, characterized in that the road sign graphics use triangles, or rectangles, or arcs, or other geometric element combinations that are easy to recognize.
The deep neural network training method for road sign recognition according to claim 1, characterized in that the road sign graphic is a bar code or a two-dimensional code.
The deep neural network training method for road sign recognition according to claim 1, wherein the sign point of the road sign graphic is its geometric center.
A deep neural network training method for road sign recognition according to claim 1, wherein the deep neural network uses a ResNet50 network, and the last classification output layer of the network is replaced with two layers with 1024 A fully connected layer of nodes, which is connected behind an output layer with two nodes output.
A deep neural network training method for road sign recognition according to claim 1, characterized in that the deep neural network uses a ResNet50 network, and the last classification output layer of the network is replaced with two layers with 2048 A fully connected layer of nodes, which is connected behind an output layer with two nodes output.
A deep neural network training method for road sign recognition according to claims 11-12, characterized in that the floating point data output by the two nodes belongs to the closed interval of [0, 1], and the The pixel coordinates can be obtained by multiplying the output floating point data with the corresponding image width and height.
A vehicle positioning method using the deep neural network training method of claim 1, wherein the method comprises:

Road sign pattern recognition step: use the deep neural network after training to identify the road sign pattern photographed during the actual pit stop of the vehicle and obtain the position coordinates of its sign point P in the image coordinate system (u, v);

The road sign graphic positioning step: calculate the coordinates (X w , Y w , Z w ) of the road sign graphic point P in the world coordinate system through the transformation relationship between the image coordinate system and the world coordinate system, thereby obtaining the The distance between the road marking graphic marking point P and the shooting device;

Vehicle positioning step: determine the distance between the shooting device and the edge of the station's entry direction according to the distance between the obtained road marking graphic identification point P and the shooting device, and then combine the installation position of the shooting device on the vehicle To determine the distance between the vehicle and the edge of the station's inbound direction.
A vehicle positioning method according to claim 14, wherein the marking point P of the road marking graphic is on the optical axis of the lens of the shooting device;

The origin of the coordinate system of the shooting device is set at the position of the imaging aperture of the shooting device, and the horizontal distance between the optical center of the lens of the shooting device and the marking point P of the road marking graphic is Z C ;

The positive direction of the Z axis of the camera coordinate system is selected as the forward direction of the vehicle, the positive direction of the Y axis of the camera coordinate system is selected as the downward direction of the vehicle, and the positive X axis of the camera coordinate system is positive Select the right direction of the vehicle;

The world coordinate system coincides with the camera coordinate system;

The origin of the image coordinate system is on the Z axis of the camera coordinate system, and the X axis and Y axis of the image coordinate system are parallel to the X axis and Y axis of the camera coordinate system, respectively;

According to the formula:

Obtain the horizontal distance Z C between the photographing device and the road marking graphic marking point P;

Then according to the formula:

L cz = Z c + L

Obtain the horizontal distance L CZ between the shooting device and the edge of the station in the direction of the station.