CN1387167A

CN1387167A - Method for creating 3D dual-vision model with structural light

Info

Publication number: CN1387167A
Application number: CN 01118302
Authority: CN
Inventors: 张广军; 魏振忠; 李鑫; 贺俊吉
Original assignee: Beihang University
Current assignee: Beihang University; Beijing University of Aeronautics and Astronautics
Priority date: 2001-05-22
Filing date: 2001-05-22
Publication date: 2002-12-25
Anticipated expiration: 2021-05-22
Also published as: CN1168045C

Abstract

A structure light method for creating 3D dual-vision model used for industrial inspection and vision pilot is disclosed. The RBF nerve network is composed of 3 layers. The input layer has two nodes. The output layer has three nodes. The default layer has three nodes, whose action function is Gauss nuclear function. The data of scale points in global coordinate system is used to train the dual-vision nerve network. Its advantages are high precision, high training speed, and no blind area.

Description

Method for creating 3 D dual-vision model with structural light

The present invention relates to a kind of structural light three-dimensional double vision vision model modeling legislation that is used for industrial detection and visual guidance.

Vision detection technology with its wide range, big visual field, measuring speed is fast, optical strip image is easy to extract and characteristics such as degree of precision have obtained using more and more widely in industrial environment.Structured light three-dimensional vision detects and is widely used in the integrality of workpiece, the measurement of surface smoothness; The automatic detection of microelectronic component (IC chip, PC plate, BGA) etc.; The detection of soft, easy crisp parts; The detection of various mould 3D shapes; Robotic vision guiding etc.This system flexibility is good; Be measured as contactless, dynamic response is fast, can satisfy a large amount of to produce " beat " short requirements, and whole measuring process is increasingly automated.

Set up rational vision-based detection model and be the important step in the structured light three-dimensional vision testing process, the modeling method of vision-based detection model mainly contains two kinds at present: conventional modeling method and based on the modeling method of BP (BackPropagation) neural network.

(1) conventional modeling method is based on the method for video camera pin-hole imaging theory, but the pin-hole imaging theory is a kind of approximate, otherwise does not have any light intensity on the video camera light-sensitive surface.Therefore, strictly speaking the vision-based detection universal model is inaccurate, especially this approximate even more serious under the situation of distance light axle.

In addition, vision detection system is accurate and complicated, influences the parameter of system accuracy, except error of mathematical model, also comprise many systematic parameters and video camera inside, external parameter, adjust error, the non-homogeneous error of the photosensitive unit of CCD, vision signal transformed error etc. as optical system.Wherein a part can be described with mathematical model, and some then is difficult to use model description.Therefore, still there is certain difference in the mathematical model that had of structured light three-dimensional vision detection model and the real system that adopts conventional method to set up.Disturb if ignored these small sum of errors, will reduce the measuring accuracy of system.Many vision-based detection of structural light three-dimensional precision is about 0.5-1mm at present.

(2) based on the vision mode modeling method of BP neural network: this method all has relevant bibliographical information both at home and abroad, such as people (Optical Engineering such as Ming Cheng, Vol.34, No.12,1995, pp3572-3576), people such as Deng Wenyi (HUST's journal, Vol.27, No.1,1999, pp78-80) all adopted this method.But these reports all adopt BP network commonly used, and are to be used for the single vision detection model.Yet there are some shortcoming and defect that can't overcome in the BP network, and mainly shows: the BP network uses more hidden layer usually on the structure, makes network become complicated, and training speed is slow.In the hands-on, there is the local optimum problem in the BP network, and speed of convergence is slow, efficient is low, and its precision is often not high, and the precision that provides in the report document has only 0.31-0.34mm.

The objective of the invention is to set up with few hidden layer, high precision, the double vision feel detection model of fast convergence rate.

Technical solution of the present invention is, radial basis function neural network (RBF neural network) is formed by three layers, and input layer is provided with two nodes, and output layer is provided with three nodes, and the action function in the hidden node adopts gaussian kernel function, for

u_{j} = \exp [- \frac{{(X - C_{j})}^{T} (X - C_{j})}{2 σ_{j}^{2}}]

J=1,2 ..., N _hNetwork is output as the linear combination of hidden node output, promptly

y_{i} = Σ_{j = 1}^{N_{h}} ω_{ij} u_{j} - θ = W_{i}^{T} U

i＝1，2，3；

After network structure is determined, utilize the data of calibration point of gathering under the global coordinate system that double vision feel neural network is trained, its training step is:

(1) start training, the initialization network architecture parameters comprises the central value of gaussian kernel function, variances sigma and hidden layer-output layer weights battle array W.

(2) network training parameter learning rate μ (k) is set, the minimum training error of the network in general of momentum term α (k) and expectation

(3) from internal memory, from N training sample, take out one group in order, with input training sample (x wherein _1i, x _2i) be input to the input layer of network.

(4) the actual output (y of three nodes of computational grid output layer _O1i, y _O2i, y _O3i), and calculate itself and network desired output (y respectively _E1i, y _E2i, y _E3i) promptly export the quadratic sum of the residual error of training sample, be added among the SUM, promptly

SUM = SUM + Σ_{j = 1}^{3} {(y_{eji} - y_{oji})}^{2};

(5) adjust each network architecture parameters by the gradient descent method that drives quantifier, concrete adjustment algorithm is as follows: the definition criterion function Calculation criterion function J (E) is to the local derviation numerical value of each network architecture parameters,

&PartialD; J (E) / &PartialD; E |_{E = E (k)},

And adjust the value of each network architecture parameters by following formula:

The adjustment type of hidden layer-output layer connection weight value matrix W is

W (k + 1) = W (k) + μ (k) \times (- \frac{&PartialD; J}{&PartialD; W}) |_{W = W (k)} + α (k) \times [W (k) - W (k - 1)]

The adjustment type of hidden layer central value Matrix C is

C (k + 1) = C (k) + μ (k) \times (- \frac{&PartialD; J}{&PartialD; C}) |_{C = C (k)} + α (k) \times [C (k) - C (k - 1)]

The adjustment type of hidden layer variance matrix σ is

σ (k + 1) = σ (k) + μ (k) \times (- \frac{&PartialD; J}{&PartialD; σ}) |_{σ = σ (k)} + α (k) \times [σ (k) - σ (k - 1)]

(6) judging whether that N group training sample is all imported has trained one time.If not, then get back to (3); If then the overall training error of computational grid is a root-mean-square deviation

E_{RMS} = \sqrt{SUM / N};

(7) judge E _RMSWhether less than the value of expectation.If not, then get back to (3), and suitably adjust network training parameter learning rate μ (k) and momentum term α (k) according to the observation; If, then utilize current network model, use the test error of each node output of network test sample set computational grid output layer.The calculating of test error is undertaken by following formula:

(8) judge test error, if test error can not meet the demands, the minimum training error of the network in general that then can reduce to expect Turning back to (3) continues training or turns back to (1) to restart training; If reached requirement, then preserve network architecture parameters, finish training.

The present invention is first based on the RBF neural network, set up structural light three-dimensional double vision vision model, not only overcome the deficiency that conventional modeling method exists effectively, also solved the blind zone problem of single vision detection system in the BP network, and improved the speed of convergence of network significantly, avoided local minimum point's problem, has overall optimal approximation performance, improved the model modeling precision, because the present invention has adopted the gradient descent method that drives quantifier to come training network, therefore, the fast convergence rate of modeling, good stability, the training precision height, training method is easily gone fast.

Fig. 1 is a network structure of the present invention;

Fig. 2 is a training process flow diagram of the present invention;

Fig. 3 is a double vision feel calibration point generating means synoptic diagram of the present invention.

The structure of network is three layers, and input layer is provided with two nodes, represents two dimensional image coordinate (x respectively ₁, x ₂), output layer is provided with three nodes, represents three-dimensional article coordinate (y respectively ₁, y ₂, y ₃).Action function in the hidden node adopts gaussian kernel function (Gauss Kernel Function), is shown below:

u_{j} = \exp [- \frac{{(X - C_{j})}^{T} (X - C_{j})}{2 σ_{j}^{2}}] j = 1,2 \dots \dots N_{h} - - - - - (1)

Wherein, u _jBe the output of j hidden node, x=(x ₁, x ₂) ^rBe the input sample, i.e. two dimensional image coordinate, C _jBe the central value of Gaussian function, σ _jBe variance, N _hIt is the number of hidden nodes.

Network is output as the linear combination of hidden node output, promptly

y_{i} = Σ_{j = 1}^{N_{h}} ω_{ij} u_{j} - θ = W_{i}^{T} U - - - i = 1,2,3 - - - - - (2)

Wherein,

W_{1} = (ω_{i 1}, ω_{i 2}, K, ω_{i N_{h}}, - θ)

Be the weights battle array of hidden layer and output layer,

U = {(u_{1}, u_{2}, K, u_{N_{h}}, 1)}^{T}

Output vector for hidden layer.

After double vision feels that RBF network model structure is determined, next be exactly to utilize the data of calibration point of gathering under the global coordinate system that double vision feel RBF neural network is trained, to set up the double vision vision model.

Based on the training implementation procedure of RBF neural network double vision vision model as shown in Figure 2, detailed steps is as follows:

SUM = SUM +

Σ_{j = 1}^{3} {(y_{eji} - y_{oji})}^{2}

。

(5) adjust each network architecture parameters by the gradient descent method that drives quantifier, concrete adjustment algorithm is as follows:

The definition criterion function

Wherein, output is wished in Y (k) representative,

(E k) is the actual output of network, and E is the vector that all parameters of network are formed, and comprises hidden layer central value, hidden layer variance and output weights, and (E k) is ε

(E is k) to the residual error of Y (k).

Calculation criterion function J (E) is to the local derviation numerical value of each network architecture parameters,

&PartialD; J (E) / &PartialD; E |_{E = E (k)},

W (k + 1) = W (k) + μ (k) \times (- \frac{&PartialD; J}{&PartialD; W}) |_{W = W (k)} + α (k) \times [W (k) - W (k - 1)]

The adjustment type of hidden layer central value Matrix C is

C (k + 1) = C (k) + μ (k) \times (- \frac{&PartialD; J}{&PartialD; C}) |_{C = C (k)} + α (k) \times [C (k) - C (k - 1)]

The adjustment type of hidden layer variance matrix σ is

σ (k + 1) = σ (k) + μ (k) \times (- \frac{&PartialD; J}{&PartialD; σ}) |_{σ = σ (k)} + α (k) \times [σ (k) - σ (k - 1)]

。

Wherein, the number of times of k for adjusting, μ (k) is a learning rate, α (k) is a k forgetting factor constantly, claims momentum term or damping term again.When last time promptly the k+1 time network parameter values by i.e. the k time network parameter values acquisition of last time, promptly the k time parameter value adds the product of the criterion function negative gradient of μ (k) and this parameter, adds the product of α (k) and the k time and k-1 subparameter value residual error.The effect of μ (k) is the training speed that is used to adjust network, and when μ (k) was big, the amplitude of parameter adjustment was big, otherwise then little.The effect of α (k) is equivalent to damping force, and when the training error of network descended rapidly, it made network convergence steady gradually, and when the network training error increased rapidly, it made more and more slower that network disperses.Like this, make and the excessive vibration of the unlikely appearance of convergence process of network help the steady convergence of network.

(6) judging whether that N group training sample is all imported has trained one time.If not, then get back to (3); If then the overall training error of computational grid is a mean square deviation

E_{RMS} = \sqrt{SUM / N}

。

Wherein, N ₁Be test sample book sum, Y _XiBe the desired output of the x coordinate of i test sample book,

Be the actual output of the x coordinate of neural network, the rest may be inferred by analogy for it.

(8) test error is the final precision of network model.If test error can not meet the demands, the minimum training error of the network in general that then can reduce to expect Turning back to (3) continues training or turns back to (1) to restart training.If reached requirement, then preserve network architecture parameters, finish training.

Obtain the data of calibration point that is used for network training, comprise two dimensional image coordinate (x ₁, x ₂) and three-dimensional article coordinate (y ₁, y ₂, y ₃).This process is finished by a high-precision three-dimensional double vision feel calibration point generating means, as shown in Figure 3.Wherein 1,2 is laser projecting apparatus.3,4 is ccd video camera, realizes obtaining of scene image.5 is two-way photoelectricity sighting device.6 is three-dimensional transfer table.7 is image pick-up card.8 is computing machine, control and data processing.(technology contents of this generating means is open in 01115655.4 patent at application number, no longer it is done set forth in detail herein) for the left side vision detection system, controlling this device moves with step-length 4mm in x direction and z direction respectively, gather the calibration point of the flat interior 60mm of light * 60mm scope, the calibration point number is 256 altogether.Do test sample book for wherein 64, do training sample for 192.For the right side vision detection system, control this device equally and move with step-length 4mm in x direction and z direction respectively, gather the calibration point of the flat interior 60mm of light * 60mm scope, the calibration point number is 256 altogether.Do test sample book for wherein 64, do training sample for 192.

For this structured light double vision vision model based on the RBF neural network, utilize above-mentioned training sample that obtains and test sample book, adopt 5.2 steps to carry out patience respectively to the vision detection system of the left and right sides and train fully, finally obtained the three-dimensional double vision of optimum structure light and felt that the RBF neural network model is as follows:

(1) structural parameters of the best RBF neural network model of right side vision detection system (the network test precision is 0.080mm)

Adopt three layers of RBF network, input number of nodes is 2, and the number of hidden nodes is 16, and the output node number is 3.Then each structure of adjusting parametric array is as follows: 1. hidden layer variance matrix σ _{16 * 1}=[σ ₀, σ ₁, Λ, σ ₁₅] ^T2. hidden layer central value matrix

C_{2 \times 16} = [\begin{matrix} C_{0,0} & C_{0,1} & Λ & C_{0,15} \\ C_{1,0} & C_{1, 1} & Λ & C_{1,15} \end{matrix}]

3. hidden layer---output layer weight matrix

W

_{3 \times 16} = [\begin{matrix} w_{0,0} & w_{0,1} & Λ & w_{0,15} \\ w_{1,0} & w_{1,1} & Λ & w_{1,15} \\ w_{2,0} & w_{2,1} & Λ & w_{2,15} \end{matrix}]

Each network training parameter, network architecture parameters initialization value and final trained values are as follows:

The initial value 0.1 of learning rate μ, stop value 0.003.The initial value 0.01 of momentum term α, stop value 0.Hidden layer-output layer connects the initialization interval [0.1,0.1] of weights W, the initialization interval [1,1] of hidden layer variances sigma, and the initialization interval of hidden layer central value is [1,1].To network training to 13500, each model structure parameter of the network that obtains is as follows: 1. hidden layer variance matrix σ _{16 * 1}=[2.525,0.665 ,-0.914,2.369 ,-1.679 ,-0.160,1.987 ,-0.057 ,-0.067,1.443 ,-1.125-1.395,1.415 ,-0.592,0.629 ,-0.327] ^T2. hidden layer central value Matrix C _{2 * 16}3. hidden layer---output layer weight matrix C _{3 * 16}

(2) structural parameters (the network test precision is 0.081mm) of the best RBF neural network model of left side vision detection system

Adopt three layers of RBF network, input number of nodes is 2, and the number of hidden nodes is 20, and the output node number is 3.Then each structure of adjusting parametric array is as follows: 1. hidden layer variance matrix σ _{20 * 1}=[σ ₀, σ ₁, Λ, σ ₁₉] ^T2. hidden layer central value matrix

C_{2 \times 20} = [\begin{matrix} C_{0,0} & C_{0,1} & Λ & C_{0,19} \\ C_{1,0} & C_{1,1} & Λ & C_{1,19} \end{matrix}]

3. hidden layer---output layer weight matrix

W

_{3 \times 20} = [\begin{matrix} w_{0,0} & w_{0,1} & Λ & w_{0,19} \\ w_{1,0} & w_{1,1} & Λ & w_{1,19} \\ w_{2,0} & w_{2,1} & Λ & w_{2,19} \end{matrix}]

The initial value 0.1 of learning rate μ, stop value 0.003.The initial value 0.01 of momentum term α, stop value 0.Hidden layer-output layer connects the initialization interval [0.1,0.1] of weights W, the initialization interval [1,1] of hidden layer variances sigma, and the initialization interval of hidden layer central value is [1,1].To network training to 15800, each model structure parameter of the network that obtains is as follows:

1. hidden layer variance matrix

σ _20×1＝[-0.493，-1.376，-2.769，-3.075，-0.370，-3.025，-0.069，-0.013，2.464，

3.272 ,-0.251,0.207,0.575,0.011 ,-0.001,0.368 ,-0.003,0.198,0.431,0.276] ^T2. hidden layer central value Matrix C _{2 * 20}3. hidden layer---output layer weight matrix

Claims

1, a kind of method for creating 3 D dual-vision model with structural light is characterized in that, radial basis function neural network (RBF neural network) is formed by three layers, input layer is provided with two nodes, output layer is provided with three nodes, and the action function in the hidden node adopts gaussian kernel function, for

u_{j} = \exp [- \frac{{(X - C_{j})}^{T} (X - C_{j})}{{2 σ}_{j}^{2}}]

J=1,2 .., N _hNetwork is output as the linear combination of hidden node output, promptly

y_{i} = Σ_{i = 1}^{N_{k}} ω_{ij} u_{j} - θ = W_{i}^{T} U

i＝1，2，3；

After network structure is determined, utilize the data of calibration point of gathering under the global coordinate system that double vision feel neural network is trained.

2, structural light three-dimensional double vision vision model modeling legislation according to claim 1 is characterized in that, the step of utilizing data of calibration point that double vision feel neural network is trained is as follows:

(1) start training, the initialization network architecture parameters comprises the central value of gaussian kernel function, variances sigma and hidden layer one output layer weights battle array W;

(3) from internal memory, from N training sample, take out one group in order, with input training sample (x wherein _1i, x _2i) be input to the input layer of network;

SUM = SUM + Σ_{j = 1}^{3} {(y_{eji} - y_{oji})}^{2};

The definition criterion function

&PartialD; J (E) / &PartialD; E |_{E = E (k)},

W (k + 1) = W (k) + μ (k) \times (- \frac{&PartialD; J}{&PartialD; W}) |_{W = W (k)} + α (k) \times [W (k) - W (k - 1)]

The adjustment type of hidden layer central value Matrix C is

C (k + 1) = C (k) + μ (k) \times (- \frac{&PartialD; J}{&PartialD; C}) |_{C = C (k)} + α (k) \times [C (k) - C (k - 1)]

The adjustment type of hidden layer variance matrix σ is

σ (k + 1) = σ (k) + μ (k) \times (- \frac{&PartialD; J}{&PartialD; σ}) |_{σ = σ (k)} + α (k) \times [σ (k) - σ (k - 1)]

E_{RMS} = \sqrt{SUM / N};

(8) judge test error, if test error can not meet the demands, the minimum training error of the network in general that then can reduce to expect

Turning back to (3) continues training or turns back to (1) to restart training; If reached requirement, then preserve network architecture parameters, finish training.