CN110334701B

CN110334701B - Data acquisition method based on deep learning and multi-vision in digital twin environment

Info

Publication number: CN110334701B
Application number: CN201910623996.4A
Authority: CN
Inventors: 李�浩; 刘根; 王昊琪; 文笑雨; 乔东平; 罗国富
Original assignee: Zhengzhou University of Light Industry
Current assignee: Zhengzhou University of Light Industry
Priority date: 2019-07-11
Filing date: 2019-07-11
Publication date: 2020-07-31
Anticipated expiration: 2039-07-11
Also published as: CN110334701A

Abstract

The invention discloses a data acquisition method based on deep learning and multi-vision in a digital twin environment, which comprises the following steps: s1, setting spherical mark points with distinction degree with the environment background; s2, obtaining the position coordinates and radius of the sphere center of the mark point in the video image; s3, constructing a deep learning model and training; and S4, attaching the mark point to the target object with positioning, and positioning the mark point in the space by using the step S3 to realize the positioning of the target object. The method can be used for acquiring the position and posture data of various target objects in a digital twin environment, and has strong universal applicability. By using the assistance of the mark points, the complexity of visual image analysis and processing is reduced, the identification and positioning process is simpler, and the efficiency is higher and more reliable. The method uses deep learning to position the mark points, so that the positioning error of the camera caused by image distortion is minimized, and the method is suitable for the number and layout conditions of various cameras.

Description

Data acquisition method based on deep learning and multi-vision in digital twin environment

Technical Field

The invention belongs to the technical field of digital acquisition, and particularly relates to a data acquisition method in a digital twin environment, in particular to a data acquisition method based on deep learning and multi-vision.

Background

The digital twin technology needs to highly simulate the physical equipment, and needs to master various state data of the physical equipment in real time, so that the real-time state of the simulation model is consistent with that of the physical equipment. Digital twinning techniques rely on sensing and control techniques and their integration of comprehensive techniques. The mechanical state information, the current state information, the thermodynamic state information and the action state information of the physical equipment all need to be acquired by means of sensing technology.

In the construction of the digital twin system, a 3D model of physical equipment needs to be constructed in equal proportion, and then a mechanical, electrical and energy high-level model of the physical equipment is constructed. In order to realize the real-time corresponding relation between the 3D model and the state of the physical equipment, the position, the posture and the action information of each part of the physical equipment need to be collected in real time. There are various technical means for collecting the position, posture and motion information of the physical equipment. The use of sensors is a commonly employed method of data acquisition.

Different sensor configuration schemes need to be designed for different data acquisition tasks. For example, an angle sensor is needed for collecting angle data of equipment, and a laser sensor and an inertial sensor are needed for collecting operation action information of the equipment. The disadvantages of this are that the original equipment needs to be modified, the sensor system needs to be configured specifically for the specific equipment, and the universality is not strong.

Vision is one of the most critical channels to obtain information. The appearance, the posture, the running state, the action information and the like of the equipment can be obtained by a visual method. One key issue in obtaining this information visually is the location and tracking of the target. The robot grabs the target object when the unmanned aerial vehicle confirms the gesture and the position of self or target object, all need solve the discernment and the space positioning problem of target object on the removal of raw and other materials on the production line.

The existing methods for identifying and positioning objects mainly include a machine vision method, an ultrasonic positioning method, and an electromagnetic wave identification and positioning method. Ultrasonic positioning is affected by uncertain factors such as temperature, humidity, air pressure and the like, and cannot be accurately performed. The method of identifying an object by means of electromagnetic waves is represented by a radio frequency tag technology, and can solve the problem of identifying an object by means of a tag, but cannot solve the problem of locating an object. The positioning of the target by utilizing the electromagnetic wave can be realized by imitating the radar principle, but the system is too complex and huge, and the wavelengths of the electromagnetic wave which can be used for the radar at present are all above centimeter level, so the positioning precision by the aid of the electromagnetic wave cannot be below 1 centimeter.

The visual method can achieve higher positioning accuracy. Visual methods are used to identify and locate the target, one is to extract the target features directly. The method has the advantages of complex image processing process and long operation time, and different algorithms need to be designed according to different targets.

The other is by means of manual identification. The identification capability of the algorithm to the identification in different backgrounds is improved by training the identification of the manual identification by adopting a deep learning method, but the performance of the algorithm is greatly reduced when the background environment is changed by adopting the method, and the training amount is large by adopting the deep learning method.

Binocular stereo vision based on the parallax principle is a method for acquiring three-dimensional geometric information of an object from a plurality of images. Two digital images of the target object are obtained from different angles through the two cameras, one camera can shoot the two digital images at different positions, and three-dimensional geometric information of the target object is restored through calculating parallax of the two images. According to the method, information such as focal length, imaging size, camera center point position, distance and angle of two cameras and the like of the cameras are measured firstly, due to the defect of lens imaging, a certain degree of distortion can occur after an object is imaged, and the calculation method is required to be corrected when the distortion is processed.

Disclosure of Invention

According to the prior art, the technical problems to be solved by the invention are that the data acquisition method based on deep learning and multi-view vision in a digital twin environment is provided, wherein the data acquisition method based on deep learning and multi-view vision is complex in installation and does not have universality, the algorithm for directly identifying and positioning a target object by using a machine vision method is complex, the problem of instability of identification when the background of a deep learning algorithm is changed is adopted when the target object is indirectly identified and positioned by means of a mark point, and the problem of positioning error caused by complexity in calibration and image distortion when the target is positioned by using binocular or multi-view vision.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows:

a data acquisition method based on deep learning and multi-vision in a digital twin environment comprises the following steps:

and S1, setting spherical mark points with larger distinction degree with the environment background.

The spherical mark points have specific colors with larger distinction degree with the environment background.

And S2, obtaining the position coordinates and the radius of the spherical center of the mark point in the video image.

S2.1, arranging at least two cameras in the environment, wherein the cameras are distributed at different positions of the environment, and visual bodies of the cameras are crossed.

S2.2, obtaining the video image f of each camera_i(x, y), wherein i is the ith camera, x is the horizontal coordinate of the video image pixel, and y is the vertical coordinate of the video image pixel;

s2.3, obtaining a video image f by using an edge detection method_i(x, y) edge image F_i(x，y)；

The image edge has a large difference with surrounding pixels, and the maximum value of the first derivative is obtained by calculating the derivative number of the image data in the x direction and the y direction, namely the image edge is obtained, namely the zero point of the second derivative.

S2.3.1, using L aplace operator to the video image f_i(x, y) derivation to obtain a first derivative f of the video image_i′(x，y)：

f_i′(x，y)＝-4f_i(x，y)+f_i(x-1，y)+f_i(x+1，y)+f_i(x，y-1)+f_i(x，y-1)；

S2.3.2, extracting the first derivative f_i' (x, y) to obtain L aplace operator template;

s2.3.3, combining L aplace operator template with video image f_i(x, y) obtaining each edge image F_iPixel values of (x, y) to obtain respective edge images F_i(x，y)；

Overlay L aplace operator template on image f_i(x, y) and image f_iMultiplying (x, y) pixel by L aplace operator template corresponding position value, and adding to obtain L aplace operator template central position corresponding pixel value, which is the edge image F_iA pixel value of (x, y);

s2.4, finding an edge image F by adopting a Hough circle finding algorithm_iAll circles in (x, y);

s2.4.1, given the general equation for a circle:

(x-a)²+(y-b)²＝r²；

wherein (a, b) are coordinates of the center of the circle, and r is the radius of the circle;

s2.4.2, edge image F in x-y pixel coordinate system_iEach pixel point of (x, y) corresponds to a circle corresponding to the pixel point in the a-b coordinate system, and the equation of the circle in the a-b coordinate system after the correspondence is (a-x)²+ (b-y)²＝r²(x, y) is a coordinate of the center of a circle, the radius r is set to a preset value, all circles in an a-b coordinate system intersect at one point, and the point is a possible position of the center of the circle;

s2.4.3, adjusting the value of the radius r, repeating the step S2.4.2 until the circle center positions of all the radius circles are found, and further obtaining the edge image F_iAll circles in (x, y);

s2.5, for each obtained circular area in the video image f_i(x, y) carrying out histogram statistics to find a circle closest to the color area of the mark point and obtain the circle center and radius data of the circle;

s2.5.1, converting the video image f_i(x, y) converting into a grayscale image;

s2.5.2, dividing the gray image into three sections;

the sections of the invention are respectively as follows: [0,85), [85,170), [170,255 ];

s2.5.3, scanning and judging the frequency of each circular area pixel value falling into three blocks;

s2.5.4, obtaining the position coordinates and radius of the sphere center of the mark point in the video image;

comparing the pixel value frequency distribution of the circular area with the color frequency distribution of the mark points, eliminating circles with large differences, and recording the circle center and radius data of the circles with high frequency distribution similarity, wherein the circle center and radius data of the circles with high frequency distribution similarity are the circle center and radius of the mark points;

s3, constructing a deep learning model and training;

based on a deep learning algorithm framework, a two-dimensional coordinate of a sphere center of a mark point and a radius of the mark point in an image acquired by each camera are used as input data, a three-dimensional coordinate of the mark point in a space is used as output data, and a deep learning algorithm program is trained to have accurate positioning capacity on the mark point.

The method specifically comprises the following steps: s3.1, obtaining sample data;

the sample data comprises input data and output data, wherein the input data comprises a two-dimensional coordinate of a mark point and a radius of the mark point; the output data is a mark point space coordinate;

s3.1.1, traversing each position of the view volume using the robot arm to hold the mark point;

s3.1.2, obtaining the center coordinates and radius of the mark point according to the step S2 and using the center coordinates and radius as input data, wherein the center coordinates of the mark point are the two-dimensional coordinates of the mark point;

s3.1.3, combining the center coordinates and the radius of the mark points to obtain the space coordinates of the mark points, wherein the space coordinates of the mark points are output data;

s3.2, constructing a deep learning model;

s3.2.1, designing a neural network structure with one input layer, one output layer and two hidden layers; the number of nodes of the input layer is equal to the number of the cameras multiplied by the number of input parameters, the number of the input parameters is 3, and the circle center coordinates (x, y) and the radius r are adopted.

The number of output layer nodes is the number of output parameters, and is 3, namely, the spatial coordinates (X, Y and Z) of the mark points;

the number of hidden layer nodes is set to be a fixed value, and is 50 in the invention.

S3.2.2, optimizing the deep learning model;

introducing a dropout mechanism into a deep learning model, and deleting a part of hidden layer nodes with a certain probability P in training;

s3.2.2.1, obtaining an activation function of the neural network structure:

s3.2.2.2, each node adds an offset value to the weighted sum of the input data:

s3.2.2.3, combining steps S3.2.2.1 and S3.2.2.2, resulting in a node output:

where j represents the jth layer of the neural network structure, m_jAs mask parameters, in accordance with Bernoulli probability distribution, m_jDepending on the value of the probability P.

S3.2.2.4, the hidden layer node is deleted.

When mask parameter m_jWhen the node output is 0, and the current node is deleted;

s3.2.2.5, obtaining an optimized deep learning model, wherein the final output of the optimized deep learning model is as follows:

spatial three-dimensional coordinates (X, Y, Z);

wherein X is G₁(W，B，M)，Y＝G₂(W，B，M)，Z＝G₃(W，B, M); w is weight W vector, B is bias value B vector, and M is mask value M vector.

And S3.3, training the optimized deep learning model.

S3.3.1, dividing the sample data obtained in step S3.1 into training data and test data;

training data is 80% of sample data, and is recorded as:

the test data is 20% of the sample data and is recorded as:

s3.3.2, given the training error calculation formula:

s3.3.3, given the gradient calculation formula:

s3.3.4, substituting the training data into the optimized deep learning model for training and iterating to obtain W and B.

S3.3.5, the test data is substituted into the deep learning model for verification.

And S4, attaching the mark points to the target object with positioning, and positioning the mark points in the space by using the depth optimization model constructed in the step S3, thereby realizing the positioning of the target object.

The invention has the beneficial effects that: the invention applies the machine vision method to the digital twinning technology, and acquires real-time data such as the position, the action, the posture and the like of equipment by the vision method. The machine vision method is improved, manual identification is introduced, so that the identification and positioning process of a target object is simplified, a conventional image processing algorithm is used, a high identification rate of the mark points is achieved through combined application, data obtained through calculation of the conventional image processing algorithm is relatively simplified, a deep learning algorithm framework is provided, the training process is quicker, the training effect is better, the deep learning algorithm framework is used for solving the problem of positioning the mark points, all points in space are traversed in the training process, the problem of positioning errors caused by image distortion can be solved, and meanwhile, the training learning mode can be adopted, the positions and the number of the cameras can be randomly arranged, and only the scene body is required to be coincided for training to be successful.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of the system of the present invention.

FIG. 2 is a schematic diagram of a deep learning neural network of the present invention.

Fig. 3 is a schematic diagram of the multi-view visual positioning structure of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without any inventive step, shall fall within the scope of the present invention.

A data acquisition method based on deep learning and multi-vision in a digital twin environment is shown in figure 1, and comprises the following steps:

s1, setting a spherical mark point with a greater distinction degree from the environmental background, as shown in fig. 3.

S2.1, arranging at least two cameras in the environment, wherein the number of the cameras is 4 as shown in figure 3, the cameras are distributed at different positions of the environment, visual bodies of the cameras are crossed, a rectangle in front of the cameras in figure 3 is an image range of the cameras, and the mark point can be imaged at one point in the image range of the cameras.

S2.2, obtaining the video image f of each camera_i(x, y), i is the ith camera, x is the horizontal coordinate of the video image pixel, and y is the vertical coordinate of the video image pixel.

S2.3, obtaining a video image f by using an edge detection method_i(x, y) edge image F_i(x，y)。

S2.3.2, extracting the first derivative f_i' (x, y) yields L aplace operator templates.

S2.3.3, combining L aplace operator template with video image f_i(x, y) obtaining each edge image F_iPixel values of (x, y) to obtain respective edge images F_i(x，y)。

s2.4.1, given the general equation for a circle:

(x-a)²+(y-b)²＝r²；

s2.4.3, adjusting the value of radius r and repeating step S2.4.2 until the circle center positions of all the radius circles are found, and then obtainingEdge image F_iAll circles in (x, y);

s2.5.1, converting the video image f_i(x, y) converting into a grayscale image;

s2.5.2, dividing the gray image into three sections;

s3, constructing a deep learning model and training;

s3.1.1, traversing each position of the view volume using the robot arm to hold the landmark points as shown in FIG. 3;

s3.1.3, the robot combines the circle center coordinates and the radius of the mark point to obtain the space coordinates of the mark point, and the space coordinates of the mark point are output data.

When the mechanical arm passes through a space position point, the data of the circle center and the radius of the mark point obtained by identifying the mark point in the image shot by the camera is used as the input of the neural network, and the coordinates of the space point reached by the mechanical arm are used as the output of the neural network.

S3.2, constructing a deep learning model;

s3.2.1, designing a neural network structure with one input layer, one output layer, two hidden layers, as shown in FIG. 2; the number of nodes of the input layer is equal to the number of cameras multiplied by the number of input parameters, the input parameters are 3, the circle center coordinates (x, y) and the radius r, the input data are 4 groups, and each group comprises the circle center coordinates (x) shown in figure 2₁，y₁) And radius r₁. The number of output layer nodes is the number of output parameters, and is 3, namely, the spatial coordinates (X, Y and Z) of the mark points;

S3.2.2, optimizing the deep learning model;

a dropout mechanism is introduced into a deep learning model, a part of hidden layer nodes are deleted with a certain probability P in training, and as shown in FIG. 2, a node marked with × is a node deleted with a certain probability in an algorithm.

S3.2.2.1, obtaining an activation function of the neural network structure:

s3.2.2.2, each node adds an offset value to the weighted sum of the input data:

s3.2.2.3, combining steps S3.2.2.1 and S3.2.2.2, resulting in a node output:

S3.2.2.4, the hidden layer node is deleted.

spatial three-dimensional coordinates (X, Y, Z);

wherein X is G₁(W，B，M)，Y＝G₂(W，B，M)，Z＝G₃(W, B, M); w is weight W vector, B is bias value B vector, and M is mask value M vector.

And S3.3, training the optimized deep learning model.

training data is 80% of sample data, and is recorded as:

the test data is 20% of the sample data and is recorded as:

s3.3.2, given the training error calculation formula:

s3.3.3, given the gradient calculation formula:

As shown in fig. 3, after the training process is completed, the mechanical arm drives the mark point to move, the camera captures the two-dimensional coordinates and the radius information of the mark point, and then the three-dimensional spatial coordinates of the mark point are obtained through the trained neural network, so as to indirectly obtain the spatial position of the mechanical arm.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.

Claims

1. A data acquisition method based on deep learning and multi-vision in a digital twin environment is characterized by comprising the following steps: s1, setting spherical mark points with distinction degree with the environment background;

the spherical mark points have specific colors with larger distinction degree with the environment background;

s2, obtaining the position coordinates and radius of the sphere center of the mark point in the video image;

s3, constructing a deep learning model and training;

s4, attaching the mark points to the target object with positioning, and positioning the mark points in the space by using the depth optimization model constructed in the step S3, so as to realize the positioning of the target object;

in step S3.2, the specific steps are:

s3.2.1, designing a neural network structure with one input layer, one output layer and two hidden layers; the number of nodes of the input layer is equal to the number of the cameras multiplied by the number of the input parameters, and the number of nodes of the output layer is the number of the output parameters; the number of hidden layer nodes is set to be a fixed value;

s3.2.2, optimizing the deep learning model;

s3.2.2.1, obtaining an activation function of the neural network structure:

s3.2.2.2, each node adds an offset value to the weighted sum of the input data:

s3.2.2.3, combining steps S3.2.2.1 and S3.2.2.2, resulting in a node output:

where j represents the jth layer of the neural network structure, m_jAs mask parameters, in accordance with Bernoulli probability distribution, m_jVaries according to the value of the probability P;

s3.2.2.4, deleting hidden layer nodes;

spatial three-dimensional coordinates (X, Y, Z);

2. The data acquisition method based on deep learning and multi-vision in the digital twin environment according to claim 1, wherein in step S2, the specific steps are as follows:

s2.1, arranging at least two cameras in an environment, wherein the cameras are distributed at different positions of the environment, so that visual bodies of the cameras are crossed;

s2.2, obtaining the video image f of each camera_k(x, y), wherein k is the kth camera, x is the horizontal coordinate of the video image pixel, and y is the vertical coordinate of the video image pixel;

s2.3, obtaining a video image f_k(x, y) edge image F_k(x，y)；

S2.4, finding an edge image F by adopting a Hough circle finding algorithm_kAll circles in (x, y);

s2.5, for each obtained circular area in the video image f_kAnd (x, y) performing histogram statistics to find a circle closest to the color area of the mark point, and obtaining the circle center and radius data of the circle.

3. The data acquisition method based on deep learning and multi-vision in the digital twin environment as claimed in claim 2, wherein in step S2.3, the specific steps are as follows:

s2.3.1, using L aplace operator to the video image f_k(x, y) deriving to obtain the first derivative f 'of the video image'_k(x，y)：

f′_k(x，y)＝-4f_k(x，y)+f_k(x-1，y)+f_k(x+1，y)+f_k(x，y-1)+f_k(x，y-1)；

S2.3.2, extracting the first derivative f'_kThe coefficient of (x, y) obtains L aplace operator template;

s2.3.3, combining L aplace operator template with video image f_k(x, y) obtaining each edge image F_kPixel values of (x, y) to obtain respective edge images F_k(x，y)；

Overlay L aplace operator template on image f_k(x, y) and image f_kMultiplying (x, y) pixel by L aplace operator template corresponding position value, then adding to obtain L aplace operator template central position corresponding pixel value, the obtained pixel value is edge image F_kPixel value of (x, y).

4. The data acquisition method based on deep learning and multi-vision in the digital twin environment as claimed in claim 2, wherein in step S2.4, the specific steps are as follows:

s2.4.1, given the general equation for a circle:

(x-a)²+(y-b)²＝r²；

s2.4.2, edge image F in x-y pixel coordinate system_kEach pixel point of (x, y) corresponds to a circle corresponding to the pixel point in the a-b coordinate system, and after the circle corresponds to the pixel pointThe equation for a circle in the a-b coordinate system is (a-x)²+(b-y)²＝r²(x, y) is a circle center coordinate, and the radius r is set to be a preset value;

s2.4.3, adjusting the value of the radius r, repeating the step S2.4.2 until the circle center positions of all the radius circles are found, and further obtaining an edge image F_kAll circles in (x, y).

5. The data acquisition method based on deep learning and multi-vision in the digital twin environment as claimed in claim 2, wherein in step S2.5, the specific steps are as follows:

s2.5.1, converting the video image f_k(x, y) converting into a grayscale image;

s2.5.2, dividing the gray image into three sections;

comparing the pixel value frequency distribution of the circular area with the color frequency distribution of the mark points, eliminating circles with large differences, and recording the circle center and radius data of the circles with high frequency distribution similarity, wherein the circle center and radius data of the circles with high frequency distribution similarity are the circle center and radius of the mark points.

6. The data acquisition method based on deep learning and multi-vision in the digital twin environment according to claim 1, wherein in step S3, the specific steps are as follows:

s3.1, obtaining sample data;

s3.2, constructing a deep learning model;

and S3.3, training the optimized deep learning model.

7. The data acquisition method based on deep learning and multi-vision in the digital twin environment as claimed in claim 6, wherein in step S3.1, the specific steps are as follows:

8. The data acquisition method based on deep learning and multi-vision in the digital twin environment as claimed in claim 6, wherein in step S3.3, the specific steps are as follows:

training data is 80% of sample data, and is recorded as:

and (X)^train，Y^train，Z^train)；

The test data is 20% of the sample data and is recorded as:

and (X)^test，Y^test，Z^test)；

S3.3.2, given the training error calculation formula:

s3.3.3, given the gradient calculation formula:

s3.3.4, substituting the training data into the optimized deep learning model for training and iterating to obtain W and B;