CN110334701B - Data acquisition method based on deep learning and multi-vision in digital twin environment - Google Patents

Data acquisition method based on deep learning and multi-vision in digital twin environment Download PDF

Info

Publication number
CN110334701B
CN110334701B CN201910623996.4A CN201910623996A CN110334701B CN 110334701 B CN110334701 B CN 110334701B CN 201910623996 A CN201910623996 A CN 201910623996A CN 110334701 B CN110334701 B CN 110334701B
Authority
CN
China
Prior art keywords
deep learning
data
radius
circle
mark point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910623996.4A
Other languages
Chinese (zh)
Other versions
CN110334701A (en
Inventor
李�浩
刘根
王昊琪
文笑雨
乔东平
罗国富
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou University of Light Industry
Original Assignee
Zhengzhou University of Light Industry
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou University of Light Industry filed Critical Zhengzhou University of Light Industry
Priority to CN201910623996.4A priority Critical patent/CN110334701B/en
Publication of CN110334701A publication Critical patent/CN110334701A/en
Application granted granted Critical
Publication of CN110334701B publication Critical patent/CN110334701B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a data acquisition method based on deep learning and multi-vision in a digital twin environment, which comprises the following steps: s1, setting spherical mark points with distinction degree with the environment background; s2, obtaining the position coordinates and radius of the sphere center of the mark point in the video image; s3, constructing a deep learning model and training; and S4, attaching the mark point to the target object with positioning, and positioning the mark point in the space by using the step S3 to realize the positioning of the target object. The method can be used for acquiring the position and posture data of various target objects in a digital twin environment, and has strong universal applicability. By using the assistance of the mark points, the complexity of visual image analysis and processing is reduced, the identification and positioning process is simpler, and the efficiency is higher and more reliable. The method uses deep learning to position the mark points, so that the positioning error of the camera caused by image distortion is minimized, and the method is suitable for the number and layout conditions of various cameras.

Description

Data acquisition method based on deep learning and multi-vision in digital twin environment
Technical Field
The invention belongs to the technical field of digital acquisition, and particularly relates to a data acquisition method in a digital twin environment, in particular to a data acquisition method based on deep learning and multi-vision.
Background
The digital twin technology needs to highly simulate the physical equipment, and needs to master various state data of the physical equipment in real time, so that the real-time state of the simulation model is consistent with that of the physical equipment. Digital twinning techniques rely on sensing and control techniques and their integration of comprehensive techniques. The mechanical state information, the current state information, the thermodynamic state information and the action state information of the physical equipment all need to be acquired by means of sensing technology.
In the construction of the digital twin system, a 3D model of physical equipment needs to be constructed in equal proportion, and then a mechanical, electrical and energy high-level model of the physical equipment is constructed. In order to realize the real-time corresponding relation between the 3D model and the state of the physical equipment, the position, the posture and the action information of each part of the physical equipment need to be collected in real time. There are various technical means for collecting the position, posture and motion information of the physical equipment. The use of sensors is a commonly employed method of data acquisition.
Different sensor configuration schemes need to be designed for different data acquisition tasks. For example, an angle sensor is needed for collecting angle data of equipment, and a laser sensor and an inertial sensor are needed for collecting operation action information of the equipment. The disadvantages of this are that the original equipment needs to be modified, the sensor system needs to be configured specifically for the specific equipment, and the universality is not strong.
Vision is one of the most critical channels to obtain information. The appearance, the posture, the running state, the action information and the like of the equipment can be obtained by a visual method. One key issue in obtaining this information visually is the location and tracking of the target. The robot grabs the target object when the unmanned aerial vehicle confirms the gesture and the position of self or target object, all need solve the discernment and the space positioning problem of target object on the removal of raw and other materials on the production line.
The existing methods for identifying and positioning objects mainly include a machine vision method, an ultrasonic positioning method, and an electromagnetic wave identification and positioning method. Ultrasonic positioning is affected by uncertain factors such as temperature, humidity, air pressure and the like, and cannot be accurately performed. The method of identifying an object by means of electromagnetic waves is represented by a radio frequency tag technology, and can solve the problem of identifying an object by means of a tag, but cannot solve the problem of locating an object. The positioning of the target by utilizing the electromagnetic wave can be realized by imitating the radar principle, but the system is too complex and huge, and the wavelengths of the electromagnetic wave which can be used for the radar at present are all above centimeter level, so the positioning precision by the aid of the electromagnetic wave cannot be below 1 centimeter.
The visual method can achieve higher positioning accuracy. Visual methods are used to identify and locate the target, one is to extract the target features directly. The method has the advantages of complex image processing process and long operation time, and different algorithms need to be designed according to different targets.
The other is by means of manual identification. The identification capability of the algorithm to the identification in different backgrounds is improved by training the identification of the manual identification by adopting a deep learning method, but the performance of the algorithm is greatly reduced when the background environment is changed by adopting the method, and the training amount is large by adopting the deep learning method.
Binocular stereo vision based on the parallax principle is a method for acquiring three-dimensional geometric information of an object from a plurality of images. Two digital images of the target object are obtained from different angles through the two cameras, one camera can shoot the two digital images at different positions, and three-dimensional geometric information of the target object is restored through calculating parallax of the two images. According to the method, information such as focal length, imaging size, camera center point position, distance and angle of two cameras and the like of the cameras are measured firstly, due to the defect of lens imaging, a certain degree of distortion can occur after an object is imaged, and the calculation method is required to be corrected when the distortion is processed.
Disclosure of Invention
According to the prior art, the technical problems to be solved by the invention are that the data acquisition method based on deep learning and multi-view vision in a digital twin environment is provided, wherein the data acquisition method based on deep learning and multi-view vision is complex in installation and does not have universality, the algorithm for directly identifying and positioning a target object by using a machine vision method is complex, the problem of instability of identification when the background of a deep learning algorithm is changed is adopted when the target object is indirectly identified and positioned by means of a mark point, and the problem of positioning error caused by complexity in calibration and image distortion when the target is positioned by using binocular or multi-view vision.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows:
a data acquisition method based on deep learning and multi-vision in a digital twin environment comprises the following steps:
and S1, setting spherical mark points with larger distinction degree with the environment background.
The spherical mark points have specific colors with larger distinction degree with the environment background.
And S2, obtaining the position coordinates and the radius of the spherical center of the mark point in the video image.
S2.1, arranging at least two cameras in the environment, wherein the cameras are distributed at different positions of the environment, and visual bodies of the cameras are crossed.
S2.2, obtaining the video image f of each camerai(x, y), wherein i is the ith camera, x is the horizontal coordinate of the video image pixel, and y is the vertical coordinate of the video image pixel;
s2.3, obtaining a video image f by using an edge detection methodi(x, y) edge image Fi(x,y);
The image edge has a large difference with surrounding pixels, and the maximum value of the first derivative is obtained by calculating the derivative number of the image data in the x direction and the y direction, namely the image edge is obtained, namely the zero point of the second derivative.
S2.3.1, using L aplace operator to the video image fi(x, y) derivation to obtain a first derivative f of the video imagei′(x,y):
fi′(x,y)=-4fi(x,y)+fi(x-1,y)+fi(x+1,y)+fi(x,y-1)+fi(x,y-1);
S2.3.2, extracting the first derivative fi' (x, y) to obtain L aplace operator template;
Figure BDA0002126461040000031
s2.3.3, combining L aplace operator template with video image fi(x, y) obtaining each edge image FiPixel values of (x, y) to obtain respective edge images Fi(x,y);
Overlay L aplace operator template on image fi(x, y) and image fiMultiplying (x, y) pixel by L aplace operator template corresponding position value, and adding to obtain L aplace operator template central position corresponding pixel value, which is the edge image FiA pixel value of (x, y);
s2.4, finding an edge image F by adopting a Hough circle finding algorithmiAll circles in (x, y);
s2.4.1, given the general equation for a circle:
(x-a)2+(y-b)2=r2
wherein (a, b) are coordinates of the center of the circle, and r is the radius of the circle;
s2.4.2, edge image F in x-y pixel coordinate systemiEach pixel point of (x, y) corresponds to a circle corresponding to the pixel point in the a-b coordinate system, and the equation of the circle in the a-b coordinate system after the correspondence is (a-x)2+ (b-y)2=r2(x, y) is a coordinate of the center of a circle, the radius r is set to a preset value, all circles in an a-b coordinate system intersect at one point, and the point is a possible position of the center of the circle;
s2.4.3, adjusting the value of the radius r, repeating the step S2.4.2 until the circle center positions of all the radius circles are found, and further obtaining the edge image FiAll circles in (x, y);
s2.5, for each obtained circular area in the video image fi(x, y) carrying out histogram statistics to find a circle closest to the color area of the mark point and obtain the circle center and radius data of the circle;
s2.5.1, converting the video image fi(x, y) converting into a grayscale image;
s2.5.2, dividing the gray image into three sections;
the sections of the invention are respectively as follows: [0,85), [85,170), [170,255 ];
s2.5.3, scanning and judging the frequency of each circular area pixel value falling into three blocks;
s2.5.4, obtaining the position coordinates and radius of the sphere center of the mark point in the video image;
comparing the pixel value frequency distribution of the circular area with the color frequency distribution of the mark points, eliminating circles with large differences, and recording the circle center and radius data of the circles with high frequency distribution similarity, wherein the circle center and radius data of the circles with high frequency distribution similarity are the circle center and radius of the mark points;
s3, constructing a deep learning model and training;
based on a deep learning algorithm framework, a two-dimensional coordinate of a sphere center of a mark point and a radius of the mark point in an image acquired by each camera are used as input data, a three-dimensional coordinate of the mark point in a space is used as output data, and a deep learning algorithm program is trained to have accurate positioning capacity on the mark point.
The method specifically comprises the following steps: s3.1, obtaining sample data;
the sample data comprises input data and output data, wherein the input data comprises a two-dimensional coordinate of a mark point and a radius of the mark point; the output data is a mark point space coordinate;
s3.1.1, traversing each position of the view volume using the robot arm to hold the mark point;
s3.1.2, obtaining the center coordinates and radius of the mark point according to the step S2 and using the center coordinates and radius as input data, wherein the center coordinates of the mark point are the two-dimensional coordinates of the mark point;
s3.1.3, combining the center coordinates and the radius of the mark points to obtain the space coordinates of the mark points, wherein the space coordinates of the mark points are output data;
s3.2, constructing a deep learning model;
s3.2.1, designing a neural network structure with one input layer, one output layer and two hidden layers; the number of nodes of the input layer is equal to the number of the cameras multiplied by the number of input parameters, the number of the input parameters is 3, and the circle center coordinates (x, y) and the radius r are adopted.
The number of output layer nodes is the number of output parameters, and is 3, namely, the spatial coordinates (X, Y and Z) of the mark points;
the number of hidden layer nodes is set to be a fixed value, and is 50 in the invention.
S3.2.2, optimizing the deep learning model;
introducing a dropout mechanism into a deep learning model, and deleting a part of hidden layer nodes with a certain probability P in training;
s3.2.2.1, obtaining an activation function of the neural network structure:
Figure BDA0002126461040000061
s3.2.2.2, each node adds an offset value to the weighted sum of the input data:
Figure BDA0002126461040000062
s3.2.2.3, combining steps S3.2.2.1 and S3.2.2.2, resulting in a node output:
Figure BDA0002126461040000063
where j represents the jth layer of the neural network structure, mjAs mask parameters, in accordance with Bernoulli probability distribution, mjDepending on the value of the probability P.
S3.2.2.4, the hidden layer node is deleted.
When mask parameter mjWhen the node output is 0, and the current node is deleted;
s3.2.2.5, obtaining an optimized deep learning model, wherein the final output of the optimized deep learning model is as follows:
spatial three-dimensional coordinates (X, Y, Z);
wherein X is G1(W,B,M),Y=G2(W,B,M),Z=G3(W,B, M); w is weight W vector, B is bias value B vector, and M is mask value M vector.
And S3.3, training the optimized deep learning model.
S3.3.1, dividing the sample data obtained in step S3.1 into training data and test data;
training data is 80% of sample data, and is recorded as:
Figure BDA0002126461040000064
the test data is 20% of the sample data and is recorded as:
Figure BDA0002126461040000065
Figure BDA0002126461040000071
s3.3.2, given the training error calculation formula:
Figure BDA0002126461040000072
Figure BDA0002126461040000073
Figure BDA0002126461040000074
s3.3.3, given the gradient calculation formula:
Figure BDA0002126461040000075
Figure BDA0002126461040000076
Figure BDA0002126461040000077
Figure BDA0002126461040000078
s3.3.4, substituting the training data into the optimized deep learning model for training and iterating to obtain W and B.
S3.3.5, the test data is substituted into the deep learning model for verification.
And S4, attaching the mark points to the target object with positioning, and positioning the mark points in the space by using the depth optimization model constructed in the step S3, thereby realizing the positioning of the target object.
The invention has the beneficial effects that: the invention applies the machine vision method to the digital twinning technology, and acquires real-time data such as the position, the action, the posture and the like of equipment by the vision method. The machine vision method is improved, manual identification is introduced, so that the identification and positioning process of a target object is simplified, a conventional image processing algorithm is used, a high identification rate of the mark points is achieved through combined application, data obtained through calculation of the conventional image processing algorithm is relatively simplified, a deep learning algorithm framework is provided, the training process is quicker, the training effect is better, the deep learning algorithm framework is used for solving the problem of positioning the mark points, all points in space are traversed in the training process, the problem of positioning errors caused by image distortion can be solved, and meanwhile, the training learning mode can be adopted, the positions and the number of the cameras can be randomly arranged, and only the scene body is required to be coincided for training to be successful.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of the system of the present invention.
FIG. 2 is a schematic diagram of a deep learning neural network of the present invention.
Fig. 3 is a schematic diagram of the multi-view visual positioning structure of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without any inventive step, shall fall within the scope of the present invention.
A data acquisition method based on deep learning and multi-vision in a digital twin environment is shown in figure 1, and comprises the following steps:
s1, setting a spherical mark point with a greater distinction degree from the environmental background, as shown in fig. 3.
The spherical mark points have specific colors with larger distinction degree with the environment background.
And S2, obtaining the position coordinates and the radius of the spherical center of the mark point in the video image.
S2.1, arranging at least two cameras in the environment, wherein the number of the cameras is 4 as shown in figure 3, the cameras are distributed at different positions of the environment, visual bodies of the cameras are crossed, a rectangle in front of the cameras in figure 3 is an image range of the cameras, and the mark point can be imaged at one point in the image range of the cameras.
S2.2, obtaining the video image f of each camerai(x, y), i is the ith camera, x is the horizontal coordinate of the video image pixel, and y is the vertical coordinate of the video image pixel.
S2.3, obtaining a video image f by using an edge detection methodi(x, y) edge image Fi(x,y)。
The image edge has a large difference with surrounding pixels, and the maximum value of the first derivative is obtained by calculating the derivative number of the image data in the x direction and the y direction, namely the image edge is obtained, namely the zero point of the second derivative.
S2.3.1, using L aplace operator to the video image fi(x, y) derivation to obtain a first derivative f of the video imagei′(x,y):
fi′(x,y)=-4fi(x,y)+fi(x-1,y)+fi(x+1,y)+fi(x,y-1)+fi(x,y-1);
S2.3.2, extracting the first derivative fi' (x, y) yields L aplace operator templates.
Figure BDA0002126461040000091
S2.3.3, combining L aplace operator template with video image fi(x, y) obtaining each edge image FiPixel values of (x, y) to obtain respective edge images Fi(x,y)。
Overlay L aplace operator template on image fi(x, y) and image fiMultiplying (x, y) pixel by L aplace operator template corresponding position value, and adding to obtain L aplace operator template central position corresponding pixel value, which is the edge image FiA pixel value of (x, y);
s2.4, finding an edge image F by adopting a Hough circle finding algorithmiAll circles in (x, y);
s2.4.1, given the general equation for a circle:
(x-a)2+(y-b)2=r2
wherein (a, b) are coordinates of the center of the circle, and r is the radius of the circle;
s2.4.2, edge image F in x-y pixel coordinate systemiEach pixel point of (x, y) corresponds to a circle corresponding to the pixel point in the a-b coordinate system, and the equation of the circle in the a-b coordinate system after the correspondence is (a-x)2+ (b-y)2=r2(x, y) is a coordinate of the center of a circle, the radius r is set to a preset value, all circles in an a-b coordinate system intersect at one point, and the point is a possible position of the center of the circle;
s2.4.3, adjusting the value of radius r and repeating step S2.4.2 until the circle center positions of all the radius circles are found, and then obtainingEdge image FiAll circles in (x, y);
s2.5, for each obtained circular area in the video image fi(x, y) carrying out histogram statistics to find a circle closest to the color area of the mark point and obtain the circle center and radius data of the circle;
s2.5.1, converting the video image fi(x, y) converting into a grayscale image;
s2.5.2, dividing the gray image into three sections;
the sections of the invention are respectively as follows: [0,85), [85,170), [170,255 ];
s2.5.3, scanning and judging the frequency of each circular area pixel value falling into three blocks;
s2.5.4, obtaining the position coordinates and radius of the sphere center of the mark point in the video image;
comparing the pixel value frequency distribution of the circular area with the color frequency distribution of the mark points, eliminating circles with large differences, and recording the circle center and radius data of the circles with high frequency distribution similarity, wherein the circle center and radius data of the circles with high frequency distribution similarity are the circle center and radius of the mark points;
s3, constructing a deep learning model and training;
based on a deep learning algorithm framework, a two-dimensional coordinate of a sphere center of a mark point and a radius of the mark point in an image acquired by each camera are used as input data, a three-dimensional coordinate of the mark point in a space is used as output data, and a deep learning algorithm program is trained to have accurate positioning capacity on the mark point.
The method specifically comprises the following steps: s3.1, obtaining sample data;
the sample data comprises input data and output data, wherein the input data comprises a two-dimensional coordinate of a mark point and a radius of the mark point; the output data is a mark point space coordinate;
s3.1.1, traversing each position of the view volume using the robot arm to hold the landmark points as shown in FIG. 3;
s3.1.2, obtaining the center coordinates and radius of the mark point according to the step S2 and using the center coordinates and radius as input data, wherein the center coordinates of the mark point are the two-dimensional coordinates of the mark point;
s3.1.3, the robot combines the circle center coordinates and the radius of the mark point to obtain the space coordinates of the mark point, and the space coordinates of the mark point are output data.
When the mechanical arm passes through a space position point, the data of the circle center and the radius of the mark point obtained by identifying the mark point in the image shot by the camera is used as the input of the neural network, and the coordinates of the space point reached by the mechanical arm are used as the output of the neural network.
S3.2, constructing a deep learning model;
s3.2.1, designing a neural network structure with one input layer, one output layer, two hidden layers, as shown in FIG. 2; the number of nodes of the input layer is equal to the number of cameras multiplied by the number of input parameters, the input parameters are 3, the circle center coordinates (x, y) and the radius r, the input data are 4 groups, and each group comprises the circle center coordinates (x) shown in figure 21,y1) And radius r1. The number of output layer nodes is the number of output parameters, and is 3, namely, the spatial coordinates (X, Y and Z) of the mark points;
the number of hidden layer nodes is set to be a fixed value, and is 50 in the invention.
S3.2.2, optimizing the deep learning model;
a dropout mechanism is introduced into a deep learning model, a part of hidden layer nodes are deleted with a certain probability P in training, and as shown in FIG. 2, a node marked with × is a node deleted with a certain probability in an algorithm.
S3.2.2.1, obtaining an activation function of the neural network structure:
Figure BDA0002126461040000111
s3.2.2.2, each node adds an offset value to the weighted sum of the input data:
Figure BDA0002126461040000112
s3.2.2.3, combining steps S3.2.2.1 and S3.2.2.2, resulting in a node output:
Figure BDA0002126461040000121
where j represents the jth layer of the neural network structure, mjAs mask parameters, in accordance with Bernoulli probability distribution, mjDepending on the value of the probability P.
S3.2.2.4, the hidden layer node is deleted.
When mask parameter mjWhen the node output is 0, and the current node is deleted;
s3.2.2.5, obtaining an optimized deep learning model, wherein the final output of the optimized deep learning model is as follows:
spatial three-dimensional coordinates (X, Y, Z);
wherein X is G1(W,B,M),Y=G2(W,B,M),Z=G3(W, B, M); w is weight W vector, B is bias value B vector, and M is mask value M vector.
And S3.3, training the optimized deep learning model.
S3.3.1, dividing the sample data obtained in step S3.1 into training data and test data;
training data is 80% of sample data, and is recorded as:
Figure BDA0002126461040000122
the test data is 20% of the sample data and is recorded as:
Figure BDA0002126461040000123
Figure BDA0002126461040000124
s3.3.2, given the training error calculation formula:
Figure BDA0002126461040000125
Figure BDA0002126461040000126
Figure BDA0002126461040000127
s3.3.3, given the gradient calculation formula:
Figure BDA0002126461040000128
Figure BDA0002126461040000129
Figure BDA0002126461040000131
Figure BDA0002126461040000132
s3.3.4, substituting the training data into the optimized deep learning model for training and iterating to obtain W and B.
S3.3.5, the test data is substituted into the deep learning model for verification.
And S4, attaching the mark points to the target object with positioning, and positioning the mark points in the space by using the depth optimization model constructed in the step S3, thereby realizing the positioning of the target object.
As shown in fig. 3, after the training process is completed, the mechanical arm drives the mark point to move, the camera captures the two-dimensional coordinates and the radius information of the mark point, and then the three-dimensional spatial coordinates of the mark point are obtained through the trained neural network, so as to indirectly obtain the spatial position of the mechanical arm.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.

Claims (8)

1. A data acquisition method based on deep learning and multi-vision in a digital twin environment is characterized by comprising the following steps: s1, setting spherical mark points with distinction degree with the environment background;
the spherical mark points have specific colors with larger distinction degree with the environment background;
s2, obtaining the position coordinates and radius of the sphere center of the mark point in the video image;
s3, constructing a deep learning model and training;
s4, attaching the mark points to the target object with positioning, and positioning the mark points in the space by using the depth optimization model constructed in the step S3, so as to realize the positioning of the target object;
in step S3.2, the specific steps are:
s3.2.1, designing a neural network structure with one input layer, one output layer and two hidden layers; the number of nodes of the input layer is equal to the number of the cameras multiplied by the number of the input parameters, and the number of nodes of the output layer is the number of the output parameters; the number of hidden layer nodes is set to be a fixed value;
s3.2.2, optimizing the deep learning model;
introducing a dropout mechanism into a deep learning model, and deleting a part of hidden layer nodes with a certain probability P in training;
s3.2.2.1, obtaining an activation function of the neural network structure:
Figure FDA0002532822890000011
s3.2.2.2, each node adds an offset value to the weighted sum of the input data:
Figure FDA0002532822890000012
s3.2.2.3, combining steps S3.2.2.1 and S3.2.2.2, resulting in a node output:
Figure FDA0002532822890000013
where j represents the jth layer of the neural network structure, mjAs mask parameters, in accordance with Bernoulli probability distribution, mjVaries according to the value of the probability P;
s3.2.2.4, deleting hidden layer nodes;
when mask parameter mjWhen the node output is 0, and the current node is deleted;
s3.2.2.5, obtaining an optimized deep learning model, wherein the final output of the optimized deep learning model is as follows:
spatial three-dimensional coordinates (X, Y, Z);
wherein X is G1(W,B,M),Y=G2(W,B,M),Z=G3(W, B, M); w is weight W vector, B is bias value B vector, and M is mask value M vector.
2. The data acquisition method based on deep learning and multi-vision in the digital twin environment according to claim 1, wherein in step S2, the specific steps are as follows:
s2.1, arranging at least two cameras in an environment, wherein the cameras are distributed at different positions of the environment, so that visual bodies of the cameras are crossed;
s2.2, obtaining the video image f of each camerak(x, y), wherein k is the kth camera, x is the horizontal coordinate of the video image pixel, and y is the vertical coordinate of the video image pixel;
s2.3, obtaining a video image fk(x, y) edge image Fk(x,y);
S2.4, finding an edge image F by adopting a Hough circle finding algorithmkAll circles in (x, y);
s2.5, for each obtained circular area in the video image fkAnd (x, y) performing histogram statistics to find a circle closest to the color area of the mark point, and obtaining the circle center and radius data of the circle.
3. The data acquisition method based on deep learning and multi-vision in the digital twin environment as claimed in claim 2, wherein in step S2.3, the specific steps are as follows:
s2.3.1, using L aplace operator to the video image fk(x, y) deriving to obtain the first derivative f 'of the video image'k(x,y):
f′k(x,y)=-4fk(x,y)+fk(x-1,y)+fk(x+1,y)+fk(x,y-1)+fk(x,y-1);
S2.3.2, extracting the first derivative f'kThe coefficient of (x, y) obtains L aplace operator template;
Figure FDA0002532822890000021
s2.3.3, combining L aplace operator template with video image fk(x, y) obtaining each edge image FkPixel values of (x, y) to obtain respective edge images Fk(x,y);
Overlay L aplace operator template on image fk(x, y) and image fkMultiplying (x, y) pixel by L aplace operator template corresponding position value, then adding to obtain L aplace operator template central position corresponding pixel value, the obtained pixel value is edge image FkPixel value of (x, y).
4. The data acquisition method based on deep learning and multi-vision in the digital twin environment as claimed in claim 2, wherein in step S2.4, the specific steps are as follows:
s2.4.1, given the general equation for a circle:
(x-a)2+(y-b)2=r2
wherein (a, b) are coordinates of the center of the circle, and r is the radius of the circle;
s2.4.2, edge image F in x-y pixel coordinate systemkEach pixel point of (x, y) corresponds to a circle corresponding to the pixel point in the a-b coordinate system, and after the circle corresponds to the pixel pointThe equation for a circle in the a-b coordinate system is (a-x)2+(b-y)2=r2(x, y) is a circle center coordinate, and the radius r is set to be a preset value;
s2.4.3, adjusting the value of the radius r, repeating the step S2.4.2 until the circle center positions of all the radius circles are found, and further obtaining an edge image FkAll circles in (x, y).
5. The data acquisition method based on deep learning and multi-vision in the digital twin environment as claimed in claim 2, wherein in step S2.5, the specific steps are as follows:
s2.5.1, converting the video image fk(x, y) converting into a grayscale image;
s2.5.2, dividing the gray image into three sections;
s2.5.3, scanning and judging the frequency of each circular area pixel value falling into three blocks;
s2.5.4, obtaining the position coordinates and radius of the sphere center of the mark point in the video image;
comparing the pixel value frequency distribution of the circular area with the color frequency distribution of the mark points, eliminating circles with large differences, and recording the circle center and radius data of the circles with high frequency distribution similarity, wherein the circle center and radius data of the circles with high frequency distribution similarity are the circle center and radius of the mark points.
6. The data acquisition method based on deep learning and multi-vision in the digital twin environment according to claim 1, wherein in step S3, the specific steps are as follows:
s3.1, obtaining sample data;
the sample data comprises input data and output data, wherein the input data comprises a two-dimensional coordinate of a mark point and a radius of the mark point; the output data is a mark point space coordinate;
s3.2, constructing a deep learning model;
and S3.3, training the optimized deep learning model.
7. The data acquisition method based on deep learning and multi-vision in the digital twin environment as claimed in claim 6, wherein in step S3.1, the specific steps are as follows:
s3.1.1, traversing each position of the view volume using the robot arm to hold the mark point;
s3.1.2, obtaining the center coordinates and radius of the mark point according to the step S2 and using the center coordinates and radius as input data, wherein the center coordinates of the mark point are the two-dimensional coordinates of the mark point;
s3.1.3, the robot combines the circle center coordinates and the radius of the mark point to obtain the space coordinates of the mark point, and the space coordinates of the mark point are output data.
8. The data acquisition method based on deep learning and multi-vision in the digital twin environment as claimed in claim 6, wherein in step S3.3, the specific steps are as follows:
s3.3.1, dividing the sample data obtained in step S3.1 into training data and test data;
training data is 80% of sample data, and is recorded as:
Figure FDA0002532822890000041
and (X)train,Ytrain,Ztrain);
The test data is 20% of the sample data and is recorded as:
Figure FDA0002532822890000042
and (X)test,Ytest,Ztest);
S3.3.2, given the training error calculation formula:
Figure FDA0002532822890000043
Figure FDA0002532822890000044
Figure FDA0002532822890000045
s3.3.3, given the gradient calculation formula:
Figure FDA0002532822890000046
Figure FDA0002532822890000047
Figure FDA0002532822890000048
Figure FDA0002532822890000049
s3.3.4, substituting the training data into the optimized deep learning model for training and iterating to obtain W and B;
s3.3.5, the test data is substituted into the deep learning model for verification.
CN201910623996.4A 2019-07-11 2019-07-11 Data acquisition method based on deep learning and multi-vision in digital twin environment Active CN110334701B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910623996.4A CN110334701B (en) 2019-07-11 2019-07-11 Data acquisition method based on deep learning and multi-vision in digital twin environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910623996.4A CN110334701B (en) 2019-07-11 2019-07-11 Data acquisition method based on deep learning and multi-vision in digital twin environment

Publications (2)

Publication Number Publication Date
CN110334701A CN110334701A (en) 2019-10-15
CN110334701B true CN110334701B (en) 2020-07-31

Family

ID=68146261

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910623996.4A Active CN110334701B (en) 2019-07-11 2019-07-11 Data acquisition method based on deep learning and multi-vision in digital twin environment

Country Status (1)

Country Link
CN (1) CN110334701B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111563446B (en) * 2020-04-30 2021-09-03 郑州轻工业大学 Human-machine interaction safety early warning and control method based on digital twin
CN112418245B (en) * 2020-11-04 2024-04-26 武汉大学 Electromagnetic emission point positioning method based on urban environment physical model
CN113419857B (en) * 2021-06-24 2023-03-24 广东工业大学 Federal learning method and system based on edge digital twin association
CN114332741B (en) * 2022-03-08 2022-05-10 盈嘉互联(北京)科技有限公司 Video detection method and system for building digital twins
CN115184563B (en) * 2022-09-08 2022-12-02 北京中环高科环境治理有限公司 Chemical workshop field data acquisition method based on digital twinning
CN115631401A (en) * 2022-12-22 2023-01-20 广东省科学院智能制造研究所 Robot autonomous grabbing skill learning system and method based on visual perception
CN115849202B (en) * 2023-02-23 2023-05-16 河南核工旭东电气有限公司 Intelligent crane operation target identification method based on digital twin technology
CN116524030B (en) * 2023-07-03 2023-09-01 新乡学院 Reconstruction method and system for digital twin crane under swinging condition

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101226638A (en) * 2007-01-18 2008-07-23 中国科学院自动化研究所 Method and apparatus for standardization of multiple camera system
CN104463191A (en) * 2014-10-30 2015-03-25 华南理工大学 Robot visual processing method based on attention mechanism
CN104867158A (en) * 2015-06-03 2015-08-26 武汉理工大学 Monocular vision-based indoor water surface ship precise positioning system and method
WO2018140365A1 (en) * 2017-01-24 2018-08-02 Siemens Aktiengesellschaft System and method for cognitive engineering technology for automation and control of systems
CN109448061A (en) * 2018-10-09 2019-03-08 西北工业大学 A kind of underwater binocular visual positioning method without camera calibration
CN109933035A (en) * 2019-04-24 2019-06-25 中国科学院重庆绿色智能技术研究院 A kind of production line control system, method and the production system twin based on number

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11069056B2 (en) * 2017-11-22 2021-07-20 General Electric Company Multi-modal computer-aided diagnosis systems and methods for prostate cancer
CN108805222A (en) * 2018-05-08 2018-11-13 南京邮电大学 A kind of deep learning digital handwriting body recognition methods based on ARM platforms
CN109341689A (en) * 2018-09-12 2019-02-15 北京工业大学 Vision navigation method of mobile robot based on deep learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101226638A (en) * 2007-01-18 2008-07-23 中国科学院自动化研究所 Method and apparatus for standardization of multiple camera system
CN104463191A (en) * 2014-10-30 2015-03-25 华南理工大学 Robot visual processing method based on attention mechanism
CN104867158A (en) * 2015-06-03 2015-08-26 武汉理工大学 Monocular vision-based indoor water surface ship precise positioning system and method
WO2018140365A1 (en) * 2017-01-24 2018-08-02 Siemens Aktiengesellschaft System and method for cognitive engineering technology for automation and control of systems
CN109448061A (en) * 2018-10-09 2019-03-08 西北工业大学 A kind of underwater binocular visual positioning method without camera calibration
CN109933035A (en) * 2019-04-24 2019-06-25 中国科学院重庆绿色智能技术研究院 A kind of production line control system, method and the production system twin based on number

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A semi‐active human digital twin model for detecting severity of carotid stenoses from head vibration—A coupled computational mechanics and computer vision method;Neeraj Kavan Chakshu等;《Int J Numer Meth Biomed Engng》;20190111;全文 *
基于数字孪生的复杂机械产品多学科协同设计建模技术;李琳利等;《计算机集成制造系统》;20190630;第25卷(第6期);全文 *

Also Published As

Publication number Publication date
CN110334701A (en) 2019-10-15

Similar Documents

Publication Publication Date Title
CN110334701B (en) Data acquisition method based on deep learning and multi-vision in digital twin environment
CN110415342B (en) Three-dimensional point cloud reconstruction device and method based on multi-fusion sensor
CN109308693B (en) Single-binocular vision system for target detection and pose measurement constructed by one PTZ camera
CN109360240B (en) Small unmanned aerial vehicle positioning method based on binocular vision
CN106529538A (en) Method and device for positioning aircraft
CN106651942A (en) Three-dimensional rotation and motion detecting and rotation axis positioning method based on feature points
CN112801074B (en) Depth map estimation method based on traffic camera
CN113850865A (en) Human body posture positioning method and system based on binocular vision and storage medium
CN111998862B (en) BNN-based dense binocular SLAM method
CN112818925B (en) Urban building and crown identification method
CN104881029B (en) Mobile Robotics Navigation method based on a point RANSAC and FAST algorithms
CN112509125A (en) Three-dimensional reconstruction method based on artificial markers and stereoscopic vision
CN111563878A (en) Space target positioning method
CN109214254B (en) Method and device for determining displacement of robot
CN114022560A (en) Calibration method and related device and equipment
CN111583342B (en) Target rapid positioning method and device based on binocular vision
CN113393439A (en) Forging defect detection method based on deep learning
CN113345084B (en) Three-dimensional modeling system and three-dimensional modeling method
CN110349209A (en) Vibrating spear localization method based on binocular vision
CN109636856A (en) Object 6 DOF degree posture information union measuring method based on HOG Fusion Features operator
CN111899289B (en) Infrared image and visible light image registration method based on image characteristic information
KR101673144B1 (en) Stereoscopic image registration method based on a partial linear method
CN114608522B (en) Obstacle recognition and distance measurement method based on vision
CN112991372B (en) 2D-3D camera external parameter calibration method based on polygon matching
Liu et al. Dense three-dimensional color reconstruction with data fusion and image-guided depth completion for large-scale outdoor scenes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant