CN113468969A - Aliasing electronic component space expression method based on improved monocular depth estimation - Google Patents

Aliasing electronic component space expression method based on improved monocular depth estimation Download PDF

Info

Publication number
CN113468969A
CN113468969A CN202110618580.0A CN202110618580A CN113468969A CN 113468969 A CN113468969 A CN 113468969A CN 202110618580 A CN202110618580 A CN 202110618580A CN 113468969 A CN113468969 A CN 113468969A
Authority
CN
China
Prior art keywords
module
electronic component
rgb
aliasing
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110618580.0A
Other languages
Chinese (zh)
Other versions
CN113468969B (en
Inventor
顾寄南
雷文桐
张可
高伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu University
Original Assignee
Jiangsu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu University filed Critical Jiangsu University
Priority to CN202110618580.0A priority Critical patent/CN113468969B/en
Publication of CN113468969A publication Critical patent/CN113468969A/en
Application granted granted Critical
Publication of CN113468969B publication Critical patent/CN113468969B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses an aliasing electronic component space expression method based on improved monocular depth estimation, which relates to the field of machine vision and comprises an image acquisition module, a target detection network module, a semantic segmentation network module, an HSV (hue, saturation, value) module and an RGB (red, green and blue) module; the image acquisition module is used for acquiring RGB images of aliasing electronic components of different types in the bin; the target detection network module is used for processing the RGB image acquired by the image acquisition module to obtain a depth image A; the semantic segmentation network module is used for segmenting the depth image A processed by the target detection network module to obtain rough depth information; and the HSV and RGB modules refine the rough depth information segmented by the semantic segmentation network module to obtain the detailed depth information of each electronic component. The invention can effectively solve the problem of autonomous recognition under the complex working scene of aliasing among electronic components.

Description

Aliasing electronic component space expression method based on improved monocular depth estimation
Technical Field
The invention relates to the field of machine vision, in particular to an aliasing electronic component space expression method based on improved monocular depth estimation.
Background
The automatic identification of the electronic components is the basis of the visual control of the intelligent assembly robot, and the understanding of the complex scene is the basic support of the automatic identification of the electronic components. The electronic components can be accurately and autonomously identified, and the accuracy and the efficiency of the intelligent assembling robot can be directly related. In actual production application, the mechanical arm is assisted by a machine vision technology to assemble electronic components, so that the problems of low production efficiency, high labor input and large worker burden are solved, and the transformation from traditional flow production to intelligent production is fundamentally realized.
The existing electronic component identification method based on machine vision mainly aims at the identification of scattered and uniformly distributed electronic components, but still does not solve the autonomous identification problem among the electronic components and under the complex working scene of aliasing of the electronic components and the background.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides an aliasing electronic component space expression method based on improved monocular depth estimation, which can effectively solve the autonomous recognition problem in a complex working scene of aliasing among electronic components.
The present invention achieves the above-described object by the following technical means.
An aliasing electronic component space expression method based on improved monocular depth estimation comprises an image acquisition module, a target detection network module, a semantic segmentation network module, an HSV (hue, saturation, value) module and an RGB (red, green and blue) module;
the image acquisition module is used for acquiring RGB images of aliasing electronic components of different types in the bin;
the target detection network module is used for processing the RGB image acquired by the image acquisition module to obtain a depth image A;
the semantic segmentation network module is used for segmenting the depth image A processed by the target detection network module to obtain rough depth information;
and the HSV and RGB modules refine the rough depth information segmented by the semantic segmentation network module to obtain the detailed depth information of each electronic component.
Furthermore, the target detection network module comprises an input image module, a data enhancement module, a feature extraction network module, a feature fusion module, a down-sampling module, a full connection layer module, a classifier and a prediction output module; specifically, the acquired RGB image is subjected to data enhancement processing, feature extraction, feature fusion, down sampling, full connection layer, classifier and prediction output.
Further, the RGB image is randomly zoomed for 2 times through a data enhancement module to obtain 2 images a and b; randomly cutting for 2 times to obtain c and d;
the feature extraction network module comprises a lightweight network and a deep convolution network, feature extraction is carried out on the images a and c by utilizing a lightweight network algorithm, and feature extraction is carried out on the images b and d by utilizing a deep convolution network algorithm;
the feature fusion module performs three-time hierarchical feature fusion: fusing the shallow features and the deep features of the graphs a and b to obtain a feature graph x, fusing the shallow features and the deep features of the graphs c and d to obtain a feature graph y, and fusing the shallow features and the deep features of the graphs x and y to obtain a feature graph z; and (3) the feature graph z is subjected to down-sampling module, full connection layer module and classifier and then is output by prediction output module.
Further, the prediction output module predicts and outputs a depth image a, electronic component position information, and a category and a probability distribution of the electronic component.
Further, the depth image a is an RGB color image.
Further, the HSV and RGB modules comprise an HSV color model, an HSV cone model, an RGB three-dimensional coordinate model and an RGB value classifier;
firstly, a depth image A is segmented by a semantic segmentation network module to obtain rough depth information, the rough depth information is input into an HSV color model, and values of three attributes are output H, S, V; secondly, visualizing the H, S, V attribute values to a color cone model by the HSV cone model, converting the HSV cone model into an RGB three-dimensional coordinate model, and obtaining R, G, B values of a depth map; three ranges of R, G, B values were refined with an RGB value classifier.
Furthermore, the HSV color model determines colors according to H, S, V attributes, namely hue, saturation and brightness; wherein, the hue H is measured by angle, the value range is 0-360 degrees, the red is 0 degree, the green is 120 degrees and the blue is 240 degrees according to the counter-clockwise direction calculation from the red; the saturation S represents the degree of color approaching spectral color, and usually ranges from 0% to 100%, and the larger the value is, the more saturated the color is; lightness V represents the degree of brightness of the color, for a light source color, the lightness value is related to the lightness of the illuminant; for object colors, this value is related to the transmittance or reflectance of the object.
Furthermore, the manipulator device further comprises a manipulator control module, and the manipulator control module realizes positioning, grabbing and assembling according to the position information of the electronic components, the category and probability distribution of the electronic components and the RGB depth information.
Compared with the prior art, the technical scheme of the invention has at least the following benefits:
1. the invention combines the lightweight network and the deep convolutional network, thereby not only ensuring the comprehensiveness of image characteristics and detail information, but also improving the speed of model prediction and realizing real-time target detection on mobile equipment and embedded equipment.
2. The invention carries out three times of hierarchical feature fusion on the extracted image features, fuses the low-level detail features and the high-level semantic features, and greatly improves the detection performance of the network.
3. Compared with a general target detection algorithm, the method increases the output of the depth image, divides the aliasing electronic components in the depth direction, realizes the spatial expression of the aliasing electronic components, and solves the problem that the aliasing electronic components are difficult to understand by a computer.
Drawings
FIG. 1 is a schematic flow chart of an aliasing electronic component spatial representation based on improved monocular depth estimation according to an embodiment of the present invention;
FIG. 2 is a schematic flow diagram of the target detection module of FIG. 1 according to the present invention;
FIG. 3 is a schematic diagram illustrating a specific operation of the target detection module of FIG. 2 according to the present invention;
FIG. 4 is a schematic flow diagram of the HSV and RGB modules of FIG. 1 according to the present invention.
Reference numerals:
1-an image acquisition module; 2-target detection network module; 3-semantic segmentation network module; 4-HSV, RGB module; 5-a manipulator control module; 6-input image module; 7-a data enhancement module; 8-feature extraction network module; 9-a feature fusion module; 10-a down-sampling module; 11-full connectivity layer module; 12-a classifier; 13-a prediction output module; 14-HSV color model; 15-HSV conical model; 16-RGB three-dimensional coordinate model; 17-RGB value classifier.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
Acquiring images of different types of aliasing electronic components in the material box by using a camera to obtain RGB images of the electronic components; 2 times of random zooming and 2 times of random cutting are carried out on each picture; carrying out feature extraction on the processed image by using a lightweight algorithm and a depth convolution algorithm in target detection; fusing the extracted shallow features and deep features; after downsampling, full connection layers and classifiers, depth images, electronic component position information, class prediction and probability scores of electronic components are obtained; segmenting the depth image by using a semantic segmentation module by taking the color as a standard, thereby providing image understanding at a pixel level; the color of the depth map and the distance between the electronic component and the lens are accurately limited in a range through setting of parameters and hyper-parameters and training of a network; combining the segmented depth image with HSV and RGB methods, and refining the range to obtain the depth D of each electronic component; combining the position information (x, y, w, h) of each electronic component with the depth D of each electronic component, and expressing the complete three-dimensional position information of each electronic component by 5 parameters (x, y, w, h, D); the camera coordinate system is converted into a manipulator coordinate system, so that aliasing electronic component accurate position information and depth information are provided for the manipulator end effector, and the manipulator is convenient to position, grab and assemble with high precision.
The image acquisition module comprises a monocular color high-resolution CCD camera, an electronic component placing platform, a telescopic bracket and a light source; the image pixel of the CCD camera is 38 ten thousand points, the color resolution is 480 lines, and the black-white resolution is 600 lines. The camera is placed on a telescopic bracket with the height of 15cm of the experiment platform, and the distance between the lens and the platform surface of the experiment platform is 10 cm.
The image acquisition object of the invention is an electronic component, in particular to different types and aliasing electronic components. The electronic components comprise resistors, capacitors and inductors, the shapes of the electronic components comprise cylinders, squares, tubes, coils and the like, and the number of the electronic components is controlled to be 15-25. The electronic components are placed in a material box on the experimental platform, and the length, the width and the height of the material box are respectively 10cm, 10cm and 5 cm.
The invention can carry out 2 times of random resize and 2 times of random crop on the collected electronic components, and enhance the data, thereby improving the model precision and enhancing the model stability.
The feature extraction method used by the invention comprises two deep convolution networks and a lightweight network; the deep convolutional network can better extract image characteristics and detail information, including the color, shape, size, edge characteristics and corner characteristics of electronic components; the lightweight network can reduce network parameters, does not lose network performance, solves the storage problem of the model, can improve the speed of model prediction, and realizes real-time target detection on mobile equipment and embedded equipment.
The extracted features are subjected to three-time hierarchical feature fusion, the resolution of low-level features is higher, and more position and detail information is contained; the high-level feature resolution is very low and the semantic information is higher. And by means of three-time hierarchical feature fusion, low-level detail features and high-level semantic features are fused, so that the detection performance of the network is improved.
The target detection algorithm has three outputs (a depth image, an electronic component position image, a category and a corresponding probability score), compared with a general target detection algorithm, the target detection algorithm increases the output of the depth image, divides the aliasing electronic components in the depth direction, and can effectively solve the problem that a computer is difficult to understand the aliasing electronic components.
According to the invention, the color is taken as a standard, the obtained depth map is subjected to image semantic segmentation, and the electronic components are roughly divided into an upper-layer electronic component, a middle-layer electronic component and a bottom-layer electronic component; and then combining the divided depth image with HSV and RGB, refining again to obtain the depth of each electronic component, and controlling the precision to be 0.1 mm.
The invention uses 5 parameters (x, y, w, h and D) to express the complete three-dimensional position information of each electronic component, realizes the spatial expression of the aliasing electronic components, provides the accurate position information and depth information of the aliasing electronic components for the manipulator end effector, and is convenient for the high-precision positioning, grabbing and assembling of the follow-up intelligent assembling robot.
Specifically, an aliasing electronic component space expression method based on improved monocular depth estimation is characterized in that an industrial CCD is used for carrying out image acquisition on different types of aliasing electronic components in a bin to obtain RGB images of the electronic components; carrying out image enhancement on the electronic component by utilizing random zooming and random cutting; carrying out feature extraction on the image by using a lightweight network and a deep convolution network in a target detection network; fusing shallow features and deep features by utilizing feature fusion in a target detection network; obtaining a depth image A, electronic component position information B, class prediction C of the electronic component and a probability score P by utilizing down-sampling, a full connection layer and a classifier; utilizing a semantic segmentation network module to segment the depth graph to obtain rough depth information; refining the rough depth information after semantic segmentation by using HSV and RGB modules to obtain detailed depth information D of each electronic component; the position information (x, y, w, h) of each electronic component is combined with the detailed depth information depth D of each electronic component, and the complete three-dimensional position information of each electronic component is expressed by 5 parameters (x, y, w, h, D), so that the manipulator is convenient to position, grab and assemble with high precision.
The feature extraction is carried out by the lightweight network and the deep convolution network, and the deep convolution network can better extract image features and detail information, including the color, shape, size, edge features and corner features of electronic components; the lightweight network can reduce network parameters, does not lose network performance, solves the storage problem of the model, can improve the speed of model prediction, and realizes real-time target detection on mobile equipment and embedded equipment.
Feature fusion: the extracted image features are subjected to three-time hierarchical feature fusion, so that the resolution of low-level features is higher, and more position and detail information is contained; the high-level feature resolution is very low and the semantic information is higher. And by means of three-time hierarchical feature fusion, low-level detail features and high-level semantic features are fused, so that the detection performance of the network is improved.
Compared with a general target detection algorithm, the output of the depth image A is increased by utilizing the output after downsampling, full connection layers and classifiers, aliasing electronic components are divided in the depth direction, and the problem that aliasing electronic components are difficult to understand by a computer can be solved.
The semantic segmentation module is used for performing image semantic segmentation on the obtained depth map by taking the color as a standard, and roughly dividing the electronic components into an upper-layer electronic component, a middle-layer electronic component and a bottom-layer electronic component;
the HSV and RGB module inputs rough depth information obtained by the depth image A through a semantic segmentation network into an HSV color model and outputs H, S, V values of three attributes; visualizing H, S, V values of the three attributes onto a color cone model using the HSV cone model; converting the HSV conical model into an RGB three-dimensional coordinate model to obtain an R, G, B value of the depth map; the three ranges in the semantic segmentation network are refined by an RGB classifier, and the distance precision is controlled to be 0.1mm, so that the detailed depth information depth D (namely the distance from the camera) of each electronic component is obtained.
The positioning, grabbing and assembling of the manipulator express the complete three-dimensional position information of each electronic component by 5 parameters (x, y, w, h and D), the camera coordinate system is converted into the manipulator coordinate system, and aliasing accurate position information and depth information of the electronic component are provided for the manipulator end effector, so that the manipulator is convenient to position, grab and assemble with high precision.
With reference to fig. 1, an aliasing electronic component space expression method based on improved monocular depth estimation comprises an image acquisition module 1, a target detection network module 2, a semantic segmentation network module 3, an HSV, an RGB module 4 and a manipulator control module 5;
the image acquisition module 1 acquires RGB images of different types of aliasing electronic components in the material box; the target detection network 2 performs data enhancement processing, feature extraction, feature fusion, down-sampling, full connection layer and classifier on the acquired RGB image to obtain a depth image, electronic component position information, class prediction and probability score of the electronic component; the semantic segmentation network module 3 is used for segmenting the depth image by taking color as a standard, and roughly limiting the color of the depth image and the distance between an electronic component and a lens within a range through parameter setting and network training; the HSV and RGB module 4 refines the range of the depth image to obtain the depth D of each electronic component, combines the position information (x, y, w, h) of each electronic component with the depth D of each electronic component, and expresses the complete three-dimensional position information of each electronic component by 5 parameters (x, y, w, h, D); the manipulator control module 5 converts the camera coordinate system into a manipulator coordinate system, provides aliasing accurate position information and depth information of the electronic components for the manipulator end effector, and realizes high-precision positioning, grabbing and assembling of the manipulator.
In specific implementation, the monocular camera is a color high-resolution CCD camera, the image pixels are 38 ten thousand dots, the color resolution is 480 lines, and the black-and-white resolution is 600 lines. The camera adopts the fixed mode of shooing, puts on the telescopic bracket who is 15cm of a height of experiment platform, and the camera lens is 10cm apart from the distance of experiment platform mesa, locks behind the fixed position, guarantees in the experimentation that monocular camera can not remove and slide.
In specific implementation, the image acquisition objects of the invention are different types of aliasing electronic components. The electronic components comprise resistors, capacitors and inductors, and the electronic components are cylindrical, square, tubular, coil-shaped and the like. Electronic components places in the workbin on the experiment platform, and the length of workbin, width, height are 10cm, 5cm respectively, and the workbin passes through the spout to be fixed on the experiment platform, guarantees can not remove and shake in the experimentation. The number of the electronic components is controlled to be 15-25, the specific number can be adjusted according to the sizes of the electronic components, aliasing and shielding among the electronic components are ensured, and the electronic components cannot exceed the horizontal plane at the upper end of the feed box.
In specific implementation, the semantic segmentation module adopts PSPNet, and the PSPNet extracts abstract features through a residual error network ResNet; aggregating context information through a pyramid pooling module, wherein the pyramid level is 4, and obtaining 4 pieces of information with different scales; reducing the number of channels of the feature maps of 4 levels to 512 by convolution (conv/BN/ReLU); and respectively restoring the spatial dimension of each feature map to the spatial dimension of the input of the pyramid pooling module through bilinear linear interpolation upsampling, namely restoring the output dimension of each level of feature map to (60, 512).
In specific implementation, colors of the depth map are divided into 6 ranges through semantic segmentation, and the depth is from light to deep: "blue" (0, 119) - (0, 255), "cyan" (0, 255) - (0,119,119), "green" (0,119,119) - (119,255,0), "yellow" (255,199,0) - (199,255,0), "orange" (255,0,0) - (255,119,0), "red" (255,119,0) - (119,0, 0). The result after segmentation is divided into three parts, which are respectively: a relatively short distance from the camera (blue-cyan display), a distance of 5-7cm, a medium distance from the camera (green-yellow display), a distance of 7-9cm, a relatively long distance from the camera (orange-red display), a distance of 9-10 cm. Corresponding to three ranges of electronic components respectively: upper layer electronic components (no shielding), middle layer electronic components (partial shielding, shielding part less than half/only one layer shielding), and bottom layer electronic components (partial shielding, shielding part greater than half/multiple layers shielding).
In specific implementation, in training, target detection and semantic segmentation both adopt transfer learning, all parameters are trained after weights are loaded, shallow learning parameters of a well-learned network are transferred to a new network, and the new network also has the capability of recognizing bottom-layer general features.
In specific implementation, the transformation of the camera coordinate system and the manipulator coordinate system is as follows: assume OXY is the robot coordinate system and O ' X ' Y ' is the camera coordinate system. theta is the included angle between the two coordinate systems, the coordinate transformation relationship is as follows:
x=x′*r′*cos(theta)-y′*r*sin(theta)+x0(1)
y=x′*r*sin(theta)-y′*r*cos(theta)+y0(2)
where r is the millimeter pixel ratio, (mm/pixel) refers to the number of pixels in one millimeter, theta is the angle between the two coordinate systems, and (x0, y0) is the distance from the origin of image coordinates to the origin of mechanical coordinates.
As shown in fig. 2 and 3, the target detection network module includes an input image module 6, a data enhancement module 7, a feature extraction network module 8, a feature fusion module 9, a down-sampling module 10, a full connection layer 11, a classifier 12, and a prediction output module 13.
The image input by the input image module 6 is an RGB image of different types of aliasing electronic components in a material box collected by the image collection module 1; the data enhancement module 7 respectively obtains 2 images a and b from the preprocessed pictures through 2 times of random scaling (resize), and respectively obtains 2 images c and d from the preprocessed pictures through 2 times of random cropping (crop); the feature extraction network 8 comprises a lightweight network (MobileNet V3) and a deep convolution network (DenseNet), the images a and c are subjected to feature extraction by utilizing a MobileNet V3 algorithm, the images b and d are subjected to feature extraction by utilizing a DenseNet algorithm in target detection, and the sizes of the images are unified into 900 x 900 after feature extraction; the feature fusion module 9 comprises three times of hierarchical feature fusion, namely fusing the shallow features and the deep features of the images a and b to obtain a feature map x, fusing the shallow features and the deep features of the images c and d to obtain a feature map y, and fusing the shallow features and the deep features of the images x and y to obtain a feature map z; passing the feature map z through a downsampling module 10, a full connection layer 11 and a classifier 12(softmax) to obtain a prediction output 13; the prediction output 13 includes a depth image a, electronic component position information B, electronic component category C, and probability score P.
In a specific implementation, the depth image a is an RGB color image. In one depth map, the part closer to the camera is shown as blue-cyan (from near to far), the part equidistant from the camera is shown as cyan-green-yellow (from near to far), and the part farther from the camera is shown as yellow-orange-red (from near to far).
In specific implementation, the electronic component position information B is represented by coordinate values, the upper right corner and the upper left corner are used as original points and are marked as (0, 0), a minimum circumscribed rectangle is drawn around each electronic component, the central point of the electronic component is (x, y), the width of the minimum circumscribed rectangle is w, and the height of the minimum circumscribed rectangle is h. The positional information of each electronic component is expressed by four parameters (x, y, w, h).
In specific implementation, the category C and the probability score P of the electronic components appear in the upper right corner of the minimum circumscribed rectangle of each electronic component in the image and are respectively represented by Chinese characters and decimal numbers. The category one is C +1 (including a category background), and is referred to as "resistance", "capacitance", "inductance" and "background" 4 in the present invention. The probability score refers to the probability that the object framed by the minimum circumscribed rectangle is in the category, the probability score value P is between 0 and 1, and the precision is controlled to be 0.01.
Referring to fig. 4, the HSV and RGB module includes an HSV color model 14, an HSV pyramid model 15, an RGB three-dimensional coordinate model 16, and an RGB value classifier 17; inputting the rough depth information obtained by the depth image A through the semantic segmentation network module 3 into the HSV color model 14, and outputting H, S, V values of three attributes; the HSV pyramid model 15 visualizes H, S, V the values of the three attributes onto one color pyramid model; converting the HSV conical model 15 into an RGB three-dimensional coordinate model 16 to obtain an R, G, B value of the depth map; the three ranges of the R, G, B values were refined by the RGB classifier 17, and the distance accuracy was controlled to 0.1mm, thereby obtaining the depth D of each electronic component (i.e., the distance from the camera).
In particular, the HSV color model 14 determines color from H, S, V, which are hue, saturation, and lightness, respectively. Wherein, the hue H is measured by an angle, the value range is 0-360 degrees, the red is 0 degree, the green is 120 degrees and the blue is 240 degrees according to the anticlockwise calculation from the red. Their complementary colors are: yellow 60 °, cyan 180 °, magenta 300 °; the saturation S represents the degree to which the color approaches the spectral color. The value range is usually 0% -100%, and the larger the value is, the more saturated the color is; lightness V represents the degree of brightness of the color, for a light source color, the lightness value is related to the lightness of the illuminant; for object colors, this value is related to the transmittance or reflectance of the object. Values typically range from 0% (black) to 100% (white).
In particular, the HSV pyramid model visualizes H, S, V the values of the three attributes onto a colored inverted pyramid. At the apex (i.e., origin) of the cone, V is 0, H and S are undefined and represent black. S-0, V-1, H is undefined and represents white at the center of the top surface of the cone. The V-axis in the HSV model corresponds to the principal diagonal in the RGB color space.
In specific implementation, X, Y, Z axes in the RGB three-dimensional coordinate model respectively correspond to R, G, B three channels, and the value range is 0-255. When H is more than or equal to 0 and less than 360, S is more than or equal to 0 and less than or equal to 1, and V is more than or equal to 0 and less than or equal to 1, the conversion formula of HSV and RGB is as follows:
C=V×S (3)
X=C×(1-|(H/60°)mod2-1|) (4)
m=V-C (5)
Figure BDA0003098713960000081
(R,G,B)=((R′+m)×255,(G′+m)×255,(B′+m)×255) (7)
in specific implementation, the RGB classifier 17 refines 6 color span ranges "blue" (0,0,119) - (0,0,255), "cyan" (0,0,255) - (0,119,119), "green" (0,119,119) - (119,255,0), "yellow" (255,199,0) - (199,255,0), "orange" (255,0,0) - (255,119,0), "red" (255,119,0) - (119,0,0) to each channel value (R, GB), and controls the distance accuracy of the camera from the electronic component to 0.1mm, thereby obtaining detailed depth information D (i.e., the distance from the camera) of each electronic component.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made in the above embodiments by those of ordinary skill in the art without departing from the principle and spirit of the present invention.

Claims (8)

1. An aliasing electronic component space expression method based on improved monocular depth estimation is characterized by comprising an image acquisition module, a target detection network module, a semantic segmentation network module, an HSV (hue, saturation and value) module and an RGB (red, green and blue) module;
the image acquisition module is used for acquiring RGB images of aliasing electronic components of different types in the bin;
the target detection network module is used for processing the RGB image acquired by the image acquisition module to obtain a depth image A;
the semantic segmentation network module is used for segmenting the depth image A processed by the target detection network module to obtain rough depth information;
and the HSV and RGB modules refine the rough depth information segmented by the semantic segmentation network module to obtain the detailed depth information of each electronic component.
2. The aliasing electronic component spatial expression method based on improved monocular depth estimation according to claim 1, wherein the target detection network module comprises an input image module, a data enhancement module, a feature extraction network module, a feature fusion module, a down-sampling module, a full-link layer module, a classifier and a prediction output module; specifically, the acquired RGB image is subjected to data enhancement processing, feature extraction, feature fusion, down sampling, full connection layer, classifier and prediction output.
3. The aliasing electronic component spatial expression method based on improved monocular depth estimation according to claim 2,
carrying out 2-time random scaling on the RGB image through a data enhancement module to obtain 2 images a and b; randomly cutting for 2 times to obtain c and d;
the feature extraction network module comprises a lightweight network and a deep convolution network, feature extraction is carried out on the images a and c by utilizing a lightweight network algorithm, and feature extraction is carried out on the images b and d by utilizing a deep convolution network algorithm;
the feature fusion module performs three-time hierarchical feature fusion: fusing the shallow features and the deep features of the graphs a and b to obtain a feature graph x, fusing the shallow features and the deep features of the graphs c and d to obtain a feature graph y, and fusing the shallow features and the deep features of the graphs x and y to obtain a feature graph z; and (3) the feature graph z is subjected to down-sampling module, full connection layer module and classifier and then is output by prediction output module.
4. The aliasing electronic component spatial expression method based on improved monocular depth estimation according to claim 3, wherein the prediction output module predicts the output to include a depth image A, electronic component position information, and category and probability distribution of the electronic component.
5. The aliasing electronic component space expression method based on improved monocular depth estimation of claim 1, wherein depth image a is an RGB color image.
6. The aliasing electronic component space expression method based on improved monocular depth estimation of claim 1, wherein the HSV, RGB modules comprise an HSV color model, an HSV pyramid model, an RGB three-dimensional coordinate model, and an RGB value classifier;
firstly, a depth image A is segmented by a semantic segmentation network module to obtain rough depth information, the rough depth information is input into an HSV color model, and values of three attributes are output H, S, V; secondly, visualizing the H, S, V attribute values to a color cone model by the HSV cone model, converting the HSV cone model into an RGB three-dimensional coordinate model, and obtaining R, G, B values of a depth map; and refining the three ranges of the R, G, B values by using an RGB value classifier, thereby realizing the refinement of the rough depth information obtained by the semantic segmentation module and obtaining the detailed depth information of the electronic component.
7. The aliasing electronic component space expression method based on the improved monocular depth estimation of claim 6, wherein the HSV color model determines color from H, S, V three attributes, namely hue, saturation and brightness; wherein, the hue H is measured by angle, the value range is 0-360 degrees, the red is 0 degree, the green is 120 degrees and the blue is 240 degrees according to the counter-clockwise direction calculation from the red; the saturation S represents the degree of color approaching spectral color, and usually ranges from 0% to 100%, and the larger the value is, the more saturated the color is; lightness V represents the degree of brightness of the color, for a light source color, the lightness value is related to the lightness of the illuminant; for object colors, this value is related to the transmittance or reflectance of the object.
8. The aliasing electronic component spatial expression method based on the improved monocular depth estimation of claim 6, further comprising a manipulator control module, wherein the manipulator control module is used for positioning, grabbing and assembling according to the electronic component position information, the class and probability distribution of the electronic component and the detailed depth information of the electronic component.
CN202110618580.0A 2021-06-03 2021-06-03 Aliased electronic component space expression method based on improved monocular depth estimation Active CN113468969B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110618580.0A CN113468969B (en) 2021-06-03 2021-06-03 Aliased electronic component space expression method based on improved monocular depth estimation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110618580.0A CN113468969B (en) 2021-06-03 2021-06-03 Aliased electronic component space expression method based on improved monocular depth estimation

Publications (2)

Publication Number Publication Date
CN113468969A true CN113468969A (en) 2021-10-01
CN113468969B CN113468969B (en) 2024-05-14

Family

ID=77872099

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110618580.0A Active CN113468969B (en) 2021-06-03 2021-06-03 Aliased electronic component space expression method based on improved monocular depth estimation

Country Status (1)

Country Link
CN (1) CN113468969B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109711413A (en) * 2018-12-30 2019-05-03 陕西师范大学 Image, semantic dividing method based on deep learning
KR102127153B1 (en) * 2020-04-09 2020-06-26 한밭대학교 산학협력단 Depth estimation method and system using cycle GAN and segmentation
CN111340864A (en) * 2020-02-26 2020-06-26 浙江大华技术股份有限公司 Monocular estimation-based three-dimensional scene fusion method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109711413A (en) * 2018-12-30 2019-05-03 陕西师范大学 Image, semantic dividing method based on deep learning
CN111340864A (en) * 2020-02-26 2020-06-26 浙江大华技术股份有限公司 Monocular estimation-based three-dimensional scene fusion method and device
KR102127153B1 (en) * 2020-04-09 2020-06-26 한밭대학교 산학협력단 Depth estimation method and system using cycle GAN and segmentation

Also Published As

Publication number Publication date
CN113468969B (en) 2024-05-14

Similar Documents

Publication Publication Date Title
CN109903331B (en) Convolutional neural network target detection method based on RGB-D camera
CN110728200A (en) Real-time pedestrian detection method and system based on deep learning
CN106156778B (en) The method of known object in the visual field of NI Vision Builder for Automated Inspection for identification
WO2023050589A1 (en) Intelligent cargo box loading method and system based on rgbd camera
CN114724120B (en) Vehicle target detection method and system based on radar vision semantic segmentation adaptive fusion
WO2022033076A1 (en) Target detection method and apparatus, device, storage medium, and program product
CN108229440A (en) One kind is based on Multi-sensor Fusion indoor human body gesture recognition method
Qiu-yu et al. Hand gesture segmentation method based on YCbCr color space and K-means clustering
CN103646249A (en) Greenhouse intelligent mobile robot vision navigation path identification method
WO2007002382A2 (en) Terrain map summary elements
TWI745204B (en) High-efficiency LiDAR object detection method based on deep learning
Ouyang et al. A cgans-based scene reconstruction model using lidar point cloud
Lacroix et al. Feature extraction using the constrained gradient
CN114120067A (en) Object identification method, device, equipment and medium
Rodriguez-Telles et al. A fast floor segmentation algorithm for visual-based robot navigation
CN116994135A (en) Ship target detection method based on vision and radar fusion
CN115641322A (en) Robot grabbing method and system based on 6D pose estimation
CN113681552B (en) Five-dimensional grabbing method for robot hybrid object based on cascade neural network
CN114639115A (en) 3D pedestrian detection method based on fusion of human body key points and laser radar
CN114332796A (en) Multi-sensor fusion voxel characteristic map generation method and system
CN113379684A (en) Container corner line positioning and automatic container landing method based on video
CN113468969B (en) Aliased electronic component space expression method based on improved monocular depth estimation
CN117011380A (en) 6D pose estimation method of target object
CN111784768A (en) Unmanned aerial vehicle attitude estimation method and system based on three-color four-lamp mark recognition
CN116188763A (en) Method for measuring carton identification positioning and placement angle based on YOLOv5

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant