CN116228696A

CN116228696A - Glass detection method based on deep learning and ghost phenomenon

Info

Publication number: CN116228696A
Application number: CN202310128767.1A
Authority: CN
Inventors: 晏涛; 高嘉晖; 李贺龙
Original assignee: Jiangnan University
Current assignee: Jiangnan University
Priority date: 2023-02-17
Filing date: 2023-02-17
Publication date: 2023-06-06

Abstract

The application relates to a glass detection method based on deep learning and ghost phenomena, and relates to the technical field of computer vision. The method comprises the following steps: glass detection is carried out based on a single original input image, wherein the image is a single RGB image; extracting ghost features based on an original input image through a deep learning method of a backbone network to obtain a ghost region prediction graph; after the ghost area prediction graph is connected with an original input image channel, extracting glass features under the guidance of ghost cues based on a backbone network, and then performing glass feature decoding and segmentation results of a glass area based on a convolutional neural network; and outputting a glass region prediction graph. Compared with the prior art, the glass detection method based on the single image is wider in application range. The backbone network is used for extracting the ghost features and the glass region features more accurately and efficiently, and meanwhile, the ghost phenomenon is utilized for positioning the glass region more accurately, so that a high-quality glass region prediction graph is obtained, and the robustness is good.

Description

Glass detection method based on deep learning and ghost phenomenon

Technical Field

The application relates to the technical field of computer vision, in particular to a glass detection method based on deep learning and ghost phenomena.

Background

Glass inspection work has recently attracted a great deal of attention. Glass surfaces, including glass windows, glass doors and glass walls, are ubiquitous in indoor and outdoor settings of our daily lives. However, since they are transparent surfaces, there is typically no particular visual image, and the information presented is largely dependent on the scene behind them. Because glass lacks a consistent visual appearance and special functions, computer vision-based systems such as robots and drones can easily disregard the glass surface, often the detected glass area is not the glass surface, but rather penetrates the scene of the glass, thereby affecting its proper operation. Thus, accurate detection of glass surfaces is critical to many computer vision-based systems.

Meanwhile, with the development of computer technology and the wide application of computer vision principles, deep learning is based on strong learning ability and feature expression ability, and is rapidly developed in the field of computer vision, and a traditional way of manually constructing features based on priori knowledge is rapidly mentioned, wherein in recent years, a deep learning method based on a transducer obtains results exceeding convolutional neural networks in a plurality of fields.

Existing deep learning-based methods make use of contextual contrast information, however, there is no useful clue to the mining of glass physical properties. Glass detection using reflection is also very limited because reflection is not a physical property specific to the glass region, and reflection from smooth surfaces such as walls, floors, displays, etc. affects the accuracy of the detection result for the glass region.

Disclosure of Invention

The technical problem to be solved by the application is that the accuracy of a detection result of detecting a glass region in the prior art is low, and the application aims to provide a glass detection method based on deep learning and ghost phenomena, which is used for extracting global features more effectively based on the deep learning method, detecting the glass region more accurately based on the ghost phenomena to obtain a high-quality detection result and has good robustness.

In order to achieve the above purpose, the following technical scheme is adopted in the application:

in one aspect, a glass detection method based on deep learning and ghost phenomenon includes the steps of:

glass detection is carried out based on a single original input image, wherein the image is a single RGB image;

extracting ghost features based on an original input image through a deep learning method of a backbone network, and calculating to obtain a ghost region prediction graph;

after the ghost area prediction graph is connected with an original input image channel, extracting glass features under the guidance of ghost cues based on a backbone network, and then decoding the glass features and acquiring a segmentation result of a glass area based on a convolutional neural network;

and outputting a glass region prediction graph based on the segmentation result of the glass region.

The step of obtaining the ghost area prediction graph comprises the following steps:

acquiring multi-scale characteristics based on a backbone network;

inputting the acquired multi-scale characteristics into a double-reflection estimation module to acquire an offset estimation diagram to detect a ghost area, wherein the double-reflection estimation module acquires primary reflection characteristics and secondary reflection characteristics through primary reflection detection and secondary reflection detection;

and fusing the primary reflection characteristic, the secondary reflection characteristic and the offset estimation graph to obtain ghost characteristics, and obtaining a high-quality ghost area prediction graph through a decoder based on a convolutional neural network.

The double-layer reflection estimation module process comprises the following steps:

inputting the multi-scale characteristics acquired based on a backbone network into a double-estimation reflection estimation module;

based on the multi-scale characteristics, acquiring primary reflection characteristics and a primary reflection area prediction graph through primary detection, and acquiring secondary reflection characteristics and a secondary reflection area prediction graph through secondary detection;

performing feature constraint on the primary reflection feature and the secondary reflection feature through deformable convolution;

inputting the primary reflection area prediction graph and the secondary reflection prediction graph into a coder-decoder structure, and acquiring an offset estimation graph through an encoder;

and (5) merging the primary reflection characteristic, the secondary reflection characteristic and the ghost characteristic obtained by the offset estimation graph, and inputting the ghost characteristic into a decoder to obtain a ghost area prediction graph.

The characteristic constraint is formed by subtracting the primary reflection characteristic and the secondary reflection characteristic of the deformable convolution, and the calculation formula of the loss function is used:

wherein the method comprises the steps of

For deformable convolution operations, ++>

For the primary reflection feature at the corresponding i scale, +.>

Is a secondary reflection feature at the corresponding i scale.

The backbone network is a Swin-Transformer.

According to the method and the device, global features are extracted based on the Swin-transducer, meanwhile, physical characteristics of the ghost are comprehensively considered, a dual reflection estimation module prediction offset map is designed, and accuracy of a detected ghost area can be improved. Wherein the feature constraints performed are also advantageous for the extraction of ghost features.

In another aspect, a glass detection system based on deep learning and ghost phenomenon, the system being adapted for use in a glass detection method based on deep learning and ghost phenomenon, the system comprising:

an acquisition module for acquiring a single Zhang Yuanshi input image;

the ghost detection module is used for extracting ghost features based on an original input image through a deep learning method of a backbone network and calculating to obtain a ghost region prediction graph;

the glass segmentation module is used for extracting glass characteristics based on a backbone network under the guidance of ghost cues after the ghost area prediction graph is connected with an original input image channel, and then carrying out glass characteristic decoding and segmentation results of the glass areas based on a convolutional neural network;

and the output module is used for outputting the glass region prediction graph.

The glass dividing module is of a U-shaped structure and comprises an encoding part and a decoding part.

The glass segmentation module designs a U-shaped structure, is different from the traditional U-Net and most convolution neural network-based methods, adopts a Swin-transform and convolution combination mode, global features extracted by the transform are beneficial to positioning of a potential glass region, and uses the convolution neural network to perform feature fusion progressive decoding, so that a high-quality glass region prediction graph is finally obtained.

The beneficial effects that this application provided technical scheme brought include at least:

compared with the prior art, the glass detection method based on the single image is wider in application range. In the network structure, the network is constructed by using a backbone network, so that ghost features and glass region features can be extracted more accurately and efficiently. Meanwhile, by utilizing the special visual clue of double image, the glass region can be positioned more accurately, and a high-quality glass region prediction graph is obtained, so that the method has good robustness.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 shows a schematic flow chart of a glass detection method based on deep learning and ghost phenomena;

FIG. 2 illustrates a flow diagram of a ghost detection module provided in an exemplary embodiment of the present application;

FIG. 3 illustrates a dual reflection estimation module process schematic provided in one exemplary embodiment of the present application;

FIG. 4 illustrates a flow diagram of a dual reflection estimation module provided in an exemplary embodiment of the present application;

FIG. 5 illustrates a block diagram of a deep learning and ghost phenomenon based glass detection system according to an exemplary embodiment of the present application;

FIG. 6 illustrates a block diagram of a dual layer reflection estimation module in a deep learning and ghost phenomenon based glass detection system according to an exemplary embodiment of the present application;

FIG. 7 illustrates a schematic diagram of a glass detection network connection provided in an exemplary embodiment of the present application;

fig. 8 shows a graph of experimental results, wherein the first column Input is a picture of the Input real scene with glass regions; the second column of ourGhos obtains a 2D mask of the ghost area for one exemplary embodiment of the present application; the third column Ours is the 2D mask of the glass region predicted by one exemplary embodiment of the present application, and the fourth column GT is the truth chart of the glass region mask.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

The present application is further described below with reference to the drawings and examples.

First, the terms involved in the embodiments of the present application will be briefly described:

the ghost effect is an inherent property of the glass surface and always occurs on the glass surface. This is due to the fact that the glass pane has two contact surfaces (on both sides thereof) which are formed by two attenuated slightly offset reflections, with the use of ghost information the detection of the glass region can be effectively guided. Since ghost images occur only in the glass region in daily life, guidance and support can be effectively provided for inspecting glass.

SwinTransformer is a deep learning method based on the Transformer.

Fig. 1 shows a schematic flow chart of a glass detection method based on deep learning and ghost phenomena according to an exemplary embodiment of the present application, where the method includes the following steps:

step 101, glass detection is performed based on a single original input image, wherein the image is a single RGB image.

And 102, extracting ghost features based on the original input image through a deep learning method of a backbone network, and calculating to obtain a ghost region prediction graph.

And step 103, after the ghost area prediction graph is connected with an original input image channel, extracting glass features under the guidance of ghost cues based on a backbone network, and then performing glass feature decoding and a segmentation result of a glass area based on a convolutional neural network.

Step 104, outputting a glass region prediction map based on the segmentation result of the glass region. The training is repeated until a high-quality glass region prediction graph is obtained.

The specific flow is as follows:

glass detection is carried out based on a single original input image, then ghost characteristics are extracted by a ghost detection module through a backbone network, the backbone network is a Swin-converter, the obtained characteristics are input into a double reflection estimation module, and primary reflection and secondary reflection are detected by using two branches; the features of the primary reflection are aligned with the features of the reflection using a deformable convolution block to accurately estimate the primary and secondary reflections (a double layer reflection estimation block). The estimated prediction map of primary and secondary reflections is up-sampled to the original resolution input to the encoder-decoder structure to estimate a displacement map, which is an offset estimation map. And (3) downsampling the obtained displacement image, connecting the downsampled displacement image with the characteristic images of primary reflection and secondary reflection, merging the downsampled displacement image with the characteristic images of primary reflection and secondary reflection, and inputting the merged displacement image into a decoder to decode a ghost area prediction image. The glass region is then segmented under the guidance of the ghost region by a glass segmentation module. And connecting the obtained ghost area with an input image channel, inputting the ghost area into another backbone network to extract glass features, performing feature decoding by using a convolutional neural network, and obtaining a segmentation structure of the glass area, and repeatedly training until a high-quality glass area prediction graph is obtained.

In summary, compared with the prior art, the glass detection method based on deep learning and ghost phenomena provided by the application is improved in method, and meanwhile, the physical characteristics of the glass are analyzed, so that the glass detection method is more unique. Since ghost images occur only in the glass region in daily life, guidance and support can be effectively provided for inspecting glass.

The global features are extracted by a SwinTransformer-based method, meanwhile, physical characteristics of the ghost are comprehensively considered, and the accuracy of the detected ghost area can be improved by designing a double reflection estimation module prediction offset map. Wherein the feature constraints performed are also advantageous for the extraction of ghost features.

The structure of the glass segmentation module designs a U-shaped structure, is a coder-decoder structure, is different from the traditional U-Net and most convolution neural network-based methods, adopts a Swin-transform and convolution combination mode, global features extracted by the transform are beneficial to positioning of potential glass regions, and uses the convolution neural network to perform feature fusion progressive decoding, so that a high-quality glass region prediction graph is finally obtained.

The application carries out glass detection based on a single image, and the application range is wider. Based on a single image, the multimodal input method requires the use of additional input devices such as an infrared camera, a polarization camera, a depth camera, etc., unlike the multimodal input in which a plurality of images are simultaneously processed, uses a larger limit of field Jing Huiyou than the use of a single camera, and is less applicable.

Fig. 2 shows a schematic flow chart of a ghost detection module according to an exemplary embodiment of the present application, and the method includes the following steps:

step 201, acquiring multi-scale characteristics through a backbone network;

step 202, inputting a double reflection estimation module to estimate a displacement map based on the acquired multi-scale features to detect a ghost area, wherein the double reflection estimation module adopts two branches to detect primary reflection and secondary reflection to acquire primary reflection features and secondary reflection features;

and 203, fusing the primary reflection characteristic, the secondary reflection characteristic and the displacement map to obtain ghost characteristics, and obtaining a high-quality ghost area prediction map through a decoder based on a convolutional neural network.

Fig. 3 and 4 illustrate a dual reflection estimation module provided in an exemplary embodiment of the present application, and the process includes the following steps:

step 301, inputting a multi-scale feature acquired based on a backbone network into a dual-estimation reflection estimation module;

step 302, acquiring primary reflection characteristics and a primary reflection area prediction graph through primary detection based on the multi-scale characteristics, and acquiring secondary reflection characteristics and a secondary reflection area prediction graph through secondary detection;

step 303, constraining the primary reflection feature and the secondary reflection feature by a deformable convolution, wherein the deformable convolution block aligns the primary reflection feature with the secondary reflection feature to accurately estimate the primary reflection and the secondary reflection;

step 304, sampling to the original resolution based on the primary reflection area prediction graph and the secondary reflection prediction graph, inputting to an encoder-decoder structure, and obtaining an offset estimation graph through an encoder, wherein the offset estimation graph is an estimated displacement graph;

in step 305, the ghost features obtained by merging the primary reflection feature, the secondary reflection feature and the offset estimation map are input to a decoder, and the decoder decodes the ghost region prediction map.

FIG. 5 shows a block diagram of a deep learning and ghost phenomenon based glass detection system according to an exemplary embodiment of the present application, the system comprising: an acquisition module 410, a ghost detection module 420, a glass segmentation module 430, and an output module 440.

The acquiring module 410 is configured to acquire a single Zhang Yuanshi input image.

The ghost detection module 420 is configured to extract ghost features based on the original input image through a deep learning method of the backbone network, and calculate and obtain a ghost region prediction graph.

The glass segmentation module 430 is configured to extract glass features under guidance of ghost cues based on a backbone network after connecting the ghost region prediction graph with an original input image channel, and then perform glass feature decoding and segmentation results of a glass region based on a convolutional neural network;

and an output module 440 for outputting the glass region prediction graph.

Fig. 6 shows a block diagram of a dual-layer reflection estimation module in a glass detection system based on deep learning and ghost phenomena according to an exemplary embodiment of the present application.

The ghost detection module can obtain multi-scale features by using a backbone network, then input the multi-scale features into ghost features in an image through a specially designed dual reflection estimation module, and finally obtain a ghost region prediction graph through a decoder.

The ghost detection module 420 includes: a backbone network 510, a dual reflection estimation module 520, and a decoder 530. Preferably, backbone network 510 is a SwinTransformer network.

The backbone network 510 is used to detect potential glass regions. Preferably, the backbone network uses an existing Swin Transformer. First, the present application takes advantage of the swinTransformer in extracting low-level features and learning long-range dependencies on these aspects, since ghost effects are typically observed as duplicates of edges in the input image. Second, swinTransformer can model region dependencies in a local to global hierarchical fashion, which helps to handle appearance changes (e.g., intensity and shape) of ghost effects.

Compared with the traditional method, the transformation former has the advantages of extracting long-distance dependent characteristics, and compared with the traditional method based on the CNN, the method can obtain larger feeling at a shallower layer of the network, and can also have the advantages of extracting the characteristics favorable for glass detection.

The dual reflection estimation module 520 detects ghost effects by acquiring an offset estimation map on multiple scales with the dual reflection estimation module, provided that the multi-scale features obtained from the backbone network have been obtained. The dual reflection estimate is used to detect the presence of a reflection shift. The present application detects any two primary reflections and then estimates the offset between the reflective layers. Such a design brings two practical advantages. First, based on the ghost phenomenon, the present application proposes that the model can handle any type of glass surface, regardless of the number of glass regions in the input image. Second, the designed module does not need to accurately estimate the offset, so the application does not provide ground truth values of the real scene, and only monitors the synthesized scene.

The present application uses two branches to detect primary and secondary reflections. The present application uses deformable convolution blocks to align the characteristic displacement of primary reflections with the characteristics of secondary reflections to accurately estimate both primary and secondary reflections. The estimated prediction map of primary and secondary reflections is first up-sampled to the original resolution and then input to the encoder-decoder structure to estimate a displacement map, which is an offset estimation map. A non-0 value in the displacement map may indicate the presence of ghost effects in the region. The obtained displacement map can provide a powerful clue to distinguish ghost effect from single reflection. And finally, downsampling the estimated displacement diagram, connecting the downsampled displacement diagram with the characteristic diagrams of the primary reflection and the secondary reflection, and inputting the fused displacement diagram into a decoder.

The decoder part uses convolutional neural network, and inputs the characteristics output by the double reflection estimation module into the decoder to obtain the final ghost region prediction result.

The convolution network is used for decoding and fusing the multi-scale characteristics extracted by the backbone network, and the fusion of the multi-scale characteristics is beneficial to detecting glass areas with different sizes.

The loss function used by the supervision part involved in the network of the present application is as follows.

The loss function used in the predictive supervision of the primary reflection 2D mask, the secondary reflection 2D mask, and the ghost area 2D mask is a BCE loss function, as follows:

equation one:

where i indicates the scale index of the prediction mask map and s indicates the total number of scales. M represents the predicted 2D mask and,

representing ground truth values.

The loss function used by the characteristic constraint part is a mean square error loss function, and the specific calculation formula is as follows:

formula II:

wherein the method comprises the steps of

For deformable convolution operations, ++>

For the features corresponding to primary reflection at the i scale, < >>

Is characteristic of the secondary reflection at the corresponding scale i.

Fig. 7 is a schematic diagram of a glass inspection network according to an exemplary embodiment of the present application, where the glass splitting module is a U-shaped network structure and the module structure is an encoder-decoder structure for better inspecting glass of different sizes.

Wherein the encoder uses SwinTransformer to obtain four layers of feature representations of different scales, wherein the input is firstly subjected to position coding, and the number of SwinTransformer blocks stacked from shallow to deep 4 stages of the network is respectively 2, 16 and 2. The decoder combines the characteristic diagram obtained in the encoding stage and the characteristic diagram obtained in the decoding stage together in a channel link mode, combines deep-level and shallow-level characteristics, refines the image, and predicts and partitions the glass region according to the obtained characteristic diagram.

Loss function:

and (3) a formula III:

the loss function for the glass splitting module includes three parts, namely a BCE loss function, an SSIM loss function and an IOU loss. The final loss function is the sum of the three.

Fig. 8 shows graphs showing experimental results, and the present application is illustrated by the following specific experiments:

the spatial resolution of the input image of the present application is 384×384. Training was performed on a server equipped with Inteli9-10900X10core/3.7G/19.25MCPU,16G memory and NVIDIARTX309024GB video memory GPU. The environment for network training was python3.6.13 and pytorch1.7.1, the number of iterations was set to 200, and the batch size was set to 2. Adam optimizers are employed to train the network. The learning rate of the ghost detection module was set to 0.00001, and the learning rate of the glass dividing module was set to 0.00001.

The network training process is that a backbone network is utilized to extract ghost features first. The obtained features are input into a dual reflection estimation module, using two branches to detect primary and secondary reflections. The features of the primary reflection are then aligned with the features of the secondary reflection using a deformable convolution to accurately estimate the primary and secondary reflections. The estimated prediction maps of primary and secondary reflections are first up-sampled to the original resolution and then input to the encoder-decoder structure to obtain an offset estimation map. And finally, carrying out downsampling on the offset estimation graph obtained by estimation, connecting the offset estimation graph with the characteristic graphs of primary reflection and secondary reflection, and inputting the obtained offset estimation graph into a decoder to decode a ghost area after fusion.

The glass dividing module divides the glass region under the guidance of the ghost region. Specifically, after the obtained ghost area is connected with an input image channel, the ghost area is input into another backbone network to extract glass features, and then a convolutional neural network is used for feature decoding and a segmentation result of the glass area. Finally, training is repeated until a high-quality glass region prediction graph is obtained.

Fig. 8 is a diagram showing the experimental results, and shows the qualitative evaluation of the method on the real scene, and as can be seen from fig. 8, the ghost area can be accurately detected in the real scene, and the glass area can be accurately predicted. The first and second rows show the scene of a large glass block under different lighting conditions, and the method can accurately capture the ghost phenomenon and detect the glass region. The third fourth scene shows a complex outdoor scene with multiple glass pieces, where the third row has a portion of the glass pieces at the edge of the scene due to scene limitations and the fourth row has a portion of the glass pieces due to occlusion. The fifth row shows the result of glass detection in an outdoor scene, and the method can accurately detect glass. The present application is able to accurately predict these regions where a large portion of the glass has been detected. The application has good practicability and universality.

In summary, compared with the prior art, the glass detection method based on deep learning and ghost phenomena provided by the embodiment of the application is widely applicable to glass detection based on a single image. In the network configuration, the network is constructed by using a strong transducer, so that ghost features and glass region features can be extracted more accurately and efficiently. Meanwhile, by utilizing the special visual clue of double image, the glass region can be positioned more accurately, and a high-quality glass region prediction graph is obtained, so that the method has good robustness.

The foregoing description of the preferred embodiments is merely exemplary in nature and is not intended to limit the invention, but is intended to cover various modifications, substitutions, improvements, and alternatives falling within the spirit and principles of the invention.

Claims

1. A glass detection method based on deep learning and ghost phenomenon comprises the following steps:

2. The method for detecting glass based on deep learning and ghost phenomenon according to claim 1, wherein the obtaining ghost region prediction map comprises the steps of:

acquiring multi-scale characteristics based on a backbone network;

3. The method for detecting glass based on deep learning and ghost phenomenon according to claim 2, wherein the double-layer reflection estimation module process comprises the steps of:

inputting the primary reflection area prediction graph and the secondary reflection prediction graph into a coder decoder structure, and acquiring an offset estimation graph through a coder;

4. The method for detecting glass based on deep learning and ghost phenomenon according to claim 3, wherein,

the feature constraint subtracts the primary reflection feature and the secondary reflection feature through the variability convolution, and the calculation formula of the loss function is used:

wherein the method comprises the steps of

For the variability convolution operation, +.>

For the primary reflection feature at the corresponding i scale, +.>

Is a secondary reflection feature at the corresponding i scale.

5. The method for glass detection based on deep learning and ghost phenomenon according to any one of claims 1 to 4, wherein the backbone network is Swin Transformer.

6. A glass detection system based on deep learning and ghost phenomena, suitable for use in the detection method according to claim 1, characterized in that it comprises:

an acquisition module for acquiring a single Zhang Yuanshi input image;

and the output module is used for outputting the glass region prediction graph.

7. The glass detection system based on deep learning and ghost phenomena according to claim 6, wherein the glass dividing module has a U-shaped structure including an encoding part and a decoding part.