CN114842032A - Image processing method and device - Google Patents

Image processing method and device Download PDF

Info

Publication number
CN114842032A
CN114842032A CN202210559683.9A CN202210559683A CN114842032A CN 114842032 A CN114842032 A CN 114842032A CN 202210559683 A CN202210559683 A CN 202210559683A CN 114842032 A CN114842032 A CN 114842032A
Authority
CN
China
Prior art keywords
image
layer
processing method
image processing
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210559683.9A
Other languages
Chinese (zh)
Inventor
王孝星
张庆
田晓伟
纪海晶
李岳阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Siemens X Ray Vacuum Technology Ltd
Original Assignee
Siemens X Ray Vacuum Technology Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Siemens X Ray Vacuum Technology Ltd filed Critical Siemens X Ray Vacuum Technology Ltd
Priority to CN202210559683.9A priority Critical patent/CN114842032A/en
Publication of CN114842032A publication Critical patent/CN114842032A/en
Priority to DE102023113166.4A priority patent/DE102023113166A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4084Scaling of whole images or parts thereof, e.g. expanding or contracting in the transform domain, e.g. fast Fourier transform [FFT] domain scaling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0004Industrial image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/32Indexing scheme for image data processing or generation, in general involving image mosaicing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30108Industrial image inspection
    • G06T2207/30136Metal

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure relates to an image processing method for segmenting an input image, comprising: acquiring the input image; processing the input image using an image segmentation model; outputting the processed output image; the image segmentation model is a first convolutional neural network embedded in a global feature acquisition module, and the global feature acquisition module is embedded between a convolutional layer and a pooling layer of the first convolutional neural network. According to the present disclosure, an image segmentation method is provided that can segment an image over a larger field of view, and is more effective for situations where the target object is larger.

Description

Image processing method and device
Technical Field
The present disclosure relates to the field of image processing, and in particular, to a processing method and apparatus for segmenting a specific object in an image.
Background
However, for a large-sized object in an image, single-layer convolution can only sense a local visual field, and in order to sense a larger visual field, only a plurality of convolution layers can be overlapped, so that the efficiency is low.
Disclosure of Invention
In view of the above, the present disclosure provides an image processing method and apparatus.
According to an exemplary embodiment of the present disclosure, an image processing method for segmenting an input image, includes: acquiring the input image; processing the input image using an image segmentation model; outputting the processed output image; the image segmentation model is a first convolutional neural network embedded in a global feature acquisition module, and the global feature acquisition module is embedded between a convolutional layer and a pooling layer of the first convolutional neural network.
According to an exemplary embodiment of the present disclosure, the first convolutional neural network is a U-shaped convolutional neural network composed of a down-sampling layer and an up-sampling layer.
According to an exemplary embodiment of the present disclosure, the global feature acquisition module maps features from a coordinate space to an interaction space, further infers using a graph-convolution network to acquire global features, and finally back-maps to the coordinate space.
According to an exemplary embodiment of the present disclosure, the global feature acquisition module is a GloRe unit.
According to an exemplary embodiment of the present disclosure, the U-shaped convolutional neural network is a UNet model, and the downsampling layer of the image segmentation model is composed of five sets of convolutional layer groups and one GloRe unit, wherein the GloRe unit is inserted between a first convolutional layer group and a second convolutional layer group.
According to an exemplary embodiment of the present disclosure, before acquiring the input image, the method further includes: acquiring a first image; zooming the first image to obtain a second image; carrying out blocking operation on the second image by adopting a sliding window to obtain a plurality of input images; after the output image is obtained, the method further comprises the following steps: and splicing the output images obtained from each input image to obtain a segmented image.
According to an exemplary embodiment of the present disclosure, an image processing apparatus, characterized by comprising: at least one processor; a computer storage medium storing a computer program which, when executed by the at least one processor, implements the method in embodiments of the disclosure.
According to an exemplary embodiment of the present disclosure, a computer-readable storage medium stores a computer program, wherein the computer program, when executed by a processor, implements a method in an embodiment of the present disclosure.
According to an exemplary embodiment of the disclosure, a computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements the method in an embodiment of the disclosure.
According to the image processing method provided by the disclosure, a larger object in an image can be effectively segmented.
Drawings
The foregoing and other features and advantages of the invention will become more apparent to those skilled in the art to which the invention relates upon consideration of the following detailed description of a preferred embodiment of the invention with reference to the accompanying drawings, in which:
FIG. 1 is a flow chart of an exemplary embedding graph convolution semantic segmentation based metal surface flaw detection method of the present disclosure;
FIG. 2 is a partial data sample after preprocessing in an exemplary embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a graph rolling GloRe method in an exemplary embodiment of the present disclosure;
FIG. 4 is a graph of a UNet network model based on embedding graph convolution in an exemplary embodiment of the present disclosure;
FIG. 5 is a graph of partial fault prediction results in an exemplary embodiment of the present disclosure;
Detailed Description
In order to make the objects, technical solutions and advantages of the present disclosure more apparent, the present disclosure is further described in detail by referring to the following examples. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.
In one exemplary embodiment, an image processing method of the present disclosure for segmenting an input image includes: acquiring the input image; processing the input image using an image segmentation model; outputting the processed output image; the image segmentation model is a first convolution neural network embedded into a global feature acquisition module, and the global feature acquisition module is embedded between a convolution layer and a pooling layer of the first convolution neural network. The global feature acquisition module is added, so that the problem that the visual field of each convolutional layer of the convolutional neural network is small can be solved, the segmentation of the object with a large size can be efficiently completed, and the situation that a plurality of convolutional layers are superposed to obtain a large receptive field is avoided. Here, it is important that a global feature acquisition module is inserted between the convolutional layer and the pooling layer to acquire global information before pooling. The global obtaining module can be various as long as it can obtain the relationship among the modules, which is beneficial to obtaining the global information.
In one exemplary embodiment, the first convolutional neural network is a U-shaped convolutional neural network composed of a downsampling layer and an upsampling layer. The U-shaped convolutional neural network comprises the processes of encoding (down sampling) and decoding (up sampling), and finally the feature map is obtained. In the process of coding, the image is gradually convoluted and pooled to obtain a characteristic image, and then the characteristic image is reversely decoded to finally form a black-and-white image with the same size as the original image, which is used for distinguishing different objects to obtain a segmentation image. Illustratively, the UNet model may be used. Certainly, models such as the deplab v3 and the deplab v3+ can also replace UNet models, wherein UNet models have the advantages of simplicity, high efficiency, high inference speed, easiness in construction and the like, so that the models are simpler to deploy and faster in monitoring speed.
In one exemplary embodiment, the global feature acquisition module maps features from coordinate space to interaction space, uses graph convolution networks to further infer to acquire global features, and finally back-maps them to coordinate space. An exemplary use of a GloRe cell is the basic idea of using a graph-rolled GloRe cell to project features of interest from coordinate space to interaction space, to perform relationship inference, and then back to the original coordinate space, so that relationship inference can be performed at an early stage of the network model. Other plug-ins that can obtain global features are also possible, as long as they can capture the relationships between the parts in this step. In one exemplary embodiment, the U-shaped convolutional neural network is a UNet model, UNet is particularly effective for small data sets, and can be trained using fewer samples. The down-sampling layer of the image segmentation model consists of five groups of convolution layer groups and a GloRe unit, wherein the GloRe unit is inserted between the first convolution layer group and the second convolution layer group. The insertion here makes it possible to obtain global features at an earlier time, although insertions between other layers are also possible, depending on the specific form of the network and the characteristics of the identified objects.
In one exemplary embodiment, where each convolution layer set includes two convolution layers and one pooling layer, the number of channels doubles for each downsampling pass. The number here is exemplary and other numbers of convolutional layers are possible, chosen according to the actual situation.
In an exemplary embodiment, before acquiring the input image, the method further comprises: acquiring a first image; zooming the first image to obtain a second image; carrying out blocking operation on the second image by adopting a sliding window to obtain a plurality of input images; after the output image is obtained, the method further comprises the following steps: and splicing the output images obtained from each input image to obtain a segmented image. The image is preprocessed firstly, the preprocessed image is zoomed to facilitate subsequent operations, for a larger image, the embodiment firstly divides the image into blocks, processes each block respectively, and then splices the blocks to obtain a final segmentation image.
In an exemplary embodiment, referring particularly to fig. 1, the image processing method is applied to the technical field of metal surface image processing, namely, flaw detection of a metal surface by convolution semantic segmentation based on an embedded graph. In the production process of metal materials (such as aluminum profiles, steel materials, rails, metal parts and the like), defects can be randomly generated on the surface of the metal materials due to various uncertain factors, and the defects comprise scratches, protruding powder, dirty spots, oil residues and the like. The existence of these defects not only affects the appearance of the metal materials, but also affects the normal use of the materials, and some defects can cause the grade of the products to be reduced, and directly affect the economic benefit of enterprises, so that the defect detection of the metal materials plays a crucial role in the production of the metal materials. At present, most metal material production enterprises still adopt a manual detection method, and the method has low efficiency and poor stability, and is easy to cause conditions such as missing detection, false detection and the like. With the wide application of the deep learning technology, the flaw intelligent detection method based on the deep neural network model has the characteristics of robustness of detection performance, no influence of subjective factors and the like, so that high and stable detection accuracy can be maintained for a long time.
In practical application, the current metal surface flaw detection method based on deep learning has the following problems: (1) training of the inspection model typically requires an enormous number of flaw image samples. In actual production, image data sets for industrial flaw detection are often fewer and are not easy to obtain, the imbalance of samples among various categories of training data can be caused by the large difference of the number of flaw samples of different categories, and the labeling cost of the flaw images with large number is high; (2) for the detection of some large-size flaws, a large receptive field is usually required during model training, a single-layer convolutional layer can only obtain a local receptive field, and a plurality of convolutional layers can only be superposed when the single-layer convolutional layer is required to obtain the large receptive field to capture the relationship between far regions, so that the detection is very inefficient.
Convolutional networks are good at capturing local relationships through convolutional computation, but are often inefficient when long-range global relationships are to be captured, requiring multiple convolutional layers to be stacked. The disclosed exemplary embodiments use a metal surface flaw detection method based on embedded graph convolution semantic segmentation, whose basic idea is to project features of interest from coordinate space to interaction space using a graph convolution GloRe unit, perform relationship inference, and then return to the original coordinate space, so that relationship inference can be performed at an early stage of a network model.
In the exemplary flow of fig. 1, taking an aluminum profile data set as an example, the method includes the following steps:
s1: in step 1, the experimental data set comprises 87 aluminum profile pictures, 78 aluminum profile original pictures are randomly selected and preprocessed, and specifically, the method comprises the following steps: respectively carrying out zooming processing on each image to reduce the size of the image; marking the zoomed image to obtain a label image, namely distinguishing defective pixels from background pixels; and performing overlapping blocking operation on the zoomed image and the corresponding label image in a sliding window mode. After the preprocessing, 2730 sub-images and corresponding sub-label images are obtained. The partially preprocessed data samples are shown in fig. 2, where a column in fig. 2 represents a sub-image, and b column represents a label image corresponding to the sub-image. In fig. 2, the first row of images is "dirty dot" images, the second row of images is "scratch" images, the third row of images is "bond" images, the fourth row of images is "convex powder" images, and the fifth row of images is "oil residue" images. Randomly selecting part of the preprocessed sub-images as a training set, and the rest of the sub-images as a verification set, wherein the ratio of the two is 9: 1. It is particularly noted that in the third row of "stuck" images, reflections and defects are distinguished, where reflections are not labeled as defects, so that the model can accurately learn the characteristics of the defects.
S2: in step 2, a UNet network model based on a GloRe unit is constructed, the model is trained by adopting the training set and the verification set in step 1, and the trained improved UNet network model is obtained.
S3: in the step, the remaining 9 pictures in the 87 aluminum profile pictures are preprocessed, specifically, the preprocessing comprises the following steps: respectively carrying out zooming processing on each image to reduce the size of the image; labeling the zoomed image to obtain a label image; and carrying out non-overlapping blocking operation on the zoomed image in a sliding window mode. And for each test image, respectively inputting a plurality of preprocessed sub-images into the trained model as test samples to obtain a predictor result image, and finally splicing the plurality of predictor result images to obtain a final flaw prediction image.
Next, specific processes in the respective steps are described in further detail.
Wherein, the scaling treatment is carried out on 78 aluminum profile original images in the step 1, and the specific process is as follows: the original aluminum profile image with the size of 2560 × 1920 is zoomed to 0.5 times of the original, namely 1280 × 960.
Wherein, the step 1 of adopting sliding window to carry out overlapped blocking operation comprises the following specific processes: respectively sliding a region window with the size of 640 × 480 on an image of 1280 × 960 and a label image corresponding to the image according to the sequence from top to bottom and from left to right to obtain overlapped image blocks, wherein the step sizes in the width direction and the height direction are both 100, and if the width of the cut image block is less than 640 or the height of the cut image block is less than 480, discarding the image block. After the 78 images are respectively preprocessed, 2730 sub-images and corresponding label images are obtained.
The GloRe unit in the step 2 realizes mapping of a coordinate space and an interaction space through weighting global pooling, and performs relational reasoning in the interaction space through graph convolution. Referring to fig. 3, the specific process includes three steps:
(1) the first step is as follows: from coordinate space to interaction space.
Given a set of input features X ∈ R L×C C is the feature dimension, L ═ W × H (where W is width and H is height), and each original feature is mapped into the interaction space by the learning projection function f to obtain a new feature V, which can be defined as:
V=f(X)∈R N×C (1)
where N is the number of nodes in the interaction space.
To reason directly about a set of regions, the projection function is expressed as a linear combination of the original features, so that the new features can aggregate information from multiple regions, specifically, each new feature is generated by the following formula:
Figure BDA0003655383890000091
wherein the learnable projection weight B ═ B 1 ,…,b N ]∈R N×L ,x j ∈R 1×C ,v i ∈R 1×C ,i∈[1,N],j∈[1,C]。
To reduce the output dimension and enhance the capacity of the projection function, X is implemented as phi (X; W) φ ) B is realized as B ═ θ (X; w is a group of θ ) Modeling phi (-) and theta (-) by two 1 x 1 convolutional layers, where W φ And W θ Is a weight parameter that each layer can learn, namely:
V=θ(X;W θ )φ(X;W φ )∈R N×C (3)
(2) the second step is that: the relationship inference is performed by graph convolution.
Projecting the characteristics of the coordinate space to an interaction space to obtain a graph with each node containing characteristic description, regarding the characteristics as the nodes of a complete connected graph, adopting graph convolution to reason on the complete connected graph, and expressing the graph convolution as follows:
Z=GVW g =((I-A g )V)W g (4)
wherein G and A g Representing an N adjacency matrix, W, for propagating information at a node g Representing the state update function, I representing the identity matrix, and V representing the new feature that the original feature maps to the interaction space. The realization process is that graph convolution reasoning is carried out in the C channel direction and the node direction through one-dimensional convolution respectively.
(3) The third step: from the interaction space to the coordinate space.
The output features are projected back into the original space after the relational inference in such a way that features from the post-inference update can be used by subsequent convolutional layers to make better decisions. Given a node-feature matrix Z ∈ R N×C Transforming the node features into Y ═ g (z) e R by learning a back projection function L×C . Similar to the first step, the expression g (z) is expressed in terms of linear projection:
Figure BDA0003655383890000101
wherein D ═ D 1 ,…,d L ]∈R L×N Implementing D as D ═ B T Reuse of the projections generated in the first step to reduce the computational cost without any negative impact on the final accuracy, z j ∈R 1×C , y i ∈R 1×C ,i∈[1,L],j∈[1,C]。
The implementation process from the interactive space to the coordinate space reuses the output of the 1 × 1 convolution modeled for θ (-) as a first step as a weight, back-projects the information from the graph convolution layer to the original coordinate space, performs dimension expansion, and then matches the output dimension with the input dimension through another 1 × 1 convolution layer to form a residual path.
The concrete structure of GloRe is shown in fig. 3, which consists of five convolutions, two for dimensionality reduction and extension on the input features X and output features Y (top left and bottom left), one for generating a double projection B of the coordinate space and potential interaction space (top right), and two for global reasoning based on the graph of the interaction space (two in the middle left).
The UNet model embedded with the GloRe in the step 2 is a U-shaped convolutional neural network structure formed by an up-sampling layer and a down-sampling layer, wherein the down-sampling layer is composed of five groups of convolutional layer groups and a GloRe unit. The GloRe cell was inserted between the first and second set of convolutional layers and the model structure is shown in fig. 4. Each convolution layer group comprises two convolution layers and a pooling layer, the number of channels is doubled after each downsampling, the number of convolution kernels of each layer of the first convolution layer group is 64, and the number of convolution kernels of each layer of other convolution layer groups is 128, 256, 512 and 1024 in sequence. The maximum pooling is adopted in the pooling layer, and the step length is 2. The upsampling layer consists of five convolutional layer groups. Each convolution layer group comprises one up-sampling operation layer and two convolution operation layers. The number of the up-sampling operation layers of the five convolution layer groups is 1024, 512, 256, 128 and 64 in sequence, and the size of each convolution kernel is 2 multiplied by 2. The number of each layer of convolution kernel of the first convolution layer group is 1024, the number of each layer of convolution kernel of other convolution layer groups is 512, 256, 128 and 64 in sequence, and the sizes of the convolution kernels are all 3 multiplied by 3.
In fig. 4, the left network may be regarded as a feature extraction network, and the right network may be regarded as a feature fusion network. In fig. 4, arrow 1 indicates conv 3 × 3, ReLU indicates convolution using a convolution kernel of 3 × 3, and the number of channels is 64; arrow 2 represents concat splicing, which is to perform channel dimension splicing on the feature map after up-sampling and the feature map corresponding to the left network. For example, the uppermost arrow 2 passes through the GloRe unit to obtain a 64-channel feature map, and the feature map (also 64 channels) obtained by up-sampling in the right image are spliced together to form a 128-channel feature map. Arrow 3 indicates max pool 2 × 2, which is the pooling layer, and the maximum pooling is adopted, and the size is 2 × 2; arrow 4 indicates up-conv 2 × 2, which is deconvolution or upsampling, with a convolution kernel size of 2 × 2; arrow 5 indicates conv 1 × 1, which is convolution with a convolution kernel of 1 × 1. More accurate context information can be obtained in the splicing process, and a better segmentation effect is achieved.
Training models by using a training set and a verification set in the step 2, specifically, calculating a cross entropy loss value between a predicted value and an actual value of each pixel by using a cross entropy loss function during training, and then averaging all pixels; an RMSProp optimizer is adopted, the batch _ size is 16, the learning rate is 0.001, the parameter rho is 0.9, the smoothing term epsilon is 1e-7, and the iteration times are 300. And in the training process, a verification set is used for monitoring the current training result, and the model with the minimum average loss on the verification set is taken as the final trained model.
In step 3, the scaling process is performed on the 9 predicted images, and the specific process is as follows: a prediction image of size 2560 x 1920 is scaled to 0.5 times the original size, i.e., 1280 x 960.
Wherein, the sliding window is adopted to carry out the blocking operation in the step 3, and the specific process is as follows: sliding a region window with the size of 640 × 480 on a predicted image with the size of 1280 × 960 from top to bottom and from left to right to obtain non-overlapping sub image blocks, wherein the step size in the width direction is 640, the step size in the height direction is 480, and cutting the sub image blocks to obtain 4 sub image blocks.
The 4 sub image blocks obtained from each predicted image are respectively input into the trained model to obtain 4 predictor result graphs, and the 4 predictor result graphs are spliced to obtain a final defect prediction result image of 1280 × 960. Fig. 5 shows a partial prediction result image, where in fig. 5, a column is an original image, b column is a Label image, i.e., an artificially labeled image, and c column is a prediction result image. As can be seen from fig. 5, the method achieves a better effect, all images in the test set are respectively preprocessed and predicted, and the average cross-over ratio (mlou) of the prediction results reaches 87.2%. Wherein for the first row of images, the model accurately distinguishes between glistenings and defects.
In the disclosure, aiming at the defects of the current metal surface flaw detection method based on deep learning, a metal surface flaw detection method based on embedded graph convolution semantic segmentation is provided. Specifically, the UNet-based semantic segmentation model adopts an encoder and a decoder and a topological structure of jump connection, and can be trained to obtain a model with better flaw detection performance under the condition of a small number of images. Further, a graph neural network module is embedded in the UNet structure, specifically, a GloRe unit is embedded to project the features from a Coordinate Space (Coordinate Space) to a potential Interaction Space (Interaction Space) and perform graph convolution reasoning, and then the features are updated by back projection to the Coordinate Space, so that the relationship between distant regions of any shape can be captured.
In the embodiment of the disclosure, a metal surface flaw detection method combining a semantic segmentation model and a graph neural network technology is provided, the method is a flaw intelligent detection method, in practical application, manual detection can be replaced, and some defects of the current flaw detection method based on deep learning can be overcome. The monitoring device of the embodiment of the disclosure has at least the following beneficial effects:
(1) in practical application, the industrial defect detection data sets are often fewer, and the method can well complete the defect detection task under the condition of a small number of data sets. The semantic segmentation network UNet can achieve more accurate flaw segmentation on a small number of training sets. In practical applications, the task of detecting defects on the surface of industrial metal needs to determine whether the current product has defects and the sizes of the defects, that is, the classification of the defects on the surface of metal at the pixel level needs to be obtained. The semantic segmentation network UNet is an end-to-end detection network, and when predicting, all defective pixels in an image can be marked by classifying each pixel in the image, so that a defect detection result is obtained.
(2) The size of the metal surface flaws is greatly different, and partial flaws occupy a large area. To detect large size defects, a larger field of view is required. Whereas a single convolutional layer can only obtain a small receptive field, so to obtain a larger region of interest, it is conventional practice to superimpose multiple convolutional layers, which is less efficient because the relationship between arbitrarily shaped distant regions on the signature can only be captured by the near-top layer with a sufficiently large receptive field. The present disclosure projects a set of features from coordinate space into potential interaction space by embedding graph convolved GloRe cells in UNet networks; in an interaction space, each set of disjoint areas can be represented by one feature, each node stores the new feature as the state by constructing a new fully connected graph, relationship inference is simplified into interaction between fully connected graph nodes, and graph convolution is applied to model and infer the context relationship between node pairs so as to update the node states; and then converting the updated features after reasoning into the original coordinate space through back projection. Therefore, the relation between any areas in the whole space can be captured, and the problems of false detection and missed detection are effectively improved.
According to another aspect of embodiments of the present disclosure, there is provided a non-transitory computer readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements an image processing method according to any one of the above-described embodiments of the present disclosure.
According to another aspect of embodiments of the present disclosure, a computer program product is proposed, comprising a computer program, wherein the computer program, when executed by a processor, implements an image processing method according to any of the above-mentioned embodiments of the present disclosure.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a computer-readable storage medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may be a computer readable signal medium or a computer readable storage medium. A computer readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a readable storage medium include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be performed in parallel, sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.
Although embodiments or examples of the present disclosure have been described with reference to the accompanying drawings, it is to be understood that the above-described methods, systems and apparatus are merely exemplary embodiments or examples and that the scope of the present invention is not limited by these embodiments or examples, but only by the claims as issued and their equivalents. Various elements in the embodiments or examples may be omitted or may be replaced with equivalents thereof. Further, the steps may be performed in an order different from that described in the present disclosure. It is important that as technology evolves, many of the elements described herein may be replaced with equivalent elements that appear after the present disclosure.
It should be noted that, in the foregoing embodiments, various features described in the above embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, various combinations that are possible in the present disclosure are not described again.

Claims (10)

1. An image processing method for segmenting an input image, comprising:
acquiring the input image;
processing the input image using an image segmentation model;
outputting the processed output image;
the image segmentation model is a first convolutional neural network embedded in a global feature acquisition module, and the global feature acquisition module is embedded between a convolutional layer and a pooling layer of the first convolutional neural network.
2. The image processing method according to claim 1, wherein the first convolutional neural network is a U-shaped convolutional neural network composed of a downsampling layer and an upsampling layer.
3. The image processing method of claim 2, wherein the global feature acquisition module maps features from coordinate space to interaction space, uses graph convolution network to further infer to acquire global features, and finally maps back to coordinate space.
4. The image processing method of claim 3, the global feature acquisition module being a GloRe unit.
5. The image processing method according to claim 4, wherein the U-shaped convolutional neural network is a UNet model, and the downsampling layer of the image segmentation model is composed of five groups of convolutional layer groups and one GloRe unit, wherein the GloRe unit is inserted between a first convolutional layer group and a second convolutional layer group.
6. The image processing method of claim 5, wherein each convolution layer group includes two convolution layers and one pooling layer, and the number of channels is doubled for each downsampling.
7. The image processing method according to claim 1, further comprising, before acquiring the input image:
acquiring a first image;
zooming the first image to obtain a second image;
carrying out blocking operation on the second image by adopting a sliding window to obtain a plurality of input images;
after obtaining the output image, the method further comprises:
and splicing the output images obtained from each input image to obtain a segmented image.
8. An image processing apparatus characterized by comprising:
at least one processor;
a computer storage medium storing a computer program which, when executed by the at least one processor, implements the method according to any one of claims 1-7.
9. A computer-readable storage medium storing a computer program, wherein the computer program realizes the method according to any one of claims 1-7 when executed by a processor.
10. A computer program product comprising a computer program, wherein the computer program realizes the method according to any of claims 1-7 when executed by a processor.
CN202210559683.9A 2022-05-20 2022-05-20 Image processing method and device Pending CN114842032A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210559683.9A CN114842032A (en) 2022-05-20 2022-05-20 Image processing method and device
DE102023113166.4A DE102023113166A1 (en) 2022-05-20 2023-05-19 Image processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210559683.9A CN114842032A (en) 2022-05-20 2022-05-20 Image processing method and device

Publications (1)

Publication Number Publication Date
CN114842032A true CN114842032A (en) 2022-08-02

Family

ID=82571607

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210559683.9A Pending CN114842032A (en) 2022-05-20 2022-05-20 Image processing method and device

Country Status (2)

Country Link
CN (1) CN114842032A (en)
DE (1) DE102023113166A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116523888A (en) * 2023-05-08 2023-08-01 北京天鼎殊同科技有限公司 Pavement crack detection method, device, equipment and medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116523888A (en) * 2023-05-08 2023-08-01 北京天鼎殊同科技有限公司 Pavement crack detection method, device, equipment and medium
CN116523888B (en) * 2023-05-08 2023-11-03 北京天鼎殊同科技有限公司 Pavement crack detection method, device, equipment and medium

Also Published As

Publication number Publication date
DE102023113166A1 (en) 2023-11-23

Similar Documents

Publication Publication Date Title
Tan et al. Automatic detection of sewer defects based on improved you only look once algorithm
CN108664981B (en) Salient image extraction method and device
CN109671071B (en) Underground pipeline defect positioning and grade judging method based on deep learning
CN110264444B (en) Damage detection method and device based on weak segmentation
CN114445366A (en) Intelligent long-distance pipeline radiographic image defect identification method based on self-attention network
CN112330593A (en) Building surface crack detection method based on deep learning network
CN110807749A (en) Single image raindrop removing method based on dense multi-scale generation countermeasure network
CN111027539A (en) License plate character segmentation method based on spatial position information
CN113610778A (en) Bridge surface crack detection method and system based on semantic segmentation
CN115661505A (en) Semantic perception image shadow detection method
CN115147418A (en) Compression training method and device for defect detection model
CN114842032A (en) Image processing method and device
CN116994000A (en) Part edge feature extraction method and device, electronic equipment and storage medium
CN116703885A (en) Swin transducer-based surface defect detection method and system
CN115410059A (en) Remote sensing image part supervision change detection method and device based on contrast loss
CN114612803A (en) Transmission line insulator defect detection method for improving CenterNet
CN115908793A (en) Coding and decoding structure semantic segmentation model based on position attention mechanism
CN114022865A (en) Image processing method, apparatus, device and medium based on lane line recognition model
CN113516652A (en) Battery surface defect and adhesive detection method, device, medium and electronic equipment
CN115830597B (en) Domain self-adaptive remote sensing image semantic segmentation method from local to global based on pseudo tag generation
CN115830514A (en) Method and system for calculating surface flow velocity of whole river section of riverway with curve
CN115661097A (en) Object surface defect detection method and system
CN116228637A (en) Electronic component defect identification method and device based on multi-task multi-size network
CN116363447A (en) Wafer defect detection method, defect detection model training method and device
CN115240163A (en) Traffic sign detection method and system based on one-stage detection network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination