CN115578565B - Attention scale perception guided lightweight U-net method, device and storage medium - Google Patents

Attention scale perception guided lightweight U-net method, device and storage medium Download PDF

Info

Publication number
CN115578565B
CN115578565B CN202211394805.XA CN202211394805A CN115578565B CN 115578565 B CN115578565 B CN 115578565B CN 202211394805 A CN202211394805 A CN 202211394805A CN 115578565 B CN115578565 B CN 115578565B
Authority
CN
China
Prior art keywords
feature
attention
features
deepest
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211394805.XA
Other languages
Chinese (zh)
Other versions
CN115578565A (en
Inventor
周展
李朋超
蔡丽蓉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jushi Intelligent Technology Co ltd
Original Assignee
Beijing Jushi Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jushi Intelligent Technology Co ltd filed Critical Beijing Jushi Intelligent Technology Co ltd
Priority to CN202211394805.XA priority Critical patent/CN115578565B/en
Publication of CN115578565A publication Critical patent/CN115578565A/en
Application granted granted Critical
Publication of CN115578565B publication Critical patent/CN115578565B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a light U-net method, a device and a storage medium for attention scale perception guidance, which are applied to the technical field of workpiece surface defect region segmentation and comprise the following steps: acquiring an average pooling result and a maximum pooling result of the deepest layer feature F through an attention sensing module, and respectively applying 1 × 1 convolutional layers shared by weights to the average pooling result and the maximum pooling result; generating attention A based on the obtained features F1 and F2, generating a feature Fscale after scaling operation, finally carrying out element summation operation between the feature Fscale and the input deepest feature F through a learnable parameter a, and outputting the feature F SSAM (ii) a According to the scheme, the attention scale perception module is used, the scale features of the multi-defect target are learned and judged through the attention mechanism capable of perceiving the scale, the discriminant features of the defect target are effectively focused, the interference of a complex background is restrained, and the problem that features such as background textures are easily confused with the defects is effectively avoided.

Description

Attention scale perception guided lightweight U-net method, device and storage medium
Technical Field
The invention relates to the technical field of workpiece surface defect region segmentation, in particular to a light U-net method, a device and a storage medium guided by attention scale perception.
Background
In recent years, high-precision workpiece surface defect region segmentation based on a deep learning algorithm is rapidly developed. The current representative method adopts a coder-decoder framework such as U-Net or a DeeplabV3 method, and realizes effective fusion of multi-scale features by fusing multi-level features such as bottom-layer spatial detail and high-layer discrimination semantics of an image, or aggregates context information in different distance ranges by expansion convolution pyramids in different receptive field ranges, so as to realize prediction of a defect region;
however, as industrial practical application scenes are complex and changeable, defect forms are varied, and features such as background textures and the like are easily confused with defects, the problems of large intra-class difference and small inter-class difference are formed, in the process of dividing a defect region, representative methods such as U-Net and the like only focus on fusion of multi-scale features of different levels, fusion of context information of different receptive fields or a focus mechanism of space and channel dimensions, and complex background interference is difficult to effectively suppress.
Disclosure of Invention
In view of this, an object of the present invention is to provide a method, an apparatus, and a storage medium for attention-scale-aware-guided lightweight U-net, so as to solve the problem in the prior art that, in the defect region segmentation process, only multi-scale features of different levels are fused, context information of different receptive fields is fused, or an attention mechanism of space and channel dimensions is focused, but complex background interference is ignored, so that features such as background texture are easily confused with defects.
According to a first aspect of embodiments of the present invention, there is provided an attention-scale-aware-guided lightweight U-net method, comprising:
inputting an image to be segmented into a segmentation network, obtaining a plurality of layers of feature maps with different levels after multilayer convolution and pooling operation, and selecting a deepest feature F;
inputting the deepest layer features F into an attention scale perception module for obtaining an average pooling result and a maximum pooling result of the deepest layer features F;
the attention sensing module respectively applies the 1 × 1 convolutional layer shared by the weights to the average pooling result and the maximum pooling result to obtain a feature F1 and a feature F2;
generating attention A emphasizing corresponding features in the deepest features F based on the features F1 and the features F2;
generating a scaled feature Fscale based on the feature F1, the feature F2, and the attention a;
element summation operation is carried out between the feature Fscale and the input deepest feature F through the learnable parameter a to obtain the feature F SSAM
Mutual injectionFeature F output by intention scale perception module SSAM Executing up-sampling operation, and fusing the sampling result with the multilayer feature graphs of different levels to obtain a feature X m The feature X m And outputting a segmentation result of the defect area of the image to be segmented after one-layer convolution.
Preferably, the first and second electrodes are formed of a metal,
inputting the deepest layer features F into the attention scale perception module, wherein the step of obtaining an average pooling result and a maximum pooling result of the deepest layer features F comprises the following steps:
inputting the deepest layer features F into an attention scale perception module;
and the attention scale perception module carries out aggregation processing on the deepest layer characteristic F through parallel maximum pooling operation and average pooling operation respectively to obtain an average pooling result and a maximum pooling result of the deepest layer characteristic F.
Preferably, the first and second electrodes are formed of a metal,
the outputting of the segmentation result of the defective region of the image to be segmented includes:
feature F output by attention scale perception module SSAM Performing an upsampling operation;
carrying out U-net type channel dimension splicing fusion on the sampling result and the upper layer feature diagram of the deepest feature F, and carrying out convolution operation and activation operation on the fusion result to obtain a feature X1;
then, the characteristic X1 is subjected to up-sampling operation, the steps are repeated until the U-net type channel dimensionality splicing fusion is carried out on the characteristic X and the shallowest layer characteristic graph, and the fusion result is subjected to convolution operation and activation operation to obtain the characteristic X m
Will be characterized by X m And outputting the segmentation result of the defect area of the image to be segmented after one layer of convolution.
Preferably, the first and second liquid crystal display panels are,
the generating scaled features Fscale includes:
an element multiplication operation is performed between attention a, feature F1 and feature F2, respectively, for generating a scaled feature Fscale.
Preferably, the first and second electrodes are formed of a metal,
the generating of the attention A emphasizing the corresponding feature in the deepest feature F based on the feature F1 and the feature F2 comprises:
based on the features F1 and F2, attention a emphasizing the corresponding feature in the deepest feature F is generated by the softmax function.
Preferably, the first and second electrodes are formed of a metal,
inputting the image to be segmented into a segmentation network, obtaining a plurality of layers of feature maps with different levels after multilayer convolution and pooling operation, wherein the step of selecting the deepest feature F comprises the following steps:
inputting an image to be segmented into a segmentation network, and obtaining a characteristic M after convolution and pooling N Said feature M N The feature map of the shallowest layer is obtained;
for feature M N After convolution and pooling operation, the characteristic M is obtained N-1
And repeating the steps, and obtaining the deepest layer characteristic F after convolution and pooling operation for preset times.
According to a second aspect of embodiments of the present invention, there is provided an attention-scale perceptually-guided lightweight U-net device, the device comprising:
a characteristic diagram acquisition module: the method comprises the steps that images to be segmented are input into a segmentation network, after multilayer convolution and pooling operation, multilayer feature maps with different levels are obtained, and the deepest feature F is selected;
an input module: the system comprises an attention scale sensing module, a clustering module and a storage module, wherein the attention scale sensing module is used for inputting the deepest layer characteristics F into the attention scale sensing module and obtaining an average pooling result and a maximum pooling result of the deepest layer characteristics F;
a convolution application module: the attention sensing module is used for respectively applying the 1 × 1 convolutional layer shared by the weights to the average pooling result and the maximum pooling result to obtain a feature F1 and a feature F2;
the attention map generation module: generating attention A for emphasizing corresponding features in the deepest features F based on the features F1 and the features F2;
a scaling module: for generating a scaled feature Fscale based on feature F1, feature F2 and attention a;
an element summation module: the method is used for carrying out element summation operation on the feature Fscale and the input deepest feature F through the learnable parameter a to obtain the feature F SSAM
An output module: feature F for sensing module output for attention scale SSAM Executing up-sampling operation, and fusing the sampling result with the multilayer feature graphs of different levels to obtain a feature X m The feature X m And outputting the segmentation result of the defect area of the image to be segmented after one layer of convolution.
According to a third aspect of embodiments of the present invention, there is provided a storage medium storing a computer program which, when executed by a processor, implements each step in the attention metric perception guided lightweight U-net method as described in any one of the above.
The technical scheme provided by the embodiment of the invention can have the following beneficial effects:
according to the method, an average pooling result and a maximum pooling result of the deepest feature F are obtained through an attention sensing module, and then 1 × 1 convolutional layers shared by weights are applied to the average pooling result and the maximum pooling result respectively through the attention sensing module to obtain a feature F1 and a feature F2; generating attention A emphasizing corresponding features in the deepest features F based on the features F1 and the features F2, generating features Fscale after scaling operation, and finally performing element summation operation between the features Fscale and the input deepest features F through learnable parameters a by an attention sensing module to output the features F SSAM (ii) a According to the scheme, the attention scale sensing module is used, the scale characteristics of the multi-defect target are learned and judged through the attention mechanism capable of sensing scales, the remote dependency relationship can be effectively captured with low calculation cost, the distinguishing characteristics of the defect target are effectively focused, the interference of a complex background is inhibited, and the problem that the characteristics such as background textures are easily confused with the defects is effectively avoided.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
FIG. 1 is a flow diagram illustrating an attention metric perceptually-guided lightweight U-net method according to an exemplary embodiment;
FIG. 2 is a schematic diagram of an overall flow process shown in accordance with another exemplary embodiment;
FIG. 3 is a schematic diagram of a scaling process shown in accordance with another exemplary embodiment;
FIG. 4 is a system diagram illustrating an attention-scale perceptually-guided lightweight U-net device, according to another exemplary embodiment;
in the drawings: the method comprises the following steps of 1-a characteristic diagram acquisition module, 2-an input module, 3-a convolution application module, 4-an attention diagram generation module, 5-a scaling module, 6-an element summation module and 7-an output module.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. The following description refers to the accompanying drawings in which the same numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
Example one
Fig. 1 is a flowchart illustrating an attention metric perception guided lightweight U-net method according to an exemplary embodiment, as shown in fig. 1, including:
s1, inputting an image to be segmented into a segmentation network, obtaining a plurality of layers of feature maps with different levels after multilayer convolution and pooling operation, and selecting a deepest feature F;
s2, inputting the deepest layer characteristics F into an attention scale sensing module for obtaining an average pooling result and a maximum pooling result of the deepest layer characteristics F;
s3, the attention sensing module respectively applies the 1 × 1 convolutional layers shared by the weights to the average pooling result and the maximum pooling result to obtain a feature F1 and a feature F2;
s4, generating attention A emphasizing corresponding features in the deepest features F based on the features F1 and the features F2;
s5, generating a scaled feature Fscale based on the feature F1, the feature F2 and the attention A;
s6, element summation operation is carried out between the characteristic Fscale and the input deepest characteristic F through the learnable parameter a, and the characteristic F is obtained SSAM
S7, outputting the characteristics F of the attention scale perception module SSAM Executing up-sampling operation, and fusing the sampling result with multi-layer characteristic graphs of different levels to obtain a characteristic X m The feature X m Outputting a segmentation result of a defect area of the image to be segmented after one layer of convolution;
it can be understood that the scheme provides a lightweight U-net based on attention scale perception guidance of an encoder-decoder structure, and a lightweight ResNet18 is adopted as an encoder to generate feature maps of different levels from different stages of convolution; the model structure is shown in fig. 2, wherein the bottommost feature map is processed by the attention scale perception module and then input into the decoder;
image to be segmented
Figure 304549DEST_PATH_IMAGE001
Inputting the characteristic graphs into a segmentation network, obtaining multilayer characteristic graphs with different levels after multilayer convolution and pooling operations, wherein the deepest characteristic graph is F, obtaining an average pooling result and a maximum pooling result of the deepest characteristic F through an attention sensing module, and respectively applying 1 multiplied by 1 convolution layers shared by weight to the average pooling result and the maximum pooling result through the attention sensing module to obtain a characteristic F1 and a characteristic F2; generating attention A emphasizing corresponding features in the deepest features F based on the features F1 and the features F2, generating features Fscale after zooming operation, and finally enabling the attention sensing module to learn through learningThe parameter a carries out element summation operation between the feature Fscale and the input deepest feature F, and the feature F is output SSAM (ii) a Features F of decoder to attention sensing module output SSAM Carrying out up-sampling operation, and fusing the sampling result with multi-layer characteristic graphs of different levels to obtain a characteristic X m The feature X m Outputting a segmentation result of a defect area of the image to be segmented after one layer of convolution; according to the scheme, the attention scale sensing module is used, the scale characteristics of the multi-defect target are learned and judged through the attention mechanism capable of sensing scales, the remote dependency relationship can be effectively captured with low calculation cost, the distinguishing characteristics of the defect target are effectively focused, the interference of a complex background is inhibited, and the problem that the characteristics such as background textures are easily confused with the defects is effectively avoided.
Preferably, the first and second electrodes are formed of a metal,
inputting the deepest layer features F into the attention scale perception module, wherein the step of obtaining an average pooling result and a maximum pooling result of the deepest layer features F comprises the following steps:
inputting the deepest layer features F into an attention scale perception module;
the attention scale perception module carries out aggregation processing on the deepest layer characteristics F through parallel maximum pooling operation and average pooling operation to obtain average pooling results and maximum pooling results of the deepest layer characteristics F;
it will be appreciated that as shown in FIG. 3, the feature F is processed by the attention scale perception module, which will first use parallel max-pooling or average-pooling operations
Figure 860688DEST_PATH_IMAGE002
Is polymerized to
Figure 198129DEST_PATH_IMAGE003
、/>
Figure 654649DEST_PATH_IMAGE004
Whereby highly relevant context information is extracted from each line of the feature F, operating->
Figure 52132DEST_PATH_IMAGE005
And &>
Figure 92638DEST_PATH_IMAGE006
As shown in formulas 1 and 2, respectively:
Figure 233770DEST_PATH_IMAGE007
(1)
Figure 544796DEST_PATH_IMAGE008
(2)
in the formula (I), the compound is shown in the specification,
Figure 378760DEST_PATH_IMAGE009
represents a maximum pooling or average pooling operation;
preferably, the first and second electrodes are formed of a metal,
the outputting of the segmentation result of the defective region of the image to be segmented comprises:
feature F output by attention scale perception module SSAM Performing an upsampling operation;
carrying out U-net type channel dimension splicing fusion on the sampling result and a feature map on the upper layer of the deepest feature F, and carrying out convolution operation and activation operation on the fusion result to obtain a feature X1;
then, the characteristic X1 is subjected to up-sampling operation, the steps are repeated until the U-net type channel dimensionality splicing fusion is carried out on the characteristic X and the shallowest layer characteristic graph, and the fusion result is subjected to convolution operation and activation operation to obtain the characteristic X m
Will be characterized by X m Outputting a segmentation result of a defect area of the image to be segmented after one layer of convolution;
it will be appreciated that the feature F output by the attention scale sensing module is shown in FIG. 2 SSAM Executing up-sampling operation, then carrying out U-net type channel dimension splicing and fusion with M1, then carrying out convolution and activation on the fusion result to obtain characteristic X1, and similarly carrying out up-sampling on X1, andand fusing the characteristics M2, convolving the fusion result, activating to obtain the characteristics M2, sequentially obtaining X3 and X4 in the same way, and finally outputting the segmentation result of the defect area after the X4 is convolved by one layer.
Preferably, the first and second liquid crystal display panels are,
the generating of the attention A emphasizing the corresponding feature in the deepest feature F based on the feature F1 and the feature F2 comprises:
generating attention A emphasizing corresponding features in the deepest features F through a softmax function on the basis of the features F1 and the features F2;
it can be appreciated that the attention metric perception module applies the 1 × 1 convolutional layer for weight sharing to the 1 × 1 convolutional layer
Figure 643913DEST_PATH_IMAGE006
And
Figure 588735DEST_PATH_IMAGE005
in the above (equation 3 and equation 4), the feature F1 and the feature F2 are obtained, the height context information of each feature in F is transferred, and an attention map (equation 5) is generated by using a softmax function, the importance of the corresponding feature in F is emphasized, the attention map can dynamically select a proper scale feature, and the features of different scales are fused through self-learning:
Figure 754268DEST_PATH_IMAGE010
(3)
Figure 759133DEST_PATH_IMAGE011
(4)
Figure 197068DEST_PATH_IMAGE012
(5)
in the formula (I), the compound is shown in the specification,
Figure 929270DEST_PATH_IMAGE013
representing the convolution operation of weight sharing.
Preferably, the first and second liquid crystal display panels are,
the generating scaled features Fscale includes:
performing element multiplication operations among the attention A, the feature F1 and the feature F2 respectively for generating a scaled feature Fscale;
it will be appreciated that attention is directed to A and F, respectively 1 、F 2 Performs element multiplication operation to generate scaled feature Fscale (equation 6), and finally, utilizes learnable parameters
Figure 464156DEST_PATH_IMAGE014
To F scale And performing element summation operation between the input characteristic F and the output characteristic F to obtain the final output characteristic F SSAM (formula 7):
Figure 390655DEST_PATH_IMAGE015
(6)
Figure 378202DEST_PATH_IMAGE016
(7)
preferably, the first and second electrodes are formed of a metal,
inputting the image to be segmented into a segmentation network, obtaining a plurality of layers of feature maps with different levels after multilayer convolution and pooling operation, wherein the step of selecting the deepest feature F comprises the following steps:
inputting an image to be segmented into a segmentation network, and obtaining a characteristic M after convolution and pooling N Said feature M N The feature map of the shallowest layer is obtained;
for feature M N After convolution and pooling operation, the characteristic M is obtained N-1
Repeating the steps, and obtaining the deepest layer characteristic F after convolution and pooling operation for preset times;
it will be appreciated that the images are shown in FIG. 2
Figure 917025DEST_PATH_IMAGE001
Inputting the segmentation network, and obtaining a plurality of characteristics with different levels after multi-layer convolution and poolingM4, M3, M2, M1 and F, wherein F is the deepest layer, and M4 is the shallowest layer.
Example two
The embodiment also discloses a system schematic diagram of a lightweight attention scale perception guided U-net apparatus, as shown in fig. 4, including:
the characteristic diagram obtaining module 1: the method comprises the steps that images to be segmented are input into a segmentation network, after multilayer convolution and pooling operation, multilayer feature maps with different levels are obtained, and the deepest feature F is selected;
an input module 2: the system comprises an attention scale sensing module, a clustering module and a storage module, wherein the attention scale sensing module is used for inputting the deepest layer characteristics F into the attention scale sensing module and obtaining an average pooling result and a maximum pooling result of the deepest layer characteristics F;
convolution application module 3: the attention sensing module is used for respectively applying the 1 × 1 convolutional layer shared by the weights to the average pooling result and the maximum pooling result to obtain a feature F1 and a feature F2;
the attention map generation module 4: generating attention A for emphasizing corresponding features in the deepest features F based on the features F1 and the features F2;
the scaling module 5: for generating a scaled feature Fscale based on feature F1, feature F2 and attention a;
element summation module 6: the method is used for carrying out element summation operation on the feature Fscale and the input deepest feature F through the learnable parameter a to obtain the feature F SSAM
An output module 7: features F for sensing module output for attention scale SSAM Executing up-sampling operation, and fusing the sampling result with the multilayer feature graphs of different levels to obtain a feature X m Let the feature X m Outputting a segmentation result of a defect area of the image to be segmented after one layer of convolution;
it can be understood that the image to be segmented is input into the segmentation network through the feature map acquisition module 1, after multilayer convolution and pooling operations, multilayer feature maps with different levels are obtained, the deepest feature F is selected, and the deepest feature F is input into the attention scale perception module through the input module 2 for acquisitionAverage pooling results and maximum pooling results of the deepest layer features F; the convolution application module 3 respectively applies the 1 × 1 convolution layer shared by the weights to the average pooling result and the maximum pooling result through the attention sensing module to obtain a feature F1 and a feature F2; the attention map generation module 4: generating attention A for emphasizing corresponding features in the deepest features F based on the features F1 and the features F2; the attention map generation module 4 generates attention a emphasizing a corresponding feature in the deepest feature F based on the feature F1 and the feature F2; the scaling module 5 generates a scaled feature Fscale based on the feature F1, the feature F2 and the attention a; the element summation module 6 carries out element summation operation between the feature Fscale and the input deepest feature F through the learnable parameter a to obtain the feature F SSAM (ii) a The output module 7 is used for sensing the characteristics F output by the module on the attention scale SSAM Executing up-sampling operation, and fusing the sampling result with the multilayer feature graphs of different levels to obtain a feature X m Let the feature X m Outputting a segmentation result of a defect area of the image to be segmented after one layer of convolution; according to the scheme, the attention scale perception module is used, the scale characteristics of the multi-defect target are learned and judged through the attention mechanism capable of perceiving the scale, the remote dependency relationship can be effectively captured with low calculation cost, the discriminant characteristics of the defect target are effectively focused, the interference of a complex background is inhibited, and the problem that the characteristics such as background textures are easily confused with the defects is effectively avoided.
EXAMPLE III
The present embodiment also discloses a storage medium storing a computer program which, when executed by a processor, implements each step in the attention metric perception guided lightweight U-net method as described in any one of the above;
it will be appreciated that the storage medium referred to above may be a read-only memory, a magnetic or optical disk, or the like.
It is understood that the same or similar parts in the above embodiments may be mutually referred to, and the same or similar parts in other embodiments may be referred to for the content which is not described in detail in some embodiments.
It should be noted that the terms "first," "second," and the like in the description of the present invention are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Further, in the description of the present invention, the meaning of "a plurality" means at least two unless otherwise specified.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following technologies, which are well known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried out in the method of implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and the program, when executed, includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are exemplary and not to be construed as limiting the present invention, and that changes, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (7)

1. An attention scale perceptually-guided lightweight U-net method, the method comprising:
inputting an image to be segmented into a segmentation network, obtaining a plurality of layers of feature maps with different levels after multilayer convolution and pooling operation, and selecting a deepest feature F;
inputting the deepest layer features F into an attention scale perception module for obtaining an average pooling result and a maximum pooling result of the deepest layer features F;
the attention sensing module respectively applies the 1 multiplied by 1 convolutional layers shared by the weights to the average pooling result and the maximum pooling result to obtain a characteristic F1 and a characteristic F2;
generating attention A emphasizing corresponding features in the deepest features F based on the features F1 and the features F2;
the generating of the attention A emphasizing the corresponding feature in the deepest feature F based on the feature F1 and the feature F2 comprises:
based on the features F1 and the features F2, generating attention A emphasizing corresponding features in the deepest features F through a softmax function, wherein the attention A is used for emphasizing the importance of the corresponding features in the deepest features F;
generating a scaled feature Fscale based on the feature F1, the feature F2 and the attention A;
element summation operation is carried out between the feature Fscale and the input deepest feature F through the learnable parameter a to obtain the feature F SSAM
Feature F output by attention scale perception module SSAM Executing up-sampling operation, and fusing the sampling result with multi-layer characteristic graphs of different levels to obtain a characteristic X m Let the feature X m And outputting the segmentation result of the defect area of the image to be segmented after one layer of convolution.
2. The method of claim 1,
inputting the deepest layer features F into the attention scale perception module, wherein the step of obtaining an average pooling result and a maximum pooling result of the deepest layer features F comprises the following steps:
inputting the deepest layer characteristics F into an attention scale perception module;
and the attention scale perception module carries out aggregation processing on the deepest layer characteristic F through parallel maximum pooling operation and average pooling operation respectively to obtain an average pooling result and a maximum pooling result of the deepest layer characteristic F.
3. The method of claim 2,
the outputting of the segmentation result of the defective region of the image to be segmented includes:
feature F output by attention scale perception module SSAM Performing an upsampling operation;
carrying out U-net type channel dimension splicing fusion on the sampling result and a feature map on the upper layer of the deepest feature F, and carrying out convolution operation and activation operation on the fusion result to obtain a feature X1;
then, the characteristic X1 is subjected to up-sampling operation, the steps are repeated until U-net type channel dimension splicing fusion is carried out on the characteristic X1 and the shallowest layer characteristic diagram, and convolution operation and activation operation are carried out on the fusion result to obtain the characteristic X m
Will feature X m And outputting a segmentation result of the defect area of the image to be segmented after one-layer convolution.
4. The method of claim 3,
the generating a scaled feature Fscale comprises:
an element multiplication operation is performed between the attention a, the feature F1 and the feature F2, respectively, for generating a scaled feature Fscale.
5. The method of claim 4,
inputting the image to be segmented into a segmentation network, obtaining a plurality of layers of feature maps with different levels after multilayer convolution and pooling operation, wherein the step of selecting the deepest feature F comprises the following steps:
inputting an image to be segmented into a segmentation network, and obtaining a characteristic M after convolution and pooling N Said feature M N The feature map of the shallowest layer is obtained;
for feature M N After convolution and pooling operation, the characteristic M is obtained N-1
And repeating the steps, and obtaining the deepest layer characteristic F after convolution and pooling operation for preset times.
6. An attention metric perceptually-guided lightweight U-net device, said device comprising:
a feature map acquisition module: the system is used for inputting an image to be segmented into a segmentation network, obtaining a plurality of layers of feature maps with different levels after multilayer convolution and pooling operation, and selecting a deepest feature F;
an input module: the system is used for inputting the deepest features F into the attention scale perception module and obtaining an average pooling result and a maximum pooling result of the deepest features F;
a convolution application module: the 1 × 1 convolutional layer is used for respectively applying the weight sharing values to the average pooling result and the maximum pooling result through the attention sensing module to obtain a characteristic F1 and a characteristic F2;
the attention map generation module: generating attention A for emphasizing corresponding features in the deepest features F based on the features F1 and the features F2;
the generating of the attention A emphasizing the corresponding feature in the deepest feature F based on the feature F1 and the feature F2 comprises:
based on the features F1 and the features F2, generating attention A emphasizing corresponding features in the deepest features F through a softmax function, wherein the attention A is used for emphasizing the importance of the corresponding features in the deepest features F;
a scaling module: for generating a scaled feature Fscale based on feature F1, feature F2 and attention a;
an element summation module: the method is used for carrying out element summation operation on the feature Fscale and the input deepest feature F through the learnable parameter a to obtain the feature F SSAM
An output module: feature F for sensing module output for attention scale SSAM Executing up-sampling operation, and fusing the sampling result with the multilayer feature graphs of different levels to obtain a feature X m The feature X m And outputting the segmentation result of the defect area of the image to be segmented after one layer of convolution.
7. A storage medium storing a computer program which, when executed by a processor, performs the steps of the attention metric aware guided lightweight U-net method according to any of claims 1 to 5.
CN202211394805.XA 2022-11-09 2022-11-09 Attention scale perception guided lightweight U-net method, device and storage medium Active CN115578565B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211394805.XA CN115578565B (en) 2022-11-09 2022-11-09 Attention scale perception guided lightweight U-net method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211394805.XA CN115578565B (en) 2022-11-09 2022-11-09 Attention scale perception guided lightweight U-net method, device and storage medium

Publications (2)

Publication Number Publication Date
CN115578565A CN115578565A (en) 2023-01-06
CN115578565B true CN115578565B (en) 2023-04-14

Family

ID=84589244

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211394805.XA Active CN115578565B (en) 2022-11-09 2022-11-09 Attention scale perception guided lightweight U-net method, device and storage medium

Country Status (1)

Country Link
CN (1) CN115578565B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111681273A (en) * 2020-06-10 2020-09-18 创新奇智(青岛)科技有限公司 Image segmentation method and device, electronic equipment and readable storage medium
CN112465790A (en) * 2020-12-03 2021-03-09 天津大学 Surface defect detection method based on multi-scale convolution and trilinear global attention

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111950467B (en) * 2020-08-14 2021-06-25 清华大学 Fusion network lane line detection method based on attention mechanism and terminal equipment
US20220335600A1 (en) * 2021-04-14 2022-10-20 Ping An Technology (Shenzhen) Co., Ltd. Method, device, and storage medium for lesion segmentation and recist diameter prediction via click-driven attention and dual-path connection
CN113362320B (en) * 2021-07-07 2024-05-28 北京工业大学 Wafer surface defect mode detection method based on deep attention network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111681273A (en) * 2020-06-10 2020-09-18 创新奇智(青岛)科技有限公司 Image segmentation method and device, electronic equipment and readable storage medium
CN112465790A (en) * 2020-12-03 2021-03-09 天津大学 Surface defect detection method based on multi-scale convolution and trilinear global attention

Also Published As

Publication number Publication date
CN115578565A (en) 2023-01-06

Similar Documents

Publication Publication Date Title
CN113822885B (en) Workpiece defect detection method and device integrating multi-attention machine system
CN110738697A (en) Monocular depth estimation method based on deep learning
CN110837811B (en) Method, device and equipment for generating semantic segmentation network structure and storage medium
Baker et al. Limits on super-resolution and how to break them
CN110264476B (en) Multi-scale serial convolution deep learning microscopic image segmentation method
CN109389667B (en) High-efficiency global illumination drawing method based on deep learning
CN111401436A (en) Streetscape image segmentation method fusing network and two-channel attention mechanism
CN116740162B (en) Stereo matching method based on multi-scale cost volume and computer storage medium
Yin et al. Local binary pattern metric-based multi-focus image fusion
CN115272437A (en) Image depth estimation method and device based on global and local features
CN114170438A (en) Neural network training method, electronic device and computer storage medium
KR102128789B1 (en) Method and apparatus for providing efficient dilated convolution technique for deep convolutional neural network
CN116844032A (en) Target detection and identification method, device, equipment and medium in marine environment
CN113158970B (en) Action identification method and system based on fast and slow dual-flow graph convolutional neural network
CN115578565B (en) Attention scale perception guided lightweight U-net method, device and storage medium
CN113313162A (en) Method and system for detecting multi-scale feature fusion target
CN116434303A (en) Facial expression capturing method, device and medium based on multi-scale feature fusion
CN113516670B (en) Feedback attention-enhanced non-mode image segmentation method and device
CN112927250B (en) Edge detection system and method based on multi-granularity attention hierarchical network
Whiteley et al. Direct image reconstruction from raw measurement data using an encoding transform refinement-and-scaling neural network
CN115713624A (en) Self-adaptive fusion semantic segmentation method for enhancing multi-scale features of remote sensing image
CN113807354B (en) Image semantic segmentation method, device, equipment and storage medium
WO2022188102A1 (en) Depth image inpainting method and apparatus, camera assembly, and electronic device
Wang Application of computer images in virtual simulation technology-apparel as an example
CN113642452A (en) Human body image quality evaluation method, device, system and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant