CN115578565A - Attention scale perception guided lightweight U-net method, device and storage medium - Google Patents

Attention scale perception guided lightweight U-net method, device and storage medium Download PDF

Info

Publication number
CN115578565A
CN115578565A CN202211394805.XA CN202211394805A CN115578565A CN 115578565 A CN115578565 A CN 115578565A CN 202211394805 A CN202211394805 A CN 202211394805A CN 115578565 A CN115578565 A CN 115578565A
Authority
CN
China
Prior art keywords
feature
attention
characteristic
deepest
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211394805.XA
Other languages
Chinese (zh)
Other versions
CN115578565B (en
Inventor
周展
李朋超
蔡丽蓉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jushi Intelligent Technology Co ltd
Original Assignee
Beijing Jushi Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jushi Intelligent Technology Co ltd filed Critical Beijing Jushi Intelligent Technology Co ltd
Priority to CN202211394805.XA priority Critical patent/CN115578565B/en
Publication of CN115578565A publication Critical patent/CN115578565A/en
Application granted granted Critical
Publication of CN115578565B publication Critical patent/CN115578565B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a light U-net method, a device and a storage medium for attention scale perception guidance, which are applied to the technical field of workpiece surface defect region segmentation and comprise the following steps: acquiring an average pooling result and a maximum pooling result of the deepest layer characteristic F through an attention sensing module, and respectively applying 1 multiplied by 1 convolutional layers shared by weights to the average pooling result and the maximum pooling result; generating attention A based on the obtained features F1 and F2, and scalingGenerating a characteristic Fscale after operation, finally carrying out element summation operation between the characteristic Fscale and the input deepest characteristic F through the learnable parameter a, and outputting the characteristic F SSAM (ii) a According to the scheme, the attention scale perception module is used, the scale features of the multi-defect target are learned and judged through the attention mechanism capable of perceiving the scale, the discriminant features of the defect target are effectively focused, the interference of a complex background is restrained, and the problem that features such as background textures are easily confused with the defects is effectively avoided.

Description

Attention scale perception guided lightweight U-net method, device and storage medium
Technical Field
The invention relates to the technical field of workpiece surface defect region segmentation, in particular to a light U-net method, a device and a storage medium guided by attention scale perception.
Background
In recent years, segmentation of defective regions on the surface of a workpiece with high accuracy based on a deep learning algorithm has been rapidly developed. The current representative method adopts a coder-decoder framework such as U-Net or a DeeplabV3 method, and realizes effective fusion of multi-scale features by fusing multi-level features such as bottom-layer spatial detail and high-layer discrimination semantics of an image, or aggregates context information in different distance ranges by expansion convolution pyramids in different receptive field ranges, so as to realize prediction of a defect region;
however, as industrial practical application scenes are complex and changeable, defect forms are varied, and features such as background textures and the like are easily confused with defects, the problems of large intra-class difference and small inter-class difference are formed, in the process of dividing a defect region, representative methods such as U-Net and the like only focus on fusion of multi-scale features of different levels, fusion of context information of different receptive fields or a focus mechanism of space and channel dimensions, and complex background interference is difficult to effectively suppress.
Disclosure of Invention
In view of the above, an object of the present invention is to provide a method, an apparatus, and a storage medium for lightweight U-net guided attention-scale perception, so as to solve the problem in the prior art that, in the defect region segmentation process, only attention is paid to fusion of multi-scale features of different levels, fusion of context information of different receptive fields, or attention mechanism of space and channel dimensions, and complicated background interference is ignored, so that features such as background texture are easily confused with defects.
According to a first aspect of embodiments of the present invention, there is provided an attention-scale-aware-guided lightweight U-net method, comprising:
inputting an image to be segmented into a segmentation network, obtaining a plurality of layers of feature maps with different levels after multilayer convolution and pooling operation, and selecting a deepest feature F;
inputting the deepest layer characteristics F into an attention scale perception module for obtaining an average pooling result and a maximum pooling result of the deepest layer characteristics F;
the attention sensing module respectively applies the 1 multiplied by 1 convolutional layers shared by the weights to the average pooling result and the maximum pooling result to obtain a characteristic F1 and a characteristic F2;
generating attention A emphasizing corresponding features in the deepest features F based on the features F1 and the features F2;
generating a scaled feature Fscale based on the feature F1, the feature F2, and the attention a;
element summation operation is carried out between the characteristic Fscale and the input deepest characteristic F through the learnable parameter a to obtain the characteristic F SSAM
Feature F output by attention scale perception module SSAM Executing up-sampling operation, and fusing the sampling result with the multilayer feature graphs of different levels to obtain a feature X m The feature X m And outputting a segmentation result of the defect area of the image to be segmented after one-layer convolution.
Preferably, the first and second electrodes are formed of a metal,
inputting the deepest layer features F into the attention scale perception module, wherein the step of obtaining an average pooling result and a maximum pooling result of the deepest layer features F comprises the following steps:
inputting the deepest layer features F into an attention scale perception module;
and the attention scale perception module carries out aggregation processing on the deepest layer characteristic F through parallel maximum pooling operation and average pooling operation respectively to obtain an average pooling result and a maximum pooling result of the deepest layer characteristic F.
Preferably, the first and second liquid crystal display panels are,
the outputting of the segmentation result of the defective region of the image to be segmented includes:
feature F output by attention scale perception module SSAM Performing an upsampling operation;
carrying out U-net type channel dimension splicing fusion on the sampling result and a feature map on the upper layer of the deepest feature F, and carrying out convolution operation and activation operation on the fusion result to obtain a feature X1;
then, the characteristic X1 is subjected to up-sampling operation, the steps are repeated until the U-net type channel dimensionality splicing fusion is carried out on the characteristic X and the shallowest layer characteristic graph, and the fusion result is subjected to convolution operation and activation operation to obtain the characteristic X m
Will feature X m And outputting a segmentation result of the defect area of the image to be segmented after one-layer convolution.
Preferably, the first and second electrodes are formed of a metal,
the generating a scaled feature Fscale comprises:
an element multiplication operation is performed between the attention a, the feature F1 and the feature F2, respectively, for generating a scaled feature Fscale.
Preferably, the first and second liquid crystal display panels are,
the generating of the attention A emphasizing the corresponding feature in the deepest feature F based on the feature F1 and the feature F2 comprises:
based on the features F1 and F2, attention a emphasizing the corresponding feature in the deepest feature F is generated by the softmax function.
Preferably, the first and second electrodes are formed of a metal,
inputting the image to be segmented into a segmentation network, obtaining a plurality of layers of feature maps with different levels after multilayer convolution and pooling operation, wherein the step of selecting the deepest feature F comprises the following steps:
inputting an image to be segmented into a segmentation network, and obtaining a characteristic M after convolution and pooling N Said feature M N The feature map of the shallowest layer is obtained;
for feature M N After convolution and pooling operation, the characteristic M is obtained N-1
And repeating the steps, and obtaining the deepest layer characteristic F after convolution and pooling operation for preset times.
According to a second aspect of embodiments of the present invention, there is provided an attention-scale perceptually-guided lightweight U-net device, the device comprising:
a characteristic diagram acquisition module: the method comprises the steps that images to be segmented are input into a segmentation network, after multilayer convolution and pooling operation, multilayer feature maps with different levels are obtained, and the deepest feature F is selected;
an input module: the system comprises an attention scale sensing module, a clustering module and a storage module, wherein the attention scale sensing module is used for inputting the deepest layer characteristics F into the attention scale sensing module and obtaining an average pooling result and a maximum pooling result of the deepest layer characteristics F;
a convolution application module: the attention sensing module is used for respectively applying the 1 × 1 convolutional layer shared by the weights to the average pooling result and the maximum pooling result to obtain a feature F1 and a feature F2;
the attention map generation module: generating attention A for emphasizing corresponding features in the deepest features F based on the features F1 and the features F2;
a scaling module: for generating a scaled feature Fscale based on feature F1, feature F2 and attention a;
an element summation module: the method is used for carrying out element summation operation on the feature Fscale and the input deepest feature F through the learnable parameter a to obtain the feature F SSAM
An output module: features F for sensing module output for attention scale SSAM Executing up-sampling operation, and fusing the sampling result with the multilayer feature graphs of different levels to obtain a feature X m Let the feature X m And outputting the segmentation result of the defect area of the image to be segmented after one layer of convolution.
According to a third aspect of embodiments of the present invention, there is provided a storage medium storing a computer program which, when executed by a processor, implements each step in the attention metric perception guided lightweight U-net method as described in any one of the above.
The technical scheme provided by the embodiment of the invention can have the following beneficial effects:
according to the method, an average pooling result and a maximum pooling result of the deepest feature F are obtained through an attention sensing module, and then 1 × 1 convolutional layers shared by weights are applied to the average pooling result and the maximum pooling result respectively through the attention sensing module to obtain a feature F1 and a feature F2; then, based on the features F1 and F2, attention emphasizing corresponding features in the deepest features F is generatedThe force A generates a characteristic Fscale after scaling operation, the attention perception module finally carries out element summation operation between the characteristic Fscale and the input deepest characteristic F through a learnable parameter a, and the characteristic F is output SSAM (ii) a According to the scheme, the attention scale sensing module is used, the scale characteristics of the multi-defect target are learned and judged through the attention mechanism capable of sensing scales, the remote dependency relationship can be effectively captured with low calculation cost, the distinguishing characteristics of the defect target are effectively focused, the interference of a complex background is inhibited, and the problem that the characteristics such as background textures are easily confused with the defects is effectively avoided.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
FIG. 1 is a flow diagram illustrating an attention metric perceptually-guided lightweight U-net method according to an exemplary embodiment;
FIG. 2 is a schematic diagram of an overall flow process shown in accordance with another exemplary embodiment;
FIG. 3 is a schematic diagram of a scaling process shown in accordance with another exemplary embodiment;
FIG. 4 is a system diagram illustrating an attention-scale perceptually-guided lightweight U-net device, according to another exemplary embodiment;
in the drawings: the method comprises the following steps of 1-a characteristic diagram acquisition module, 2-an input module, 3-a convolution application module, 4-an attention diagram generation module, 5-a scaling module, 6-an element summation module and 7-an output module.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
Example one
Fig. 1 is a flowchart illustrating an attention metric perception guided lightweight U-net method according to an exemplary embodiment, as shown in fig. 1, including:
s1, inputting an image to be segmented into a segmentation network, obtaining a plurality of layers of feature maps with different levels after multilayer convolution and pooling operation, and selecting a deepest feature F;
s2, inputting the deepest layer characteristics F into an attention scale sensing module for obtaining an average pooling result and a maximum pooling result of the deepest layer characteristics F;
s3, the attention sensing module respectively applies the 1 × 1 convolutional layers shared by the weights to the average pooling result and the maximum pooling result to obtain a feature F1 and a feature F2;
s4, generating attention A emphasizing corresponding features in the deepest features F based on the features F1 and the features F2;
s5, generating a scaled feature Fscale based on the feature F1, the feature F2 and the attention A;
s6, element summation operation is carried out between the characteristic Fscale and the input deepest characteristic F through the learnable parameter a to obtain the characteristic F SSAM
S7, outputting the characteristics F to the attention scale sensing module SSAM Executing up-sampling operation, and fusing the sampling result with the multilayer feature graphs of different levels to obtain a feature X m Let the feature X m Outputting a segmentation result of a defect area of the image to be segmented after one layer of convolution;
it can be understood that the scheme provides a lightweight U-net based on attention scale perception guidance of an encoder-decoder structure, and a lightweight ResNet18 is adopted as an encoder to generate feature maps of different levels from different stages of convolution; the model structure is shown in fig. 2, wherein the bottommost feature map is processed by the attention scale perception module and then input into the decoder;
image to be segmented
Figure 304549DEST_PATH_IMAGE001
Inputting the characteristic graphs into a segmentation network, obtaining multilayer characteristic graphs with different levels after multilayer convolution and pooling operations, wherein the deepest characteristic graph is F, obtaining an average pooling result and a maximum pooling result of the deepest characteristic F through an attention sensing module, and respectively applying 1 multiplied by 1 convolution layers shared by weight to the average pooling result and the maximum pooling result through the attention sensing module to obtain a characteristic F1 and a characteristic F2; generating attention A emphasizing corresponding features in the deepest features F based on the features F1 and the features F2, generating features Fscale after scaling operation, and finally performing element summation operation between the features Fscale and the input deepest features F through learnable parameters a by an attention sensing module to output the features F SSAM (ii) a Features F of decoder output to attention sensing module SSAM Carrying out up-sampling operation, and fusing the sampling result with the multilayer feature maps with different levels to obtain a feature X m The feature X m Outputting a segmentation result of a defect area of the image to be segmented after one layer of convolution; according to the scheme, the attention scale sensing module is used, the scale characteristics of the multi-defect target are learned and judged through the attention mechanism capable of sensing scales, the remote dependency relationship can be effectively captured with low calculation cost, the distinguishing characteristics of the defect target are effectively focused, the interference of a complex background is inhibited, and the problem that the characteristics such as background textures are easily confused with the defects is effectively avoided.
Preferably, the first and second electrodes are formed of a metal,
inputting the deepest layer features F into the attention scale perception module, wherein the step of obtaining an average pooling result and a maximum pooling result of the deepest layer features F comprises the following steps:
inputting the deepest layer features F into an attention scale perception module;
the attention scale perception module carries out aggregation processing on the deepest layer characteristics F through parallel maximum pooling operation and average pooling operation to obtain average pooling results and maximum pooling results of the deepest layer characteristics F;
it will be appreciated that, as attachedAs shown in FIG. 3, the feature F is processed by the attention scale perception module, which first uses parallel max-pooling or average-pooling operations, which will be
Figure 860688DEST_PATH_IMAGE002
Is polymerized to
Figure 198129DEST_PATH_IMAGE003
Figure 654649DEST_PATH_IMAGE004
In this way, highly relevant context information is extracted from each line of the feature F, operation
Figure 52132DEST_PATH_IMAGE005
And
Figure 92638DEST_PATH_IMAGE006
respectively shown in formula 1 and formula 2:
Figure 233770DEST_PATH_IMAGE007
(1)
Figure 544796DEST_PATH_IMAGE008
(2)
in the formula (I), the compound is shown in the specification,
Figure 378760DEST_PATH_IMAGE009
represents a maximum pooling or average pooling operation;
preferably, the first and second electrodes are formed of a metal,
the outputting of the segmentation result of the defective region of the image to be segmented includes:
feature F output by attention scale perception module SSAM Performing an upsampling operation;
carrying out U-net type channel dimension splicing fusion on the sampling result and the upper layer feature diagram of the deepest feature F, and carrying out convolution operation and activation operation on the fusion result to obtain a feature X1;
then performing an upsampling operation on the feature X1Repeating the steps until the U-net type channel dimensionality splicing fusion is carried out on the fusion result and the shallowest layer feature graph, and carrying out convolution operation and activation operation on the fusion result to obtain the feature X m
Will be characterized by X m Outputting a segmentation result of a defect area of the image to be segmented after one layer of convolution;
it will be appreciated that the feature F output by the attention scale sensing module is shown in FIG. 2 SSAM Performing up-sampling operation, performing U-net type channel dimension splicing and fusion with M1, performing convolution and activation on a fusion result to obtain a characteristic X1, performing up-sampling on X1, performing fusion with the characteristic M2, performing convolution and activation on the fusion result to obtain a characteristic M2, and sequentially obtaining X3 and X4 in the same way, and finally performing one-layer convolution on X4 to output a segmentation result of a defect region.
Preferably, the first and second electrodes are formed of a metal,
the generating of the attention A emphasizing the corresponding feature in the deepest feature F based on the feature F1 and the feature F2 comprises:
generating attention A emphasizing corresponding features in the deepest features F through a softmax function on the basis of the features F1 and the features F2;
it can be appreciated that the attention metric perception module applies the 1 × 1 convolutional layer for weight sharing to the 1 × 1 convolutional layer
Figure 643913DEST_PATH_IMAGE006
And
Figure 588735DEST_PATH_IMAGE005
in the above (equation 3 and equation 4), the features F1 and F2 are obtained, the height context information of each feature in F is transferred, and an attention map (equation 5) is generated by using a softmax function, so as to emphasize the importance of the corresponding feature in F, the attention map can dynamically select a proper scale feature, and the features of different scales are fused through self-learning:
Figure 754268DEST_PATH_IMAGE010
(3)
Figure 759133DEST_PATH_IMAGE011
(4)
Figure 197068DEST_PATH_IMAGE012
(5)
in the formula (I), the compound is shown in the specification,
Figure 929270DEST_PATH_IMAGE013
representing the convolution operation of weight sharing.
Preferably, the first and second electrodes are formed of a metal,
the generating a scaled feature Fscale comprises:
performing element multiplication operations among the attention A, the feature F1 and the feature F2 respectively for generating a scaled feature Fscale;
it will be appreciated that attention is drawn to A and F, respectively 1 、F 2 Performs element multiplication operation to generate scaled feature Fscale (equation 6), and finally, utilizes learnable parameters
Figure 464156DEST_PATH_IMAGE014
To F is aligned with scale And performing element summation operation between the sum and the input characteristic F to obtain the final output characteristic F SSAM (formula 7):
Figure 390655DEST_PATH_IMAGE015
(6)
Figure 378202DEST_PATH_IMAGE016
(7)
preferably, the first and second electrodes are formed of a metal,
inputting the image to be segmented into a segmentation network, obtaining a plurality of layers of feature maps with different levels after multilayer convolution and pooling operation, wherein the step of selecting the deepest feature F comprises the following steps:
inputting an image to be segmented into a segmentation network, and obtaining a characteristic M after convolution and pooling operations N Said feature M N The feature map of the shallowest layer is obtained;
for feature M N After convolution and pooling operation, the characteristic M is obtained N-1
Repeating the steps, and obtaining the deepest layer characteristic F after convolution and pooling operation for preset times;
it will be appreciated that the images are shown in FIG. 2
Figure 917025DEST_PATH_IMAGE001
Inputting a segmentation network, and obtaining a plurality of characteristics M4, M3, M2, M1 and F of different levels after multilayer convolution and pooling, wherein F is the deepest layer, and M4 is the shallowest layer.
Example two
The embodiment also discloses a system schematic diagram of a lightweight attention scale perception guided U-net apparatus, as shown in fig. 4, including:
the characteristic diagram acquisition module 1: the method comprises the steps that images to be segmented are input into a segmentation network, after multilayer convolution and pooling operation, multilayer feature maps with different levels are obtained, and the deepest feature F is selected;
an input module 2: the system comprises an attention scale sensing module, a clustering module and a storage module, wherein the attention scale sensing module is used for inputting the deepest layer characteristics F into the attention scale sensing module and obtaining an average pooling result and a maximum pooling result of the deepest layer characteristics F;
convolution application module 3: the 1 × 1 convolutional layer is used for respectively applying the weight sharing values to the average pooling result and the maximum pooling result through the attention sensing module to obtain a characteristic F1 and a characteristic F2;
the attention map generation module 4: an attention A for emphasizing a corresponding feature in the deepest feature F is generated based on the feature F1 and the feature F2;
the scaling module 5: for generating a scaled feature Fscale based on feature F1, feature F2 and attention a;
element summation module 6: the method is used for carrying out element summation operation on the feature Fscale and the input deepest feature F through the learnable parameter a to obtain the feature F SSAM
An output module 7: feature F for sensing module output for attention scale SSAM Perform an upsampling operation and willThe sampling result is fused with the characteristic graphs of multiple layers at different levels to obtain a characteristic X m The feature X m Outputting a segmentation result of a defect area of the image to be segmented after one layer of convolution;
it can be understood that, the image to be segmented is input into the segmentation network through the feature map acquisition module 1, after multilayer convolution and pooling operation, multilayer feature maps with different levels are obtained, the deepest feature F is selected, and the deepest feature F is input into the attention scale sensing module through the input module 2 for acquiring an average pooling result and a maximum pooling result of the deepest feature F; the convolution application module 3 respectively applies the 1 × 1 convolution layers shared by the weights to the average pooling result and the maximum pooling result through the attention sensing module to obtain a feature F1 and a feature F2; attention is sought to generate module 4: generating attention A for emphasizing corresponding features in the deepest features F based on the features F1 and the features F2; the attention map generation module 4 generates attention a emphasizing a corresponding feature in the deepest feature F based on the feature F1 and the feature F2; the scaling module 5 generates a scaled feature Fscale based on the feature F1, the feature F2 and the attention a; the element summation module 6 carries out element summation operation between the feature Fscale and the input deepest feature F through the learnable parameter a to obtain the feature F SSAM (ii) a The output module 7 is used for sensing the characteristics F output by the module on the attention scale SSAM Executing up-sampling operation, and fusing the sampling result with the multilayer feature graphs of different levels to obtain a feature X m The feature X m Outputting a segmentation result of a defect area of the image to be segmented after one layer of convolution; according to the scheme, the attention scale sensing module is used, the scale characteristics of the multi-defect target are learned and judged through the attention mechanism capable of sensing scales, the remote dependency relationship can be effectively captured with low calculation cost, the distinguishing characteristics of the defect target are effectively focused, the interference of a complex background is inhibited, and the problem that the characteristics such as background textures are easily confused with the defects is effectively avoided.
EXAMPLE III
The present embodiment also discloses a storage medium storing a computer program, which when executed by a processor, implements the steps of the attention metric perception guided lightweight U-net method according to any one of the above;
it will be appreciated that the storage medium referred to above may be a read-only memory, a magnetic or optical disk, or the like.
It is understood that the same or similar parts in the above embodiments may be mutually referred to, and the same or similar contents in other embodiments may be referred to for the contents which are not described in detail in some embodiments.
It should be noted that, in the description of the present invention, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Further, in the description of the present invention, the meaning of "a plurality" means at least two unless otherwise specified.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following technologies, which are well known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried out in the method of implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and the program, when executed, includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may also be stored in a computer-readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (8)

1. An attention scale perceptually-guided lightweight U-net method, the method comprising:
inputting an image to be segmented into a segmentation network, obtaining a plurality of layers of feature maps with different levels after multilayer convolution and pooling operation, and selecting a deepest feature F;
inputting the deepest layer features F into an attention scale perception module for obtaining an average pooling result and a maximum pooling result of the deepest layer features F;
the attention sensing module respectively applies the 1 multiplied by 1 convolutional layers shared by the weights to the average pooling result and the maximum pooling result to obtain a characteristic F1 and a characteristic F2;
generating attention A emphasizing corresponding features in the deepest features F based on the features F1 and the features F2;
generating a scaled feature Fscale based on the feature F1, the feature F2 and the attention A;
element summation operation is carried out between the characteristic Fscale and the input deepest characteristic F through the learnable parameter a to obtain the characteristic F SSAM
Feature F output by attention scale perception module SSAM Executing up-sampling operation, and fusing the sampling result with multi-layer characteristic graphs of different levels to obtain a characteristic X m Let the feature X m And outputting the segmentation result of the defect area of the image to be segmented after one layer of convolution.
2. The method of claim 1,
the inputting the deepest features F into the attention scale perception module for obtaining the average pooling result and the maximum pooling result of the deepest features F comprises:
inputting the deepest layer characteristics F into an attention scale perception module;
and the attention scale perception module carries out aggregation processing on the deepest layer characteristic F through parallel maximum pooling operation and average pooling operation respectively to obtain an average pooling result and a maximum pooling result of the deepest layer characteristic F.
3. The method of claim 2,
the outputting of the segmentation result of the defective region of the image to be segmented includes:
feature F output by attention scale perception module SSAM Performing an upsampling operation;
carrying out U-net type channel dimension splicing fusion on the sampling result and a feature map on the upper layer of the deepest feature F, and carrying out convolution operation and activation operation on the fusion result to obtain a feature X1;
then, the characteristic X1 is subjected to up-sampling operation, the steps are repeated until U-net type channel dimension splicing fusion is carried out on the characteristic X1 and the shallowest layer characteristic diagram, and convolution operation and activation operation are carried out on the fusion result to obtain the characteristic X m
Will be characterized by X m And outputting the segmentation result of the defect area of the image to be segmented after one layer of convolution.
4. The method of claim 3,
the generating scaled features Fscale includes:
an element multiplication operation is performed between the attention a, the feature F1 and the feature F2, respectively, for generating a scaled feature Fscale.
5. The method of claim 4,
the generating of the attention A emphasizing the corresponding feature in the deepest feature F based on the feature F1 and the feature F2 comprises:
based on the features F1 and F2, attention a emphasizing the corresponding feature in the deepest feature F is generated by the softmax function.
6. The method of claim 5,
inputting the image to be segmented into a segmentation network, obtaining a plurality of layers of feature maps with different levels after multilayer convolution and pooling operation, wherein the step of selecting the deepest feature F comprises the following steps:
inputting an image to be segmented into a segmentation network, and obtaining a characteristic M after convolution and pooling N Said feature M N The feature map of the shallowest layer is obtained;
for feature M N After convolution and pooling operation, the characteristic M is obtained N-1
And repeating the steps, and obtaining the deepest layer characteristic F after convolution and pooling operation for preset times.
7. An attention-scale perceptually-guided, lightweight U-net apparatus, the apparatus comprising:
a feature map acquisition module: the system is used for inputting an image to be segmented into a segmentation network, obtaining a plurality of layers of feature maps with different levels after multilayer convolution and pooling operation, and selecting a deepest feature F;
an input module: the system comprises an attention scale sensing module, a clustering module and a storage module, wherein the attention scale sensing module is used for inputting the deepest layer characteristics F into the attention scale sensing module and obtaining an average pooling result and a maximum pooling result of the deepest layer characteristics F;
a convolution application module: the 1 × 1 convolutional layer is used for respectively applying the weight sharing values to the average pooling result and the maximum pooling result through the attention sensing module to obtain a characteristic F1 and a characteristic F2;
the attention map generation module: generating attention A for emphasizing corresponding features in the deepest features F based on the features F1 and the features F2;
a scaling module: for generating a scaled feature Fscale based on feature F1, feature F2 and attention a;
and an element summation module: the method is used for carrying out element summation operation on the feature Fscale and the input deepest feature F through the learnable parameter a to obtain the feature F SSAM
An output module: features F for sensing module output for attention scale SSAM Executing up-sampling operation, and fusing the sampling result with the multilayer feature graphs of different levels to obtain a feature X m The feature X m And outputting a segmentation result of the defect area of the image to be segmented after one-layer convolution.
8. A storage medium storing a computer program which, when executed by a processor, performs the steps of the attention metric aware guided lightweight U-net method according to any of claims 1 to 6.
CN202211394805.XA 2022-11-09 2022-11-09 Attention scale perception guided lightweight U-net method, device and storage medium Active CN115578565B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211394805.XA CN115578565B (en) 2022-11-09 2022-11-09 Attention scale perception guided lightweight U-net method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211394805.XA CN115578565B (en) 2022-11-09 2022-11-09 Attention scale perception guided lightweight U-net method, device and storage medium

Publications (2)

Publication Number Publication Date
CN115578565A true CN115578565A (en) 2023-01-06
CN115578565B CN115578565B (en) 2023-04-14

Family

ID=84589244

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211394805.XA Active CN115578565B (en) 2022-11-09 2022-11-09 Attention scale perception guided lightweight U-net method, device and storage medium

Country Status (1)

Country Link
CN (1) CN115578565B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111681273A (en) * 2020-06-10 2020-09-18 创新奇智(青岛)科技有限公司 Image segmentation method and device, electronic equipment and readable storage medium
CN112465790A (en) * 2020-12-03 2021-03-09 天津大学 Surface defect detection method based on multi-scale convolution and trilinear global attention
CN113362320A (en) * 2021-07-07 2021-09-07 北京工业大学 Wafer surface defect mode detection method based on deep attention network
US11222217B1 (en) * 2020-08-14 2022-01-11 Tsinghua University Detection method using fusion network based on attention mechanism, and terminal device
US20220335600A1 (en) * 2021-04-14 2022-10-20 Ping An Technology (Shenzhen) Co., Ltd. Method, device, and storage medium for lesion segmentation and recist diameter prediction via click-driven attention and dual-path connection

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111681273A (en) * 2020-06-10 2020-09-18 创新奇智(青岛)科技有限公司 Image segmentation method and device, electronic equipment and readable storage medium
US11222217B1 (en) * 2020-08-14 2022-01-11 Tsinghua University Detection method using fusion network based on attention mechanism, and terminal device
CN112465790A (en) * 2020-12-03 2021-03-09 天津大学 Surface defect detection method based on multi-scale convolution and trilinear global attention
US20220335600A1 (en) * 2021-04-14 2022-10-20 Ping An Technology (Shenzhen) Co., Ltd. Method, device, and storage medium for lesion segmentation and recist diameter prediction via click-driven attention and dual-path connection
CN113362320A (en) * 2021-07-07 2021-09-07 北京工业大学 Wafer surface defect mode detection method based on deep attention network

Also Published As

Publication number Publication date
CN115578565B (en) 2023-04-14

Similar Documents

Publication Publication Date Title
CN113822885B (en) Workpiece defect detection method and device integrating multi-attention machine system
CN110738697A (en) Monocular depth estimation method based on deep learning
CN110837811B (en) Method, device and equipment for generating semantic segmentation network structure and storage medium
CN111738110A (en) Remote sensing image vehicle target detection method based on multi-scale attention mechanism
CN113139543B (en) Training method of target object detection model, target object detection method and equipment
CN109389667B (en) High-efficiency global illumination drawing method based on deep learning
Yin et al. Local binary pattern metric-based multi-focus image fusion
KR102128789B1 (en) Method and apparatus for providing efficient dilated convolution technique for deep convolutional neural network
CN115272437A (en) Image depth estimation method and device based on global and local features
CN113158970B (en) Action identification method and system based on fast and slow dual-flow graph convolutional neural network
CN114708189A (en) Deep learning-based multi-energy X-ray image fusion method and device
CN116740162B (en) Stereo matching method based on multi-scale cost volume and computer storage medium
CN114266711A (en) Generating type image restoration method based on attention cross-layer transfer mechanism
CN115578565B (en) Attention scale perception guided lightweight U-net method, device and storage medium
Wang et al. Multi-scale dense and attention mechanism for image semantic segmentation based on improved DeepLabv3+
CN113313162A (en) Method and system for detecting multi-scale feature fusion target
CN116468702A (en) Chloasma assessment method, device, electronic equipment and computer readable storage medium
CN116434303A (en) Facial expression capturing method, device and medium based on multi-scale feature fusion
CN112927250B (en) Edge detection system and method based on multi-granularity attention hierarchical network
CN113034432B (en) Product defect detection method, system, device and storage medium
CN115713624A (en) Self-adaptive fusion semantic segmentation method for enhancing multi-scale features of remote sensing image
CN115345917A (en) Multi-stage dense reconstruction method and device for low video memory occupation
CN113807354B (en) Image semantic segmentation method, device, equipment and storage medium
CN113012132B (en) Image similarity determination method and device, computing equipment and storage medium
CN113642452A (en) Human body image quality evaluation method, device, system and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant