WO2023102724A1 - 图像的处理方法和系统 - Google Patents

图像的处理方法和系统 Download PDF

Info

Publication number
WO2023102724A1
WO2023102724A1 PCT/CN2021/136054 CN2021136054W WO2023102724A1 WO 2023102724 A1 WO2023102724 A1 WO 2023102724A1 CN 2021136054 W CN2021136054 W CN 2021136054W WO 2023102724 A1 WO2023102724 A1 WO 2023102724A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
sequence
fused
fusion
input
Prior art date
Application number
PCT/CN2021/136054
Other languages
English (en)
French (fr)
Inventor
王智玉
黄强威
黄伯雄
Original Assignee
宁德时代新能源科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 宁德时代新能源科技股份有限公司 filed Critical 宁德时代新能源科技股份有限公司
Priority to CN202180078455.3A priority Critical patent/CN116848547A/zh
Priority to EP21960096.2A priority patent/EP4220543A4/en
Priority to PCT/CN2021/136054 priority patent/WO2023102724A1/zh
Priority to US18/140,642 priority patent/US11948287B2/en
Publication of WO2023102724A1 publication Critical patent/WO2023102724A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/803Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of input or preprocessed data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/60Image enhancement or restoration using machine learning, e.g. neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/778Active pattern-learning, e.g. online learning of image or video features
    • G06V10/7784Active pattern-learning, e.g. online learning of image or video features based on feedback from supervisors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/2628Alteration of picture size, shape, position or orientation, e.g. zooming, rotation, rolling, perspective, translation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/265Mixing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10141Special mode during image acquisition
    • G06T2207/10148Varying focus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Definitions

  • This application relates to computer technology, especially image processing technology.
  • Image processing by computer is widely used in various fields. Image processing can be used to improve the visual quality of images, extract features of specific objects in images, store and transmit images, and fuse image sequences. When shooting a target object, it is often necessary to shoot a series of images with different focal points to capture the target object. In such cases, it is desirable to fuse the captured image sequence for subsequent image processing.
  • the present application provides an image processing method and system capable of providing a fused image in which every pixel of a target object is in focus.
  • the present application provides an image processing method, including: acquiring an input image sequence containing a target object; and performing multi-resolution fusion on the input image sequence to generate a single fused image, wherein the fused image A pixel comprising a pixel of a corresponding position of an input image in the sequence of input images, each pixel in the fused image containing the target object comprising a pixel in the sequence of input images in which a portion of the target object is in focus The pixel at the corresponding location of the input image.
  • the index of the input image sequence for each focused pixel point of the target object is learned, and the part corresponding to the clearest part in the input image sequence is extracted to perform pixel-level fusion on it to integrate the
  • the different image sequences in the focus area are fused into a single full-clear image of the target object, achieving pixel-level precision full-clear and fusion images that retain the detailed information of the target object, effectively improving the information utilization rate of the image.
  • capturing the input image sequence further comprises: setting a step of a camera for capturing the input image sequence based on the frame number of the input image sequence and the size of the target object in the input image sequence long. Setting the camera step size based on the size of the target object and the number of frames of the input image sequence can ensure that the captured input image sequence can cover all the focus areas of the target object, thereby ensuring that each pixel of the target object in the fused image includes the focus part .
  • the input image sequence includes an index
  • performing multi-resolution fusion on the input image sequence to generate a fused image further comprises: performing feature extraction on the input image sequence; performing multi-resolution on the extracted features Rate fusion to obtain the fused multi-resolution feature; generate a prediction mask map based on the fused multi-resolution feature, wherein each pixel of the prediction mask map indicates the index of the input image, and the index indicates the fusion an input image from which each pixel of the image is derived; and generating the fused image based on the predicted mask map and the sequence of input images.
  • the clearest part of the target object in the multi-frame image sequence is found and fused by means of semantic segmentation, which can enable the internal convolution of the deep learning semantic segmentation neural network to learn the relative position information of each clear pixel (ie, the index of the input image sequence) , extract the part corresponding to the clearest part in the input image sequence to perform pixel-level fusion on it to fuse image sequences with different focus areas in the same scene into a single full-clear image of the target object, achieving full clarity and preservation of pixel-level precision
  • the fused image of the detailed information of the target object can effectively improve the information utilization rate of the image.
  • the method further comprises: applying a 2D fusion algorithm to the sequence of input images to generate an initial fused image; and receiving ground-truth annotations for the initial fused image to generate an annotated mask map, wherein The annotated mask map indicates whether one or more pixels of the target object in the initial fused image are in focus.
  • the part of the target object in the initial fusion image that is still blurred (unfocused) is marked with a mask, and it is excluded from the training sample set to obtain a real training data set that only contains the true value label of the focused pixel, which can quickly generate a large number of Task-related training data, and real and effective production line data can be used for semantic segmentation model training.
  • the solution of this application only needs to collect some real and effective data for fine-tuning training for different production lines, and then it can be replicated in batches and promoted to these different production lines, which can cover actual needs and truly implement the technology into the actual application of each production line middle.
  • the method further includes: calculating a loss rate between the predicted mask map and the labeled mask map; Multi-resolution fusion algorithm for rate fusion.
  • the loss rate between the predicted mask map and/or fused image output by the multi-resolution fusion algorithm and the mask map marked by the ground truth reflects the difference between the predicted mask map and/or fused image output by the multi-resolution fusion algorithm and the original Similarity between input images.
  • the method further comprises: updating a multi-resolution fusion algorithm for performing the multi-resolution fusion based on the loss rate or the annotated mask map or a combination of both.
  • the calculated loss rate is fed back to the multi-resolution fusion algorithm together with the mask image labeled with the true value, and supervised learning and training are performed on the output of the multi-resolution fusion algorithm. While achieving the regression of training fit, continuous training and Learn to improve the accuracy of multi-resolution fusion algorithms in generating fully-sharp fused images of objects of interest.
  • the present application provides an image processing system, including: an acquisition module configured to acquire an input image sequence containing a target object; and a fusion module configured to perform multi-resolution fusion on the input image sequence to generate a single fused image, wherein the pixels of the fused image include pixels corresponding to one of the input images in the input image sequence, and each pixel in the fused image that contains the target object includes the input image sequence The pixel in the corresponding position of an input image in which a part of the target object is in focus.
  • the index of the input image sequence for each focused pixel point of the target object is learned, and the part corresponding to the clearest part in the input image sequence is extracted to perform pixel-level fusion on it to integrate the
  • the different image sequences in the focus area are fused into a single full-clear image of the target object, achieving pixel-level precision full-clear and fusion images that retain the detailed information of the target object, effectively improving the information utilization rate of the image.
  • the acquisition module is further configured to set the number of frames of the input image sequence and the size of the target object in the input image sequence to set the camera used to acquire the input image sequence step size. Setting the camera step size based on the size of the target object and the number of frames of the input image sequence can ensure that the captured input image sequence can cover all the focus areas of the target object, thereby ensuring that each pixel of the target object in the fused image includes the focus part .
  • the input image sequence contains an index
  • the fusion module further comprises: an encoder configured to: perform feature extraction on the input image sequence; perform multi-resolution fusion on the extracted features, obtaining a fused multi-resolution feature; and a decoder configured to: generate a prediction mask map based on the fused multi-resolution feature, wherein each pixel of the prediction mask map indicates an index of an input image, so The index indicates the input image from which each pixel of the fused image originates.
  • the multi-resolution fusion method of this application starts with the structure of the semantic segmentation neural network, and proposes an end-to-end adaptive multi-focus fusion scheme based on deep learning.
  • the image sequence is passed through the model encoder to extract deep features, the features are fused, and the decoder is used to generate a fused image, so that the deep learning semantic segmentation neural network can learn each clear pixel through internal convolution
  • the relative position information that is, the index of the input image sequence
  • the depth model to reduce the dependence of the traditional algorithm on the threshold, enhance the fusion robustness.
  • the fusion module is further configured to generate the fused image based on the predicted mask map and the sequence of input images.
  • the clearest part of the target object in the multi-frame image sequence is found and fused by means of semantic segmentation, which can enable the internal convolution of the deep learning semantic segmentation neural network to learn the relative position information of each clear pixel (ie, the index of the input image sequence) , extract the part corresponding to the clearest part in the input image sequence to perform pixel-level fusion on it to fuse image sequences with different focus areas in the same scene into a single full-clear image of the target object, achieving full clarity and preservation of pixel-level precision
  • the fused image of the detailed information of the target object can effectively improve the information utilization rate of the image.
  • the system further comprises: an initial fusion module configured to apply a 2D fusion algorithm to the sequence of input images to generate an initial fusion image; and an annotation receiving module configured to receive a reference to the The ground truth annotation of the initial fused image to generate an annotated mask map, wherein the annotated mask map indicates whether one or more pixels of the target object in the initial fused image are in focus.
  • the part of the target object in the initial fusion image that is still blurred (unfocused) is marked with a mask, and it is excluded from the training sample set to obtain a real training data set that only contains the true value label of the focused pixel, which can quickly generate a large number of Task-related training data, and real and effective production line data can be used for semantic segmentation model training.
  • the solution of this application only needs to collect some real and effective data for fine-tuning training for different production lines, and then it can be replicated in batches and promoted to these different production lines, which can cover actual needs and truly implement the technology into the actual application of each production line middle.
  • the system further includes: a loss rate module configured to: calculate a loss rate between the predicted mask map and the labeled mask map; Feedback to the fusion module.
  • the loss rate between the predicted mask map and/or fused image output by the multi-resolution fusion algorithm and the mask map marked by the ground truth reflects the difference between the predicted mask map and/or fused image output by the multi-resolution fusion algorithm and the original Similarity between input images.
  • the fusion module is further configured to update the fusion module based on the loss rate or the annotated mask map or a combination of both.
  • the calculated loss rate is fed back to the multi-resolution fusion algorithm together with the mask image labeled with the true value, and supervised learning and training are performed on the output of the multi-resolution fusion algorithm. While achieving the regression of training fit, continuous training and Learn to improve the accuracy of multi-resolution fusion algorithms in generating fully-sharp fused images of objects of interest.
  • the present application provides an image processing system, including: a memory storing computer-executable instructions; and a processor coupled to the memory, wherein the computer-executable instructions are executed by the processor When executed, causes the system to: acquire a sequence of input images comprising a target object; and perform multi-resolution fusion on the sequence of input images to generate a single fused image, wherein pixels of the fused image comprise the sequence of input images Each pixel in the fused image containing the target object includes a pixel at a corresponding position in an input image in the sequence of input images in which a portion of the target object is in focus .
  • the index of the input image sequence for each focused pixel point of the target object is learned, and the part corresponding to the clearest part in the input image sequence is extracted to perform pixel-level fusion on it to integrate the
  • the different image sequences in the focus area are fused into a single full-clear image of the target object, achieving pixel-level precision full-clear and fusion images that retain the detailed information of the target object, effectively improving the information utilization rate of the image.
  • Fig. 1 is a flowchart of an image processing method according to some embodiments of the present application.
  • FIG. 2 is a functional block diagram of an image processing system according to some embodiments of the present application.
  • Fig. 3 is a structural block diagram of a fusion module according to some embodiments of the present application.
  • FIG. 4 is a schematic diagram of a specific implementation of a fusion module according to some embodiments of the present application.
  • FIG. 5 is a structural block diagram of a computer system suitable for implementing an image processing system according to some embodiments of the present application.
  • Image processing by computer is widely used in various fields. Image processing can be used to improve the visual quality of images, extract features of specific objects in images, store and transmit images, and fuse image sequences. When shooting a target object, it is often necessary to shoot a series of images with different focal points to capture the target object. In such cases, it is desirable to fuse the captured image sequence for subsequent image processing.
  • Some methods to fuse images include, for example, a deconvolution network using low-pass and high-pass filters to extract low-frequency and high-frequency information of source images to fuse images. Because this method fails to make full use of the information in the middle layer of the network, the fused image obtained by summing the inferred fused feature map and convolution often loses some of the original information with different sharp focus in the source image sequence.
  • Other approaches to fused images involve Gaussian blurring different regions of the labeled image as training data. Since the training data of this method does not come from the real production line, it is difficult to simulate and cover the actual needs, and the practicability is poor.
  • the present application provides an image processing technology capable of providing a fused image in which every pixel of a target object is in focus.
  • the image processing method of the present application includes: acquiring an input image sequence containing a target object; and performing multi-resolution fusion on the input image sequence to generate a single fused image, wherein pixels of the fused image include those in the input image sequence
  • Each pixel in the fused image containing the target object includes a pixel at a corresponding position in an input image in which a part of the target object is in focus in the sequence of input images.
  • the scheme of this application starts with the structure of the semantic segmentation neural network, and proposes an end-to-end adaptive multi-focus fusion scheme based on deep learning.
  • the image sequence is passed through the model encoder to extract deep features, the features are fused, and the decoder is used to generate a fused image, so that the deep learning semantic segmentation neural network can learn each clear pixel through internal convolution
  • the relative position information of the input image sequence that is, the index of the input image sequence
  • the image realizes the fusion image with full clarity of pixel-level precision and retains the detailed information of the target object, effectively improves the information utilization rate of the image, and reduces the dependence of the traditional algorithm on the threshold through the depth model, and enhances the robustness of the fusion.
  • the technical solutions of the embodiments of the present application are applicable to situations where the input image sequence is fused and each pixel of the target object in the fused image is required to have a high resolution, including but not limited to, parts such as tabs in lithium batteries
  • the method includes: at step 105, acquiring an input image sequence containing a target object; and at step 110, performing multi-resolution fusion on the input image sequence to generate a single fused image, wherein the fused image
  • the pixels of the input image sequence comprise pixels corresponding to one of the input images, and each pixel of the fused image containing the target object comprises the input image sequence in which a part of the target object is in focus.
  • the input image sequence may include a series of images captured in the same scene focusing on different parts of the target object, such as a sequence of images focusing on different pole pieces of a tab in the same scene, in which Each image of has a corresponding index, such as image 1, image 2, . . . , image k.
  • performing multi-resolution fusion on the sequence of input images to generate a fused image may include inputting the sequence of input images (such as image 1, image 2, . . . module to generate a single fused image (such as image k+1).
  • the multi-resolution fusion algorithm is an algorithm that can be implemented by a deep learning semantic segmentation neural network, which learns the image sequence index in which each pixel of the target object in the input image sequence is focused, and extracts the corresponding image sequence index and perform multi-resolution pixel-level fusion, resulting in a fused image in which every pixel of the object of interest is in focus.
  • the multi-resolution fusion algorithm learns that the i-th row and j-th column pixel of the input image (the pixel presenting the target object) is focused in image 2 in the image sequence 1-k, and the i-th row j+1 of the input image If a column of pixels is focused in image k in the image sequence 1-k, the image index value for the i-th row and j-th column pixel is 2, and the image index value for the i-th row and j+1 column pixel is k , and so on to obtain the set of image sequence indexes in which each pixel of the target object in the input image is focused, and extract the pixel of the input image in which each pixel is focused in the image sequence 1-k (ie, for the ith row).
  • the pixel in the jth column extracts the pixel value of the corresponding pixel from the image 2 in the image sequence 1-k, and extracts the pixel value of the corresponding pixel from the image k in the image sequence 1-k for the pixel in the
  • the index of the input image sequence for each focused pixel point of the target object is learned, and the part corresponding to the clearest part in the input image sequence is extracted to perform pixel-level fusion on it to integrate the
  • the different image sequences in the focus area are fused into a single full-clear image of the target object, achieving pixel-level precision full-clear and fusion images that retain the detailed information of the target object, effectively improving the information utilization rate of the image.
  • step 105 further includes: based on the frame number of the input image sequence and the size of the target object in the input image sequence, setting The step size of the camera.
  • the width L of the target object (such as a lug) can first be obtained from measurements (e.g., physical measurements via mechanical equipment), and then can be input based on continuous shooting.
  • Setting the camera step size based on the size of the target object and the number of frames of the input image sequence can ensure that the captured input image sequence can cover all the focus areas of the target object, thereby ensuring that each pixel of the target object in the fused image includes the focus part .
  • FIG. 2 is a structural block diagram of a fusion module according to some embodiments of the present application and FIG. 3 is a fusion module according to some embodiments of the present application
  • step 110 further includes: performing feature extraction on the input image sequence; performing multi-resolution fusion on the extracted features to obtain fused multi-resolution features; based on fusion
  • the subsequent multi-resolution features generate a predicted mask map, wherein each pixel of the predicted mask map indicates an index of an input image, and the index indicates the input image from which each pixel of the fused image originates; and according to the The predicted mask map and the input image sequence are used to generate the fused image.
  • performing feature extraction on the input image sequence may include
  • Each input image is respectively input to the encoder in the fusion module for performing the multi-resolution fusion algorithm to obtain the multi-resolution features of the image, as shown in FIG. 2 .
  • the basic structure of the encoder may include a convolution (convolution) layer, a batch normalization (batch normalization) layer, and a non-linear activation (rectified linear unit, RLU) layer, as shown in Figure 3.
  • performing multi-resolution fusion on the extracted features may include inputting the multi-resolution features of each image of the input image sequence 1-k into an encoder module in a fusion module for performing the multi-resolution fusion algorithm to perform concatenation (or concat) fusion on it, as shown in Figure 2.
  • generating a prediction mask map based on the fused multi-resolution features may include inputting the fused multi-resolution features to a decoder module in a fusion module for performing a multi-resolution fusion algorithm to output a prediction mask map, where each pixel in the prediction mask map indicates the index of the input image in which the pixel is focused, for example, each pixel in the prediction mask map has a value of 0, 1, ...
  • the basic structure of the decoder may include a convolutional layer, a batch normalization layer, a nonlinear activation layer, and a bilinear upsampling (bilinear upsample) layer, as shown in FIG. 3 .
  • generating the fused image based on the predicted mask map and the input image sequence may include based on an image sequence index in which each pixel of the target object is focused and the input image sequence 1 -k, a fused image in which each pixel of the target object is in focus can be obtained, for example, the i-th row and j-column pixel of the fused image includes the value of the corresponding pixel from Figure 2, and the i-th row and j+1 column of the fused image A pixel includes the value of the corresponding pixel from map k, and so on.
  • the clearest part of the target object in the multi-frame image sequence is found and fused by means of semantic segmentation, which can enable the internal convolution of the deep learning semantic segmentation neural network to learn the relative position information of each clear pixel (ie, the index of the input image sequence) , extract the part corresponding to the clearest part in the input image sequence to perform pixel-level fusion on it to fuse image sequences with different focus areas in the same scene into a single full-clear image of the target object, achieving full clarity and preservation of pixel-level precision
  • the fused image of the detailed information of the target object can effectively improve the information utilization rate of the image.
  • the method further includes: applying a 2D fusion algorithm to the input image sequence to generate an initial fusion image; Annotating a mask map, wherein the annotated mask map indicates whether one or more pixels of the target object in the initial fused image are in focus.
  • applying a 2D fusion algorithm to the sequence of input images to generate an initial fused image may include utilizing a prior art image fusion algorithm to obtain a sequence of input images (such as image 1, image 2, ... image k) The initial fused image of (such as image k+1').
  • receiving ground truth annotations for the initial fused image to generate an annotated mask map may include receiving ground truth (GT) annotations for an initial fused image (such as image k+1′) to generate An annotated masker, wherein the annotated mask map indicates whether each pixel of the target object in the initial fused image is in focus.
  • GT ground truth
  • the value of each pixel of the labeled mask may be 0 or 1, where 1 indicates that the pixel is in focus and 0 indicates that the pixel is not in focus.
  • the annotated mask maps one or more pixels of an object of interest that are not in focus to be masked out of the data samples such that they are fed into a multi-resolution fusion algorithm for performing multi-resolution fusion
  • the data for contains only ground-truth labels for focused pixels.
  • the part of the target object in the initial fusion image that is still blurred (unfocused) is marked with a mask, and it is excluded from the training sample set to obtain a real training data set that only contains the true value label of the focused pixel, which can quickly generate a large number of Task-related training data, and real and effective production line data can be used for semantic segmentation model training.
  • the solution of this application only needs to collect some real and effective data for fine-tuning training for different production lines, and then it can be replicated in batches and promoted to these different production lines, which can cover actual needs and truly implement the technology into the actual application of each production line middle.
  • the method further includes: calculating a loss rate between the predicted mask map and the marked mask map; feeding back the calculated loss rate to the user A multi-resolution fusion algorithm for performing the multi-resolution fusion.
  • calculating the loss ratio between the predicted mask and the annotated mask may include using one of MSE (mean squared error loss function) and SSIM (image quality loss function) as follows or More than one:
  • u represents the average value of all pixels of the image patch (patch), and ⁇ represents the variance of pixels within the image patch.
  • R has a value of 0 to 255 if the image data type is unit8, and -1 to 1 if the data image type is floating point.
  • the values of K 1 and K 2 can be derived based on heuristics. In some examples, K 1 may take 0.01 and K 2 may take 0.03.
  • the selection of the image patch (patch) can be achieved by using a window, such as using a window size of 11 ⁇ 11 to select the image patch, as long as the length of the window side is an odd number (the central pixel is guaranteed to exist).
  • SSIM focuses on the similarity of three aspects of the image: image illumination (such as image block mean, gray value), image contrast (such as image block variance), and image structure ratio (such as normalized pixel vector).
  • training fit can be achieved using either or both of the MSE and SSIM loss functions to measure the similarity of the fused image (predicted mask) to the input image sequence (annotated mask) regression.
  • the loss rate between the predicted mask map and/or fused image output by the multi-resolution fusion algorithm and the mask map marked by the ground truth reflects the difference between the predicted mask map and/or fused image output by the multi-resolution fusion algorithm and the original Similarity between input images.
  • the accuracy of the multi-resolution fusion algorithm in terms of generating a fully clear fused image of the target object.
  • the method further includes: updating the multi-resolution fusion algorithm for performing the multi-resolution fusion based on the loss rate or the labeled mask map or a combination of the two. Resolution fusion algorithm.
  • the calculated loss rate is fed back to the multi-resolution fusion algorithm together with the mask image labeled with the true value, and supervised learning and training are performed on the output of the multi-resolution fusion algorithm. While achieving the regression of training fit, continuous training and Learn to improve the accuracy of multi-resolution fusion algorithms in generating fully-sharp fused images of objects of interest.
  • the present application provides an image processing method, comprising: acquiring an input image sequence containing pole ears, and the input image sequence contains indexes 1...k, where Set the step size used by the CCD camera for collecting the input image sequence according to the measured width L of the lug and the number of frames k of the input image sequence, i.e.
  • m L/step; apply the 2D fusion algorithm to the input image sequence the sequence of input images to generate an initial fused image; receiving ground-truth annotations for the initial fused image to generate an annotated mask map, wherein the annotated mask map indicates one or one of the ears in the initial fused image Whether multiple pixels are in focus; perform depth feature extraction on the input image sequence; perform multi-resolution fusion on the extracted depth features to obtain fused multi-resolution features; generate a prediction mask based on the fused multi-resolution features map, wherein each pixel of the predicted mask map indicates an index of an input image, the index indicating the input image from which each pixel of the fused image originates; according to the predicted mask map and the sequence of input images generating a single fused image, wherein the pixels of the fused image include pixels at a corresponding position of an input image in the sequence of input images, and each pixel in the fused image including a tab includes a tab in the sequence of input images Pixels corresponding to positions of an input image in which
  • FIG. 4 it shows a functional block diagram of an image processing system according to some embodiments of the present application, and the present application provides an image processing system.
  • bold rectangular boxes represent logic modules configured to perform operations described with reference to the above, while flag-shaped boxes represent outputs of preceding logic modules.
  • arrows indicate the logical order and direction of operations described with reference to the above. As shown in FIG. 4
  • the system includes: an acquisition module 405 configured to acquire an input image sequence containing a target object; and a fusion module 410 configured to perform multi-resolution fusion on the input image sequence to generate a single a fused image, wherein the pixels of the fused image include pixels at a corresponding position of an input image in the input image sequence, and each pixel in the fused image that contains the target object includes the The pixel at the corresponding location of an input image in which a part of the target object is in focus.
  • the index of the input image sequence for each focused pixel point of the target object is learned, and the part corresponding to the clearest part in the input image sequence is extracted to perform pixel-level fusion on it to integrate the
  • the different image sequences in the focus area are fused into a single full-clear image of the target object, achieving pixel-level precision full-clear and fusion images that retain the detailed information of the target object, effectively improving the information utilization rate of the image.
  • the acquisition module 405 is further configured to set the The step size of the camera that describes the input image sequence.
  • Setting the camera step size based on the size of the target object and the number of frames of the input image sequence can ensure that the captured input image sequence can cover all the focus areas of the target object, thereby ensuring that each pixel of the target object in the fused image includes the focus part .
  • FIG. 2 is a structural block diagram of a fusion module according to some embodiments of the present application and FIG. 3 is a fusion module according to some embodiments of the present application
  • the input image sequence includes an index
  • the fusion module 410 further includes: an encoder configured to: perform feature extraction on the input image sequence; perform multi-resolution fusion on the extracted features, obtaining a fused multi-resolution feature; and a decoder configured to: generate a prediction mask map based on the fused multi-resolution feature, wherein each pixel of the prediction mask map indicates an index of an input image, so The index indicates the input image from which each pixel of the fused image originates.
  • the clearest part of the target object in the multi-frame image sequence is found and fused by means of semantic segmentation, which can enable the internal convolution of the deep learning semantic segmentation neural network to learn the relative position information of each clear pixel (ie, the index of the input image sequence) , extract the part corresponding to the clearest part in the input image sequence to perform pixel-level fusion on it to fuse image sequences with different focus areas in the same scene into a single full-clear image of the target object, achieving full clarity and preservation of pixel-level precision
  • the fused image of the detailed information of the target object can effectively improve the information utilization rate of the image.
  • the fusion module 410 is further configured to generate the fusion image according to the prediction mask map and the input image sequence.
  • the clearest part of the target object in the multi-frame image sequence is found and fused by means of semantic segmentation, which can enable the internal convolution of the deep learning semantic segmentation neural network to learn the relative position information of each clear pixel (ie, the index of the input image sequence) , extract the part corresponding to the clearest part in the input image sequence to perform pixel-level fusion on it to fuse image sequences with different focus areas in the same scene into a single full-clear image of the target object, achieving full clarity and preservation of pixel-level precision
  • the fused image of the detailed information of the target object can effectively improve the information utilization rate of the image.
  • the system further includes: an initial fusion module 415 configured to apply a 2D fusion algorithm to the input image sequence to generate an initial fusion image; and an annotation receiving module 420, It is configured to receive ground-truth annotations of the initial fused image to generate an annotated mask map, wherein the annotated mask map indicates whether one or more pixels of the target object in the initial fused image focus.
  • an initial fusion module 415 configured to apply a 2D fusion algorithm to the input image sequence to generate an initial fusion image
  • an annotation receiving module 420 It is configured to receive ground-truth annotations of the initial fused image to generate an annotated mask map, wherein the annotated mask map indicates whether one or more pixels of the target object in the initial fused image focus.
  • the part of the target object in the initial fusion image that is still blurred (unfocused) is marked with a mask, and it is excluded from the training sample set to obtain a real training data set that only contains the true value label of the focused pixel, which can quickly generate a large number of Task-related training data, and real and effective production line data can be used for semantic segmentation model training.
  • the solution of this application only needs to collect some real and effective data for fine-tuning training for different production lines, and then it can be replicated in batches and promoted to these different production lines, which can cover actual needs and truly implement the technology into the actual application of each production line middle.
  • the system further includes: a loss rate module 425 configured to: calculate a loss rate between the predicted mask map and the labeled mask map; The calculated loss rate is fed back to the fusion module.
  • a loss rate module 425 configured to: calculate a loss rate between the predicted mask map and the labeled mask map; The calculated loss rate is fed back to the fusion module.
  • the loss rate between the predicted mask map and/or fused image output by the multi-resolution fusion algorithm and the mask map marked by the ground truth reflects the difference between the predicted mask map and/or fused image output by the multi-resolution fusion algorithm and the original Similarity between input images.
  • the accuracy of the multi-resolution fusion algorithm in terms of generating a fully clear fused image of the target object.
  • the fusion module 410 is further configured to update the fusion module based on the loss rate or the labeled mask map or a combination of the two.
  • the calculated loss rate is fed back to the multi-resolution fusion algorithm together with the mask image labeled with the true value, and supervised learning and training are performed on the output of the multi-resolution fusion algorithm. While achieving the regression of training fit, continuous training and Learn to improve the accuracy of multi-resolution fusion algorithms in generating fully-sharp fused images of objects of interest.
  • the present application provides an image processing system, including:
  • An acquisition module 405 configured to acquire an input image sequence comprising tabs, the input image sequence comprising indices 1...k, wherein the width L of the tabs obtained according to measurement and the number of frames k of continuously shooting the input image sequence are determined
  • Set the step size used by the CCD camera for collecting the input image sequence, i.e. m L/step;
  • Fusion module 410 the fusion module includes:
  • An encoder configured to perform depth feature extraction on the input image sequence; perform multi-resolution fusion on the extracted depth features to obtain fused multi-resolution features;
  • a decoder configured to generate a prediction mask map based on the fused multi-resolution features, wherein each pixel of the prediction mask map indicates an index of an input image, the index indicating each pixel of the fused image the input image from;
  • the fusion module 410 is further configured to generate the fused image according to the predicted mask map and the sequence of input images, wherein the pixels of the fused image include a corresponding position of an input image in the sequence of input images pixels, each pixel in the fused image comprising a tab comprises a pixel in a corresponding position of an input image in which a portion of the tab in the sequence of input images is in focus; and based on the loss rate or the labeled mask code map or a combination of both to update a multi-resolution fusion algorithm for performing said multi-resolution fusion;
  • an initial fusion module 415 configured to apply a 2D fusion algorithm to the sequence of input images to generate an initial fusion image
  • An annotation receiving module 420 configured to receive ground-truth annotations on the initial fused image to generate an annotated mask map, wherein the annotated mask image indicates one or more of the ears in the initial fused image Whether a pixel is in focus;
  • FIG. 5 it is a structural block diagram of a computer system suitable for implementing an image processing system according to some embodiments of the present application.
  • the system includes: a memory 028 having computer-executable instructions stored thereon; and a processor 016 coupled to the memory 028, wherein the computer-executable instructions, when executed by the processor 016, cause
  • the system performs the following operations: acquiring a sequence of input images containing a target object; and performing multi-resolution fusion on the sequence of input images to generate a single fused image, wherein pixels of the fused image comprise one of the sequences of input images Pixels corresponding to positions of the input images, each pixel in the fused image containing the target object comprises pixels corresponding to positions of an input image in the sequence of input images in which a portion of the target object is in focus.
  • Figure 5 shows a block diagram of an exemplary computer system 012 suitable for use in implementing embodiments of the invention.
  • the computer system 012 shown in FIG. 5 is only an example, and should not limit the functions and scope of use of the embodiments of the present invention.
  • computer system 012 takes the form of a general-purpose computing device.
  • Components of computer system 012 may include, but are not limited to: one or more processors or processing units 016, system memory 028, bus 018 connecting various system components including system memory 028 and processing unit 016.
  • Bus 018 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or a local bus using any of a variety of bus structures.
  • bus structures include, by way of example, but are not limited to Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MAC) bus, Enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect ( PCI) bus.
  • Computer system 012 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by computer system 012 and include both volatile and nonvolatile media, removable and non-removable media.
  • System memory 028 may include computer system readable media in the form of volatile memory, such as random access memory (RAM) 030 and/or cache memory 032 .
  • the computer system 012 may further include other removable/non-removable, volatile/nonvolatile computer system storage media.
  • storage system 034 may be used to read from and write to non-removable, non-volatile magnetic media (not shown in FIG. 5, commonly referred to as a "hard drive").
  • a disk drive for reading and writing to a removable non-volatile disk may be provided, as well as a removable non-volatile disk (such as a CD-ROM, DVD-ROM) or other optical media) CD-ROM drive.
  • each drive may be connected to bus 018 via one or more data media interfaces.
  • Memory 028 may include at least one program product having a set (eg, at least one) of program modules configured to perform the functions of various embodiments of the present invention.
  • Program modules 042 generally perform the functions and/or methods of the described embodiments of the present invention.
  • the computer system 012 can also communicate with one or more external devices 014 (such as keyboards, pointing devices, displays 024, etc.). Communicate with devices capable of interacting with the computer system 012, and/or communicate with any device (eg, network card, modem, etc.) that enables the computer system 012 to communicate with one or more other computing devices. Such communication may occur through input/output (I/O) interface 022 . Also, the computer system 012 can also communicate with one or more networks (eg, a local area network (LAN), a wide area network (WAN) and/or a public network, such as the Internet) through the network adapter 020 . As shown, network adapter 020 communicates with other modules of computer system 012 via bus 018 .
  • LAN local area network
  • WAN wide area network
  • public network such as the Internet
  • the processing unit 016 executes various functional applications and data processing by running the programs stored in the system memory 028 , such as implementing the method flow provided by the embodiment of the present invention.
  • the above-mentioned computer program can be set in a computer storage medium, that is, the computer storage medium is encoded with a computer program, and when the program is executed by one or more computers, one or more computers can execute the computer programs shown in the above-mentioned embodiments of the present invention.
  • Method flow and/or device operation For example, the process of the method provided by the embodiment of the present invention is executed by the above-mentioned one or more processors.
  • the computer readable medium may be a computer readable signal medium or a computer readable storage medium.
  • a computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof. More specific examples (non-exhaustive list) of computer readable storage media include: electrical connections with one or more leads, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), Erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer readable signal medium may include a data signal carrying computer readable program code in baseband or as part of a carrier wave. Such propagated data signals may take many forms, including - but not limited to - electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device. .
  • Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including - but not limited to - wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • Computer program code for carrying out the operations of the present invention may be written in one or more programming languages, or combinations thereof, including object-oriented programming languages—such as Java, Smalltalk, C++, including conventional Procedural Programming Language - such as "C" or a similar programming language.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as via the Internet using an Internet service provider). connect).
  • LAN local area network
  • WAN wide area network
  • connect such as via the Internet using an Internet service provider
  • the index of the input image sequence for each focused pixel point of the target object is learned, and the part corresponding to the clearest part in the input image sequence is extracted to perform pixel-level fusion on it to integrate the
  • the different image sequences in the focus area are fused into a single full-clear image of the target object, achieving pixel-level precision full-clear and fusion images that retain the detailed information of the target object, effectively improving the information utilization rate of the image.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

本申请涉及图像的处理方法和系统。该方法,包括:采集包含目标对象的输入图像序列;以及对所述输入图像序列执行多分辨率融合以生成单个融合图像,其中所述融合图像的像素包括所述输入图像序列中的一个输入图像的对应位置的像素,所述融合图像中包含所述目标对象的每一像素包括所述输入图像序列中所述目标对象的一部分在其中聚焦的一个输入图像的对应位置的像素。

Description

图像的处理方法和系统 技术领域
本申请涉及计算机技术,尤其涉及图像的处理技术。
背景技术
利用计算机进行图像处理在各个领域被广泛应用。图像处理可以被用于提升图像的视觉质量、提取图像中的特定目标的特征、图像的存储和传输、图像序列的融合等。在拍摄目标对象时,往往需要拍摄一系列的焦点不同的图像来捕获目标对象。在此类情形中,将所拍摄的图像序列进行融合以供后续图像处理是合乎需要的。
因此,需要一种用于图像融合的改进的技术。
发明内容
鉴于上述问题,本申请提供了能够提供目标对象的每一像素均聚焦的融合图像的图像处理方法和系统。
第一方面,本申请提供了一种图像的处理方法,包括:采集包含目标对象的输入图像序列;以及对所述输入图像序列执行多分辨率融合以生成单个融合图像,其中所述融合图像的像素包括所述输入图像序列中的一个输入图像的对应位置的像素,所述融合图像中包含所述目标对象的每一像素包括所述输入图像序列中所述目标对象的一部分在其中聚焦的一个输入图像的对应位置的像素。
在本申请实施例的技术方案中,学习针对目标对象的每个聚焦像素点的输入图像序列的索引,提取输入图像序列中对应于最清晰的部分以对其执行像素级融合以将同一场景下聚焦区域不同的图像序列融合成单张目标对象全清晰的图像,实现像素级精度的全清晰且保留目标对象细节信息的融合图像,有效提高图像的信息利用率。
在一些实施例中,采集输入图像序列进一步包括:基于所述输入图像序列 的帧数和所述输入图像序列中的所述目标对象的尺寸来设置用于采集所述输入图像序列的相机的步长。基于目标对象的尺寸和输入图像序列的帧数来设置相机步长能够确保所采集的输入图像序列能够覆盖目标对象的所有聚焦区域,从而保证融合图像中的目标对象的每一像素均包括聚焦部分。
在一些实施例中,所述输入图像序列包含索引,对所述输入图像序列执行多分辨率融合以生成融合图像进一步包括:对所述输入图像序列执行特征提取;对所提取的特征执行多分辨率融合,得到融合后的多分辨率特征;基于融合后的多分辨率特征生成预测掩码图,其中所述预测掩码图的每一像素指示输入图像的索引,所述索引指示所述融合图像的每一像素源自的输入图像;以及根据所述预测掩码图和所述输入图像序列生成所述融合图像。通过语义分割的方式寻找到多帧图像序列中目标对象最清晰的部分进行融合,可以使得深度学习语义分割神经网络内部卷积学习每个清晰像素点的相对位置信息(即输入图像序列的索引),提取输入图像序列中对应于最清晰的部分以对其执行像素级融合以将同一场景下聚焦区域不同的图像序列融合成单张目标对象全清晰的图像,实现像素级精度的全清晰且保留目标对象细节信息的融合图像,有效提高图像的信息利用率。
在一些实施例中,所述方法进一步包括:将2D融合算法应用于所述输入图像序列以生成初始融合图像;以及接收对所述初始融合图像的真值标注以生成经标注掩码图,其中所述经标注掩码图指示所述初始融合图像中的所述目标对象的一个或多个像素是否聚焦。以半自动标注方式对初始融合图像中目标对象依旧模糊(非聚焦)的部分标注掩码,将其剔除出训练样本集,获取仅包含聚焦像素的真值标注的真实训练数据集,能够快速产生大量与任务相关的训练数据,并且能够通过真实有效的产线数据来进行语义分割模型训练。本申请的方案针对不同的产线只需要搜集部分真实有效数据进行微调训练,就能够批量复制并推广到这些不同的产线,能够覆盖实际需求,将该技术真正落实到各产线的实际应用中。
在一些实施例中,所述方法进一步包括:计算所述预测掩码图与所述经标注掩码图之间的损失率;将计算所得的所述损失率反馈至用于执行所述多分辨率融合的多分辨率融合算法。多分辨率融合算法输出的预测掩码图和/或融合图像与经真值标注的掩码图之间的损失率反应了多分辨率融合算法输出的预测掩码图和/或融合图像与原始输入图像之间的相似性。将该损失率反馈至多分辨率融合算法,与 经真值标注的掩码图一起对多分辨率融合算法输出执行有监督学习训练,在达到训练拟合回归性的同时通过不断的训练和学习提升多分辨率融合算法的生成目标对象全清晰的融合图像方面的准确性。
在一些实施例中,所述方法进一步包括:基于所述损失率或所述经标注掩码图或这两者的组合来更新用于执行所述多分辨率融合的多分辨率融合算法。将计算所得的损失率与经真值标注的掩码图一起反馈至多分辨率融合算法,对多分辨率融合算法输出执行有监督学习训练,在达到训练拟合回归性的同时通过不断的训练和学习提升多分辨率融合算法的生成目标对象全清晰的融合图像方面的准确性。
第二方面,本申请提供了图像的处理系统,包括:采集模块,其被配置成采集包含目标对象的输入图像序列;以及融合模块,其被配置成对所述输入图像序列执行多分辨率融合以生成单个融合图像,其中所述融合图像的像素包括所述输入图像序列中的一个输入图像的对应位置的像素,所述融合图像中包含所述目标对象的每一像素包括所述输入图像序列中所述目标对象的一部分在其中聚焦的一个输入图像的对应位置的像素。
在本申请实施例的技术方案中,学习针对目标对象的每个聚焦像素点的输入图像序列的索引,提取输入图像序列中对应于最清晰的部分以对其执行像素级融合以将同一场景下聚焦区域不同的图像序列融合成单张目标对象全清晰的图像,实现像素级精度的全清晰且保留目标对象细节信息的融合图像,有效提高图像的信息利用率。
在一些实施例中,所述采集模块被进一步配置成基于所述输入图像序列的帧数和所述输入图像序列中的所述目标对象的尺寸来设置用于采集所述输入图像序列的相机的步长。基于目标对象的尺寸和输入图像序列的帧数来设置相机步长能够确保所采集的输入图像序列能够覆盖目标对象的所有聚焦区域,从而保证融合图像中的目标对象的每一像素均包括聚焦部分。
在一些实施例中,所述输入图像序列包含索引,所述融合模块进一步包括:编码器,其被配置成:对所述输入图像序列执行特征提取;对所提取的特征执行多分辨率融合,得到融合后的多分辨率特征;以及解码器,其被配置成:基于融合后的多分辨率特征生成预测掩码图,其中所述预测掩码图的每一像素指示输入图像的索引,所述索引指示所述融合图像的每一像素源自的输入图像。本申请的多分辨率 融合方法从语义分割神经网络的结构入手,提出端到端的基于深度学习的自适应多聚焦融合方案。在本申请中,使图像序列通过模型编码器部分以提取深度特征,对特征进行融合,利用解码器生成融合图像,从而使得深度学习语义分割神经网络能够通过内部卷积学习到每个清晰像素点的相对位置信息(即输入图像序列的索引),通过深度模型减少传统算法对阈值的依赖性,增强融合鲁棒性。
在一些实施例中,所述融合模块被进一步配置成根据所述预测掩码图和所述输入图像序列生成所述融合图像。通过语义分割的方式寻找到多帧图像序列中目标对象最清晰的部分进行融合,可以使得深度学习语义分割神经网络内部卷积学习每个清晰像素点的相对位置信息(即输入图像序列的索引),提取输入图像序列中对应于最清晰的部分以对其执行像素级融合以将同一场景下聚焦区域不同的图像序列融合成单张目标对象全清晰的图像,实现像素级精度的全清晰且保留目标对象细节信息的融合图像,有效提高图像的信息利用率。
在一些实施例中,所述系统进一步包括:初始融合模块,其被配置成将2D融合算法应用于所述输入图像序列以生成初始融合图像;以及标注接收模块,其被配置成接收对所述初始融合图像的真值标注以生成经标注掩码图,其中所述经标注掩码图指示所述初始融合图像中的所述目标对象的一个或多个像素是否聚焦。以半自动标注方式对初始融合图像中目标对象依旧模糊(非聚焦)的部分标注掩码,将其剔除出训练样本集,获取仅包含聚焦像素的真值标注的真实训练数据集,能够快速产生大量与任务相关的训练数据,并且能够通过真实有效的产线数据来进行语义分割模型训练。本申请的方案针对不同的产线只需要搜集部分真实有效数据进行微调训练,就能够批量复制并推广到这些不同的产线,能够覆盖实际需求,将该技术真正落实到各产线的实际应用中。
在一些实施例中,所述系统进一步包括:损失率模块,其被配置成:计算所述预测掩码图与所述经标注掩码图之间的损失率;将计算所得的所述损失率反馈至所述融合模块。多分辨率融合算法输出的预测掩码图和/或融合图像与经真值标注的掩码图之间的损失率反应了多分辨率融合算法输出的预测掩码图和/或融合图像与原始输入图像之间的相似性。将该损失率反馈至多分辨率融合算法,与经真值标注的掩码图一起对多分辨率融合算法输出执行有监督学习训练,在达到训练拟合回归性的同时通过不断的训练和学习提升多分辨率融合算法的生成目标对象全清晰 的融合图像方面的准确性。
在一些实施例中,所述融合模块被进一步配置成基于所述损失率或所述经标注掩码图或这两者的组合来更新所述融合模块。将计算所得的损失率与经真值标注的掩码图一起反馈至多分辨率融合算法,对多分辨率融合算法输出执行有监督学习训练,在达到训练拟合回归性的同时通过不断的训练和学习提升多分辨率融合算法的生成目标对象全清晰的融合图像方面的准确性。
第三方面,本申请提供了一种图像的处理系统,包括:其上存储有计算机可执行指令存储器;以及与所述存储器耦合的处理器,其中所述计算机可执行指令在由所述处理器执行时致使所述系统执行如下操作:采集包含目标对象的输入图像序列;以及对所述输入图像序列执行多分辨率融合以生成单个融合图像,其中所述融合图像的像素包括所述输入图像序列中的一个输入图像的对应位置的像素,所述融合图像中包含所述目标对象的每一像素包括所述输入图像序列中所述目标对象的一部分在其中聚焦的一个输入图像的对应位置的像素。
在本申请实施例的技术方案中,学习针对目标对象的每个聚焦像素点的输入图像序列的索引,提取输入图像序列中对应于最清晰的部分以对其执行像素级融合以将同一场景下聚焦区域不同的图像序列融合成单张目标对象全清晰的图像,实现像素级精度的全清晰且保留目标对象细节信息的融合图像,有效提高图像的信息利用率。
上述说明仅是本申请技术方案的概述,为了能够更清楚了解本申请的技术手段,而可依照说明书的内容予以实施,并且为了让本申请的上述和其它目的、特征和优点能够更明显易懂,以下特举本申请的具体实施方式。
附图说明
通过阅读对下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本申请的限制。而且在全部附图中,用相同的附图标号表示相同的部件。在附图中:
图1是根据本申请的一些实施例的图像的处理方法的流程图;
图2是根据本申请的一些实施例的图像处理系统的功能框图;
图3是根据本申请的一些实施例的融合模块的结构框图;
图4是根据本申请的一些实施例的融合模块的具体实现示意图;
图5是适于实现根据本申请的一些实施例的图像的处理系统的计算机系统的结构框图。
具体实施方式
下面将结合附图对本申请技术方案的实施例进行详细的描述。以下实施例仅用于更加清楚地说明本申请的技术方案,因此只作为示例,而不能以此来限制本申请的保护范围。
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同;本文中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请;本申请的说明书和权利要求书及上述附图说明中的术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。
在本申请实施例的描述中,技术术语“第一”“第二”等仅用于区别不同对象,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量、特定顺序或主次关系。在本申请实施例的描述中,“多个”的含义是两个以上,除非另有明确具体的限定。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
在本申请实施例的描述中,术语“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。
利用计算机进行图像处理在各个领域被广泛应用。图像处理可以被用于提升图像的视觉质量、提取图像中的特定目标的特征、图像的存储和传输、图像序列的融合等。在拍摄目标对象时,往往需要拍摄一系列的焦点不同的图像来捕获目标对象。在此类情形中,将所拍摄的图像序列进行融合以供后续图像处理是合乎需要 的。
在动力锂电池生产过程中,由于工艺及设备原因,缺陷不可避免。贯穿产线的各个环节,检测锂电池的极耳是否存在翻折是至关重要的一环,其检测结果有效性确保了电池出厂的安全性。例如,在通过拍摄生产线上产出的锂电池的图像并且对图像中诸如极耳之类的目标对象执行缺陷检测的情形中,由于摄像机镜头受景深的限制,无法同时聚焦所有极片,因此拍摄的照片中往往部分极片清晰而部分极片模糊。因而,通常无法通过仅拍摄单张照片来获得其中极耳的所有极片都清晰的图像。在实践中,往往通过在同一场景下拍摄多张聚焦区域不同的图像并且将该多张图像融合成一张图像以供后续的缺陷检测。
一些融合图像的方法包括例如利用低通和高通滤波器的反卷积网络来提取源图像的低频和高频信息以融合图像。该方法由于未能充分利用网络中间层信息,因此根据推断的融合特征图与卷积求和而得到的融合图像往往丢失源图像序列中的部分不同清晰聚焦的原始信息。另一些融合图像的方法包括通过对标签图像的不同区域执行高斯模糊处理来作为训练数据。该方法由于训练数据并非来自真实产线,因而难以模拟和覆盖实际需求,实用性较差。
针对上述问题,本申请提供了能够提供目标对象的每一像素均聚焦的融合图像的图像处理技术。本申请的图像处理的方法包括:采集包含目标对象的输入图像序列;以及对所述输入图像序列执行多分辨率融合以生成单个融合图像,其中所述融合图像的像素包括所述输入图像序列中的一个输入图像的对应位置的像素,所述融合图像中包含所述目标对象的每一像素包括所述输入图像序列中所述目标对象的一部分在其中聚焦的一个输入图像的对应位置的像素。
本申请的方案从语义分割神经网络的结构入手,提出端到端的基于深度学习的自适应多聚焦融合方案。在本申请中,使图像序列通过模型编码器部分以提取深度特征,对特征进行融合,利用解码器生成融合图像,从而使得深度学习语义分割神经网络能够通过内部卷积学习到每个清晰像素点的相对位置信息(即输入图像序列的索引),提取输入图像序列中对应于最清晰的部分以对其执行像素级融合以将同一场景下聚焦区域不同的图像序列融合成单张目标对象全清晰的图像,实现像素级精度的全清晰且保留目标对象细节信息的融合图像,有效提高图像的信息利用率,并且通过深度模型减少传统算法对阈值的依赖性,增强融合鲁棒性。
本申请的实施例的技术方案适用于对输入图像序列进行融合并且要求融合图像中的目标对象的每一像素均具有高分辨率的情形,包括但不限于,锂电池中诸如极耳等部件的全清晰融合图像的获取,医学领域中病毒细胞的全清晰融合图像的获取、军事领域中目标设施或点位的全清晰融合图像的获取、以及任何其他适用的场景下对输入图像序列进行融合并且要求融合图像中的目标对象的每一像素均具有高分辨率的情形。
参照图1,其示出了根据本申请的一些实施例的图像的处理方法的流程图,本申请提供了一种图像的处理方法。如图1所示,该方法包括:在步骤105,采集包含目标对象的输入图像序列;以及在步骤110,对所述输入图像序列执行多分辨率融合以生成单个融合图像,其中所述融合图像的像素包括所述输入图像序列中的一个输入图像的对应位置的像素,所述融合图像中包含所述目标对象的每一像素包括所述输入图像序列中所述目标对象的一部分在其中聚焦的一个输入图像的对应位置的像素。
在一些示例中,输入图像序列可包括在同一场景对目标对象的不同部分进行聚焦所拍摄的一系列图像,诸如在同一场景下对极耳的不同极片进行聚焦的图像序列,该图像序列中的每一图像具有对应的索引,诸如图像1、图像2、……、图像k。在一些示例中,对所述输入图像序列执行多分辨率融合以生成融合图像可包括将输入图像序列(诸如图像1、图像2、……、图像k)输入到执行多分辨率融合算法的融合模块中以生成包含单张的融合图像(诸如图像k+1)。在一些示例中,该多分辨率融合算法是可由深度学习语义分割神经网络实现的算法,其学习输入图像序列中目标对象的每一个像素在其中聚焦的图像序列索引,提取对应于该图像序列索引的像素中的值并执行多分辨率像素级融合,从而生成目标对象的每一像素均聚焦的融合图像。例如,该多分辨率融合算法学习输入图像的第i行第j列像素(呈现目标对象的像素)在图像序列1-k之中在图像2中聚焦,输入图像的第i行第j+1列像素在图像序列1-k之中在图像k中聚焦,则可获得针对第i行第j列像素的图像索引值为2,针对第i行第j+1列像素的图像索引值为k,以此类推来获取输入图像中的目标对象的每一像素在其中聚焦的图像序列索引的集合,提取图像序列1-k中每一像素在其中聚焦的输入图像的像素(即针对第i行第j列像素从图像序列1-k中的图像2中提取对应像素的像素值,针对第i行第j+1列像素从图像序列1- k中的图像k中提取对应像素的像素值)并将这些像素融合在一起,从而生成其中目标对象的每一像素是聚焦的融合图像。
在本申请实施例的技术方案中,学习针对目标对象的每个聚焦像素点的输入图像序列的索引,提取输入图像序列中对应于最清晰的部分以对其执行像素级融合以将同一场景下聚焦区域不同的图像序列融合成单张目标对象全清晰的图像,实现像素级精度的全清晰且保留目标对象细节信息的融合图像,有效提高图像的信息利用率。
根据本申请的一些实施例,可选地,步骤105进一步包括:基于所述输入图像序列的帧数和所述输入图像序列中的所述目标对象的尺寸来设置用于采集所述输入图像序列的相机的步长。
在一些示例中,例如在使用CCD相机来采集图像序列的情形中,首先可根据测量(例如,经由机械设备进行物理测量)得到目标对象(诸如极耳)的宽度L,接着可基于连续拍摄输入图像序列的帧数k来设置CCD相机所采用的步长,例如m=L/step。
基于目标对象的尺寸和输入图像序列的帧数来设置相机步长能够确保所采集的输入图像序列能够覆盖目标对象的所有聚焦区域,从而保证融合图像中的目标对象的每一像素均包括聚焦部分。
根据本申请的一些实施例,可选地,进一步参考图2-图3,图2是根据本申请的一些实施例的融合模块的结构框图并且图3是根据本申请的一些实施例的融合模块的具体实现示意图,所述输入图像序列包含索引,步骤110进一步包括:对所述输入图像序列执行特征提取;对所提取的特征执行多分辨率融合,得到融合后的多分辨率特征;基于融合后的多分辨率特征生成预测掩码图,其中所述预测掩码图的每一像素指示输入图像的索引,所述索引指示所述融合图像的每一像素源自的输入图像;以及根据所述预测掩码图和所述输入图像序列生成所述融合图像。
在一些示例中,假设输入图像序列为分辨率为5120*5120*1的灰度图像序列(包括图像1、图像2、……、图像k),对所述输入图像序列执行特征提取可包括在将每一输入图像分别输入到用于执行多分辨率融合算法的融合模块中的编码器,以得到该图像的多分辨率特征,如图2所示。在一些示例中,该编码器的基本结构可包括卷积(convolution)层、批量归一化(batch normalization)层、以及非 线性激活(rectified linear unit,RLU)层,如图3所示。在一些示例中,对所提取的特征执行多分辨率融合可包括将输入图像序列1-k的每一图像的多分辨率特征输入用于执行多分辨率融合算法的融合模块中编码器模块中的融合层以对其执行拼接式(concatenation,或concat)融合,如图2所示。在一些示例中,基于融合后的多分辨率特征生成预测掩码图可包括将融合后的多分辨率特征输入用于执行多分辨率融合算法的融合模块中的解码器模块以输出预测掩码图,其中该预测掩码图的每一像素指示该像素在其中聚焦的输入图像的索引,例如预测掩码图中的每个像素点值为0、1、……k(0表示背景(非目标对象),而1、2、……k表示图像序列索引),如图2所示。在一些示例中,该解码器的基本结构可包括卷积层、批量归一化层、非线性激活层、双线性上采样(bilinear upsample)层,如图3所示。在一些示例中,根据所述预测掩码图和所述输入图像序列生成所述融合图像可包括根据其中每个像素点指示目标对象的每一像素在其中聚焦的图像序列索引和输入图像序列1-k,可得到目标对象的每一像素均聚焦的融合图像,例如,融合图像的第i行第j列像素包括来自图2的对应像素的值,融合图像的第i行第j+1列像素包括来自图k的对应像素的值,以此类推。
通过语义分割的方式寻找到多帧图像序列中目标对象最清晰的部分进行融合,可以使得深度学习语义分割神经网络内部卷积学习每个清晰像素点的相对位置信息(即输入图像序列的索引),提取输入图像序列中对应于最清晰的部分以对其执行像素级融合以将同一场景下聚焦区域不同的图像序列融合成单张目标对象全清晰的图像,实现像素级精度的全清晰且保留目标对象细节信息的融合图像,有效提高图像的信息利用率。
根据本申请的一些实施例,可选地,所述方法进一步包括:将2D融合算法应用于所述输入图像序列以生成初始融合图像;以及接收对所述初始融合图像的真值标注以生成经标注掩码图,其中所述经标注掩码图指示所述初始融合图像中的所述目标对象的一个或多个像素是否聚焦。
在一些示例中,将2D融合算法应用于所述输入图像序列以生成初始融合图像可包括利用现有技术中的图像融合算法来获得输入图像序列(诸如图像1、图像2、……图像k)的初始融合图像(诸如图像k+1’)。在一些示例中,接收对所述初始融合图像的真值标注以生成经标注掩码图可包括接收对初始融合图像(诸如 图像k+1’)的真(ground truth,GT)值标注以生成经标注掩码器,其中该经标注掩码图指示初始融合图像中的所述目标对象的每一像素是否聚焦。在一些示例中,经标注掩码图的每一像素的值可为0或1,其中1指示该像素聚焦而0指示该像素不聚焦。在一些示例中,该经标注掩码图将不聚焦的目标对象的一个或多个像素通过掩码方式剔除出数据样本,从而使得被反馈至用于执行多分辨率融合的多分辨率融合算法的数据仅包含聚焦像素的真值标注。
以半自动标注方式对初始融合图像中目标对象依旧模糊(非聚焦)的部分标注掩码,将其剔除出训练样本集,获取仅包含聚焦像素的真值标注的真实训练数据集,能够快速产生大量与任务相关的训练数据,并且能够通过真实有效的产线数据来进行语义分割模型训练。本申请的方案针对不同的产线只需要搜集部分真实有效数据进行微调训练,就能够批量复制并推广到这些不同的产线,能够覆盖实际需求,将该技术真正落实到各产线的实际应用中。
根据本申请的一些实施例,可选地,所述方法进一步包括:计算所述预测掩码图与所述经标注掩码图之间的损失率;将计算所得的所述损失率反馈至用于执行所述多分辨率融合的多分辨率融合算法。
在一些示例中,计算所述预测掩码图与所述经标注掩码图之间的损失率可包括如下利用MSE(平均平方误差损失函数)和SSIM(图像质量损失函数)中的一者或多者:
Figure PCTCN2021136054-appb-000001
其中
Figure PCTCN2021136054-appb-000002
表示图像中第i行第j列像素值对应的真实标签值(经标注掩码图中的GT值),
Figure PCTCN2021136054-appb-000003
表示图像中第i行第j列像素值对应的预测标签值(预测掩码图中的值),m*n表示图像分辨率。
Figure PCTCN2021136054-appb-000004
其中u表示图像块(patch)的所有像素的平均值,σ表示图像块内的像素方差。在在一些示例中,校正系数C 1=(K 1×R) 2,C 2=(K 2×R) 2,其中R为根据图像数据类型所确定的动态范围,K为加权因子。在一些示例中,在图像数据类型为unit8的情形中,R的值为0到255,在数据图像类型为浮点的情形中,R的值为-1到1。在一些示例中,K 1和K 2的值可根据试探法得出。在一些示例中,K 1可取 0.01,而K 2可取0.03。在一些示例中,图像块(patch)的选择可利用划窗方式来实现,诸如采用11×11的划窗大小来选择图像块,只要划窗边长为奇数即可(保证存在中心像素)。SSIM关注图像三个方面的相似性:图像照明度(诸如图像块均值、灰度值)、图像对比度(诸如图像块方差)以及图像结构比(诸如归一化后的像素向量)。在一些示例中,可以利用MSE和SSIM损失函数中的任何一者或两者来衡量融合图像(预测掩码图)与输入图像序列(经标注掩码图)的相似性,从而达成训练拟合回归性。
多分辨率融合算法输出的预测掩码图和/或融合图像与经真值标注的掩码图之间的损失率反应了多分辨率融合算法输出的预测掩码图和/或融合图像与原始输入图像之间的相似性。将该损失率反馈至多分辨率融合算法,与经真值标注的掩码图一起对多分辨率融合算法输出执行有监督学习训练,在达到训练拟合回归性的同时通过不断的训练和学习提升多分辨率融合算法的生成目标对象全清晰的融合图像方面的准确性。
根据本申请的一些实施例,可选地,所述方法进一步包括:基于所述损失率或所述经标注掩码图或这两者的组合来更新用于执行所述多分辨率融合的多分辨率融合算法。
将计算所得的损失率与经真值标注的掩码图一起反馈至多分辨率融合算法,对多分辨率融合算法输出执行有监督学习训练,在达到训练拟合回归性的同时通过不断的训练和学习提升多分辨率融合算法的生成目标对象全清晰的融合图像方面的准确性。
根据本申请的一些实施例,参考图1-图3,本申请提供了一种图像的处理方法,包括:采集包含极耳的输入图像序列,所述输入图像序列包含索引1……k,其中根据测量得到的极耳的宽度L和连续拍摄输入图像序列的帧数k来设置用于采集输入图像序列的CCD相机所采用的步长,即m=L/step;将2D融合算法应用于所述输入图像序列以生成初始融合图像;接收对所述初始融合图像的真值标注以生成经标注掩码图,其中所述经标注掩码图指示所述初始融合图像中的极耳的一个或多个像素是否聚焦;对所述输入图像序列执行深度特征提取;对所提取的深度特征执行多分辨率融合,得到融合后的多分辨率特征;基于融合后的多分辨率特征生成预测掩码图,其中所述预测掩码图的每一像素指示输入图像的索引,所述索引指示 所述融合图像的每一像素源自的输入图像;根据所述预测掩码图和所述输入图像序列生成单个融合图像,其中所述融合图像的像素包括所述输入图像序列中的一个输入图像的对应位置的像素,所述融合图像中包含极耳的每一像素包括所述输入图像序列中极耳的一部分在其中聚焦的一个输入图像的对应位置的像素;利用MSE和SSIM损失函数来计算所述预测掩码图与所述经标注掩码图之间的损失率,其中
Figure PCTCN2021136054-appb-000005
其中
Figure PCTCN2021136054-appb-000006
表示图像中第i行第j列像素值对应的真实标签值(经标注掩码图中的GT值),
Figure PCTCN2021136054-appb-000007
表示图像中第i行第j列像素值对应的预测标签值(预测掩码图中的值),m*n表示图像分辨率,
Figure PCTCN2021136054-appb-000008
其中以11×11的划窗大小来选择图像块,u表示图像块(patch)的所有像素的平均值,σ表示图像块内的像素方差,C 1=(K 1×R) 2,C 2=(K 2×R) 2,R的值为0到255,K 1=0.01,而K 2=0.03;将计算所得的所述损失率反馈至用于执行所述多分辨率融合的多分辨率融合算法;以及基于所述损失率或所述经标注掩码图或这两者的组合来更新用于执行所述多分辨率融合的多分辨率融合算法。
参照图4,其示出了根据本申请的一些实施例的图像的处理系统的功能框图,本申请提供了一种图像的处理系统。在图4中,加粗矩形框表示被配置成执行参考上文所描述的各操作的逻辑模块,而旗帜形框表示在前的逻辑模块的输出。在图4中,箭头表示参考上文所描述的各操作的逻辑顺序和方向。如图4所示,该系统包括:采集模块405,其被配置成采集包含目标对象的输入图像序列;以及融合模块410,其被配置成对所述输入图像序列执行多分辨率融合以生成单个融合图像,其中所述融合图像的像素包括所述输入图像序列中的一个输入图像的对应位置的像素,所述融合图像中包含所述目标对象的每一像素包括所述输入图像序列中所述目标对象的一部分在其中聚焦的一个输入图像的对应位置的像素。
在本申请实施例的技术方案中,学习针对目标对象的每个聚焦像素点的输入图像序列的索引,提取输入图像序列中对应于最清晰的部分以对其执行像素级融合以将同一场景下聚焦区域不同的图像序列融合成单张目标对象全清晰的图像,实现像素级精度的全清晰且保留目标对象细节信息的融合图像,有效提高图像的信息利用率。
根据本申请的一些实施例,可选地,所述采集模块405被进一步配置成基于所述输入图像序列的帧数和所述输入图像序列中的所述目标对象的尺寸来设置用于采集所述输入图像序列的相机的步长。
基于目标对象的尺寸和输入图像序列的帧数来设置相机步长能够确保所采集的输入图像序列能够覆盖目标对象的所有聚焦区域,从而保证融合图像中的目标对象的每一像素均包括聚焦部分。
根据本申请的一些实施例,可选地,进一步参考图2-图3,图2是根据本申请的一些实施例的融合模块的结构框图并且图3是根据本申请的一些实施例的融合模块的具体实现示意图,所述输入图像序列包含索引,所述融合模块410进一步包括:编码器,其被配置成:对所述输入图像序列执行特征提取;对所提取的特征执行多分辨率融合,得到融合后的多分辨率特征;以及解码器,其被配置成:基于融合后的多分辨率特征生成预测掩码图,其中所述预测掩码图的每一像素指示输入图像的索引,所述索引指示所述融合图像的每一像素源自的输入图像。
通过语义分割的方式寻找到多帧图像序列中目标对象最清晰的部分进行融合,可以使得深度学习语义分割神经网络内部卷积学习每个清晰像素点的相对位置信息(即输入图像序列的索引),提取输入图像序列中对应于最清晰的部分以对其执行像素级融合以将同一场景下聚焦区域不同的图像序列融合成单张目标对象全清晰的图像,实现像素级精度的全清晰且保留目标对象细节信息的融合图像,有效提高图像的信息利用率。
根据本申请的一些实施例,可选地,所述融合模块410被进一步配置成根据所述预测掩码图和所述输入图像序列生成所述融合图像。
通过语义分割的方式寻找到多帧图像序列中目标对象最清晰的部分进行融合,可以使得深度学习语义分割神经网络内部卷积学习每个清晰像素点的相对位置信息(即输入图像序列的索引),提取输入图像序列中对应于最清晰的部分以对其执行像素级融合以将同一场景下聚焦区域不同的图像序列融合成单张目标对象全清晰的图像,实现像素级精度的全清晰且保留目标对象细节信息的融合图像,有效提高图像的信息利用率。
根据本申请的一些实施例,可选地,所述系统进一步包括:初始融合模块415,其被配置成将2D融合算法应用于所述输入图像序列以生成初始融合图像; 以及标注接收模块420,其被配置成接收对所述初始融合图像的真值标注以生成经标注掩码图,其中所述经标注掩码图指示所述初始融合图像中的所述目标对象的一个或多个像素是否聚焦。
以半自动标注方式对初始融合图像中目标对象依旧模糊(非聚焦)的部分标注掩码,将其剔除出训练样本集,获取仅包含聚焦像素的真值标注的真实训练数据集,能够快速产生大量与任务相关的训练数据,并且能够通过真实有效的产线数据来进行语义分割模型训练。本申请的方案针对不同的产线只需要搜集部分真实有效数据进行微调训练,就能够批量复制并推广到这些不同的产线,能够覆盖实际需求,将该技术真正落实到各产线的实际应用中。
根据本申请的一些实施例,可选地,所述系统进一步包括:损失率模块425,其被配置成:计算所述预测掩码图与所述经标注掩码图之间的损失率;将计算所得的所述损失率反馈至所述融合模块。
多分辨率融合算法输出的预测掩码图和/或融合图像与经真值标注的掩码图之间的损失率反应了多分辨率融合算法输出的预测掩码图和/或融合图像与原始输入图像之间的相似性。将该损失率反馈至多分辨率融合算法,与经真值标注的掩码图一起对多分辨率融合算法输出执行有监督学习训练,在达到训练拟合回归性的同时通过不断的训练和学习提升多分辨率融合算法的生成目标对象全清晰的融合图像方面的准确性。
根据本申请的一些实施例,可选地,所述融合模块410被进一步配置成基于所述损失率或所述经标注掩码图或这两者的组合来更新所述融合模块。
将计算所得的损失率与经真值标注的掩码图一起反馈至多分辨率融合算法,对多分辨率融合算法输出执行有监督学习训练,在达到训练拟合回归性的同时通过不断的训练和学习提升多分辨率融合算法的生成目标对象全清晰的融合图像方面的准确性。
根据本申请的一些实施例,参考图2-图4,本申请提供了一种图像的处理系统,包括:
采集模块405,其被配置成采集包含极耳的输入图像序列,所述输入图像序列包含索引1……k,其中根据测量得到的极耳的宽度L和连续拍摄输入图像序列的帧数k来设置用于采集输入图像序列的CCD相机所采用的步长,即m=L/step;
融合模块410,所述融合模块包括:
编码器,其被配置对所述输入图像序列执行深度特征提取;对所提取的深度特征执行多分辨率融合,得到融合后的多分辨率特征;
解码器,其被配置成基于融合后的多分辨率特征生成预测掩码图,其中所述预测掩码图的每一像素指示输入图像的索引,所述索引指示所述融合图像的每一像素源自的输入图像;
所述融合模块410被进一步配置成根据所述预测掩码图和所述输入图像序列生成所述融合图像,其中所述融合图像的像素包括所述输入图像序列中的一个输入图像的对应位置的像素,所述融合图像中包含极耳的每一像素包括所述输入图像序列中极耳的一部分在其中聚焦的一个输入图像的对应位置的像素;以及基于所述损失率或所述经标注掩码图或这两者的组合来更新用于执行所述多分辨率融合的多分辨率融合算法;
初始融合模块415,其被配置成将2D融合算法应用于所述输入图像序列以生成初始融合图像;
标注接收模块420,其被配置成接收对所述初始融合图像的真值标注以生成经标注掩码图,其中所述经标注掩码图指示所述初始融合图像中的极耳的一个或多个像素是否聚焦;
损失率模块425,其被配置成利用MSE和SSIM损失函数来计算所述预测掩码图与所述经标注掩码图之间的损失率,其中
Figure PCTCN2021136054-appb-000009
其中
Figure PCTCN2021136054-appb-000010
表示图像中第i行第j列像素值对应的真实标签值(经标注掩码图中的GT值),
Figure PCTCN2021136054-appb-000011
表示图像中第i行第j列像素值对应的预测标签值(预测掩码图中的值),m*n表示图像分辨率,
Figure PCTCN2021136054-appb-000012
Figure PCTCN2021136054-appb-000013
其中以11×11的划窗大小来选择图像块,u表示图像块(patch)的所有像素的平均值,σ表示图像块内的像素方差,C 1=(K 1×R) 2,C 2=(K 2×R) 2,R的值为0到255,K 1=0.01,而K 2=0.03;将计算所得的所述损失率反馈至所述融合模块。
参照图5,其是适于实现根据本申请的一些实施例的图像的处理系统的计算机系统的结构框图。如图5所示,该系统包括:其上存储有计算机可执行指令存 储器028;以及与所述存储器028耦合的处理器016,其中所述计算机可执行指令在由所述处理器016执行时致使所述系统执行如下操作:采集包含目标对象的输入图像序列;以及对所述输入图像序列执行多分辨率融合以生成单个融合图像,其中所述融合图像的像素包括所述输入图像序列中的一个输入图像的对应位置的像素,所述融合图像中包含所述目标对象的每一像素包括所述输入图像序列中所述目标对象的一部分在其中聚焦的一个输入图像的对应位置的像素。
在一些示例中,图5示出了适于用来实现本发明实施方式的示例性计算机系统012的框图。图5显示的计算机系统012仅仅是一个示例,不应对本发明实施例的功能和使用范围带来任何限制。
如图5所示,计算机系统012以通用计算设备的形式表现。计算机系统012的组件可以包括但不限于:一个或者多个处理器或者处理单元016,系统存储器028,连接不同系统组件(包括系统存储器028和处理单元016)的总线018。
总线018表示几类总线结构中的一种或多种,包括存储器总线或者存储器控制器,外围总线,图形加速端口,处理器或者使用多种总线结构中的任意总线结构的局域总线。举例来说,这些体系结构包括但不限于工业标准体系结构(ISA)总线,微通道体系结构(MAC)总线,增强型ISA总线、视频电子标准协会(VESA)局域总线以及外围组件互连(PCI)总线。
计算机系统012典型地包括多种计算机系统可读介质。这些介质可以是任何能够被计算机系统012访问的可用介质,包括易失性和非易失性介质,可移动的和不可移动的介质。
系统存储器028可以包括易失性存储器形式的计算机系统可读介质,例如随机存取存储器(RAM)030和/或高速缓存存储器032。计算机系统012可以进一步包括其它可移动/不可移动的、易失性/非易失性计算机系统存储介质。仅作为举例,存储系统034可以用于读写不可移动的、非易失性磁介质(图5未显示,通常称为“硬盘驱动器”)。尽管图5中未示出,可以提供用于对可移动非易失性磁盘(例如“软盘”)读写的磁盘驱动器,以及对可移动非易失性光盘(例如CD-ROM、DVD-ROM或者其它光介质)读写的光盘驱动器。在这些情况下,每个驱动器可以通过一个或者多个数据介质接口与总线018相连。存储器028可以包括至少一个程序产品,该程序产品具有一组(例如至少一个)程序模块,这些程序模块被配置 以执行本发明各实施例的功能。
具有一组(至少一个)程序模块042的程序/实用工具040,可以存储在例如存储器028中,这样的程序模块042包括——但不限于——操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块042通常执行本发明所描述的实施例中的功能和/或方法。
计算机系统012也可以与一个或多个外部设备014(例如键盘、指向设备、显示器024等)通信,在本发明中,计算机系统012与外部雷达设备进行通信,还可与一个或者多个使得用户能与该计算机系统012交互的设备通信,和/或与使得该计算机系统012能与一个或多个其它计算设备进行通信的任何设备(例如网卡,调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口022进行。并且,计算机系统012还可以通过网络适配器020与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器020通过总线018与计算机系统012的其它模块通信。应当明白,尽管图7中未示出,可以结合计算机系统012使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。
处理单元016通过运行存储在系统存储器028中的程序,从而执行各种功能应用以及数据处理,例如实现本发明实施例所提供的方法流程。
上述的计算机程序可以设置于计算机存储介质中,即该计算机存储介质被编码有计算机程序,该程序在被一个或多个计算机执行时,使得一个或多个计算机执行本发明上述实施例中所示的方法流程和/或装置操作。例如,被上述一个或多个处理器执行本发明实施例所提供的方法流程。
随着时间、技术的发展,介质含义越来越广泛,计算机程序的传播途径不再受限于有形介质,还可以直接从网络下载等。可以采用一个或多个计算机可读的介质的任意组合。
计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的 例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本文件中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。
计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括——但不限于——电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。
计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括——但不限于——无线、电线、光缆、RF等等,或者上述的任意合适的组合。
可以以一种或多种程序设计语言或其组合来编写用于执行本发明操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言-诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言-诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
在本申请实施例的技术方案中,学习针对目标对象的每个聚焦像素点的输入图像序列的索引,提取输入图像序列中对应于最清晰的部分以对其执行像素级融合以将同一场景下聚焦区域不同的图像序列融合成单张目标对象全清晰的图像,实现像素级精度的全清晰且保留目标对象细节信息的融合图像,有效提高图像的信息利用率。
最后应说明的是:以上各实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述各实施例对本申请进行了详细的说明,本领域的普通技术人员 应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围,其均应涵盖在本申请的权利要求和说明书的范围当中。尤其是,只要不存在结构冲突,各个实施例中所提到的各项技术特征均可以任意方式组合起来。本申请并不局限于文中公开的特定实施例,而是包括落入权利要求的范围内的所有技术方案。

Claims (14)

  1. 一种图像的处理方法,包括:
    采集包含目标对象的输入图像序列;以及
    对所述输入图像序列执行多分辨率融合以生成单个融合图像,其中所述融合图像的像素包括所述输入图像序列中的一个输入图像的对应位置的像素,所述融合图像中包含所述目标对象的每一像素包括所述输入图像序列中所述目标对象的一部分在其中聚焦的一个输入图像的对应位置的像素。
  2. 如权利要求1所述的方法,其特征在于,采集输入图像序列进一步包括:
    基于所述输入图像序列的帧数和所述输入图像序列中的所述目标对象的尺寸来设置用于采集所述输入图像序列的相机的步长。
  3. 如权利要求1-2中任一项所述的方法,其特征在于,所述输入图像序列包含索引,对所述输入图像序列执行多分辨率融合以生成融合图像进一步包括:
    对所述输入图像序列执行特征提取;
    对所提取的特征执行多分辨率融合,得到融合后的多分辨率特征;
    基于融合后的多分辨率特征生成预测掩码图,其中所述预测掩码图的每一像素指示输入图像的索引,所述索引指示所述融合图像的每一像素源自的输入图像;以及
    根据所述预测掩码图和所述输入图像序列,生成所述融合图像。
  4. 如权利要求1-3中任一项所述的方法,其特征在于,所述方法进一步包括:
    将2D融合算法应用于所述输入图像序列以生成初始融合图像;以及
    接收对所述初始融合图像的真值标注以生成经标注掩码图,其中所述经标注掩码图指示所述初始融合图像中的所述目标对象的一个或多个像素是否聚焦。
  5. 如权利要求4所述的方法,其特征在于,所述方法进一步包括:
    计算所述预测掩码图与所述经标注掩码图之间的损失率;
    将计算所得的所述损失率反馈至用于执行所述多分辨率融合的多分辨率融合算法。
  6. 如权利要求5所述的方法,其特征在于,所述方法进一步包括:
    基于所述损失率或所述经标注掩码图或这两者的组合来更新用于执行所述多分辨率融合的多分辨率融合算法。
  7. 一种图像的处理系统,包括:
    采集模块,其被配置成采集包含目标对象的输入图像序列;以及
    融合模块,其被配置成对所述输入图像序列执行多分辨率融合以生成单个融合图像,其中所述融合图像的像素包括所述输入图像序列中的一个输入图像的对应位置的像素,所述融合图像中包含所述目标对象的每一像素包括所述输入图像序列中所述目标对象的一部分在其中聚焦的一个输入图像的对应位置的像素。
  8. 如权利要求7所述的系统,其特征在于,所述采集模块被进一步配置成基于所述输入图像序列的帧数和所述输入图像序列中的所述目标对象的尺寸来设置用于采集所述输入图像序列的相机的步长。
  9. 如权利要求7-8中任一项所述的系统,其特征在于,所述输入图像序列包含索引,所述融合模块进一步包括:
    编码器,其被配置成:
    对所述输入图像序列执行特征提取;
    对所提取的特征执行多分辨率融合,得到融合后的多分辨率特征;以及
    解码器,其被配置成:
    基于融合后的多分辨率特征生成预测掩码图,其中所述预测掩码图的每一像素指示输入图像的索引,所述索引指示所述融合图像的每一像素源自的输入图像。
  10. 如权利要求9所述的系统,其特征在于,所述融合模块被进一步配置成 根据所述预测掩码图和所述输入图像序列生成所述融合图像。
  11. 如权利要求7-10中任一项所述的系统,其特征在于,所述系统进一步包括:
    初始融合模块,其被配置成将2D融合算法应用于所述输入图像序列以生成初始融合图像;以及
    标注接收模块,其被配置成接收对所述初始融合图像的真值标注以生成经标注掩码图,其中所述经标注掩码图指示所述初始融合图像中的所述目标对象的一个或多个像素是否聚焦。
  12. 如权利要求11所述的系统,其特征在于,所述系统进一步包括:
    损失率模块,其被配置成:
    计算所述预测掩码图与所述经标注掩码图之间的损失率;
    将计算所得的所述损失率反馈至所述融合模块。
  13. 如权利要求12所述的系统,其特征在于,所述融合模块被进一步配置成基于所述损失率或所述经标注掩码图或这两者的组合来更新用于执行所述多分辨率融合的多分辨率融合算法。
  14. 一种图像的处理系统,包括:
    其上存储有计算机可执行指令存储器;以及
    与所述存储器耦合的处理器,其中所述计算机可执行指令在由所述处理器执行时致使所述系统执行如下操作:
    采集包含目标对象的输入图像序列;以及
    对所述输入图像序列执行多分辨率融合以生成单个融合图像,其中所述融合图像的像素包括所述输入图像序列中的一个输入图像的对应位置的像素,所述融合图像中包含所述目标对象的每一像素包括所述输入图像序列中所述目标对象的一部分在其中聚焦的一个输入图像的对应位置的像素。
PCT/CN2021/136054 2021-12-07 2021-12-07 图像的处理方法和系统 WO2023102724A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CN202180078455.3A CN116848547A (zh) 2021-12-07 2021-12-07 图像的处理方法和系统
EP21960096.2A EP4220543A4 (en) 2021-12-07 2021-12-07 IMAGE PROCESSING METHOD AND SYSTEM
PCT/CN2021/136054 WO2023102724A1 (zh) 2021-12-07 2021-12-07 图像的处理方法和系统
US18/140,642 US11948287B2 (en) 2021-12-07 2023-04-28 Image processing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/136054 WO2023102724A1 (zh) 2021-12-07 2021-12-07 图像的处理方法和系统

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/140,642 Continuation US11948287B2 (en) 2021-12-07 2023-04-28 Image processing method and system

Publications (1)

Publication Number Publication Date
WO2023102724A1 true WO2023102724A1 (zh) 2023-06-15

Family

ID=86729496

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/136054 WO2023102724A1 (zh) 2021-12-07 2021-12-07 图像的处理方法和系统

Country Status (4)

Country Link
US (1) US11948287B2 (zh)
EP (1) EP4220543A4 (zh)
CN (1) CN116848547A (zh)
WO (1) WO2023102724A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117078677B (zh) * 2023-10-16 2024-01-30 江西天鑫冶金装备技术有限公司 一种用于始极片的缺陷检测方法及系统

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103308452A (zh) * 2013-05-27 2013-09-18 中国科学院自动化研究所 一种基于景深融合的光学投影断层成像图像获取方法
CN104182952A (zh) * 2014-08-19 2014-12-03 中国科学院西安光学精密机械研究所 多聚焦序列图像融合方法
CN110334779A (zh) * 2019-07-16 2019-10-15 大连海事大学 一种基于PSPNet细节提取的多聚焦图像融合方法
CN110533623A (zh) * 2019-09-06 2019-12-03 兰州交通大学 一种基于监督学习的全卷积神经网络多聚焦图像融合方法
CN112241940A (zh) * 2020-09-28 2021-01-19 北京科技大学 一种多张多聚焦图像融合方法及装置
US20210065336A1 (en) * 2019-08-29 2021-03-04 Institut Mines Telecom Method for generating a reduced-blur digital image
CN113012174A (zh) * 2021-04-26 2021-06-22 中国科学院苏州生物医学工程技术研究所 一种图像融合方法、系统及设备

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103308452A (zh) * 2013-05-27 2013-09-18 中国科学院自动化研究所 一种基于景深融合的光学投影断层成像图像获取方法
CN104182952A (zh) * 2014-08-19 2014-12-03 中国科学院西安光学精密机械研究所 多聚焦序列图像融合方法
CN110334779A (zh) * 2019-07-16 2019-10-15 大连海事大学 一种基于PSPNet细节提取的多聚焦图像融合方法
US20210065336A1 (en) * 2019-08-29 2021-03-04 Institut Mines Telecom Method for generating a reduced-blur digital image
CN110533623A (zh) * 2019-09-06 2019-12-03 兰州交通大学 一种基于监督学习的全卷积神经网络多聚焦图像融合方法
CN112241940A (zh) * 2020-09-28 2021-01-19 北京科技大学 一种多张多聚焦图像融合方法及装置
CN113012174A (zh) * 2021-04-26 2021-06-22 中国科学院苏州生物医学工程技术研究所 一种图像融合方法、系统及设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4220543A4 *

Also Published As

Publication number Publication date
EP4220543A1 (en) 2023-08-02
US11948287B2 (en) 2024-04-02
EP4220543A4 (en) 2024-01-24
CN116848547A (zh) 2023-10-03
US20230267586A1 (en) 2023-08-24

Similar Documents

Publication Publication Date Title
CN108710885B (zh) 目标对象的检测方法和装置
Fan et al. Dual refinement underwater object detection network
WO2018196396A1 (zh) 基于一致性约束特征学习的行人再识别方法
JP7265034B2 (ja) 人体検出用の方法及び装置
US9400939B2 (en) System and method for relating corresponding points in images with different viewing angles
Ma et al. Stage-wise salient object detection in 360 omnidirectional image via object-level semantical saliency ranking
Azagra et al. Endomapper dataset of complete calibrated endoscopy procedures
CN112862877B (zh) 用于训练图像处理网络和图像处理的方法和装置
Liao et al. Model-free distortion rectification framework bridged by distortion distribution map
WO2019214321A1 (zh) 车辆损伤识别的处理方法、处理设备、客户端及服务器
CN113724135A (zh) 图像拼接方法、装置、设备及存储介质
CN111382647B (zh) 一种图片处理方法、装置、设备及存储介质
WO2021169642A1 (zh) 基于视频的眼球转向确定方法与系统
Yun et al. Panoramic vision transformer for saliency detection in 360∘ videos
US11967125B2 (en) Image processing method and system
WO2023102724A1 (zh) 图像的处理方法和系统
CN108229281B (zh) 神经网络的生成方法和人脸检测方法、装置及电子设备
US11699216B2 (en) Automatic fisheye camera calibration for video analytics
CN116664694A (zh) 图像亮度获取模型的训练方法、图像获取方法及移动终端
Ji et al. End to end multi-scale convolutional neural network for crowd counting
CN114863124A (zh) 模型训练方法、息肉检测方法、相应装置、介质及设备
CN111062479B (zh) 基于神经网络的模型快速升级方法及装置
CN110766611A (zh) 图像处理方法、装置、存储介质及电子设备
CN106296568A (zh) 一种镜头类型的确定方法、装置及客户端
US20230386063A1 (en) Method for generating depth in images, electronic device, and non-transitory storage medium

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2021960096

Country of ref document: EP

Effective date: 20230427

WWE Wipo information: entry into national phase

Ref document number: 202180078455.3

Country of ref document: CN