WO2024021413A1 - 一种结合超像素和多尺度分层特征识别的图像分割方法 - Google Patents

一种结合超像素和多尺度分层特征识别的图像分割方法 Download PDF

Info

Publication number
WO2024021413A1
WO2024021413A1 PCT/CN2022/135319 CN2022135319W WO2024021413A1 WO 2024021413 A1 WO2024021413 A1 WO 2024021413A1 CN 2022135319 W CN2022135319 W CN 2022135319W WO 2024021413 A1 WO2024021413 A1 WO 2024021413A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
scale
pixel
superpixel
neural network
Prior art date
Application number
PCT/CN2022/135319
Other languages
English (en)
French (fr)
Inventor
张登银
倪文晔
金小飞
杜群荐
Original Assignee
南京邮电大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 南京邮电大学 filed Critical 南京邮电大学
Priority to US18/151,909 priority Critical patent/US11847811B1/en
Publication of WO2024021413A1 publication Critical patent/WO2024021413A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/54Extraction of image or video features relating to texture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • the invention relates to an image segmentation method that combines superpixel and multi-scale hierarchical feature recognition, and belongs to the technical field of image processing.
  • Image recognition is based on template matching and evolved from humans' own visual recognition.
  • the method of human visual recognition is to compare the target object in front of you with the content in your mind, and finally define the target object in front of you.
  • the principle of image recognition is also the same.
  • image recognition plays a vital role, such as unmanned driving, face recognition, missile tracking and other fields, which are closely related to image recognition.
  • image recognition technology is developing faster and faster. So far, it mainly includes statistical recognition, fuzzy set recognition, neural network recognition, template matching recognition, structure recognition, and support vector machine recognition. and other methods.
  • image segmentation saves a lot of computing resources. It divides the image into different areas according to certain parameters and extracts effective areas from them. Segmentation methods based on threshold, region, cluster analysis, and image segmentation methods by introducing superpixels are all commonly used image segmentation methods today.
  • Image segmentation is a crucial part of image recognition and is the prerequisite for image understanding and analysis. If reasonable and effective segmentation methods are selected according to different usage scenarios, the time for image recognition can be greatly reduced, thus saving a lot of computing resources. For example, in the field of autonomous driving, important traffic information such as lane lines, traffic signs, and intersection traffic lights can be quickly and accurately identified through image segmentation. It can be seen that image segmentation is in a very important position.
  • Image segmentation refers to dividing pixels with similar characteristics in an image into a category, so that each category has different semantics.
  • Image segmentation methods can be roughly divided into two categories: traditional unsupervised methods and supervised deep learning methods.
  • Traditional image segmentation methods are mainly divided into five categories, namely: threshold-based image segmentation method, edge-based image segmentation method, graph-based image segmentation method, cluster-based image segmentation method and region-based image segmentation method.
  • the image segmentation method based on threshold is a classic segmentation method. By setting a threshold, the grayscale histogram of the image is classified. Its essence is to use the consistency of grayscale within the region and the diversity of grayscale between regions. Selecting the threshold will The image is divided into target objects and background. This method is relatively simple to implement, but often shows poor results for more complex image segmentation tasks.
  • Edge-based image segmentation methods segment images by detecting the edges of the image. They usually use the characteristics of different grayscale values of pixels in different areas and large changes in pixel grayscale values at regional boundaries to obtain the edge points of the image, and then separate each Edge points are connected to form a segmented region, but the resulting unclosed boundaries may lead to insufficient segmentation results.
  • the graph-based image segmentation method converts the image segmentation problem into a graph division problem, maps the image to a weighted undirected graph, and then divides the graph into multiple categories by minimizing a given cost function, but its calculation The volume is large and the algorithm is complex.
  • the clustering-based image segmentation method gathers similar pixels into a category and iterates repeatedly until the results converge, but it requires pre-setting the number of clusters.
  • Region-based image segmentation methods are divided into region growing methods and region splitting and merging methods.
  • the region growing method selects a group of seed points as the starting point for growth, and merges the pixels near the seed point that are similar to it into the pixel area where the seed point is located according to the growth criterion to achieve the growth and expansion of the region; the region splitting and merging method uses the similarity criterion to The image is split into regions with different characteristics, and then regions with the same characteristics are merged, and the operation is repeated until no splitting or merging occurs.
  • the region segmentation method can significantly reduce noise interference and has stronger robustness.
  • Supervised deep learning image segmentation method With the development of deep learning, researchers in the computer field have gradually used convolutional neural networks as the main means of image processing, which can make full use of the deep feature information of the image to complete the image segmentation task.
  • the purpose of the present invention is to provide an image segmentation method that combines superpixel and multi-scale hierarchical feature recognition to solve the problem of the need to normalize the data set in the image segmentation method based on the convolutional neural network structure in the prior art, and the different The sizes are mixed together and it is difficult to train, and there is no artificial division of the targets in the image during the early image preprocessing process, resulting in possible misjudgment of pixels at the overlapping edge of the image.
  • An image segmentation method that combines superpixel and multi-scale hierarchical feature recognition, including:
  • the multi-scale hierarchical feature matrix is passed through the multi-layer perceptron to achieve the image pixel category distribution of the image to be segmented;
  • Superpixels improved by combining LBP texture features are used to segment the images to be segmented with good distribution of image pixel categories, and the color mean is used to merge the images to be segmented to achieve the identification and segmentation of image targets.
  • construction method of the multi-scale Gaussian pyramid includes:
  • the first group of the first layer of the original image is used as the second layer of the first group of pyramids.
  • the two-dimensional Gaussian convolution function corresponding to the pixel position (x, y) on the image is:
  • the scale space L (x, y, ⁇ ) of the image is set to the convolution of a Gaussian function G (x, y, ⁇ ) with varying scales and the original image I (x, y), then:
  • each group of L-th layer images is obtained, in which the image area of each group is 1 of the previous group of images. /2, execute repeatedly, and obtain a total of O groups, each group has L layers, and a total of O*L images, forming a multi-scale Gaussian pyramid SIFT.
  • the multi-scale convolutional neural network includes: three identical CNN structural networks.
  • the CNN structural network is composed of three stages.
  • the first stage and the second stage are both composed of a filter bank and a nonlinear activation function tanh. and pooling operations, and the third stage consists of filters.
  • the filter bank of the first stage contains 16 convolution kernels, 8 of which are connected to the Y channel of the input image, and 8 to the U and V channels, converting the original image into a 16-dimensional feature map;
  • the filter bank of the second stage is connected to the result of the maximum pooling operation of the first stage.
  • the filter bank contains 64 convolution kernels, each convolution kernel is connected to any 8 feature maps, and the 16 convolution kernels of the previous stage are connected.
  • the dimensional feature map is converted into a 64-dimensional feature map.
  • the filter bank of the third stage is connected to the maximum pooling operation result of the second stage.
  • the filter bank contains 256 convolution kernels, and each convolution kernel is associated with any
  • the 32 feature maps are connected to convert the 64-dimensional feature map in the previous stage into a 256-dimensional feature map.
  • the convolution kernel size of the filter bank is 7 ⁇ 7, and the pooling operation adopts the 2 ⁇ 2 maximum pooling method.
  • the multi-scale convolutional neural network is set to f n , and its internal parameters are ⁇ n .
  • ⁇ 0 is the initial parameter of the model.
  • W S is the weight matrix of the S-th stage
  • H S-1 is the output of the S-1th stage
  • H 0 X n .
  • multi-scale hierarchical feature matrix formula is:
  • image pixel category distribution includes:
  • the softmax function is used to calculate the standardized prediction probability distribution that pixel i belongs to category a.
  • w is a temporary weight matrix only used for learning features
  • F i represents the multi-scale hierarchical feature expression vector corresponding to the location of pixel i.
  • the superpixel segmentation adopts an improved method based on LBP texture features.
  • the algorithm of this method is specifically implemented as follows:
  • d lab ⁇ (l k -l i ) 2 +(a k -a i ) 2 +(b k -b i ) 2
  • d lab is the color distance
  • d xy is the spatial distance
  • m is the weight coefficient for adjusting the spatial distance
  • S is the distance between seed points. It can be seen that the smaller the D value, the more similar the pixels are;
  • superpixels are assigned to neighboring superpixels in a "Z" shape.
  • the superpixel segmentation method includes:
  • the LBP algorithm is used to compare the gray value of pixels in its neighborhood with the threshold, thereby obtaining a binary code to express local texture features.
  • the LBP value calculation formula is:
  • i c is the gray value of the central pixel
  • i p is the gray value of the pixel in the neighborhood
  • s is a sign function
  • the LBP texture feature is introduced into the SLIC algorithm.
  • the LBP value is added, that is
  • n is the weight coefficient for adjusting texture distance
  • the present invention can obtain accurate over-segmentation including the edge of the image target.
  • the image Before segmenting the image, the image is preprocessed to perform partial segmentation of the image. Enhancement is performed to make the target of the image more obvious, making it easier to distinguish the target of the image from the background. Then the influence of LBP texture features on image segmentation is used, and LBP texture features are added when performing super-pixel segmentation of the image, resulting in The super-pixel blocks will fit more closely to the edges of the target, and then merge areas with similar color features, and finally segment the image.
  • This method can completely and accurately extract the results of the image, can solve the problem of possible misjudgment of pixels at the overlapping edge of the image, and also reduces the requirements for image preprocessing.
  • Figure 1 is a schematic diagram of the overall structure of multi-scale hierarchical feature extraction using the method of the present invention
  • Figure 2 is a schematic diagram of the image recognition process of the method of the present invention.
  • Figure 3 is a schematic structural diagram of the convolutional neural network of the method of the present invention.
  • an image segmentation method combining superpixel and multi-scale hierarchical feature recognition including:
  • Step 1 Input the image to be segmented into a pre-constructed multi-scale Gaussian pyramid to extract multi-scale hierarchical feature recognition and obtain multiple multi-scale images;
  • Step 2 Input multiple multi-scale images into the pre-trained multi-scale convolutional neural network to generate feature maps
  • Step 3 sample the feature map and combine images of the same scale to generate a multi-scale hierarchical feature matrix
  • Step 4 The multi-scale hierarchical feature matrix uses the multi-layer perceptron to realize the image pixel category distribution of the image to be segmented;
  • Step 5 Use the improved superpixel method combined with LBP texture features to segment the image to be segmented with a good distribution of image pixel categories, and use the color mean to merge the images to be segmented to achieve the identification and segmentation of image targets.
  • the construction method of the multi-scale Gaussian pyramid includes:
  • VOC2012 data set as training samples and test samples
  • the first group of the first layer of the original image is used as the second layer of the first group of pyramids.
  • the two-dimensional Gaussian convolution function corresponding to the pixel position (x, y) on the image is:
  • the scale space L (x, y, ⁇ ) of the image is set to the convolution of a Gaussian function G (x, y, ⁇ ) with varying scales and the original image I (x, y), then:
  • each group of L-th layer images is obtained, in which the image area of each group is 1 of the previous group of images. /2, execute repeatedly, and obtain a total of O groups, each group has L layers, and a total of O*L images, forming a multi-scale Gaussian pyramid SIFT.
  • step 1 the original image needs to be converted into the YUV color space before being input to the multi-scale Gaussian pyramid SIFT, and then the multi-scale Gaussian pyramid SIFT is constructed.
  • the multi-scale convolutional neural network includes: three identical CNN structural networks.
  • the CNN structural network consists of three stages.
  • the first stage and the second stage are both composed of filters. It consists of groups, nonlinear activation function tanh and pooling operations, and the third stage consists of filters.
  • the first stage filter bank contains 16 convolution kernels, 8 of which are connected to the Y channel of the input image, and 8 to the U and V channels, converting the original image into a 16-dimensional feature map; the second stage The filter bank is connected to the result of the maximum pooling operation in the first stage.
  • the filter bank contains 64 convolution kernels, and each convolution kernel is connected to any 8 feature maps.
  • the 16-dimensional feature map of the previous stage is Converted to a 64-dimensional feature map
  • the filter bank of the third stage is connected to the maximum pooling operation result of the second stage, where the filter bank contains 256 convolution kernels, each convolution kernel is associated with any 32
  • the feature maps are connected to convert the 64-dimensional feature map in the previous stage into a 256-dimensional feature map.
  • the convolution kernel size of the filter bank is 7 ⁇ 7, and the pooling operation adopts the 2 ⁇ 2 maximum pooling method.
  • the data of adjacent areas of each image in the multi-scale pyramid are zero-meaned and normalized.
  • the multi-scale convolutional neural network is set to f n , and its internal parameters are ⁇ n .
  • ⁇ 0 is the initial parameter of the model.
  • W S is the weight matrix of the S-th stage
  • H S-1 is the output of the S-1th stage
  • H 0 X n .
  • the output feature map of the multi-scale convolutional neural network model is upsampled, and images of the same scale are combined together to generate an N-dimensional feature matrix F, that is
  • the image pixel category distribution includes:
  • the softmax function is used to calculate the standardized prediction probability distribution that pixel i belongs to category a.
  • w is a temporary weight matrix only used for learning features
  • F i represents the multi-scale hierarchical feature expression vector corresponding to the location of pixel i.
  • the superpixel segmentation adopts an improved method based on LBP texture features.
  • the algorithm of this method is specifically implemented as follows:
  • d lab ⁇ (l k -l i ) 2 +(a k -a i ) 2 +(b k -b i ) 2
  • d lab is the color distance
  • d xy is the spatial distance
  • m is the weight coefficient for adjusting the spatial distance
  • S is the distance between seed points. It can be seen that the smaller the D value, the more similar the pixels are;
  • superpixels are assigned to neighboring superpixels in a "Z" shape.
  • the superpixel segmentation method includes:
  • the LBP algorithm is used to compare the gray value of pixels in its neighborhood with the threshold, thereby obtaining a binary code to express local texture features.
  • the LBP value calculation formula is:
  • i c is the gray value of the central pixel
  • i p is the gray value of the pixel in the neighborhood
  • s is a sign function
  • the LBP texture feature is introduced into the SLIC algorithm.
  • the LBP value is added, that is
  • n is the weight coefficient for adjusting texture distance

Abstract

本发明公开了一种结合超像素和多尺度分层特征识别的图像分割方法,该方法建立在卷积神经网络模型上,以从图像的高斯金字塔中提取多尺度分层特征作为识别依据,再与多层感知器相连接实现图像中各像素的识别,解决现有技术中基于卷积神经网络结构的图像分割方法中需要将数据集归一化,并且不同的尺寸混合在一起难以训练的问题;并且该方法对图像进行超像素分割,结合LBP纹理特征改进的超像素方法对原图像进行分割,使得到的超像素块更贴合目标边缘,再利用颜色均值对原图像进行合并,最终实现图像中各目标的识别,从而解决在前期图像预处理过程中没有人为的对图像中的目标进行划分,导致对于图像处于交叠边缘位置的像素可能出现误判的问题。

Description

一种结合超像素和多尺度分层特征识别的图像分割方法 技术领域
本发明涉及一种结合超像素和多尺度分层特征识别的图像分割方法,属于图像处理技术领域。
背景技术
图像识别是通过模板进行匹配,并且从人类自身的视觉识别中演变而来。人类自身的视觉识别的方法是将自己眼前的目标物体与脑海中记忆的内容作对比,最终对眼前的目标物体进行定义。图像识别的原理也是如此,先对原始图像提取相应的特征,再和需要的目标特征进行对比,最终达到识别的功能。在现代科技中,图像识别扮演着至关重要的角色,例如无人驾驶、人脸识别、导弹跟踪等领域,都与图像识别息息相关。随着时代的不断进步、科学技术的不断发展,图像识别技术发展的越来越快,至今为止,主要有统计识别、模糊集识别、神经网络识别、模板匹配识别、结构识别、支持向量机识别等方法。此外,在某种特殊条件下,还有利用靶标进行辅助识别的方法。
现如今,基本上所有的图像识别技术都离不开图像预处理、图像分别、特征提取、特征匹配识别等步骤。其中,图像分割节约了大量的计算资源,它通过将图像根据某些参数分割成不同的区域,从中提取出有效的区域。基于阈值、区域、聚类分析的分割方法,以及通过引入超像素进行图像分割的方法,都是如今较常用的图像分割方法。
图像分割是图像识别中至关重要的一环,是图像理解和分析的前提,若根据不同的使用场景选择合理有效的分割方法,可以大大减少图像识别的时间,从而节省大量的计算资源。例如在自动驾驶领域,通过图像分割可以快速准确地识别出车道行驶线、交通指示牌、路口交通信号灯等重要的交通信息。由此可见,图像分割处于十分重要的地位。
随着近些年来图像分割相关领域的研究不断地发展,现已经出现了很多十分完善的图像分割技术。图像分割是指将图像中拥有相似特性的像素划分为一个类别,进而使得每个类别具有不同的语义。图像分割方法大致可以分为两类:传统的无监督方法和有监督的深度学习方法。
传统的图像分割方法主要分为五类,分别为:基于阈值的图像分割方法、基于边缘的图像分割方法、基于图的图像分割方法、基于聚类的图像分割方法以及基于区域的图像分割方 法。
基于阈值的图像分割方法是经典的分割方法,通过设定阈值,对图像的灰度直方图进行分类,其本质是利用区域内部灰度的一致性和区域间灰度的多样性,选取阈值将图像划分为目标对象和背景。这种方法实现起来较为简单,但对于较复杂的图像分割任务,往往表现出较差的结果。
基于边缘的图像分割方法通过检测图像的边缘来分割图像,其通常利用不同区域中像素灰度值不同,以及区域边界的像素灰度值变化比较大的特点,得到图像的边缘点,然后将各个边缘点连接起来,从而形成分割区域,但对于生成的未闭合边界可能导致不充分的分割结果。基于图的图像分割方法是将图像的分割问题转换为图的划分问题,将图像映射到加权无向图,再通过最小化给定的代价函数,从而将图划分为多个分类,但其计算量大且算法复杂。基于聚类的图像分割方法是将相似的像素点聚集为一个类别,反复迭代直至结果收敛,但其需要预先设定聚类的簇数。
基于区域的图像分割方法分为区域生长法和区域分裂合并法。区域生长法选择一组种子点作为生长起点,根据生长准则将种子点附近与其相似的像素点归并到种子点所在的像素区域内,实现区域的生长扩张;区域分裂合并法通过相似性准则,将图像分裂为特性不同的区域,再将特性相同的区域进行合并,重复操作直至没有分裂和合并发生。区域分割法可以明显减少噪声的干扰,具有更强的鲁棒性。有监督的深度学习图像分割方法,随着深度学习的发展,计算机领域的研究者逐渐将卷积神经网络作为图像处理的主要手段,可以充分利用图像的深层特征信息,完成图像的分割任务。
发明内容
本发明的目的在于提供一种结合超像素和多尺度分层特征识别的图像分割方法,以解决现有技术中基于卷积神经网络结构的图像分割方法中需要将数据集归一化,并且不同的尺寸混合在一起难以训练,以及在前期图像预处理过程中没有人为的对图像中的目标进行划分,导致对于图像处于交叠边缘位置的像素可能出现误判的问题。
技术方案:为解决上述技术问题,本发明所采用的技术方案是:
一种结合超像素和多尺度分层特征识别的图像分割方法,包括:
将待分割图像输入预先构建的多尺度高斯金字塔中进行提取多尺度分层特征识别,得到多个多尺度图像;
将多个多尺度图像输入到预先训练的多尺度卷积神经网络中,生成特征图,并对特征图进行采样,并将相同尺度的图像组合一起,生成多尺度分层特征矩阵;
将多尺度分层特征矩阵通过多层感知器对待分割图像实现图像像素类别分布;
采用结合LBP纹理特征改进的超像素对图像像素类别分布好的待分割图像进行分割,利用颜色均值对待分割图像合并,实现图像目标的识别与分割。
进一步地,所述多尺度高斯金字塔的构建方法包括:
将原图像的第一组第一层,经过高斯卷积之后,作为第一组金字塔的第二层,图像上的像素位置(x,y)对应的二维高斯卷积函数为:
Figure PCTCN2022135319-appb-000001
其中σ是尺度空间因子(σ=1.6),σ值越大,图像越平滑;m,n为高斯矩阵的维数;M,N为二维图像的维数;
图像的尺度空间L(x,y,σ)设为一个变化尺度的高斯函数G(x,y,σ)与原图像I(x,y)的卷积,则有:
L(x,y,σ)=G(x,y,σ)*I(x,y);由此得到每一组第L层的图像,其中每组图像面积为上组图像的1/2,反复执行,得到一共O组,每组L层,共计O*L个的图像,构成了多尺度高斯金字塔SIFT。
进一步地,所述多尺度卷积神经网络包括:三个相同的CNN结构网络,所述CNN结构网络由三个阶段组成,第一阶段和第二阶段均由滤波器组、非线性激活函数tanh和池化操作构成,第三阶段由滤波器组成。
进一步地,所述第一阶段的滤波器组包含16个卷积核,其中8个与输入图像的Y通道相连,8个与U和V通道相连,将原始图像转换成16维特征图;第二阶段的滤波器组与第一阶段的最大池化操作结果相连接,其中滤波器组包含64个卷积核,每个卷积核与任意的8张特征图相连,将上一阶段的16维特征图转换为64维特征图,所述第三阶段的滤波器组与第二阶段的最大池化操作结果相连接,其中滤波器组包含256个卷积核,每个卷积核与任意的32张特征图相连,将上一阶段的64维特征图转换为256维特征图。
进一步地,所述滤波器组的卷积核大小均为7×7,池化操作采用的是2×2最大池化方法。
进一步地,所述多尺度卷积神经网络设为f n,其内部参数为θ n,则多尺度卷积神经网络由每一个相应尺度图像的卷积神经网络模型组成,所有模型内的参数都为θ n,即θ n=θ 0,n∈{1,2,...,N},式中θ 0是模型的初始参数,在尺度为n的卷积神经网络模型下,对于具有S个阶段的多尺度卷积神经网络f n存在,
f n(X n;θ n)=W SH S-1
式中W S是第S个阶段的权值矩阵,H S-1为第S-1个阶段的输出,且有H 0=X n
进一步地,所述多尺度分层特征矩阵公式为:
F=[f 1,μ(f 2),...,μ(f N)],其中μ是上采样函数。
进一步地,所述图像像素类别分布包括:
在多尺度卷积神经网络模型中加入线性分类器对多尺度分层特征进行学习,从而对图像中的每个像素目标产生正确的分类预测;
Figure PCTCN2022135319-appb-000002
为线性分类器对于像素i所属类别的标准化预测向量;为了计算损失函数,利用softmax函数计算像素i属于类别a的标准化预测概率分布
Figure PCTCN2022135319-appb-000003
Figure PCTCN2022135319-appb-000004
式中w是仅用于学习特征的临时权值矩阵,F i表示像素i所在位置对应的多尺度分层特征表达向量。
进一步地,所述超像素分割采用基于LBP纹理特征改进方法,该方法算法具体实现如下:
初始化种子点,按照设定的超像素个数,在图像内均匀的分配种子点,设图片共有N个像素点,预分割为K个相同尺寸的超像素,每个超像素大小为N/K,相邻种子点的距离,即步长为
Figure PCTCN2022135319-appb-000005
则计算出种子点中心C k=[l k,a k,b k,x k,y k] T
距离度量,在每个种子点周围的邻域内为每个像素点分配类标签,采用欧式距离度量的方法,搜索的范围为2S×2S,则计算出聚类中心的种子点与2S×2S范围内每个像素点之间的距离D为
d lab=√(l k-l i) 2+(a k-a i) 2+(b k-b i) 2
d xy=√(x k-x i) 2+(y k-y i) 2
Figure PCTCN2022135319-appb-000006
其中d lab是颜色距离,d xy是空间距离,m是调节空间距离的权重系数,S是种子点间的距离,可得,D值越小,则像素之间越相似;
继续迭代优化,更新出种子点中心,直到误差收敛;
增强连通性,将超像素按照“Z”型走向分配给邻近的超像素。
进一步地,所述超像素分割方法包括:
采用LBP算法,将其邻域内像素点的灰度值与阈值进行比较,从而得到二进制编码用来表述局部纹理特征,LBP值计算公式为:
Figure PCTCN2022135319-appb-000007
其中i c是中心像素点灰度值,i p是邻域内像素点的灰度值,s是一个符号函数,
Figure PCTCN2022135319-appb-000008
将LBP纹理特征引入SLIC算法,在改进的SLIC算法在初始化种子点时,加入LBP值,即
C k=[l k,a k,b k,x k,y k,LBP k] T
在距离度量步骤加入纹理距离,即
d LBP=√(LBP k-LBP i) 2
Figure PCTCN2022135319-appb-000009
其中n为调节纹理距离的权重系数;
最后,将相邻颜色特征相似的区域进行合并,实现图像的分割。
与现有技术相比,本发明所达到的有益效果:本发明结合超像素方法能够获得包含图像目标边缘的准确过分割,在对图像进行分割之前,先对图像进行预处理,对图像的局部进行增强,使图像的目标变得更为明显,从而让图像的目标与背景更容易区分,再利用LBP纹理特征对图像分割的影响,在对图像进行超像素分割时加入LBP纹理特征,从而产生的超像素块会更加贴合目标的边缘,再对颜色特征相似的区域进行合并,最后分割出图像。该方法能够完整准确的提取出图像的结果,能够解决对于图像处于交叠边缘位置的像素可能出现误判的问题,也减少了图像预处理的要求。
附图说明
图1为本发明方法多尺度分层特征提取的整体结构示意图;
图2为本发明方法图像识别流程示意图;
图3为本发明方法卷积神经网络结构示意图。
具体实施方式
为使本发明实现的技术手段、创作特征、达成目的与功效易于明白了解,下面结合具体实施方式,进一步阐述本发明。
如图1-图3所示,公开了一种结合超像素和多尺度分层特征识别的图像分割方法,包括:
步骤1,将待分割图像输入预先构建的多尺度高斯金字塔中进行提取多尺度分层特征识别,得到多个多尺度图像;
步骤2,将多个多尺度图像输入到预先训练的多尺度卷积神经网络中,生成特征图;
步骤3,对特征图进行采样,并将相同尺度的图像组合一起,生成多尺度分层特征矩阵;
步骤4,多尺度分层特征矩阵通过多层感知器对待分割图片实现图像像素类别分布;
步骤5,采用结合LBP纹理特征改进的超像素方法对图像像素类别分布好的待分割图像进行分割,利用颜色均值对待分割图像合并,实现图像目标的识别与分割。
针对上述该方法进行详细阐述:
所述多尺度高斯金字塔的构建方法包括:
获取VOC2012数据集作为训练样本和测试样本;
将原图像的第一组第一层,经过高斯卷积之后,作为第一组金字塔的第二层,图像上的像素位置(x,y)对应的二维高斯卷积函数为:
Figure PCTCN2022135319-appb-000010
其中σ是尺度空间因子(σ=1.6),σ值越大,图像越平滑;m,n为高斯矩阵的维数;M,N为二维图像的维数;
图像的尺度空间L(x,y,σ)设为一个变化尺度的高斯函数G(x,y,σ)与原图像I(x,y)的卷积,则有:
L(x,y,σ)=G(x,y,σ)*I(x,y);由此得到每一组第L层的图像,其中每组图像面积为上组图像的1/2,反复执行,得到一共O组,每组L层,共计O*L个的图像,构成了多尺度高斯金 字塔SIFT。
在步骤1中,原始图像在输入到多尺度高斯金字塔SIFT前需转换到YUV颜色空间中,再构建多尺度高斯金字塔SIFT。
步骤2中,如图3所示,所述多尺度卷积神经网络包括:三个相同的CNN结构网络,所述CNN结构网络由三个阶段组成,第一阶段和第二阶段均由滤波器组、非线性激活函数tanh和池化操作构成,第三阶段由滤波器组成。
所述第一阶段的滤波器组包含16个卷积核,其中8个与输入图像的Y通道相连,8个与U和V通道相连,将原始图像转换成16维特征图;第二阶段的滤波器组与第一阶段的最大池化操作结果相连接,其中滤波器组包含64个卷积核,每个卷积核与任意的8张特征图相连,将上一阶段的16维特征图转换为64维特征图,所述第三阶段的滤波器组与第二阶段的最大池化操作结果相连接,其中滤波器组包含256个卷积核,每个卷积核与任意的32张特征图相连,将上一阶段的64维特征图转换为256维特征图。
所述滤波器组的卷积核大小均为7×7,池化操作采用的是2×2最大池化方法。
对多尺度金字塔中每一个图像相邻区域的数据都进行零均值化和归一化处理。所述多尺度卷积神经网络设为f n,其内部参数为θ n,则多尺度卷积神经网络由每一个相应尺度图像的卷积神经网络模型组成,所有模型内的参数都为θ n,即θ n=θ 0,n∈{1,2,...,N},式中θ 0是模型的初始参数,在尺度为n的卷积神经网络模型下,对于具有S个阶段的多尺度卷积神经网络f n存在,
f n(X n;θ n)=W SH S-1
式中W S是第S个阶段的权值矩阵,H S-1为第S-1个阶段的输出,且有H 0=X n
最终,对多尺度卷积神经网络模型的输出特征图进行上采样,并且将相同尺度的图像组合在一起,从而生成N维特征矩阵F,即
F=[f 1,μ(f 2),...,μ(f N)],其中μ是上采样函数。
所述图像像素类别分布包括:
在多尺度卷积神经网络模型中加入线性分类器对多尺度分层特征进行学习,从而对图像中的每个像素目标产生正确的分类预测;
Figure PCTCN2022135319-appb-000011
为线性分类器对于像素i所属类别的标准化预测向量;为了计算损失函数,利用softmax函数计算像素i属于类别a的标准化预测概率分布
Figure PCTCN2022135319-appb-000012
Figure PCTCN2022135319-appb-000013
式中w是仅用于学习特征的临时权值矩阵,F i表示像素i所在位置对应的多尺度分层特征表达向量。
所述超像素分割采用基于LBP纹理特征改进方法,该方法算法具体实现如下:
初始化种子点,按照设定的超像素个数,在图像内均匀的分配种子点,设图片共有N个像素点,预分割为K个相同尺寸的超像素,每个超像素大小为N/K,相邻种子点的距离,即步长为
Figure PCTCN2022135319-appb-000014
则计算出种子点中心C k=[l k,a k,b k,x k,y k] T
距离度量,在每个种子点周围的邻域内为每个像素点分配类标签,采用欧式距离度量的方法,搜索的范围为2S×2S,则计算出聚类中心的种子点与2S×2S范围内每个像素点之间的距离D为
d lab=√(l k-l i) 2+(a k-a i) 2+(b k-b i) 2
d xy=√(x k-x i) 2+*(y k-y i) 2
Figure PCTCN2022135319-appb-000015
其中d lab是颜色距离,d xy是空间距离,m是调节空间距离的权重系数,S是种子点间的距离,可得,D值越小,则像素之间越相似;
继续迭代优化,更新出种子点中心,直到误差收敛;
增强连通性,将超像素按照“Z”型走向分配给邻近的超像素。
所述超像素分割方法包括:
采用LBP算法,将其邻域内像素点的灰度值与阈值进行比较,从而得到二进制编码用来表述局部纹理特征,LBP值计算公式为:
Figure PCTCN2022135319-appb-000016
其中i c是中心像素点灰度值,i p是邻域内像素点的灰度值,s是一个符号函数,
Figure PCTCN2022135319-appb-000017
将LBP纹理特征引入SLIC算法,在改进的SLIC算法在初始化种子点时,加入LBP值,即
C k=[l k,a k,b k,x k,y k,LBP k] T
在距离度量步骤加入纹理距离,即
d LBP=√(LBP k-LBP i) 2
Figure PCTCN2022135319-appb-000018
其中n为调节纹理距离的权重系数;
最后,将相邻颜色特征相似的区域进行合并,实现图像的分割。
以上所述仅是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明技术原理的前提下,还可以做出若干改进和变形,这些改进和变形也应视为本发明的保护范围。

Claims (10)

  1. 一种结合超像素和多尺度分层特征识别的图像分割方法,其特征在于,包括:
    将待分割图像输入预先构建的多尺度高斯金字塔中进行提取多尺度分层特征识别,得到多个多尺度图像;
    将多个多尺度图像输入到预先训练的多尺度卷积神经网络中,生成特征图,并对特征图进行采样,并将相同尺度的图像组合一起,生成多尺度分层特征矩阵;
    将多尺度分层特征矩阵通过多层感知器对待分割图像实现图像像素类别分布;
    采用结合LBP纹理特征改进的超像素对图像像素类别分布好的待分割图像进行分割,利用颜色均值对待分割图像合并,实现图像目标的识别与分割。
  2. 根据权利要求1所述的结合超像素和多尺度分层特征识别的图像分割方法,其特征在于,所述多尺度高斯金字塔的构建方法包括:
    将原图像的第一组第一层,经过高斯卷积之后,作为第一组金字塔的第二层,图像上的像素位置(x,y)对应的二维高斯卷积函数为:
    Figure PCTCN2022135319-appb-100001
    其中σ是尺度空间因子(σ=1.6),σ值越大,图像越平滑;m,n为高斯矩阵的维数;M,N为二维图像的维数;
    图像的尺度空间L(x,y,σ)设为一个变化尺度的高斯函数G(x,y,σ)与原图像I(x,y)的卷积,则有:
    L(x,y,σ)=G(x,y,σ)*I(x,y);由此得到每一组第L层的图像,其中每组图像面积为上组图像的1/2,反复执行,得到一共O组,每组L层,共计O*L个的图像,构成了多尺度高斯金字塔SIFT。
  3. 根据权利要求1所述的结合超像素和多尺度分层特征识别的图像分割方法,其特征在于,所述多尺度卷积神经网络包括:三个相同的CNN结构网络,所述CNN结构网络由三个阶段组成,第一阶段和第二阶段均由滤波器组、非线性激活函数tanh和池化操作构成,第三阶段由滤波器组成。
  4. 根据权利要求3所述的结合超像素和多尺度分层特征识别的图像分割方法,其特征在于,所述第一阶段的滤波器组包含16个卷积核,其中8个与输入图像的Y通道相连,8个与U和V通道相连,将原始图像转换成16维特征图;第二阶段的滤波器组与第一阶段的最大池化操作结果相连接,其中滤波器组包含64个卷积核,每个卷积核与任意的8张特征图相连,将上一阶段的16维特征图转换为64维特征图,所述第三阶段的滤波器组与第二阶段的最大池化操作结果相连接,其中滤波器组包含256个卷积核,每个卷积核与任意的32张特征图相连,将上一阶段的64维特征图转换为256维特征图。
  5. 根据权利要求4所述的结合超像素和多尺度分层特征识别的图像分割方法,其特征在于,所述滤波器组的卷积核大小均为7×7,池化操作采用的是2×2最大池化方法。
  6. 根据权利要求3所述的结合超像素和多尺度分层特征识别的图像分割方法,其特征在于,所述多尺度卷积神经网络设为f n,其内部参数为θ n,则多尺度卷积神经网络由每一个相应尺度图像的卷积神经网络模型组成,所有模型内的参数都为θ n,即θ n=θ 0,n∈{1,2,...,N},式中θ 0是模型的初始参数,在尺度为n的卷积神经网络模型下,对于具有S个阶段的多尺度卷积神经网络f n存在,
    f n(X n;θ n)=W SH S-1
    式中W S是第S个阶段的权值矩阵,H S-1为第S-1个阶段的输出,且有H 0=X n
  7. 根据权利要求6所述的结合超像素和多尺度分层特征识别的图像分割方法,其特征在于,所述多尺度分层特征矩阵公式为:
    F=[f 1,μ(f 2),...,μ(f N)],其中μ是上采样函数。
  8. 根据权利要求1所述的结合超像素和多尺度分层特征识别的图像分割方法,其特征在于,所述图像像素类别分布包括:
    在多尺度卷积神经网络模型中加入线性分类器对多尺度分层特征进行学习,从而对图像中的每个像素目标产生正确的分类预测;
    Figure PCTCN2022135319-appb-100002
    为线性分类器对于像素i所属类别的标准化预测向量;为了计算损失函数,利用softmax函数计算像素i属于类别a的标准化预测概率分布
    Figure PCTCN2022135319-appb-100003
    Figure PCTCN2022135319-appb-100004
    式中w是仅用于学习特征的临时权值矩阵,F i表示像素i所在位置对应的多尺度分层特征表达 向量。
  9. 根据权利要求1所述的结合超像素和多尺度分层特征识别的图像分割方法,其特征在于,所述超像素分割采用基于LBP纹理特征改进方法,该方法算法具体实现如下:
    初始化种子点,按照设定的超像素个数,在图像内均匀的分配种子点,设图片共有N个像素点,预分割为K个相同尺寸的超像素,每个超像素大小为N/K,相邻种子点的距离,即步长为
    Figure PCTCN2022135319-appb-100005
    则计算出种子点中心C k=[l k,a k,b k,x k,y k] T
    距离度量,在每个种子点周围的邻域内为每个像素点分配类标签,采用欧式距离度量的方法,搜索的范围为2S×2S,则计算出聚类中心的种子点与2S×2S范围内每个像素点之间的距离D为
    d lab=√(l k-l i) 2+(a k-a i) 2+(b k-b i) 2
    d xy=√(x k-x i) 2+(y k-y i) 2
    Figure PCTCN2022135319-appb-100006
    其中d lab是颜色距离,d xy是空间距离,m是调节空间距离的权重系数,S是种子点间的距离,可得,D值越小,则像素之间越相似;
    继续迭代优化,更新出种子点中心,直到误差收敛;
    增强连通性,将超像素按照“Z”型走向分配给邻近的超像素。
  10. 根据权利要求9所述的结合超像素和多尺度分层特征识别的图像分割方法,其特征在于,所述超像素分割方法包括:
    采用LBP算法,将其邻域内像素点的灰度值与阈值进行比较,从而得到二进制编码用来表述局部纹理特征,LBP值计算公式为:
    Figure PCTCN2022135319-appb-100007
    其中i c是中心像素点灰度值,i p是邻域内像素点的灰度值,s是一个符号函数,
    Figure PCTCN2022135319-appb-100008
    将LBP纹理特征引入SLIC算法,在改进的SLIC算法在初始化种子点时,加入LBP值,即
    C k=[l k,a k,b k,x k,y k,LBP k] T
    在距离度量步骤加入纹理距离,即
    d LBP=√(LBP k-LBP i) 2
    Figure PCTCN2022135319-appb-100009
    其中n为调节纹理距离的权重系数;
    最后,将相邻颜色特征相似的区域进行合并,实现图像的分割。
PCT/CN2022/135319 2022-07-26 2022-11-30 一种结合超像素和多尺度分层特征识别的图像分割方法 WO2024021413A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/151,909 US11847811B1 (en) 2022-07-26 2023-01-09 Image segmentation method combined with superpixel and multi-scale hierarchical feature recognition

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210886251.9A CN115170805A (zh) 2022-07-26 2022-07-26 一种结合超像素和多尺度分层特征识别的图像分割方法
CN202210886251.9 2022-07-26

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/151,909 Continuation US11847811B1 (en) 2022-07-26 2023-01-09 Image segmentation method combined with superpixel and multi-scale hierarchical feature recognition

Publications (1)

Publication Number Publication Date
WO2024021413A1 true WO2024021413A1 (zh) 2024-02-01

Family

ID=83497634

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/135319 WO2024021413A1 (zh) 2022-07-26 2022-11-30 一种结合超像素和多尺度分层特征识别的图像分割方法

Country Status (2)

Country Link
CN (1) CN115170805A (zh)
WO (1) WO2024021413A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115170805A (zh) * 2022-07-26 2022-10-11 南京邮电大学 一种结合超像素和多尺度分层特征识别的图像分割方法
US11847811B1 (en) 2022-07-26 2023-12-19 Nanjing University Of Posts And Telecommunications Image segmentation method combined with superpixel and multi-scale hierarchical feature recognition
CN115578660B (zh) * 2022-11-09 2023-04-07 牧马人(山东)勘察测绘集团有限公司 基于遥感图像的土地地块分割方法
CN116152530B (zh) * 2023-04-21 2023-10-03 青岛尘元科技信息有限公司 图像差异的确定方法和装置、存储介质及电子设备
CN116935286B (zh) * 2023-08-03 2024-01-09 广州城市职业学院 一种短视频识别系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180253622A1 (en) * 2017-03-06 2018-09-06 Honda Motor Co., Ltd. Systems for performing semantic segmentation and methods thereof
CN113240688A (zh) * 2021-06-01 2021-08-10 安徽建筑大学 一种一体化洪涝灾害精准监测预警方法
WO2021188477A1 (en) * 2020-03-16 2021-09-23 University Of Pittsburgh-Of The Commonwealth System Of Higher Education Scalable and high precision context-guided segmentation of histological structures including ducts/glands and lumen, cluster of ducts/glands, and individual nuclei in whole slide images of tissue samples from spatial multi-parameter cellular and sub-cellular imaging platforms
CN113920442A (zh) * 2021-09-29 2022-01-11 中国人民解放军火箭军工程大学 一种结合图结构与卷积神经网络的高光谱分类方法
CN114494696A (zh) * 2022-01-26 2022-05-13 安徽理工大学 一种多尺度煤矸图像快速检测的方法、系统及装置
CN115170805A (zh) * 2022-07-26 2022-10-11 南京邮电大学 一种结合超像素和多尺度分层特征识别的图像分割方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180253622A1 (en) * 2017-03-06 2018-09-06 Honda Motor Co., Ltd. Systems for performing semantic segmentation and methods thereof
WO2021188477A1 (en) * 2020-03-16 2021-09-23 University Of Pittsburgh-Of The Commonwealth System Of Higher Education Scalable and high precision context-guided segmentation of histological structures including ducts/glands and lumen, cluster of ducts/glands, and individual nuclei in whole slide images of tissue samples from spatial multi-parameter cellular and sub-cellular imaging platforms
CN113240688A (zh) * 2021-06-01 2021-08-10 安徽建筑大学 一种一体化洪涝灾害精准监测预警方法
CN113920442A (zh) * 2021-09-29 2022-01-11 中国人民解放军火箭军工程大学 一种结合图结构与卷积神经网络的高光谱分类方法
CN114494696A (zh) * 2022-01-26 2022-05-13 安徽理工大学 一种多尺度煤矸图像快速检测的方法、系统及装置
CN115170805A (zh) * 2022-07-26 2022-10-11 南京邮电大学 一种结合超像素和多尺度分层特征识别的图像分割方法

Also Published As

Publication number Publication date
CN115170805A (zh) 2022-10-11

Similar Documents

Publication Publication Date Title
WO2024021413A1 (zh) 一种结合超像素和多尺度分层特征识别的图像分割方法
Chen et al. Road extraction in remote sensing data: A survey
Chen et al. Superpixel based land cover classification of VHR satellite image combining multi-scale CNN and scale parameter estimation
CN106909902B (zh) 一种基于改进的层次化显著模型的遥感目标检测方法
CN103218831B (zh) 一种基于轮廓约束的视频运动目标分类识别方法
CN110717896B (zh) 基于显著性标签信息传播模型的板带钢表面缺陷检测方法
CN108537239B (zh) 一种图像显著性目标检测的方法
CN110633708A (zh) 一种基于全局模型和局部优化的深度网络显著性检测方法
CN107480620B (zh) 基于异构特征融合的遥感图像自动目标识别方法
CN108009509A (zh) 车辆目标检测方法
Alidoost et al. A CNN-based approach for automatic building detection and recognition of roof types using a single aerial image
CN105321176A (zh) 一种基于分层高阶条件随机场的图像分割方法
CN111899172A (zh) 一种面向遥感应用场景的车辆目标检测方法
CN109598241A (zh) 基于Faster R-CNN的卫星图像海上舰船识别方法
CN108596195B (zh) 一种基于稀疏编码特征提取的场景识别方法
Pan et al. Simplified object-based deep neural network for very high resolution remote sensing image classification
CN108230330B (zh) 一种快速的高速公路路面分割和摄像机定位的方法
CN108427919B (zh) 一种基于形状引导显著性模型的无监督油罐目标检测方法
CN105931241A (zh) 一种自然场景图像的自动标注方法
CN110738672A (zh) 一种基于分层高阶条件随机场的图像分割方法
Hu et al. RGB-D image multi-target detection method based on 3D DSF R-CNN
Zhang et al. Efficient dense-dilation network for pavement cracks detection with large input image size
Zhigang et al. Vehicle target detection based on R-FCN
CN108664968B (zh) 一种基于文本选取模型的无监督文本定位方法
CN114529832A (zh) 一种预设遥感图像重叠阴影分割模型训练方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22952847

Country of ref document: EP

Kind code of ref document: A1