CN103678552A

CN103678552A - Remote-sensing image retrieving method and system based on salient regional features

Info

Publication number: CN103678552A
Application number: CN201310652866.6A
Authority: CN
Inventors: 邵振峰; 周维勋; 王星
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2013-12-05
Filing date: 2013-12-05
Publication date: 2014-03-26

Abstract

The invention discloses a remote sensing image retrieval method and system based on salient region features, comprising the steps of: using a visual attention model to obtain an image saliency map; using an adaptive threshold method to binarize the image saliency map; combining the original image with the corresponding binary image The value-based saliency map is used to perform "mask" operation to obtain the salient areas of the image; the salient points in the salient areas of the image are extracted, and the salient points are clustered based on the features of the salient points to obtain the feature vectors describing the features of the salient areas; finally, according to the preset Similarity metrics for image retrieval. While ensuring the feature extraction efficiency, the invention improves the retrieval precision rate, improves the retrieval results, and conforms to the visual characteristics of human eyes.

Description

Remote Sensing Image Retrieval Method and System Based on Significant Regional Features

技术领域technical field

本发明属于遥感图像处理影像检索技术领域，涉及一种基于显著区域特征的遥感影像检索方法及系统。The invention belongs to the technical field of remote sensing image processing and image retrieval, and relates to a remote sensing image retrieval method and system based on salient region features.

背景技术Background technique

随着遥感技术发展，多源遥感影像数据的获取变得日益简单。海量遥感影像数据在为科学研究提供更多数据选择的同时也带来了诸多亟待解决的难题。一方面，现阶段影像数据的处理和分析能力有限，使得遥感影像数据的使用效率低下；另一方面，遥感影像数据具有空间性、多样性、复杂性等特点，而目前遥感影像数据的组织、管理、浏览和查询的发展程度远远滞后于遥感影像数据的增长速度，使得针对特定应用往往不能快速找到所需的遥感影像。海量遥感影像数据的有效检索方法的缺乏已成为制约遥感影像数据应用的瓶颈，研究高效的遥感影像检索方法势在必行。With the development of remote sensing technology, the acquisition of multi-source remote sensing image data has become increasingly simple. Massive remote sensing image data not only provides more data options for scientific research, but also brings many problems that need to be solved urgently. On the one hand, the current processing and analysis capabilities of image data are limited, which makes the use of remote sensing image data inefficient; on the other hand, remote sensing image data has the characteristics of space, diversity, and complexity. The development of management, browsing and query lags far behind the growth rate of remote sensing image data, making it often impossible to quickly find the required remote sensing image for specific applications. The lack of effective retrieval methods for massive remote sensing image data has become a bottleneck restricting the application of remote sensing image data, and it is imperative to study efficient remote sensing image retrieval methods.

传统的遥感影像检索方法包括基于关键字的检索和基于内容的检索。基于关键字的影像检索技术通过人工标注方式用一系列的关键字对数据库中的影像进行描述，检索时输入关键字，系统就会返回与之相符的影像，该方法虽然检索过程简单且容易理解，但手工标注效率低，且主观性强。基于内容的影像检索技术通过提取影像的低层视觉特征（光谱特征、纹理特征和形状特征）进行检索，该技术虽然一定程度上改善了检索结果，但光谱特征和纹理特征是影像的全局特征，未考虑影像前景和背景的特点，不能很好地描述影像的语义信息；形状特征虽然跟影像上特定目标相关，但是形状特征的提取往往需要对影像进行分割，而分割本身就是计算机视觉领域的一大难题。可以看出，由于语义鸿沟的存在，低层视觉特征不能有效反映影像的本质内容。Traditional remote sensing image retrieval methods include keyword-based retrieval and content-based retrieval. The keyword-based image retrieval technology uses a series of keywords to describe the images in the database through manual annotation. When the keywords are entered during retrieval, the system will return matching images. Although the retrieval process of this method is simple and easy to understand , but manual labeling is inefficient and highly subjective. Content-based image retrieval technology retrieves by extracting low-level visual features (spectral features, texture features and shape features) of images. Although this technology improves the retrieval results to a certain extent, spectral features and texture features are global features of images. Considering the characteristics of the foreground and background of the image, the semantic information of the image cannot be well described; although the shape feature is related to the specific target on the image, the extraction of the shape feature often requires the segmentation of the image, and segmentation itself is a big problem in the field of computer vision. problem. It can be seen that due to the existence of the semantic gap, the low-level visual features cannot effectively reflect the essential content of the image.

根据人眼视觉理论，对于一幅影像人们关注的不是整幅影像的内容而是影像的显著区域，为此人们将视觉注意模型引入影像检索领域,研究利用视觉注意模型计算影像的显著图并提取颜色、纹理、边缘等特征进行检索。相比低层视觉特征，该方法更符合人们的查询意图并且能有效地弥合语义鸿沟，然而显著图是模糊的灰度图像，直接提取显著图的颜色、纹理、边缘等特征十分困难。According to the theory of human vision, what people pay attention to in an image is not the content of the whole image but the salient area of the image. For this reason, people introduce the visual attention model into the field of image retrieval, and study the use of the visual attention model to calculate the saliency map of the image and extract it. Color, texture, edge and other features are retrieved. Compared with low-level visual features, this method is more in line with people's query intention and can effectively bridge the semantic gap. However, the saliency map is a fuzzy grayscale image, and it is very difficult to directly extract the color, texture, edge and other features of the saliency map.

发明内容Contents of the invention

针对现有技术存在的不足，本发明提供了一种能更好反映检索需求的、基于显著区域特征的遥感影像检索方法及系统，本发明通过分析人眼视觉注意特性，从复杂的遥感影像中提取出显著区域，通过提取显著区域特征实现遥感影像检索。Aiming at the deficiencies in the prior art, the present invention provides a remote sensing image retrieval method and system based on salient region features that can better reflect the retrieval requirements. Extract salient areas, and realize remote sensing image retrieval by extracting salient area features.

本发明的技术方案如下：Technical scheme of the present invention is as follows:

一、基于显著区域特征的遥感影像检索方法，包括步骤：1. A remote sensing image retrieval method based on salient regional features, including steps:

步骤1，利用视觉注意模型获取影像的显著图；Step 1, using the visual attention model to obtain the saliency map of the image;

步骤2，将步骤1所得显著图转化为对应的二值化显著图；Step 2, converting the saliency map obtained in step 1 into a corresponding binarized saliency map;

步骤3，基于原始影像和步骤2所得二值化显著图获取影像的显著区域；Step 3, based on the original image and the binarized saliency map obtained in step 2, the salient area of the image is obtained;

步骤4，提取步骤3所获影像显著区域的显著点，基于显著点特征，采用聚类方法对显著点进行聚类，获得描述显著区域特征的特征向量；Step 4, extracting the salient points in the salient area of the image obtained in step 3, based on the salient point features, using a clustering method to cluster the salient points, and obtaining a feature vector describing the salient area features;

步骤5，基于步骤4所获特征向量，根据预设的相似性度量准则进行影像检索。Step 5, based on the feature vector obtained in step 4, perform image retrieval according to a preset similarity measurement criterion.

步骤1具体为：Step 1 is specifically:

通过构造不同尺度的高斯金字塔得到影像的亮度特征、颜色特征和方向特征，融合影像的亮度特征、颜色特征和方向特征获得与原始影像尺寸一致的显著图。The brightness, color and direction features of the image are obtained by constructing Gaussian pyramids of different scales, and the saliency map with the same size as the original image is obtained by fusing the brightness, color and direction features of the image.

步骤2采用自适应阈值法将显著图转化为对应的二值化显著图，二值化阈值采用大津法确定。Step 2 uses the adaptive threshold method to transform the saliency map into the corresponding binarized saliency map, and the binarization threshold is determined by the Otsu method.

步骤3中通过对原始影像与其对应的二值化显著图进行“掩膜”运算，获取影像的显著区域。In step 3, the "mask" operation is performed on the original image and its corresponding binarized saliency map to obtain the salient region of the image.

步骤4中所述的特征为词袋特征，词袋特征根据文本检索中Bag of Words算法原理获得。The features described in step 4 are bag-of-words features, and the bag-of-words features are obtained according to the Bag of Words algorithm principle in text retrieval.

步骤5进一步包括子步骤：Step 5 further includes sub-steps:

5.1预设相似性度量准则；5.1 Preset similarity measurement criteria;

5.2基于影像显著区域的特征向量，采用相似性度量准则逐一获取待检索影像和影像库中影像的相似性值；5.2 Based on the eigenvectors of the salient areas of the image, use the similarity measurement criterion to obtain the similarity values of the image to be retrieved and the image in the image library one by one;

5.3按照相似性值大小对影像库中影像排序后输出。5.3 The images in the image library are sorted according to the similarity value and then output.

二、基于显著区域特征的遥感影像检索系统，包括：2. Remote sensing image retrieval system based on salient regional features, including:

显著图获取模块，用来利用视觉注意模型获取影像的显著图；A saliency map acquisition module, which is used to obtain a saliency map of an image using a visual attention model;

二值化显著图获取模块，用来将显著图转化为对应的二值化显著图；The binarized saliency map acquisition module is used to convert the saliency map into a corresponding binarized saliency map;

显著区域获取模块，用来基于原始影像和二值化显著图获取影像的显著区域；The salient area acquisition module is used to acquire the salient area of the image based on the original image and the binarized saliency map;

特征向量获取模块，用来提取影像显著区域的显著点，基于显著点特征，采用聚类方法对显著点进行聚类，获得描述显著区域特征的特征向量；The feature vector acquisition module is used to extract the salient points of the salient areas of the image, based on the features of the salient points, the clustering method is used to cluster the salient points, and the feature vector describing the features of the salient areas is obtained;

影像检索模块，用来基于特征向量，根据预设的相似性度量准则进行影像检索。The image retrieval module is used to perform image retrieval based on feature vectors and according to preset similarity measurement criteria.

与现有技术相比，本发明具有如下特点和有益效果，Compared with the prior art, the present invention has the following characteristics and beneficial effects,

1、采用视觉注意模型获取影像的显著区域，为了弥补视觉注意模型的缺点，将原始影像和二值化显著图进行“掩膜”运算获得影像的显著区域。1. Use the visual attention model to obtain the salient area of the image. In order to make up for the shortcomings of the visual attention model, the original image and the binarized saliency map are subjected to a "mask" operation to obtain the salient area of the image.

2、提取显著区域的特征，克服了直接从显著图中提取特征的困难；基于提取的显著区域特征进行影像检索，不仅符合人眼视觉特点且能更好地反映检索需求，缩小了低层视觉特征和高层语义之间的距离，能有效提高遥感影像检索的查准率和查全率。2. Extracting the features of salient regions overcomes the difficulty of directly extracting features from salient images; image retrieval based on the extracted salient region features not only conforms to the characteristics of human vision but also better reflects the retrieval needs, reducing the low-level visual features The distance between high-level semantics and high-level semantics can effectively improve the precision and recall of remote sensing image retrieval.

3、扩展性好，用于检索的显著区域的特征包括但不限于词袋特征，只要能够描述显著区域内容的特征均可。3. Good scalability. The features of salient regions used for retrieval include but not limited to bag-of-words features, as long as they can describe the content of salient regions.

附图说明Description of drawings

图1为本发明实施例的流程图；Fig. 1 is the flowchart of the embodiment of the present invention;

图2为本发明实施例中不同返回影像数下的平均查准率图。Fig. 2 is a graph of the average precision rate under different numbers of returned images in the embodiment of the present invention.

具体实施方式Detailed ways

本发明方的一种具体实施方法为：利用Itti视觉注意模型计算影像库中所有影像的显著图，并采用“大津法”自适应确定阈值将显著图二值化；将二值化显著图与相应的原始影像进行“掩膜”运算得到影像的显著区域；利用SIFT算子提取影像显著区域的显著点，经聚类得到描述显著区域词袋特征的视觉单词特征向量（visual words）；按照预设的相似性度量准则检索影像库中影像。A kind of specific implementation method of the present invention is: utilize Itti visual attention model to calculate the saliency map of all images in the image storehouse, and adopt " Otsu method " self-adaptive determination threshold value to binarize the saliency map; Binarize the saliency map with The corresponding original image is subjected to "mask" operation to obtain the salient area of the image; the salient points of the salient area of the image are extracted by using the SIFT operator, and the visual word feature vector (visual words) describing the bag-of-words feature of the salient area is obtained through clustering; Retrieve the images in the image library using the set similarity measure criterion.

下面将结合图1详细说明本发明的具体实施方式，具体包括步骤：The specific embodiment of the present invention will be described in detail below in conjunction with Fig. 1, specifically include steps:

步骤1，获取遥感影像库中影像的显著图。Step 1, obtain the saliency map of the image in the remote sensing image database.

首先，构建检索用遥感影像库。First, build a remote sensing image database for retrieval.

本实施例检索影像库中采用的影像数据来自于分辨率为30cm的美国几大城市的航空影像，按照Tiles无重叠分块方式将影像分割为256*256大小的子块，构成包含飞机、稀疏居民区、建筑物以及停车场4类地物的检索影像库，其中每类地物包含100幅影像。The image data used in the retrieval image library in this embodiment comes from the aerial images of several major cities in the United States with a resolution of 30 cm. According to the Tiles non-overlapping block method, the image is divided into sub-blocks of 256*256 size to form sub-blocks containing aircraft, sparse The retrieved image library of four types of ground objects, including residential areas, buildings, and parking lots, where each type of ground features contains 100 images.

然后，获取遥感影像库中影像的显著图。Then, obtain the saliency map of the image in the remote sensing image library.

本实施例采用效果较好且较成熟的Itti视觉注意模型获取影像的显著图。通过构造不同尺度的高斯金字塔得到影像的亮度、颜色、方向三种特征。该三种特征的计算见式（1）～（3）。In this embodiment, the effective and mature Itti visual attention model is used to obtain the saliency map of the image. By constructing Gaussian pyramids of different scales, the brightness, color, and direction of the image are obtained. The calculation of the three characteristics is shown in formulas (1) to (3).

I＝(r+g+b)/3 （1）I＝(r+g+b)/3 （1）

式（1）中，I为影像的亮度特征，r、g、b分别为颜色的三个分量。In formula (1), I is the brightness feature of the image, and r, g, and b are the three components of the color respectively.

$\{\begin{matrix} RG RG ((c c,, s the s)) = = | | ((R R ((c c)) - - G G ((c c)))) Θ Θ ((G G ((s the s)) - - R R ((s the s)))) | | \\ BY BY ((c c,, s the s)) = = | | ((B B ((c c)) - - Y Y ((c c)))) Θ Θ ((Y Y ((s the s)) - - B B ((s the s)))) | | \end{matrix} - - - - - - ((22))$

式（2）中：In formula (2):

RG和BY分别代表红绿间及蓝黄间的颜色差异，即影像的颜色特征；RG and BY represent the color difference between red and green and blue and yellow, respectively, that is, the color characteristics of the image;

R、G、B、Y分别表示红、绿、蓝、黄颜色通道，R＝r-(g+b)/2，G＝g-(r+b)/2，B＝b-(r+g)/2，Y＝(r+g)/2-|r-g|/2-b，r、g、b分别为颜色的三个分量；R, G, B, and Y represent red, green, blue, and yellow color channels respectively, R=r-(g+b)/2, G=g-(r+b)/2, B=b-(r+ g)/2, Y=(r+g)/2-|r-g|/2-b, r, g, b are three components of color respectively;

c、s分别代表中央尺度和周边尺度；c and s represent the central scale and peripheral scale, respectively;

Θ代表“中央—周边（center—surround）”尺度算子。Θ represents the "center-surround" scaling operator.

O(c,s,θ)＝|O(c,θ)ΘO(s,θ)| （3）O(c,s,θ)＝|O(c,θ)ΘO(s,θ)| （3）

式（3）中，In formula (3),

O(c,s,θ)代表不同尺度影像的方向特征；O(c,s,θ) represents the direction characteristics of different scale images;

O(c,θ)和O(s,θ)分别代表中央尺度影像特征和周边尺度影像特征；O(c,θ) and O(s,θ) represent the central scale image features and peripheral scale image features respectively;

θ为Gabor滤波器的方向；θ is the direction of the Gabor filter;

Θ含义同式（2）。Θ has the same meaning as formula (2).

最后通过多特征融合方法将亮度特征、颜色特征和方向特征进行融合，得到与原影像尺寸一致的显著图。Finally, the brightness feature, color feature and direction feature are fused by multi-feature fusion method to obtain a saliency map with the same size as the original image.

具体实施时，可通过求取三种特征的算术平均值或根据三种特征的重要性求取加权平均值进行多特征融合，得到最终显著图。In specific implementation, multi-feature fusion can be performed by calculating the arithmetic mean of the three features or calculating the weighted average according to the importance of the three features to obtain the final saliency map.

通过求取三种特征的算术平均值进行多特征融合的公式如下：The formula for multi-feature fusion by calculating the arithmetic mean of the three features is as follows:

$S S = = \frac{11}{33} ((Intensity Intensity + + Color color + + Orientation Orientation)) - - - - - - ((44))$

式（4）中，S代表最终显著图，Intensity表示亮度特征，Color表示颜色特征，Orientation表示方向特征。In formula (4), S represents the final saliency map, Intensity represents the brightness feature, Color represents the color feature, and Orientation represents the direction feature.

根据三种特征的重要性求取加权平均值进行多特征融合的公式如下：The formula for calculating the weighted average value for multi-feature fusion according to the importance of the three features is as follows:

$S S = = \frac{α α * * Intensity Intensity + + β β * * Color color + + γ γ * * Orietation Orientation}{Intensity Intensity + + Color color + + Orientation Orientation} - - - - - - ((55))$

式（5）中，S代表最终显著图；Intensity表示亮度特征，Color表示颜色特征，Orientation表示方向特征；α、β、γ分别为亮度特征、颜色特征及方向特征的权值，根据三种特征的重要性设定，且α+β+γ=1。In formula (5), S represents the final saliency map; Intensity represents the brightness feature, Color represents the color feature, and Orientation represents the orientation feature; α, β, and γ are the weights of the brightness feature, color feature, and orientation feature, respectively. The importance setting of , and α+β+γ=1.

步骤2，显著图二值化。Step 2, saliency map binarization.

本实施例采用“大津法（Otsu）”自适应地确定显著图的二值化阈值。大津法是一种较成熟的确定图像二值化分割阈值的算法，根据该算法使影像前景和背景的类间方差达到最大值的阈值即为适合的二值化阈值，类间方差的计算见式（6）：In this embodiment, the "Otsu method (Otsu)" is adopted to adaptively determine the binarization threshold of the saliency map. The Otsu method is a relatively mature algorithm for determining the threshold value of image binarization segmentation. According to this algorithm, the threshold value at which the variance between classes of the foreground and background of the image reaches the maximum value is the appropriate threshold value for binarization. For the calculation of the variance between classes, see Formula (6):

v＝w₀w₁(μ₀-μ₁)² （6）v＝w ₀ w ₁ (μ ₀ -μ ₁ ) ² (6)

式（6）中：v表示类间方差；w₀表示显著图前景像素数占整幅显著图像素数的比例；w₁表示显著图背景像素数占整幅显著图像素数的比例；μ₀表示显著图前景像素的平均灰度；μ₁表示显著图背景像素的平均灰度。In formula (6): v represents the variance between classes; w ₀ represents the proportion of the foreground pixels of the saliency map to the pixels of the entire saliency map; w ₁ represents the proportion of the background pixels of the saliency map to the pixels of the entire saliency map; μ ₀ represents the significant The average gray level of the foreground pixels in the map; μ ₁ represents the average gray level of the background pixels in the salient map.

步骤3，获取影像的显著区域。Step 3, get the salient area of the image.

从数字图像处理的角度来说，所谓“掩膜”运算就是遮掩图像中的某些像素而保留所需像素，进行掩膜运算时要求两幅影像大小相同。From the perspective of digital image processing, the so-called "mask" operation is to cover some pixels in the image and retain the required pixels. When performing the mask operation, the two images are required to have the same size.

数字图像是由以灰度值为元素的矩阵来表示，区域灰度值越大表示该区域越亮，反之表示该区域越暗。二值化显著图其灰度值只有0和1两种，其与原始影像进行掩膜运算时，将原始影像与二值化显著图对应的矩阵进行数组乘法运算即可，此时原始影像上与二值化显著图中灰度值为1的位置相应的像素被保留下来，其余像素灰度值变为0。通过“掩膜”运算即可从原始影像上分离出影像的显著区域。A digital image is represented by a matrix whose elements are gray values. The larger the gray value of an area, the brighter the area, and the darker the area. The gray value of the binarized saliency map is only 0 and 1. When performing mask operation with the original image, the matrix corresponding to the original image and the binarized saliency map can be multiplied by array. At this time, the original image The pixels corresponding to the positions with a gray value of 1 in the binarized saliency map are retained, and the gray values of other pixels become 0. The salient areas of the image can be separated from the original image through the "mask" operation.

设F代表原始影像，S代表原始影像F对应的二值化显著图，R代表“掩膜”运算结果，掩膜运算见式（7）：Let F represent the original image, S represent the binarized saliency map corresponding to the original image F, and R represent the result of the "mask" operation. The mask operation is shown in formula (7):

R＝F*S （7）R＝F*S （7）

其中，in,

$F f = = [\begin{matrix} f f ((0,0 0,0)) & f f ((0,1 0,1)) & \cdot &Center Dot; \cdot &Center Dot; \cdot \cdot & f f ((00,, n no - - 11)) \\ f f ((1,0 1,0)) & f f ((1,1 1,1)) & \cdot &Center Dot; \cdot &Center Dot; \cdot \cdot & f f ((11,, n no - - 11)) \\ \cdot \cdot & \cdot \cdot \\ \cdot &Center Dot; & \cdot \cdot \\ \cdot &Center Dot; & \cdot \cdot \\ f f ((m m - - 1,0 1,0)) & f f ((m m - - 1,1 1,1)) & \cdot &Center Dot; \cdot \cdot \cdot &Center Dot; & f f ((m m - - 11,, n no - - 11)) \end{matrix}];;$

$S S = = [\begin{matrix} Logical Logical & Logical Logical & \cdot &Center Dot; \cdot &Center Dot; \cdot \cdot & Logical Logical \\ Logical Logical & Logical Logical & \cdot \cdot \cdot \cdot \cdot \cdot & Logical Logical \\ \cdot \cdot & \cdot \cdot \\ \cdot \cdot & \cdot \cdot \\ \cdot \cdot & \cdot \cdot \\ Logical Logical & Logical Logical & \cdot \cdot \cdot \cdot \cdot \cdot & Logical Logical \end{matrix}];;$

F中各元素为0～255间的灰度值，S中Logical为逻辑值0或1。Each element in F is a gray value between 0 and 255, and Logical in S is a logical value of 0 or 1.

步骤4，提取影像显著区域的词袋特征。Step 4, extract the bag-of-words features of the salient regions of the image.

本实施例中采用SIFT算子提取影像显著区域的词袋特征。对于各影像显著区域，采用SIFT算子可提取获得K×128维的数组，即影像显著区域的显著点特征，其中K为提取的显著点数目，K×128维数组每行表示一个显著点对应的128维特征向量。In this embodiment, the SIFT operator is used to extract the bag-of-words features of the salient regions of the image. For each salient area of the image, the SIFT operator can be used to extract a K×128-dimensional array, that is, the salient point features of the salient area of the image, where K is the number of salient points extracted, and each row of the K×128-dimensional array represents a salient point corresponding to The 128-dimensional feature vector of .

为了获得描述词袋特征的视觉单词特征向量，本实施例中采用K-means聚类算法对每幅影像显著区域的显著点进行聚类，得到1*k维的描述影像显著区域词袋特征的视觉单词特征向量V，其中，k表示聚类中心的数目，视觉单词特征向量V如下：In order to obtain the visual word feature vector describing the feature of the bag of words, in this embodiment, the K-means clustering algorithm is used to cluster the salient points in the salient area of each image, and a 1*k-dimensional bag of word feature describing the salient area of the image is obtained. Visual word feature vector V, where k represents the number of cluster centers, visual word feature vector V is as follows:

V=[a₁ a₂ … a_k] (8)V=[a ₁ a ₂ … a _k ] (8)

式（8）中，V中各元素表示各聚类中心对应的显著点数目。In formula (8), each element in V represents the number of significant points corresponding to each cluster center.

步骤5，基于相似度准则的影像检索。Step 5, image retrieval based on similarity criterion.

完成步骤4后，遥感影像库中的影像均采用视觉单词特征向量进行了描述，根据预设的相似性度量准则进行影像检索。常用的相似性度量方法包括城区距离、欧氏距离、直方图相交法、二次式距离、余弦距离、相关系数以及KL散度等。After step 4 is completed, the images in the remote sensing image library are described by visual word feature vectors, and image retrieval is performed according to the preset similarity measurement criteria. Commonly used similarity measurement methods include urban distance, Euclidean distance, histogram intersection method, quadratic distance, cosine distance, correlation coefficient, and KL divergence.

本实施例中采用较简单的欧氏距离作为相似性度量准则来衡量待检索影像和影像库中影像的相似性大小，欧氏距离的计算见公式（9）。按公式（9）分别计算待检索影像与影像库中所有影像的相似性大小，并按一定顺序返回检索结果，通常将相似性大小按降序方式进行排列后输出，这样输出结果中位置越靠前的影像表示其与待检索影像相似性越高。In this embodiment, the simpler Euclidean distance is used as the similarity measure criterion to measure the similarity between the images to be retrieved and the images in the image database. The calculation of the Euclidean distance is shown in formula (9). Calculate the similarity between the image to be retrieved and all the images in the image library according to formula (9), and return the retrieval results in a certain order. Usually, the similarity is sorted in descending order and then output, so that the higher the position in the output results An image of means that it has a higher similarity with the image to be retrieved.

$L L = = (({I I}_{11},, {I I}_{22})) = = {(({Σ Σ}_{i i = = 11}^{n no} {(({a a}_{i i} - - {b b}_{i i}))}^{22}))}^{\frac{11}{22}} - - - - - - ((99))$

式（9）中，I₁和I₂表示进行相似性计算的两幅影像；L(I₁,I₂)表示I₁和I₂的欧式距离；a_i、b_i分别表示影像I₁和I₂的视觉单词特征向量中第i个元素，n为特征向量维数。In formula (9), I ₁ and I ₂ represent two images for similarity calculation; L(I ₁ , I ₂ ) represents the Euclidean distance between I ₁ and I ₂ ; a _i and b _i represent images I ₁ and The ith element in the visual word feature vector of I ₂ , n is the feature vector dimension.

计算欧氏距离时要求影像的特征向量维数相同，同时公式（9）计算出来的数值严格来说并不表示影像间的相似性大小而是表示影像间的差距大小，因此，L(I₁,I₂)值越小表示影像I₁和I₂相似性越大。When calculating the Euclidean distance, the feature vector dimensions of the images are required to be the same. At the same time, the value calculated by formula (9) strictly speaking does not indicate the similarity between the images but the gap between the images. Therefore, L(I ₁ , I ₂ ) The smaller the value, the greater the similarity between images I ₁ and I ₂ .

综上所述，本发明方法在采用Itti视觉注意模型计算影像显著图后，将显著图进行二值化并与原始影像进行“掩膜”运算，得到符合人眼视觉特点的显著区域影像；利用SIFT算子提取影像显著区域的显著点。同时，为了得到描述显著区域影像词袋特征的视觉单词特征向量，本发明采用K-means聚类算法将显著点聚类得到视觉单词特征向量。本发明方法可有效改善检索结果，提高检索查准率。In summary, after the method of the present invention uses the Itti visual attention model to calculate the saliency map of the image, it binarizes the saliency map and performs a "mask" operation with the original image to obtain a salient area image that conforms to the visual characteristics of the human eye; The SIFT operator extracts the salient points of the salient regions of the image. At the same time, in order to obtain the feature vector of visual words describing the feature of bag-of-words in the salient region, the present invention uses the K-means clustering algorithm to cluster the salient points to obtain the feature vector of visual words. The method of the invention can effectively improve retrieval results and improve retrieval accuracy.

以下通过仿真实验来验证本发明的有益效果：The beneficial effect of the present invention is verified by simulation experiment as follows:

本实验采用4类共400幅遥感影像构建检索用影像库，每类遥感影像均包含100幅影像，大小均为256*256。本实验采用平均查准率和不同返回影像数下的查准率两个评价指标来评价影像检索效果。In this experiment, a total of 400 remote sensing images of 4 categories were used to construct an image database for retrieval. Each type of remote sensing image contained 100 images, and the size was 256*256. In this experiment, two evaluation indexes, the average precision rate and the precision rate under different numbers of returned images, are used to evaluate the image retrieval effect.

查准率指返回的影像中相似的影像数与所有返回的影像数的比值，通过设定返回的影像数目即可得到在不同返回影像数下的影像的查准率，进而计算各方法的平均查准率。The precision rate refers to the ratio of the number of similar images in the returned images to the number of all returned images. By setting the number of returned images, the precision rate of images under different numbers of returned images can be obtained, and then the average of each method can be calculated. Precision.

见图2，其中，方法1是传统的基于颜色特征的影像检索方法，方法2是传统的基于纹理特征的影像检索方法，方法3是利用SIFT算子直接从原始影像上提取词袋特征进行检索的方法，方法4是本发明方法。从图2可以看出，在返回影像数目较少时，上述方法均能保持较高的查准率；但随着返回影像数目增加，方法1～3的检索查准率下降很快，而本发明方法的检索查准率下降较慢，在返回影像数目很多时仍能保持较高的查准率。See Figure 2, where method 1 is a traditional image retrieval method based on color features, method 2 is a traditional image retrieval method based on texture features, and method 3 is to use the SIFT operator to directly extract word bag features from the original image for retrieval method, method 4 is the method of the present invention. It can be seen from Figure 2 that when the number of returned images is small, the above methods can maintain a high precision rate; but as the number of returned images increases, the retrieval precision rate of methods 1-3 drops rapidly, while this method The retrieval precision rate of the inventive method drops slowly, and can still maintain a high precision rate when the number of returned images is large.

方法1～4对四类地物的平均查准率如表1所示：The average precision of methods 1 to 4 for the four types of ground objects is shown in Table 1:

表1平均查准率统计表Table 1 Average precision rate statistics table

平均查准率average precision 方法1method 1 0.41500.4150 方法2Method 2 0.48100.4810 方法3Method 3 0.36870.3687 方法4Method 4 0.57370.5737

从表1可看出本发明方法具有最优的平均查准率，能很好地描述各类影像的显著内容，检索效果最好。It can be seen from Table 1 that the method of the present invention has the best average precision rate, can well describe the salient content of various images, and has the best retrieval effect.

以上内容是结合最佳实施方案对本发明所做的进一步详细说明，不能认定本发明的具体实施只限于这些说明。本领域的技术人员应能理解，在不脱离由所附权利要求书限定的情况下，可以在细节上进行各种修改，都应当视为本发明的保护范围。The above content is a further detailed description of the present invention in combination with the best embodiments, and it cannot be assumed that the specific implementation of the present invention is limited to these descriptions. It should be understood by those skilled in the art that various modifications can be made in the details without departing from the conditions defined by the appended claims, which should be regarded as the protection scope of the present invention.

Claims

1. The remote sensing image retrieval method based on salient regional features, is characterized in that, comprises the steps:

Step 1, using the visual attention model to obtain the saliency map of the image;

Step 2, converting the saliency map obtained in step 1 into a corresponding binarized saliency map;

Step 3, based on the original image and the binarized saliency map obtained in step 2, the salient area of the image is obtained;

Step 4, extracting the salient points in the salient area of the image obtained in step 3, based on the salient point features, using a clustering method to cluster the salient points, and obtaining a feature vector describing the salient area features;

Step 5, based on the feature vector obtained in step 4, perform image retrieval according to a preset similarity measurement criterion.

2. The remote sensing image retrieval method based on salient regional features as claimed in claim 1, characterized in that:

Step 1 is specifically:

The brightness, color and direction features of the image are obtained by constructing Gaussian pyramids of different scales, and the saliency map with the same size as the original image is obtained by fusing the brightness, color and direction features of the image.

3. The remote sensing image retrieval method based on salient regional features as claimed in claim 1, characterized in that:

Step 2 uses the adaptive threshold method to transform the saliency map into the corresponding binarized saliency map, and the binarization threshold is determined by the Otsu method.

4. The remote sensing image retrieval method based on salient regional features as claimed in claim 1, characterized in that:

In step 3, the "mask" operation is performed on the original image and its corresponding binarized saliency map to obtain the salient region of the image.

5. The remote sensing image retrieval method based on salient regional features as claimed in claim 1, characterized in that:

The features described in step 4 are bag-of-words features.

6. The remote sensing image retrieval method based on salient regional features as claimed in claim 1, characterized in that:

Step 5 further includes sub-steps:

5.1 Preset similarity measurement criteria;

5.2 Based on the feature vector of the salient area of the image, use the similarity measurement criterion to obtain the similarity value of the image to be retrieved and the image in the image library one by one;

5.3 After sorting the images in the image library according to the similarity value, output them.

7. A remote sensing image retrieval system based on salient regional features, characterized in that it includes:

A saliency map acquisition module, which is used to obtain a saliency map of an image using a visual attention model;

The binarized saliency map acquisition module is used to convert the saliency map into a corresponding binarized saliency map;

The salient area acquisition module is used to acquire the salient area of the image based on the original image and the binarized saliency map;

The feature vector acquisition module is used to extract the salient points of the salient areas of the image, based on the features of the salient points, the clustering method is used to cluster the salient points, and the feature vector describing the features of the salient areas is obtained;

The image retrieval module is used to perform image retrieval based on feature vectors and according to preset similarity measurement criteria.