CN106127197B

CN106127197B - Image saliency target detection method and device based on saliency label sorting

Info

Publication number: CN106127197B
Application number: CN201610219337.0A
Authority: CN
Inventors: 郎丛妍; 李尊; 何伟明; 于兆鹏; 杜雪涛; 杜刚; 朱艳云
Original assignee: Beijing Jiaotong University; China Mobile Group Design Institute Co Ltd
Current assignee: Beijing Jiaotong University; China Mobile Group Design Institute Co Ltd
Priority date: 2016-04-09
Filing date: 2016-04-09
Publication date: 2020-07-07
Anticipated expiration: 2036-04-09
Also published as: CN106127197A

Abstract

The embodiment of the invention provides a method and a device for detecting an image saliency target based on saliency label sequencing. The method mainly comprises the following steps: dividing each image in the image sample set into a plurality of image areas by using a SLIC segmentation method, and extracting visual features and background contrast features for each image area; forming a training set and a test set according to the visual features, the background contrast features and the significant value labels of each image region, and learning the significant value of each image region in each image by using an algorithm based on significant label sequencing; and recovering the significant value of each region in the image by using a low-rank matrix recovery theory, and detecting a significant target in the image. The method disclosed by the invention has the advantages that the complexity of the nuclear norm control model of the matrix is fully utilized, the visual feature similarity and the semantic label similarity are combined, and the correlation between graph Laplace regularization constraints is utilized, so that the problems that the significant label space is large but the number of training images is limited are effectively solved.

Description

Image saliency target detection method and device based on saliency label sorting

Technical Field

The invention relates to the technical field of article image processing, in particular to a method and a device for detecting an image saliency target based on saliency label sequencing.

Background

In recent years, with the rapid development of internet technology and multimedia information technology, multimedia information using images as carriers is becoming an important means for people to transmit and acquire information. However, the computational resources available to process multimedia information are very limited compared to the explosive growth of image data. Therefore, the significance detection technology can be combined with the information selection capability of a human cognitive system to extract interested contents from a complex image, so that complex massive multimedia visual information resources are reasonably and effectively utilized, and the significance detection technology plays an important role in the field of image analysis and understanding.

Recently, a data-driven top-down method is utilized to achieve a good effect in the field of saliency extraction of the image, the existing supervision algorithm considers the saliency detection problem as a two-classification or regression problem, and in order to learn a reliable model, the existing supervision algorithm mostly depends on a large-scale training data set and has certain limitation. Therefore, it is necessary to develop a simple and effective saliency target detection algorithm.

Disclosure of Invention

The embodiment of the invention provides a method and a device for detecting an image salient object based on salient label sorting, which are used for effectively detecting the salient object in an image.

In order to achieve the purpose, the invention adopts the following technical scheme.

According to one aspect of the invention, an image saliency target detection method based on saliency label sorting is provided, which comprises the following steps:

establishing an image sample set, dividing each image in the image sample set into a plurality of image areas by using a super-pixel segmentation SLIC (linear segmentation in particular) segmentation method, and extracting visual features and background contrast features of each image area;

extracting a saliency target of each image in the image sample set by using an image saliency detection algorithm to obtain a saliency label of each image area in each image;

forming a training set and a testing set according to the visual features, the background contrast features and the significant value labels of each image region, and learning the significant value of each image region in each image by using an algorithm based on significant label sequencing;

and recovering a saliency map of each image by using the low-rank matrix recovery theory and the saliency value of each image region, and detecting a salient target in the image.

Further, the extracting of the visual feature and the background contrast feature for each image region includes:

the visual features comprise color features and texture features, wherein the color features comprise average RGB, LAB and HSV color values of pixel points contained in each image region and corresponding color space histograms; the texture features comprise LBP and LM filter distribution features of the image area; the background contrast characteristic adopts a certain number of peripheral edge regions as a background, and respectively extracts the color texture of the background region and the contrast characteristic between the color texture and the contrast characteristic;

the background contrast characteristic of the image area is defined as follows:

using the peripheral region of the boundary as a pseudo-background region for each image region, image region R_tThe background value of (a) may be expressed as:

where B represents the entire pseudo-background area,

feature vector, v, representing each small region in a pseudo-background^BAn overall feature vector representing the entire pseudo-background; image region R for pseudo background_tRegion R of an image_tThe background contrast characteristic of (a) is defined as follows:

wherein λ_jIs to the region R_tConstraint parameter of area, p_tAnd p_jAre respectively corresponding image areas R_tAnd R_jIs a spatial weight coefficient, N is the total number of images in the image sample set,

representing feature vectors in each channel

And

the histogram distance between;

and splicing various visual features and background contrast features of the image area to obtain a feature vector of the image area.

Further, the method for learning the saliency value of each image region in each image by using the algorithm based on the saliency label ranking comprises the following steps:

regarding the significant value labels of each region as 256 classes, regarding the significant value of an image region as a positive label, regarding a complementary set of the significant value in a set {0,1 … 255} as a negative label of the image region, composing the positive label, the negative label and the feature vector of the image region into a sample set, selecting one part from the sample set as a training set, and regarding the rest part as a test set;

and establishing a significance target detection parameter model framework by using the training set and the test set, establishing an error loss model, and then optimizing the significance target detection model by using the error loss model to obtain parameters so as to obtain a significance value of each image area in each image.

Further, the establishing a frame of a saliency target detection parameter model by using the training set and the test set, then establishing an error loss model, then performing optimization solution on the saliency target detection model by using the error loss model to obtain parameters, and obtaining a saliency value of each image region in each image includes:

the significance detection is regarded as a multi-classification problem, a classification model is found through a multi-label learning algorithm based on ordering, and a training set of all image region features of each image is represented as I ═ r₁,r₂...r_nR, each image area characteristic r_i∈R^dIs a d-dimensional vector, n is the total number of the training set, and the saliency labels corresponding to all the image regions of each image are represented as τ ═ l₁,l₂...l_mUsing y ═ y }₁,y₂...y_n)∈{0,1}^m×nSignificance labels, y, representing correspondences in the training set_i∈{0,1}^mIndicating the saliency labels assigned to the ith region, using y_ji1 denotes a saliency label l_jIs assigned to region r_iOn the contrary, y_ji0; m belongs to the set 0, 1.. 255, representing the corresponding saliency values of the label.

For the image region r_iIf y is_ji1 and y_kiWhen the sequence is equal to 0, the sequence function f of the ith label is predicted by using a multi-label sequence method_i(r) for this image area r_iThe loss between the positive and negative tags is defined as follows:

ε_j,k(r,y)＝I(y_j≠y_k)l((y_j-y_k)(f_j(r)-f_k(r))) (1)

where I (z) represents an indicator function, and outputs 1 when z is true; otherwise, 0 is output, and a linear function is used to represent the prediction function, defined as f_i(g)＝w_i ^Tg, wherein W ═ W₁,w₂...w_m]∈R^d×mAccording to equation (1), the error loss model for all image regions in the training set is defined as follows:

and (3) utilizing regularization to constrain the error loss model, regarding W as a low-rank matrix, and introducing a nuclear norm, wherein a minimized loss function is as follows:

where λ is a constraint parameter.

For two region feature vectors r_iAnd r_jDefining a similarity matrix S ═ S_ij]_n×nWherein s is_ij＝e(-||r_i-r_j||²/σ²) If and only if x_i∈N_k(r_j) Or x_j∈N_k(r_i)，s_ijRepresenting the visual similarity between two regional features, N_kAnd (r) is k adjacent sets of the region r, and if the visual features of the two regions are similar by combining the graph Laplacian regularization theory, the corresponding label spaces of the two regions also have similarity. The visual constraint regularization term is defined as follows:

wherein

Is a diagonal matrix and L is a laplacian matrix, and in combination with equations (3) (4), the optimization problem is abstracted as the following objective function:

wherein α, λ is the balance parameter, L ═ E^-1/2(E-s)E^-1/2Is a normalized graph laplacian matrix.

Solving the formula (5) by using an APG method, solving a feature similarity matrix L of a training set, and iteratively solving W as follows:

wherein the optimization problem is solved as

Is to f (W)_t) Finding the gradient, W_t'＝UΣV^TIs the singular value decomposition of W' and,

is a diagonal matrix calculated as

η_tIs the step size of the update.

According to another aspect of the present invention, there is provided an image saliency target detection apparatus based on saliency label sorting, including:

the image area characteristic acquisition module is used for establishing an image sample set, dividing each image in the image sample set into a plurality of image areas by using a super-pixel segmentation SLIC (linear segmentation in particular) segmentation method, and extracting visual characteristics and background contrast characteristics for each image area;

the image area significant value label acquisition module is used for extracting a significant target from each image in the image sample set by using an image significance detection algorithm to obtain a significant value label of each image area in each image;

the image area significant value acquisition module is used for forming a training set and a test set according to the visual features, the background contrast features and the significant value labels of each image area and learning the significant value of each image area in each image by using an algorithm based on significant label sequencing;

and the salient target acquisition module of the image recovers a salient image of each image by using a low-rank matrix recovery theory and the salient value of each image area, and detects the salient target in the image.

Further, the image region feature obtaining module is specifically configured to set the visual features to include color features and texture features, where the color features include average RGB, LAB, HSV color values of pixel points included in each image region and corresponding color space histograms; the texture features comprise LBP and LM filter distribution features of the image area; the background contrast characteristic adopts a certain number of peripheral edge regions as a background, and respectively extracts the color texture of the background region and the contrast characteristic between the color texture and the contrast characteristic;

the background contrast characteristic of the image area is defined as follows:

where B represents the entire pseudo-background area,

representing feature vectors in each channel

And

the histogram distance between;

Further, the significant value obtaining module of the image region is specifically configured to regard the significant value labels of each region as 256 classes, regard the significant value of the image region as a positive label, regard a complement of the significant value in a set {0,1 … 5} as a negative label of the image region, form a sample set by the positive label, the negative label and a feature vector of the image region, select a part from the sample set as a training set, and use the remaining part as a test set;

Further, the saliency value acquisition module of the image region is specifically configured to regard saliency detection as a multi-classification problem, find a classified model through a multi-label learning algorithm based on ranking, and represent a training set of all image region features of each image as I ═ { r ═ r₁,r₂...r_nR, each image area characteristic r_i∈R^dIs a d-dimensional vector, n is the total number of the training set, and the saliency labels corresponding to all the image regions of each image are represented as τ ═ l₁,l₂...l_mUsing y ═ y }₁,y₂...y_n)∈{0,1}^m×nSignificance labels, y, representing correspondences in the training set_i∈{0,1}^mIndicating the saliency labels assigned to the ith region, using y_ji1 denotes a saliency label l_jIs assigned to region r_iOn the contrary, y_ji0; m belongs to the set 0, 1.. 255, representing the corresponding saliency values of the label.

ε_j,k(r,y)＝I(y_j≠y_k)l((y_j-y_k)(f_j(r)-f_k(r))) (1)

where λ is a constraint parameter.

wherein

wherein the optimization problem is solved as

is a diagonal matrix calculated as

η_tIs the step size of the update.

According to the technical scheme provided by the embodiment of the invention, the method of the invention fully utilizes the complexity of the nuclear norm control model of the matrix, combines the visual characteristic similarity and the semantic label similarity, and utilizes the correlation between the graph Laplace regularization constraints, thereby effectively solving the problems of large significant label space and limited training image quantity.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.

Fig. 1 is a flowchart of an image saliency target detection method based on saliency tag sorting according to an embodiment of the present invention;

FIG. 2 is a schematic model diagram of an image saliency target detection algorithm based on saliency label sorting according to an embodiment of the present invention;

fig. 3 is a specific structural diagram of an image saliency target detection apparatus based on saliency tag sorting according to an embodiment of the present invention, and includes an image region feature acquisition module 31, an image region saliency value tag acquisition module 32, and an image region saliency value acquisition module 33.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or coupled. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

For the convenience of understanding the embodiments of the present invention, the following description will be further explained by taking several specific embodiments as examples in conjunction with the drawings, and the embodiments are not to be construed as limiting the embodiments of the present invention.

Example one

An embodiment of the present invention provides a flow chart of a method for detecting an image saliency target based on saliency label sorting, as shown in fig. 1, the method includes the following steps:

step S110: establishing an image sample set from existing data sets containing a salient object;

the data set includes MSRA1000, ECSSD, and ICOSEG.

Step S120: each image in the image sample set is divided into t image regions using a slic (simple linear iterative) segmentation method, where t is a natural number, preferably 150. And extracting visual features and background contrast features for each image region, wherein the visual features comprise color features and texture features. Each image in the sample set of images is represented using features of the image region.

The color characteristics of the image comprise average RGB, LAB and HSV color values of pixel points contained in each image area and corresponding color space histograms; the texture features comprise LBP and LMfilter distribution features of the image area; the background features adopt a certain number of peripheral edge regions as background regions, and respectively extract color textures of the background regions and contrast features between the color textures.

The color characteristics are calculated as follows. Research shows that when the significance target is detected, the RGB color space and the LAB color space play a complementary role, and the HSV color space can more accurately describe the visual perception capability of human eyes. Then, for each divided image region, the average RGB color, LAB color, and HSV color thereof as color features represent a color contrast vector of each image region.

Wherein the texture features are calculated as follows: using LBP and LMFilter to represent texture feature descriptors, extracting LBP (Local Binary Pattern) histograms of 8 x 8 of each image area, and calculating chi between LBP histograms of two adjacent areas²The distances are as follows:

wherein h is_iAn LBP histogram representing the image area i.

In the same way for the LMFilter, extracting 8X 8 LMFilter histogram of each image area, and calculating chi between the LMFilter histograms of two adjacent areas²Distance.

The background contrast characteristic is calculated as follows. Background contrast features in salient object detectionOften as an inhibition feature to assist in the extraction of salient objects. Using the peripheral region of the boundary as a pseudo-background region for each image region, image region R_tThe background value of (a) may be expressed as:

where B represents the entire pseudo-background area,

feature vector, v, representing each small region in a pseudo-background^BThe global feature vector representing the entire pseudo-background is obtained by calculation in the above-described manner. The feature vector of the pseudo background is mainly composed of color and texture, so that the calculation method is similar to the feature vector acquisition method of the region; thus, the image region R for the pseudo background_tRegion R of an image_tThe background contrast characteristic of (a) is defined as follows:

wherein λ_jIs to the region R_tConstraint parameter of area, p_tAnd p_jAre respectively corresponding image areas R_tAnd R_jAverage position distance between, p_tAnd p_jAre respectively corresponding image areas R_tAnd_Rjaverage position coordinates of regional pixel points

σ is a spatial weight coefficient, N is a total number of images in the image sample set,

representing feature vectors in each channel

And

the histogram distance between. In digital image processing, an image is composed of three color channels of R, G and B, and a feature vector in the channel is obtained by calculating some features of the three channels.

Thus, regional contrast features of 3 colors are obtained, and finally each image region represents color features using a 28-dimensional feature vector.

By splicing the above various features, 74-dimensional feature vectors can be obtained for each image region

Step S130: and extracting a saliency target of each image in the sample set by using the existing image saliency detection algorithm to obtain a saliency label of each image area in each image. The conventional image saliency detection algorithm is based on a structured matrix decomposition algorithm, and adopts the idea of separating a background from a target in an image to obtain a saliency map of the image, and then an average value of saliency values of pixels of each image area is taken as a saliency value label of the image area.

Step S140: and dividing the visual features and the corresponding significant value label set of each image area into a training set and a testing set, and learning the significant value corresponding to each image area in each image by using an algorithm based on significant label ordering.

Regarding the significant value labels of each region as 256 classes, regarding the significant value of an image region as a positive label, regarding a complement of the significant value in a set {0,1 … 255} as a negative label of the image region, composing the positive label, the negative label and the feature vector of the image region into a sample set, selecting one part from the sample set as a training set, and regarding the rest part as a test set.

And establishing a significance target detection parameter model framework by utilizing the training set and the test set, then establishing an error loss model, and then optimizing the significance target detection model by utilizing the error loss model to obtain parameters so as to obtain a significance value of each image area in each image.

Through the proposed optimization algorithm, a model parameter W can be trained on a training data set, then for each region in each image, the probability of occurrence of a significant value label of each region is obtained by multiplying W by a feature vector X, the probabilities are arranged in a descending order, and the position of the column coordinate of W corresponding to the maximum probability value is taken as the significant value of all pixel points in the region.

Fig. 2 shows a model diagram of an image saliency target detection algorithm based on saliency label ranking provided by the present invention. As shown in fig. 2, the saliency detection is regarded as a multi-classification problem, and a classification model is found through a multi-label learning algorithm based on ranking. This algorithm is particularly suitable for situations where large-scale classes are learned in a limited set of training samples. First, the training set of all region features in step S120 is represented as I ═ r₁,r₂...r_nR of each region r_i∈R^dIs a d-dimensional vector and n is the total number of training sets. The saliency labels corresponding to all the regions in step S130 are denoted as τ ═ { l₁,l₂...l_mUsing y ═ y }₁,y₂...y_n)∈{0,1}^m×nSignificance label, y, corresponding to the training feature set_i∈{0,1}^mIndicating the saliency labels assigned to the ith region, using y_ji1 denotes a saliency label l_jIs assigned to region r_iOn the contrary, y_ji0; m belongs to the set 0, 1.. 255, representing the corresponding saliency values of the label.

The present invention addresses this salient object detection problem, for region r_iIf y is_ji1 and y_kiWhen the sequence is equal to 0, the sequence function f of the ith label is predicted by using a multi-label sequence method_i(r), this function can be assigned to the label/_iA high significance value, and give label l_kA low significance value. Therefore, the loss between the positive and negative tags in this area is defined as follows:

ε_j,k(r,y)＝I(y_j≠y_k)l((y_j-y_k)(f_j(r)-f_k(r))) (1)

where I (z) represents an indicator function, outputs 1 when z is true and 0 otherwise. For convenient and efficient calculation, the prediction function is expressed by a linear function, defined as f_i(g)＝w_i ^Tg, wherein W ═ W₁,w₂...w_m]∈R^d×m. Then, according to equation (1), the loss function for all regions in the training set is defined as follows:

in the invention, the overfitting condition of data is prevented while the complexity of the model is controlled, regularization is utilized to carry out constraint on the model, W is regarded as a low-rank matrix, a nuclear norm is introduced, and a minimized loss function is as follows:

where λ is a constraint parameter.

In addition, in order to better solve the problem of salient object detection, the visual similarity characteristics of image areas are fully considered, and the characteristics r of the two areas are subjected to two-area detection_iAnd r_jDefining a similarity matrix S ═ S_ij]_n×nWherein s is_ij＝e(-||r_i-r_j||²/σ²) If and only if x_i∈N_k(r_j) Or x_j∈N_k(r_i)，s_ijRepresenting the visual similarity between two regional features, N_k(r) is k immediately adjacent sets of regions r, preferably 0.01 x n. Then, in combination with the graph laplacian regularization theory, if the visual features of the two regions are similar, the corresponding label spaces also have similarity. The visual constraint regularization term is defined as follows:

wherein

Is a diagonal matrix and L is a laplacian matrix. Combining equations (3) and (4), the above optimization problem is abstracted into the following objective function:

In the invention, the function designed by fully considering the optimization problem is introduced into the non-convex function with the nuclear norm, so that the APG (estimated formal gradient) method is used for solving the step (5). Firstly, a feature similarity matrix L of a training set is solved, and then, the iterative solution of W is as follows:

wherein the optimization problem is solved as

Is to f (W)_t) And (5) calculating a gradient. Thus, W_t'＝UΣV^TIs the singular value decomposition of W' and,

is a diagonal matrix calculated as

η_tIs the step size of the update.

Step S150: and recovering the saliency value of each image region to the image in the sample set by using a low-rank matrix recovery theory, detecting the saliency target in the image, and obtaining the final saliency map.

Example two

The embodiment provides an image saliency target detection device based on saliency label sorting, the device has a specific structure as shown in fig. 3, and the device comprises:

an image area feature obtaining module 31, configured to establish an image sample set, divide each image in the image sample set into a plurality of image areas by using an SLIC segmentation method, and extract a visual feature and a background contrast feature for each image area;

a significant value label obtaining module 32 of the image area, configured to extract a significant target from each image in the image sample set by using an image significance detection algorithm, so as to obtain a significant value label of each image area in each image;

the salient value acquiring module 33 of the image area is configured to form a training set and a testing set according to the visual feature, the background contrast feature and the salient value label of each image area, and learn the salient value of each image area in each image by using an algorithm based on the ordering of the salient labels;

and the image salient target acquiring module 34 recovers the salient map of each image by using the salient value of each image region according to a low-rank matrix recovery theory, and detects the salient target in the image.

Further, the image region feature obtaining module 31 is specifically configured to set the visual features to include color features and texture features, where the color features include average RGB, LAB, HSV color values of pixel points included in each image region and corresponding color space histograms; the texture features comprise LBP and LM filter distribution features of the image area; the background contrast characteristic adopts a certain number of peripheral edge regions as a background, and respectively extracts the color texture of the background region and the contrast characteristic between the color texture and the contrast characteristic;

the background contrast characteristic of the image area is defined as follows:

using the peripheral region of the boundary as a pseudo-background region for each image region, image region R_tIs a background ofThe value may be expressed as:

where B represents the entire pseudo-background area,

representing feature vectors in each channel

And

the histogram distance between;

Further, the significant value obtaining module 33 of the image region is specifically configured to regard the significant value labels of each region as 256 classes, regard the significant value of the image region as a positive label, regard a complement of the significant value in the set {0,1 … 255} as a negative label of the image region, form a sample set by the positive label, the negative label and the feature vector of the image region, select a part of the sample set as a training set, and use the rest of the sample set as a test set;

and establishing a significance target detection parameter model framework by using the training set and the test set, establishing an error loss model, and optimizing the significance target detection model by using the error loss model to obtain parameters so as to obtain a significance value of each image area in each image.

ε_j,k(r,y)＝I(y_j≠y_k)l((y_j-y_k)(f_j(r)-f_k(r))) (1)

where λ is a constraint parameter.

wherein

wherein α, λ is balance parameterNumber, L ═ E^-1/2(E-s)E^-1/2Is a normalized graph laplacian matrix.

wherein the optimization problem is solved as

is a diagonal matrix calculated as

η_tIs the step size of the update.

The specific process of detecting the image salient object based on the salient label sorting by using the device of the embodiment of the invention is similar to that of the method embodiment, and is not repeated here.

In summary, the salient object detection algorithm provided by the embodiment of the present invention regards the salient detection problem as a multi-classification problem, performs the sequence mapping from the salient labels to the matrix recovery problem, and fully utilizes the human visual cognition process to detect the salient object in the image by combining the visual and contrast characteristics.

The salient object detection algorithm provided by the invention fully utilizes the complexity of a nuclear norm control model of a matrix, combines visual feature similarity and semantic label similarity, and utilizes the correlation between graph Laplacian regularization constraints, thereby effectively solving the problems of large salient label space and limited training image quantity.

All modules of the system are automatically completed without manual intervention, can be simply and conveniently embedded into other semantic analysis systems of the image, and has wide and universal application prospect.

Those of ordinary skill in the art will understand that: the figures are merely schematic representations of one embodiment, and the blocks or flow diagrams in the figures are not necessarily required to practice the present invention.

From the above description of the embodiments, it is clear to those skilled in the art that the present invention can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for apparatus or system embodiments, since they are substantially similar to method embodiments, they are described in relative terms, as long as they are described in partial descriptions of method embodiments. The above-described embodiments of the apparatus and system are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. An image saliency target detection method based on saliency label sorting is characterized by comprising the following steps:

recovering a saliency map of each image by using the saliency value of each image region based on a low-rank matrix recovery theory, and detecting a salient target in the image;

the method for extracting the visual feature and the background contrast feature for each image area comprises the following steps:

the background contrast characteristic of the image area is defined as follows:

for each image areaThe region around the boundary is used as a pseudo background region, image region R_tThe background values of (a) are expressed as:

where B represents the entire pseudo-background area,

feature vector, v, representing each small region in a pseudo-background^BAn overall feature vector representing the entire pseudo-background,

representing an image region R_tA background value of (d); b is a symbolic representation of a background value;

image region R 'for pseudo background'_tOf image region R'_tThe background contrast characteristic of (a) is defined as follows:

wherein λ_jIs to the region R_tConstraint parameter of area, p_tAnd p_jAre respectively corresponding image areas R_tAnd R_jIs a spatial weight coefficient, Nt is the total number of images in the image sample set,

representing feature vectors in each channel

And

the histogram distance between; k represents the number of adjacent regions of any region, c represents contrast, which is a symbolic representation of contrast values, and j represents the index of an image region, namely the jth region;

2. The method for detecting image saliency objects based on saliency label ordering according to claim 1, wherein said forming a training set and a testing set according to the visual features, the background contrast features and the saliency labels of each image region, and learning the saliency value of each image region in each image using an algorithm based on saliency label ordering includes:

dividing the estimation values of the significant values of all the regions in the image into 256 classes, wherein the estimation value of the significant value of each region is a value between 0 and 255, the estimation value of the significant value of each region is used as a positive label of the region, meanwhile, a complementary set of the estimation values in all class label sets {0,1 … 255} is used as a negative label corresponding to the image region, the positive label, the negative label and the feature vector of the image region form a sample set, one part of the sample set is selected as a training set, and the other part of the sample set is used as a test set;

and establishing a significance target detection parameter model framework by using the training set and the test set, establishing an error loss model, and then optimizing the significance target detection model by using the error loss model to obtain parameters so as to obtain an accurate value of the significance value of each image area in each image.

3. The method for detecting the image salient object based on the salient label sorting of claim 2, wherein the step of establishing a salient object detection parameter model framework by using the training set and the test set, then establishing an error loss model, and then performing optimization solution on the salient object detection model by using the error loss model to obtain the accurate value of the salient value of each image area in each image comprises the steps of:

the significance detection is regarded as a multi-classification problem, a classification model is found through a multi-label learning algorithm based on ordering, and a training set of all image region features of each image is represented as I ═ r₁,r₂...r_nR, each image area characteristic r_i∈R^dIs a d-dimensional vector, n is the total number of the training set, and the saliency labels corresponding to all the image regions of each image are represented as τ ═ l₁,l₂...l_mUsing y ═ y }₁,y₂...y_n)∈{0,1}^m×nSignificance labels, y, representing correspondences in the training set_i∈{0,1}^mIndicating the saliency labels assigned to the ith region, using y_ji1 denotes a saliency label l_jIs assigned to region r_iOn the contrary, y_ji0; m belongs to the set {0, 1.. 255} and represents a significant value corresponding to the label;

for the ith image region feature r_iIf y is_ji1 and y_kiWhen the sequence is equal to 0, the sequence function f of the ith label is predicted by using a multi-label sequence method_i(r) for this image area r_iThe loss between the positive and negative tags is defined as follows:

ε_j,k(r,y)＝I(y_j≠y_k) ℓ((y_j-y_k)(f_j(r)-f_k(r))) (1)

wherein λ is a constraint parameter;

for two region feature vectors r_iAnd r_jDefining a similarity matrix S ═ S_ij]_n×nWherein s is_ij＝e(-||r_i-r_j||²/σ²) If and only if x_i∈N_k(r_j) Or x_j∈N_k(r_i)，x_iRepresenting the original image block, x, corresponding to the ith area_jRepresenting an original image block corresponding to the jth area;

s_ijrepresenting the visual similarity between two regional features, N_k(r) is k adjacent sets of the region r, and in combination with the graph laplacian regularization theory, if the visual features of the two regions are similar, the corresponding label spaces also have similarity, and a visual constraint regularization term is defined as follows:

wherein

Is a diagonal matrix, r is the sum of all regions, r_iRepresenting the ith image area characteristic, r_jRepresenting the jth image region feature, L is a feature similarity matrix, and in combination with equations (3) (4), the optimization problem is abstracted as the following objective function:

wherein α is a balance parameter, L ═ E^-1/2(E-s)E^-1/2S is a normalized graph Laplace matrix, L is a feature similarity matrix, and each element in the matrix represents a region and its surrounding faciesThe similarity of features between adjacent regions is such that,^Tr represents a trace function;

wherein the optimization problem is solved as

Is to f (W)_t) Obtaining gradient of W'_t＝U∑V^TIs the singular value decomposition of W' and,

is a diagonal matrix calculated as

η_tFor the updated step size, U ═ U₁,u₂,...u_mIs a matrix of size m, each element in the matrix being WW^TIs also referred to as the left singular eigenvector of the matrix W, and similarly, V represents the right singular eigenvector of the matrix W.

4. An image saliency target detection apparatus based on saliency label ordering, comprising:

the salient target acquisition module of the image recovers a salient image of each image by using a low-rank matrix recovery theory and the salient value of each image area to detect a salient target in the image;

the image region feature obtaining module is specifically configured to set the visual features to include color features and texture features, where the color features include average RGB, LAB, HSV color values of pixel points included in each image region and corresponding color space histograms; the texture features comprise LBP and LM filter distribution features of the image area; the background contrast characteristic adopts a certain number of peripheral edge regions as a background, and respectively extracts the color texture of the background region and the contrast characteristic between the color texture and the contrast characteristic;

the background contrast characteristic of the image area is defined as follows:

where B represents the entire pseudo-background area,

representing feature vectors in each channel

And

the histogram distance between;

5. The apparatus according to claim 4, wherein:

the image region significant value acquisition module is specifically configured to divide the estimated values of significant values of all regions in an image into 256 classes, where the estimated value of the significant value of each region is a value between 0 and 255, the estimated value of the significant value of each region is used as a positive label of the region, meanwhile, a complement of the estimated value in all class label sets {0,1 … 255} is used as a negative label corresponding to the image region, the positive label, the negative label and a feature vector of the image region form a sample set, a part of the sample set is selected as a training set, and the rest of the sample set is used as a test set;

6. The apparatus according to claim 5, wherein:

the saliency value acquisition module of the image region is specifically configured to consider saliency detection as a multi-classification problem, find a classified model through a multi-label learning algorithm based on ranking, and represent a training set of all image region features of each image as i ═ r₁,r₂...r_nR, each image area characteristic r_i∈R^dIs a d-dimensional vector, n is the total number of the training set, and the saliency labels corresponding to all the image regions of each image are represented as τ ═ l₁,l₂...l_mUsing y ═ y }₁,y₂...y_n)∈{0,1}^m×nSignificance labels, y, representing correspondences in the training set_i∈{0,1}^mIndicating the saliency labels assigned to the ith region, using y_ji1 denotes a saliency label l_jIs assigned to region r_iOn the contrary, y_ji0; m belongs to the set {0, 1.. 255} and represents a significant value corresponding to the label;

ε_j,k(r,y)＝I(y_j≠y_k)l((y_j-y_k)(f_j(r)-f_k(r))) (1)

wherein λ is a constraint parameter;

for two region feature vectors r_iAnd r_jDefining a similarity matrix S ═ S_ij]_n×nWherein s is_ij＝e(-||r_i-r_j||²/σ²) If and only if x_i∈N_k(r_j) Or x_j∈N_k(r_i)，s_ijRepresenting the visual similarity between two regional features, N_k(r) is k adjacent sets of the region r, and in combination with the graph laplacian regularization theory, if the visual features of the two regions are similar, the corresponding label spaces also have similarity, and a visual constraint regularization term is defined as follows:

wherein

Is a diagonal matrix, r is the sum of all regions, r_iRepresenting the ith image area characteristic, r_jRepresenting the jth image region feature, L is a feature similarity matrix, in combination with equations (3) (4), abstracting the optimization problem as the following objective function:

wherein α is a balance parameter, L ═ E^-1/2(E-s)E^-1/2S is a normalized graph laplacian matrix, L is a feature similarity matrix, each element in the matrix represents the feature similarity between a certain region and its surrounding neighboring regions,^Tr represents a trace function;

wherein the optimization problem is solved as

Is to f (W)_t) Obtaining gradient of W'_t＝U∑V^TIs the singular value decomposition of W' and,is a diagonal matrix calculated as

η_tIs the step size of the update.