CN111583290A

CN111583290A - Cultural relic salient region extraction method based on visual saliency

Info

Publication number: CN111583290A
Application number: CN202010510025.1A
Authority: CN
Inventors: 杨楠; 李泽东
Original assignee: Dalian Minzu University
Current assignee: Dalian Minzu University
Priority date: 2020-06-06
Filing date: 2020-06-06
Publication date: 2020-08-25

Abstract

A cultural relic salient region extraction method based on visual saliency belongs to the field of national culture protection methods. The method for extracting the cultural relic salient region based on the visual saliency aims at the problems that the cultural relic image has a complex background and a small foreground-background difference, and the like, and can automatically extract the salient square region after the cultural relic image is uploaded by a user for displaying images of a home page or an index page, solves the problems that the cultural relic image display in the current domestic digital museum is inaccurate and unclear, and is beneficial to national culture protection and inheritance. The extraction method can directly analyze and process the image block by matching the two image processing modes, so that the calculation complexity and the time complexity in the operation process can be greatly reduced. Meanwhile, the shape characteristic can be kept in the area with the insufficient boundary information, and the boundary information can be more accurately expressed in the area with the excessive boundary.

Description

Cultural relic salient region extraction method based on visual saliency

Technical Field

The invention relates to the field of national culture protection, in particular to a cultural relic salient region extraction method based on visual saliency.

Background

With the development of society and the improvement of living standard of people, people shift more attention from the aspects of physical living demands to mental living demands. The digital museum shows the eye curtains of people, which can not only meet the spiritual needs of people, but also carry forward the traditional culture of Chinese nationalities. However, a problem is found, images of a plurality of cultural relics are not square, but images displayed on the home page of the digital museum and even the index page are square, and some digital museums select fixed positions for intercepting pictures, so that although some cultural relics can be intercepted to proper positions, some cultural relics can only intercept the background. The other groups of museums directly compress the pictures to be squares, and the small problems are seemingly unobtrusive, but can seriously affect the display effect of the pictures and are not beneficial to the spreading of Chinese national traditional culture.

When the traditional saliency extraction algorithm is applied to the condition of a cultural relic image that the background is complex and the difference between the foreground and the background is small, flaws are easy to occur, and a salient region is confused with the background. Therefore, aiming at the problems of complex background, small foreground and background difference and the like of the cultural relic image, the method for extracting the remarkable area of the cultural relic is needed to be provided, and the remarkable square area can be automatically extracted after the cultural relic image is uploaded by a user and used for displaying images on a home page or an index page, so that the problem that the display of the cultural relic image in the current domestic digital museum is inaccurate or unclear is solved.

Disclosure of Invention

In order to solve the problems, the application provides a method for extracting a remarkable area of a cultural relic based on visual saliency, and a user can automatically extract a remarkable square area of the cultural relic after uploading a cultural relic image, and the remarkable square area is used for displaying images of a home page or an index page.

The technical scheme adopted by the invention is as follows: a cultural relic salient region extraction method based on visual saliency comprises the following steps:

step 1, preprocessing the uploaded cultural relic image by using a superpixel segmentation method to obtain a superpixel image.

Step 2, extracting the significant image in the super-pixel image by using an LC significant algorithm to obtain a rough significant matrix m of the significant image_s。

Step 3, processing the uploaded cultural relic image by using a semantic segmentation model to obtain a label matrix m of the cultural relic image₁。

Step 4, according to the visual saliency, the method comprisesThe label matrix m₁Distributing the weight to obtain a weight matrix m_wThe weight matrix m_wAnd a coarse saliency matrix m_sFusion to obtain doubly optimized significant matrix m_o。

Step 5, obtaining the significant matrix m by using an OSTU algorithm_oThe optimal threshold value of (2) is to extract a square region containing the most significant points and store the square region as a significant square image.

And 6, storing the address of the significant square image obtained in the step 5 and the cultural relic image information into a database.

The super-pixel segmentation method is characterized in that pixels with similar characteristics are aggregated into super-pixels through a simple linear iterative clustering SLIC algorithm, and the super-pixels are used as basic units during image processing.

The LC algorithm takes the aggregated super-pixels as basic units during image processing, calculates the significant values of all super-pixels in the super-pixel image and obtains a rough significant matrix m_sThe saliency value of the superpixel k is calculated by:

wherein SalS (b)I _k) Is the saliency value of a superpixel k in a superpixel image,I _kfor the color of the super-pixel k in rgb color space,I _iis the color of the super-pixel i in rgb color space.

The semantic segmentation model identifies the cultural relic image by using a Deeplabv3+ model to obtain a label matrix m with clear boundaries_lIs a label matrix m_lMaking multilevel weights to obtain a weight matrix m_wThe weight matrix m is formed_wAnd a coarse saliency matrix m_sMultiplying to obtain a significant matrix m_o。

And the OSTU algorithm is used for obtaining a saliency map by binarization segmentation according to a threshold value, and selecting a square image containing the most salient regions, wherein the side length of the square image is the length of the minimum side of the cultural relic image.

The invention has the beneficial effects that: the method for extracting the cultural relic salient region based on the visual saliency aims at the problems that the cultural relic image has a complex background and a small foreground-background difference, and the like, and can automatically extract the salient square region after the cultural relic image is uploaded by a user for displaying images of a home page or an index page, solves the problems that the cultural relic image display in the current domestic digital museum is inaccurate and unclear, and is beneficial to national culture protection and inheritance. The extraction method can greatly reduce the time complexity in the operation process by combining the super-pixel significance detection with the semantic segmentation model, can keep the shape characteristic in the area with the insufficient boundary information, and can accurately express the boundary information in the area with the excessive boundary.

Drawings

Fig. 1 is an original image of a historical relic image in the embodiment.

FIG. 2 is a super pixel diagram of the cultural relic image in the embodiment.

Fig. 3 is a rough saliency matrix map of the image of the cultural relic in the example.

FIG. 4 is a comparison graph of the weight matrix of the cultural relic image in the embodiment.

Fig. 5 is a saliency matrix map of the image of the cultural relic in the embodiment.

Fig. 6 is a significant square image of the historical relic in the embodiment.

Detailed Description

The invention is described in detail below with reference to the drawings and embodiments:

after a user uploads a cultural relic image (as shown in fig. 1), the image is preprocessed by using a SLIC algorithm to obtain a super-pixel image (as shown in fig. 2).

Then, a saliency image in the super-pixel image is extracted by using an LC saliency algorithm, and the saliency values of all super-pixels in FIG. 2 are calculated by the following formula to obtain a rough saliency matrix m_s(as shown in FIG. 3):

Then, identifying the cultural relic image by using a Deeplabv3+ model in the semantic segmentation model to obtain a label matrix m with clear boundaries_lIn FIG. 4, a is the original image of the cultural relic image, and b is the label matrix m_lThe corresponding diagram of (a). According to visual saliency, weights are distributed to different labels, a horse can be distributed with a label weight of 1.5 and a background label weight of 1, and a weight matrix m is obtained finally_wThe obtained weight matrix m_wAnd a coarse saliency matrix m_sMultiplying to obtain an optimized significant matrix m_o，

(as shown in fig. 5).

Finally, obtaining the significant matrix m by using the OSTU algorithm_oThe optimal threshold value is obtained by segmenting according to the threshold value binary value, a salient image is obtained, a square area containing the most salient points is intercepted, and the area is stored as a salient square image (as shown in figure 6), wherein the side length of the square is the minimum side length of the image. And storing the address of the obtained significant square image and the cultural relic image information into an image database.

The superpixel segmentation method refers to a simple linear iterative clustering SLIC algorithm, and some pixels with similar characteristics are aggregated to be used as a basic unit in image processing. Compared with a common pixel-by-pixel processing mode of a natural image, the original image subjected to superpixel segmentation can be directly analyzed and processed for an image block (superpixel), so that the computational complexity and the time complexity in the operation process can be greatly reduced. Meanwhile, in the process of superpixel segmentation, the shape characteristics can be kept in the region with less abundant boundary information, and the boundary information can be more accurately expressed in the region with abundant boundaries. The LC algorithm provides a bottom-up spatio-temporal attention modeling method. On the basis of psychological research, the human perception system is very sensitive to the contrast of color, intensity, texture, etc. of the visual signal. Based on the basic assumption, the LC algorithm provides an effective method for calculating a spatial significance map by using image color statistical information. The computational complexity of the algorithm is linear with the number of image pixels. The saliency map of an image is established on the color contrast between pixels of the image, i.e. the sum of the euclidean distances in color between the pixel and all other pixels in the image is taken as the saliency value of the pixel. Because the calculation complexity of the algorithm is in linear relation with the number of image pixels, the super-pixel is used for replacing all pixels in the super-pixel, so that the time complexity of significance calculation is reduced.

The cultural relic image has the characteristics of complex background or fuzzy foreground and background boundaries and the like, and the image boundary obtained by using superpixel segmentation has large flaws, so that the boundary information of the image is completed by using a Deeplab v3+ model, multi-level weights can be formulated for labels of Deeplab v3+ according to visual significance, and a weight matrix m can be obtained according to a label matrix_w. The DeepLabv3+ model in the semantic segmentation model can be used for acquiring the boundary and the label of each element in the cultural relic image more clearly. Assigning weights to labels means that before using the deep lab v3+ model, we can know all labels of the image, then classify the labels into 3-4 classes according to the visual significance and the attention point of the user when viewing the image of the cultural relic, and give different weights to different classes, for example, the weight of the label of human being is 2, the weight of the label of animal flower is 1.5, and the weight of the label of background is 1.

And selecting an optimal threshold of the optimized significant image through an ostu algorithm, wherein the ostu algorithm respectively takes 0-255 as the threshold, and calculates the inter-class variance between the threshold and the threshold, wherein the larger the inter-class variance difference is, the larger the gray difference between the two parts is, the 256 thresholds of 0-255 are tried out once, and the value with the largest inter-class variance is found to be the optimal threshold. The inter-class variance calculation formula is as follows:

wherein g is the between-class variance, w₀For foreground to image ratio, w₁For background to image ratio, u₀Is average gray scale, u₁Is a flat gray scale.

Then, we can obtain a binary image, set the long side of the image as L, the short side as S, make a square with the short side S as the side length, divide the difference between the long side L and the short side S by 10 as an offset variable, divide the image into 10 parts, and find the square image with the most significant pixels, that is, the final display image (as shown in fig. 6) that we need.

Claims

1. A cultural relic salient region extraction method based on visual saliency is characterized by comprising the following steps:

step 1, preprocessing the uploaded cultural relic image by using a superpixel segmentation method to obtain a superpixel image;

step 2, extracting the significant image in the super-pixel image by using an LC significant algorithm to obtain a rough significant matrix m of the significant image_s；

Step 3, processing the uploaded cultural relic image by using a semantic segmentation model to obtain a label matrix m of the cultural relic image₁；

Step 4, setting the label matrix m according to the visual saliency₁Distributing the weight to obtain a weight matrix m_wThe weight matrix m_wAnd a coarse saliency matrix m_sFusion to obtain doubly optimized significant matrix m_o；

Step 5, obtaining the significant matrix m by using an OSTU algorithm_oThe optimal threshold value of (2) is obtained, a square area containing most significant points is extracted and stored as a significant square image;

2. The method for extracting the cultural relic salient region based on the visual saliency as claimed in claim 1, characterized in that: the super-pixel segmentation method is characterized in that pixels with similar characteristics are aggregated into super-pixels through a simple linear iterative clustering SLIC algorithm, and the super-pixels are used as basic units during image processing.

3. The method for extracting the cultural relic salient region based on the visual saliency as claimed in claim 1, characterized in that: the LC algorithm takes the aggregated super-pixels as basic units during image processing, calculates the significant values of all super-pixels in the super-pixel image and obtains a rough significant matrix m_sThe saliency value of the superpixel k is calculated by:

4. The method for extracting the cultural relic salient region based on the visual saliency as claimed in claim 1, characterized in that: the semantic segmentation model identifies the cultural relic image by using a Deeplabv3+ model to obtain a label matrix m with clear boundaries_lIs a label matrix m_lMaking multilevel weights to obtain a weight matrix m_wThe weight matrix m is formed_wAnd a coarse saliency matrix m_sMultiplying to obtain a significant matrix m_o。

5. The method for extracting the cultural relic salient region based on the visual saliency as claimed in claim 1, characterized in that: and the OSTU algorithm is used for obtaining a saliency map by binarization segmentation according to a threshold value, and selecting a square image containing the most salient regions, wherein the side length of the square image is the length of the minimum side of the cultural relic image.