CN102360431A

CN102360431A - Method for automatically describing image

Info

Publication number: CN102360431A
Application number: CN2011103026211A
Authority: CN
Inventors: 汲业; 陈燕; 李桃迎; 牟向伟; 屈莉莉
Original assignee: Dalian Maritime University
Current assignee: Dalian Maritime University
Priority date: 2011-10-08
Filing date: 2011-10-08
Publication date: 2012-02-22

Abstract

The invention discloses a method for automatically describing image, which comprises the following steps of: dividing an image in three grades; extracting an image texture characteristic; extracting an image color characteristic; and describing through a plurality of keywords. According to the invention, merging the texture characteristic with the color characteristic of each sub-image after extracting the texture characteristic and the color characteristic of each sub-image, representing each sub-image by one merged characteristic vector, inputting each characteristic vector in a pre-trained supporting vector machine to convert an image base into a text base for describing the image, and creating an index for the text base in a text search manner; indexing the text index to find the image descriptions matched with user inquiry so as to send back an image corresponding to the descriptions if the user submits an inquiry request; therefore, the invention has the advantages of converting image search into text search and avoiding the trouble of calculating image high-dimensional characteristic vectors one by one while indexing based on a content image, so that the search efficiency and accuracy are improved.

Description

A kind of method of describing image automatically

Technical field

The present invention relates to a kind of describing method of image, particularly a kind of method of describing image automatically.

Background technology

Nowadays, deepen continuously along with informationalized, people will obtain increasing mass digital view data, and how helping people to find useful image information fast will be an important task of image analysis processing.The traditional image analysis often focuses on the analysis to the low-level image feature of image resource; Corresponding retrieval technique also lays particular emphasis on and relies on the low-level image feature coupling; Submit the actual example image of a width of cloth to like the user; System is inquired about in database according to the color of this image, texture, shape or the like information, but along with the exponential increase of view data, the rapid expansion of image kind; Graphical analysis retrieval technique based on the low-level image feature coupling has seemed unable to do what one wishes, in the requirement that can't reach the user aspect accuracy rate of retrieving and the efficient.The most significant problem is " semantic wide gap " problem, and present image analysis method can only extract the characteristic of some its bottom visual properties of expression from image, like distribution of color, spatial texture, region shape or the like.And people often use the notion of representing semanteme when describing picture material; Rather than visual signatures such as color, texture; Therefore; Existing method is difficult between the expression way of these two kinds of images, set up clearer and more definite, stable corresponding relation, exists bigger gap between high-level semantic that image contained and the low-level image feature, and this huge gap has influenced the effect of CBIR; Secondly; These low-level image features often are expressed as has the very proper vector of dimensions, thereby CBIR just changes into the search for the high dimension vector space; When the quantity of image increased fast, how searching for higher dimensional space quickly and accurately was a very problem of difficulty.Therefore, it is imperative to set up the semantic expressiveness and the search mechanism of image.

Summary of the invention

For addressing the above problem; The present invention proposes a kind of method of describing image automatically; Can obtain the semantic information of image automatically; Image is changed into the semantic description of succinct multiple key form, thereby use the mode of text retrieval to set up index, make picture search be converted into text search to improve the efficient and the accuracy of search as the semantic description of image.

To achieve these goals, technical scheme of the present invention is following: a kind of method of describing image automatically may further comprise the steps:

A, image is carried out three grades cut apart

Image is divided into three grades in such a way:

The one-level image: promptly former figure, need not cut apart;

Secondary subimage: image is divided into four sub-block of 2*2, in addition the image center section is split, totally five subimages;

Three grades of subimages: 16 sub-block that image are divided into 4*4;

Thus, piece image is divided into and is slit into 22 number of sub images, carries out B, C step respectively to each width of cloth subimage;

B, extraction image texture characteristic

For piece image, utilize formula (1) to calculate each gray values of pixel points:

I = [\begin{matrix} 0.229 & 0.587 & 0.114 \end{matrix}] [\begin{matrix} R \\ G \\ B \end{matrix}] - - - (1)

For the grey scale change of each pixel in certain neighborhood, consider the 3*3 neighborhood of this pixel, it comprises 9 pixels, wherein I _i(i=0,1 ..., 8) and presentation video is at the gray scale at this pixel place, I ₀The position is a central point, uses matrix representation to do

[\begin{matrix} I_{1} & I_{2} & I_{3} \\ I_{4} & I_{0} & I_{5} \\ I_{6} & I_{7} & I_{8} \end{matrix}]

So pixel I ₀The grey scale change value be:

Find out easily that from formula (2) T regards eight-digit binary number as, its value be T ∈ 0,1 ..., 255};

The T value of all pixels of computed image, T (i, j) remarked pixel point I ₀(i, the value of j) locating, h _k(k=0,1 ..., 255) represent that the T value is the ratio of pixel quantity with the total pixel of k, then:

h_{k} = \frac{Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} f (i, j, k)}{m * n} - - - (3)

Wherein n and m are respectively the height and the width of image, f (i, j k) are expressed as:

f (i, j, k) = \{\begin{matrix} 1 & if  T (i, j) = k \\ 0 & otherwise \end{matrix} - - - (4)

Get image texture features vector space model { h thus ₀, h ₁..., h ₂₅₅;

C, extraction color of image characteristic

For each pixel of image, transform to R ' G ' B ' by RGB:

\begin{matrix} R^{'} = \frac{\max (R, G, B) - R}{\max (R, G, B) - \min (R, G, B)} \\ G^{'} = \frac{\max (R, G, B) - G}{\max (R, G, B) - \min (R, G, B)} \\ B^{'} = \frac{\max (R, G, B) - B}{\max (R, G, B) - \min (R, G, B)} \end{matrix}\} - - - (5)

Transform to HSV by R ' G ' B ' again:

Wherein, H ∈ [0,360], S ∈ [0,1], V ∈ [0,1];

In the formula, R, G and B represent redness, the green and blue in the RGB color space respectively; The tone H in hsv color space is distinguished that by color designation as red, orange, green, it is with 0 °～360 ° tolerance of angle; Brightness V is the bright-dark degree of color, usually with number percent tolerance, from black 0% to white 100%; Colourity or saturation degree S refer to the depth of color, for example are red equally, also can be divided into dark redly and pale red because of concentration is different, and it is also recently measured with percentage, from 0% to complete saturated 100%;

H, S, three components of V are carried out the quantification of unequal interval according to color-aware; From macromethod to color model; Be divided into 8 parts to tone H space; Saturation degree S and brightness V space are divided into 3 parts respectively, and quantize according to the different range of color, and the tone after the quantification, saturation degree and brightness value are respectively H ', S ' and V ';

H^{'} = \{\begin{matrix} 0 & if & H &Element; (316,360] U [0,20] \\ 1 & if & H &Element; (20,40] \\ 2 & if & H &Element; (40,75] \\ 3 & if & H &Element; (75,155] \\ 4 & if & H &Element; (155,190] \\ 5 & if & H &Element; (190,270] \\ 6 & if & H &Element; (270,295] \\ 7 & if & H &Element; (296,315] \end{matrix} S^{'} = \{\begin{matrix} 0 & if & S &Element; (0,0.2] \\ 1 & if & S &Element; (0.2,0.7] \\ 2 & if & S &Element; (0.7,1] \end{matrix} - - - (7)

V^{'} = \{\begin{matrix} 0 & if & V &Element; (0,0.2] \\ 1 & if & V &Element; (0.2,0.7] \\ 2 & if & V &Element; (0.7,1] \end{matrix}

According to above quantized level, synthesize the one-dimensional characteristic vector to three color components:

l＝H′Q _SQ _V+S′Q _V+V′ (8)

Wherein, Q _sAnd Q _vBe respectively the quantification progression of component S and V, can know that by formula (7) S and V are quantified as 0,1 or 2 three grade, so the quantification progression of S and V is Q _{S '}=3, Q _v=3; Therefore formula (8) formula is expressed as:

L＝9H′+3S′+V′ (9)

Three components of H ' S ' V ' are converted into a n dimensional vector n; Can get according to formula (7) and (9)

L∈{0，1，...，71} (10)

The L value of all pixels of computed image, L (i, j) remarked pixel point (i, the value of j) locating, l _k(k=0,1 ..., 255) represent that the L value is the ratio of pixel quantity with the total pixel of k, then:

l_{k} = \frac{Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} g (i, j, k)}{m * n} - - - (11)

Wherein n and m are respectively the height and the width of image, g (i, j k) are expressed as:

g (i, j, k) = \{\begin{matrix} 1 & ifL (i, j) = k \\ 0 & otherwise \end{matrix} - - - (12)

Get image texture features vector space model { l thus ₀, l ₁..., l ₇₁;

D, multiple key are described

After accomplishing the feature extraction of texture and color of each subimage; The texture of each subimage and the feature of color are merged; Each subimage is all used the single characteristic vector representative after the merging; Each characteristic vector is inputed to the good SVMs of training in advance; This SVMs uses radially basic kernel function, is formed by 2582 picture training that adhere to eight types in animal, plant, interior decoration, building, car, people, sky and space separately;

If (k) as the result of determination of classification, promptly the k cut apart of j level opens subimage image to R for i, j _k, belong to i class Category _i,

R (i, j, k) = \{\begin{matrix} 1 & if {image}_{k} &Element; {Category}_{i} \\ 0 & otherwise \end{matrix} - - - (13)

Thus, calculate the quantized value r that entire image belongs to the i class _i, consider the whole and local different contributions that picture material is understood, adopt corresponding weighted strategy,

r_{i} = w_{1} * R (i, 1,1) + w_{2} * Σ_{k = 1}^{5} R (i, 2, k) + w_{3} * Σ_{k = 1}^{16} R (i, 3, k) - - - (14)

Wherein, weight coefficient w ₁Be 1, w ₂Be 0.2, w ₃Be 0.0625, obvious r _iBetween [1,3], work as r _iCan give image as key word with this classification greater than 0.3.

Compared with prior art, the present invention has following beneficial effect:

1. the present invention uses the method for three grades of image segmentation, has amplified the image local element, helps the identification of image detail information.

2. two kinds of characteristic quantification image informations of comprehensive texture of the present invention and color make quantification back proper vector more can accurately give expression to the information that image comprises, and help the accuracy of image classification.

3. the present invention gathers and screens the classification results of multistage subimage through weighting algorithm, thereby obtains the textual description to a plurality of key words of image.

4. the present invention handles all images in the image library one by one, can image library be converted into the text library to iamge description, and with the mode of text search, for setting up index in text storehouse; When the submit queries request, the retrieval text index finds the iamge description that is consistent with user inquiring, describes corresponding image thereby return with these; This shows that the present invention makes picture search be converted into text search, when having avoided Content-Based Image Retrieval, therefore the high-dimensional proper vector of computed image has improved the efficient and the accuracy of search one by one.

Description of drawings

2 in the total accompanying drawing of the present invention, wherein:

Fig. 1 is the dividing method synoptic diagram of three grades of images.

Fig. 2 is a process flow diagram of the present invention.

Embodiment

Below in conjunction with accompanying drawing the present invention is described further.As shown in Figure 1; Image is divided into 22 width of cloth subgraph image sets by three grades; Extract the characteristic of each width of cloth subimage texture and color again by flow process shown in Figure 2; And merge into a proper vector, and listen proper vector through the supporting vector machine model classification each subimage merging back, belong to the quantized value of each classification respectively according to the classification results calculating entire image of subgraph; When such other key word greater than 0.3 time becomes the key word of describing this figure, obtain textual description thus to a plurality of key words of entire image.

Claims

1. method of describing image automatically is characterized in that: may further comprise the steps:

A, image is carried out three grades cut apart

Image is divided into three grades in such a way:

The one-level image: promptly former figure, need not cut apart;

Three grades of subimages: 16 sub-block that image are divided into 4*4;

B, extraction image texture characteristic

I = [\begin{matrix} 0.229 & 0.587 & 0.114 \end{matrix}] [\begin{matrix} R \\ G \\ B \end{matrix}] - - - (1)

[\begin{matrix} I_{1} & I_{2} & I_{3} \\ I_{4} & I_{0} & I_{5} \\ I_{6} & I_{7} & I_{8} \end{matrix}]

So pixel I ₀The grey scale change value be:

h_{k} = \frac{Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} f (i, j, k)}{m * n} - - - (3)

f (i, j, k) = \{\begin{matrix} 1 & if  T (i, j) = k \\ 0 & otherwise \end{matrix} - - - (4)

C, extraction color of image characteristic

For each pixel of image, transform to R ' G ' B ' by RGB:

\begin{matrix} R^{'} = \frac{\max (R, G, B) - R}{\max (R, G, B) - \min (R, G, B)} \\ G^{'} = \frac{\max (R, G, B) - G}{\max (R, G, B) - \min (R, G, B)} \\ B^{'} = \frac{\max (R, G, B) - B}{\max (R, G, B) - \min (R, G, B)} \end{matrix}\} - - - (5)

Transform to HSV by R ' G ' B ' again:

Wherein, H ∈ [0,360], S ∈ [0,1], V ∈ [0,1];

H^{'} = \{\begin{matrix} 0 & if & H &Element; (316,360] U [0,20] \\ 1 & if & H &Element; (20,40] \\ 2 & if & H &Element; (40,75] \\ 3 & if & H &Element; (75,155] \\ 4 & if & H &Element; (155,190] \\ 5 & if & H &Element; (190,270] \\ 6 & if & H &Element; (270,295] \\ 7 & if & H &Element; (296,315] \end{matrix} S^{'} = \{\begin{matrix} 0 & if & S &Element; (0,0.2] \\ 1 & if & S &Element; (0.2,0.7] \\ 2 & if & S &Element; (0.7,1] \end{matrix} - - - (7)

V^{'} = \{\begin{matrix} 0 & if & V &Element; (0,0.2] \\ 1 & if & V &Element; (0.2,0.7] \\ 2 & if & V &Element; (0.7,1] \end{matrix}

l＝H′Q _SQ _V+S′Q _V+V′ (8)

L＝9H′+3S′+V′ (9)

L∈{0，1，...，71} (10)

l_{k} = \frac{Σ_{i = 0}^{m - 1} Σ_{j = 0}^{n - 1} g (i, j, k)}{m * n} - - - (11)

g (i, j, k) = \{\begin{matrix} 1 & ifL (i, j) = k \\ 0 & otherwise \end{matrix} - - - (12)

Get image texture features vector space model { l thus ₀, l ₁..., l ₇₁;

D, multiple key are described

After accomplishing the feature of texture and color of each subimage; The texture of each subimage and the feature of color are merged; Each subimage is all used the single characteristic vector representative after the merging; Each characteristic vector is inputed to the good SVMs of training in advance; This SVMs uses radially basic kernel function, is formed by 2582 picture training that adhere to eight types in animal, plant, interior decoration, building, car, people, sky and space separately;

R (i, j, k) = \{\begin{matrix} 1 & if {image}_{k} &Element; {Category}_{i} \\ 0 & otherwise \end{matrix} - - - (13)

r_{i} = w_{1} * R (i, 1,1) + w_{2} * Σ_{k = 1}^{5} R (i, 2, k) + w_{3} * Σ_{k = 1}^{16} R (i, 3, k) - - - (14)