CN110334778B

CN110334778B - Image comprehensive similarity analysis method based on description content and image content characteristics

Info

Publication number: CN110334778B
Application number: CN201910639482.8A
Authority: CN
Inventors: 周哲
Original assignee: Tongfang Knowledge Network Digital Publishing Technology Co ltd
Current assignee: Tongfang Knowledge Network Digital Publishing Technology Co ltd
Priority date: 2019-07-16
Filing date: 2019-07-16
Publication date: 2021-08-06
Anticipated expiration: 2039-07-16
Also published as: CN110334778A

Abstract

The invention discloses an image comprehensive similarity analysis method based on description content and image content characteristics, which comprises the following steps: extracting image color information, namely RGB (red, green and blue) model and image texture information in the image content characteristic information; extracting image title content and description content in the image description content characteristic information, and performing word segmentation processing; converting the RGB model into an HSV model, and obtaining HSV values of the image to obtain 24-dimensional characteristic information of the image color; calculating the gray value of the image in the image texture information; calculating cosine similarity to obtain image content similarity; obtaining the characteristics of the image title content and the image description content; calculating the cosine similarity of the title content and the description content of the two images to obtain the similarity of the description content of the images; calculating the similarity of image features; combining the image content features and the image description content features to generate composite features of the image; calculating the cosine similarity of the image composite features to obtain the similarity of the image composite features; and when the image feature similarity and the image composite feature similarity are both larger than or equal to the threshold value, judging that the images are similar.

Description

Image comprehensive similarity analysis method based on description content and image content characteristics

Technical Field

The invention relates to a method for generating image features in image similarity analysis and a method for generating text features in text similarity analysis.

Background

In the image recognition technology, the common method is based on OCR recognition technology to match text according to the context of an image, and has limitations because the description of the text has no way to accurately express the content characteristics of the image, and the requirement of similarity detection of the image cannot be met.

Because the correctness requirement of the composite published full-text similarity analysis system on the comparison result of the image similarity is extremely high, and the existing comparison method cannot meet the requirement of the comparison on the correctness of the result, the image similarity analysis method which meets the academic unfamiliar quality requirement needs to be developed so as to achieve the purposes of improving the detection efficiency and quality and reducing the cost.

Disclosure of Invention

In order to solve the above technical problems, an object of the present invention is to provide a method for analyzing comprehensive similarity of images based on the characteristics of description content and image content, which can greatly improve the efficiency of comparing image similarity and reduce the cost.

The purpose of the invention is realized by the following technical scheme:

an image comprehensive similarity analysis method based on description content and image content features comprises the following steps:

a, extracting image content characteristic information and image description content characteristic information;

b, extracting image color information, namely RGB (red, green and blue) model and image texture information in the image content characteristic information; extracting image title content and description content in the image description content characteristic information, and performing word segmentation processing;

c, converting the RGB model into an HSV model to obtain HSV values of the image, and enabling the HSV values of the image to pass through a fuzzy filter to obtain 24-dimensional characteristic information of the image color;

d, calculating the gray value of the image in the image texture information through the YIQ model, and calculating the edge texture information of the image through a digital filter, wherein the edge texture information is a 6-dimensional histogram;

e, combining 24-dimensional feature information of image colors with a 6-dimensional histogram of image edge textures to generate 144-dimensional image content features;

f, comparing the image content characteristics of the two images to be compared, and calculating cosine similarity to obtain image content similarity;

g, calculating a TF-IDF value for each word in the image title content and the description content to obtain the characteristics of the image title content and the image description content;

h, calculating the cosine similarity of the title content and the description content of the two images to obtain the similarity of the description content of the images;

i, multiplying the similarity of the image content and the similarity of the image description content by a weight value respectively to obtain the image feature similarity;

j, combining the image content characteristics with the image description content characteristics to generate composite characteristics of the image;

k, calculating the cosine similarity of the composite features of the two images to be compared to obtain the similarity of the composite features of the images;

and L, when the image feature similarity and the image composite feature similarity are both larger than or equal to the threshold value, judging that the images are similar.

One or more embodiments of the present invention may have the following advantages over the prior art:

the method has the characteristics of high feature extraction speed and small occupied space of feature descriptors, and the accuracy and efficiency are greatly improved on the basis of the original method.

Drawings

FIG. 1 is a flow chart of a method for image synthesis similarity analysis based on descriptive content and image content characteristics;

FIG. 2 is a block diagram of a method for analyzing the comprehensive similarity of images based on the description content and the characteristics of the image content;

FIG. 3 is a schematic diagram of a 10-bins blur filter;

FIG. 4 is a schematic diagram of a 24-bins blur filter;

fig. 5 is a schematic diagram of an edge histogram digital filter.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings.

As shown in fig. 1 and fig. 2, a flow chart of an image comprehensive similarity analysis method and a structural block diagram of the analysis method based on description content and image content features are provided. The method comprises the following steps:

step 101, extracting image content characteristic information and image description content characteristic information;

102, extracting image color information, namely RGB (red, green and blue) model and image texture information in the image content characteristic information; extracting image title content and description content in the image description content characteristic information, and performing word segmentation processing;

step 103, converting the RGB model into an HSV model to obtain HSV values of the image, and obtaining 24-dimensional characteristic information of the image color by the HSV values of the image through a fuzzy filter;

104, calculating the gray value of the image in the image texture information through a YIQ model, and calculating the edge texture information of the image through a digital filter, wherein the edge texture information is a 6-dimensional histogram;

step 105, combining 24-dimensional feature information of image colors with a 6-dimensional histogram of image edge textures to generate 144-dimensional image content features;

step 106, comparing the image content characteristics of the two images to be compared, and calculating cosine similarity to obtain image content similarity;

step 107, calculating TF-IDF values of each word in the image title content and the description content to obtain the characteristics of the image title content and the image description content;

step 108, calculating cosine similarity of the title content and the description content of the two images to obtain similarity of the image description content;

step 109, multiplying the similarity of the image content and the similarity of the image description content by a weight value respectively to obtain the image feature similarity;

step 110, combining the image content features and the image description content features to generate composite features of the image;

step 111, calculating cosine similarity of two image composite features to be compared to obtain similarity of the image composite features;

and step 112, judging that the images are similar when the image feature similarity and the image composite feature similarity are both larger than or equal to the threshold value.

1. The extraction of the image content features comprises two aspects of color and texture:

(1) when image color information is extracted, the perception and the discrimination capability of a person on colors can be well expressed by using an HSV model, wherein in the HSV model, H represents hue, S represents saturation and V represents brightness; therefore, when extracting the image color information, RGB-HSV model conversion is required to be performed on the image pixels, and the conversion formula is as follows:

V＝max(R，G，B)

if R is max (R, G, B) and G is equal to or greater than B,

if R is max (R, G, B), and G < B,

if G ═ max (R, G, B),

if B is max (R, G, B),

and (3) the HSV value passes through a 10-bins filter (figure 3) to obtain a 10-dimensional vector containing color information, each color region output by the 10 bins is divided into 3H value regions through a 24-bins filter (figure 4), and 24-dimensional color information is obtained through calculation.

(2) When extracting image texture information, the pixel gray value is calculated through the YIQ model, and then the edge histogram texture information of the image is extracted. The YIQ color space belongs to the NTSC system and describes the attributes of gray scale value, color, and saturation of an image, respectively. By converting the color image from RGB to YIQ space, the luminance information and chrominance information in the color image can be separated, and the RGB-YIQ correspondence is as follows:

Y＝0.29R+0.587G+0.114B

I＝0.596R+0.275G+0.321B

Q＝0.212R+0.523G+0.311B

when extracting texture information from an image, the image is divided into a plurality of cells, each of which is then subdivided into 4 sub-cells of equal size, and 5 digital filters (fig. 5 (a) (b) (c) (d) (e)) are used to extract texture edge information, and the region to which the texture edge information acts is divided into five types, i.e., vertical direction, horizontal direction, 45-degree direction, 135-degree direction, and non-direction, and the average gray values of the four sub-cells in the (< i, j) th cell are represented by go < i, j), gi < i, j), g2< i, j), and g3< i, j), respectively. a } (k), a, k, aa-as (k), ae-i3s (k) and (k) respectively represent parameters when the average gray values of four sub-cells pass through the filter, the value of each sub-cell is the parameter of the filter, wherein the value range of k is an integer from 0 to 3, and represents four sub-cells in the cell. n } (I, j), nh (I, j), } a-as (I, j, nd-I3s (1, j) and (I, j) are values of each determined direction in the (I, j) th cell.

Finding the maximum value

m_max＝max(n_v，n_h，n_d-45，n_d-135，n_nd)

Normalizing all the n values

By calculation, the information of the image edge in each cell can be obtained, and the texture information is extracted as a histogram.

And finally, adding 24-dimensional color information extracted by a color module into each dimension in the vector histogram, and combining the color and the texture to finally obtain the content characteristic information of the image.

Comparing the similarity of the image contents:

after the image content features are obtained through calculation in the step 1, the image content feature values are analyzed through a cosine similarity formula, and the cosine similarity of the image content is obtained. The calculation formula is as follows:

left side of equal sign: cos (di, dj) represents the cosine of the included angle of the two image content characteristic vectors di, dj; on the right of equal sign, in denominator

The square root is then summed up after each dimension representing the vector di,

the square root is then summed up after each dimension representing vector dj; the numerator represents the sum of the products of each dimension of the vectors di, dj.

Extracting description contents and image titles of the images, segmenting the titles of the images and the description contents of the images to calculate weight values of all words, wherein the weight values adopt TF-IDF values of the words, and the main idea of the TF-IDF theory is as follows: the importance of a word increases in proportion to the number of times it appears in a document, but at the same time decreases in inverse proportion to the frequency with which it appears in the corpus, i.e., more specific words that appear in only a few documents are weighted more heavily than words that appear in many documents. The theory is applied in agreement of comparison between image description content and image title content, and the calculation process is as follows:

vocabulary frequency:

wherein N represents the total vocabulary number of the text, N_WIndicating the number of times the word W appears in the image description content and the image title content, the greater the value of TF, the stronger the correlation of the word W with the text.

Inverse document frequency:

wherein D is_wRepresenting the number of documents containing a word W, D representing the total number of documents in the corpus, the greater the IDF value, the lower the relevance of the word to the document. Computing i words in a document d_jThe formula of the TF-IDF value is as follows

After calculating the TF-IDF value of the feature item, each word in the image description content and the image title content can be represented by a vector. For example, the image description content and the image title content may be expressed as:

(1.163151,0.7855668,0.7440107,0.2310491,0.2310491,0.2288602,0.2079442,0.1938585,0.1848393,0.1653357, … …). The similarity between two image description contents and the title contents can be measured by a cosine included angle of two vectors, and the larger the value is, the higher the similarity is, therefore, the similarity between the image description contents and the image title contents can also be calculated by using a cosine similarity calculation formula (1 (2)) to finally obtain the cosine similarity of the image description contents.

And (3) carrying out comprehensive similarity comparison by combining the image content similarity and the image description content similarity:

the method comprises the following steps:

and finally, carrying out comprehensive comparison analysis on the image content feature similarity and the image description feature similarity in order to improve the accuracy of the final comparison result. And setting the similarity of the image content features and the similarity of image description as a and b. Then the final formula for calculating the result of the final synthetic similarity comparison S is as follows:

S＝k₁a+k₂b

wherein k is₁And k₂Weight value, k, for image content characteristics and image description characteristics₁And k is₂The sum should be 1. And finally, the larger the comprehensive similarity comparison result S is, the higher the image similarity is.

The second method comprises the following steps:

combining the image content feature values (144 dimensions) of the first portion with the image description feature values (256 dimensions) of the second portion generates a new composite vector (144+256 dimensions). When the image similarity is compared, the cosine similarity between the compound vectors of the two images is calculated by using a cosine similarity calculation formula (2 in the above 1).

And integrating the similarity analysis results of the first method and the second method, setting two threshold values, and judging the similarity of the two images when the similarity analysis results of the first method and the second method are both larger than the corresponding threshold values.

Although the embodiments of the present invention have been described above, the above descriptions are only for the convenience of understanding the present invention, and are not intended to limit the present invention. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. The image comprehensive similarity analysis method based on the description content and the image content features is characterized by comprising the following steps of:

l, when the image feature similarity and the image composite feature similarity are both larger than or equal to the threshold value, judging that the images are similar;

the similarity comparison of the image contents comprises the following steps:

after the image content features are obtained through calculation, the image content feature values are analyzed through a cosine similarity formula to obtain the cosine similarity of the image content, and the calculation formula is as follows:

the square root is then summed up after each dimension representing vector dj; the numerator represents the sum of the products of each dimension of the vectors di, dj;

extracting description contents and image titles of the images, carrying out word segmentation on the titles and the image description contents of the images to calculate weight values of all words, wherein the weight values adopt TF-IDF values of the words, and after calculating TF-IDF values of the characteristic items, each word in the image description contents and the image title contents can be represented by a vector;

the similarity between the two image description contents and the title contents is measured by using a cosine included angle of the two vectors or calculated by using a cosine similarity calculation formula, and the cosine similarity of the image description contents is obtained, wherein the larger the value is, the higher the similarity is.

2. The method of claim 1, wherein the HSV values of the image are used to obtain 24-dimensional feature information of image colors through two blurring devices, namely a 10-bins blurring filter and a 24-bins blurring filter.

3. The method as claimed in claim 1, wherein the image texture information is extracted by dividing the image into a plurality of cells, then dividing each cell into four sub-cells with equal size, extracting the image texture edge information through a plurality of digital filters, dividing the region of action into five types of vertical direction, horizontal direction, 45 degree direction, 135 degree direction and non-direction, and determining the value of each direction through the average gray value of the cell and the filter parameters, thereby obtaining the information of the image edge in each cell.