CN112528905B

CN112528905B - Image processing method, device and computer storage medium

Info

Publication number: CN112528905B
Application number: CN202011505344.XA
Authority: CN
Inventors: 王孟莹; 于威威
Original assignee: Shanghai Maritime University
Current assignee: Shanghai Maritime University
Priority date: 2020-12-18
Filing date: 2020-12-18
Publication date: 2024-04-05
Anticipated expiration: 2040-12-18
Also published as: CN112528905A

Abstract

The invention discloses an image processing method, an image processing device and a computer storage medium, wherein the method comprises the following steps: preprocessing an image to be queried; performing visual feature extraction on the preprocessed image to be queried to obtain a visual feature extraction result; obtaining a visual feature vector based on the visual feature extraction result; extracting text information from the preprocessed image to be queried to obtain a text feature vector; fusing the visual feature vector and the text feature vector to obtain a fused query image vector; and carrying out similarity calculation on the fused query image vector and the image vectors in the database, and obtaining a picture similar to the image to be queried. By applying the embodiment of the invention, the LBP of a single pixel channel is calculated, and the LBP of the single pixel channel is integrated into the maximum multi-channel LBP to acquire the inter-channel texture information of the color image, so that the number of channels of the color image is not depended, the problem of dimension disaster is solved, and the accuracy of retrieval is improved by using the fusion of the text information and visual characteristics of the image.

Description

Image processing method, device and computer storage medium

Technical Field

The present invention relates to the field of image retrieval technologies, and in particular, to an image processing method, an image processing device, and a computer storage medium.

Background

The content information of the image mainly comprises text semantic information and visual content information, and the existing image technology is divided into two types according to the difference of retrieval objects: one is Text-based image retrieval (Text-Based Image Retrieval, TBIR), and the other is Content-Based Image Retrieval, CBIR, based on the Content of the image itself. The TBIR utilizes the file name of the image and the surrounding text to construct an index, so that the retrieval of the image is converted into the retrieval of text information, and the required image can be returned by matching the keyword input by the user with the image library index. However, the acquisition of the text information of the image is mostly dependent on manual annotation, and with the rapid increase of the number of the images, TBIR faces the problems of time and labor consumption, subjective difference, uncertainty and the like of manual annotation. To overcome the problems faced by TBIR, CBIR should occur. The CBIR utilizes the visual features (color, texture and shape features) of the image to establish feature vectors, avoids artificial subjectivity, greatly improves accuracy, and improves image retrieval precision by utilizing a similarity measurement algorithm. However, the visual features of the image cannot fully represent the information of the image, and cannot embody the understanding and perception of people on the image. When one image is observed, not only the visual information such as the color, texture and the like of the image is seen, but also the image is understood by using the visual learning ability of the user, and the semantics and emotion of the image are perceived. Which is not expressible by the low-level visual features. Thus resulting in an unavoidable bottleneck problem, namely a "semantic gap" between the low-level visual features and the high-level semantics.

For merchant sign image retrieval, a user uses a mobile phone to shoot a sign picture through specific software so as to retrieve a corresponding merchant and obtain online information and service of the merchant. Most of the signboard images are single in background for highlighting shop names, the background is possibly repeated, and the images are easy to be inclined in angle, uneven in light, high in ambiguity and the like when the user shoots through a mobile phone, so that certain difficulty is brought to image retrieval.

Disclosure of Invention

The invention aims to provide an image processing method, an image processing device and a computer storage medium, which aim to overcome the existing defects and achieve better retrieval effect through fusion of image visual characteristics and text information.

In order to achieve the above object, the present invention provides an image processing method including:

preprocessing an image to be queried;

and carrying out visual feature extraction on the preprocessed image to be queried to obtain a visual feature extraction result, wherein the visual feature extraction result comprises the following steps: color features and texture features;

obtaining a visual feature vector based on the visual feature extraction result;

extracting text information from the preprocessed image to be queried to obtain text feature vectors;

fusing the visual feature vector and the text feature vector to obtain a fused query image vector;

and carrying out similarity calculation on the fused query image vector and the image vectors in the database to obtain a picture similar to the image to be queried.

Optionally, the step of extracting visual features of the preprocessed image to be queried includes:

extracting each color channel of the preprocessed image to be processed, and generating adder mapping by using a local binary pattern of each channel pixel;

based on the adder map, a local binary pattern of the largest channel of each pixel in the image is calculated, forming a texture histogram.

In one implementation, the expression of the visual feature vector is:

wherein F is _Feature Descriptors representing texture features of the image, F _Color Descriptor, len (F _Feature ) The length sum, len (F _Color ) Representing the length of the color feature descriptor, imageSize (I) is the number of pixels in the input image.

Optionally, extracting text information from the preprocessed image to be queried to obtain text feature vectors:

identifying texts in the image to be queried through an open source optical character identification tool;

text feature vectors are obtained based on the identified text.

The invention also provides a step of acquiring the image in the database, which comprises the following steps:

preprocessing each image to be placed in the database;

and carrying out visual feature extraction on each preprocessed image to obtain a visual feature extraction result corresponding to each image, wherein the visual feature extraction result corresponding to each image comprises the following steps: color features and texture features;

obtaining a visual feature vector corresponding to a visual feature extraction result of each image;

extracting text information of each image line, and using text feature vectors corresponding to the images;

and fusing the corresponding visual feature vector and the text feature vector of each image to obtain a fused image vector corresponding to the image, and storing the fused image vector into a database.

Further, an image processing apparatus is disclosed, the apparatus comprising a processor, and a memory connected to the processor through a communication bus; wherein,

the memory is used for storing an image processing program;

the processor is configured to execute the image processing program to implement any one of the image processing methods.

And, the present invention also discloses a computer storage medium storing one or more programs executable by one or more processors to cause the one or more processors to perform the steps of any of the image processing methods.

The image processing method, the device and the computer storage medium provided by the embodiment of the invention have the following beneficial effects:

(1) The present invention calculates the LBP of a single channel of pixels and integrates them into a maximum multi-channel LBP (MMLBP) to acquire inter-channel texture information of a color image, independent of the number of channels of the color image and solving the dimension disaster problem.

(2) The invention fully utilizes the fusion of the text information and visual characteristics of the image, and improves the retrieval accuracy.

Drawings

Fig. 1 is a schematic flow chart of an image processing method according to an embodiment of the invention.

Fig. 2 is a schematic diagram of visual feature extraction according to an embodiment of the invention.

Detailed Description

Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention.

Please refer to fig. 1-2. It should be noted that, the illustrations provided in the present embodiment merely illustrate the basic concept of the present invention by way of illustration, and only the components related to the present invention are shown in the drawings and are not drawn according to the number, shape and size of the components in actual implementation, and the form, number and proportion of the components in actual implementation may be arbitrarily changed, and the layout of the components may be more complex.

The present invention provides an image processing method as shown in fig. 1, comprising:

s110, preprocessing an image to be queried;

it should be noted that, before indexing and searching, preprocessing is required for the image to be queried, for example, histogram equalization, binary filtering and gamma conversion are adopted for the image to be queried to achieve the effects of increasing local contrast of the image, smoothing and eliminating dryness of the image and enhancing the image, so that feature extraction is conveniently performed in the later stage, and specific processing procedures are not repeated in the embodiments of the present invention.

S120, performing visual feature extraction on the preprocessed image to be queried to obtain a visual feature extraction result, wherein the visual feature extraction result comprises: color features and texture features;

it can be understood that the pre-processed image is subjected to visual feature extraction, and the color features and texture features of the image to be queried are respectively extracted in the whole visual feature extraction process.

Specifically, the color information of each pixel is extracted by quantizing the RGB color space, and the color histogram is calculated, and the RGB color space has the characteristics of low calculation complexity and strong robustness. In this color space, R, G, B channels are uniformly quantized to Q _r 、Q _G 、Q _B Represented elements. Color characterization of a single image, which quantifies the entire color space from 0 to Q _r *Q _G *Q _B Q of-1 _r *Q _G *Q _B Color, defined as:

Q _RGB ＝Q _r *Q _G *R+Q _B *G+B

here, Q _r 、Q _G And Q _B Is set to 4 in order to quantify the resulting color space with 64 different intensity values from 0 to 63.

Further, extracting each color channel of the image, generating an enlarged mapper by using a local binary pattern of each channel pixel, and calculating a local binary pattern LBP of a maximum multi-channel of each pixel in the image based on the mapping of the adder to form a texture histogram. Let I be a multichannel image of size p×q×c, where P, Q, C represents the number of rows, columns, and channels of the image, respectively. I _c Is the c channel in I, c E [1, C]. Is provided with N neighborhoods and arbitrary pixels I _c (a, b) equi-radially spaced, expressed as

LBP _c The calculation formula of (a, b) is as follows:

wherein n is E [1, N]，a∈[1，P]，b∈[1，Q]，And (3) calculating the generated LBP equivalent decimal weight function. The adder map is defined as:

based on the adder map, a maximum multi-channel local binary pattern (MMLBP) for each pixel in the image is calculated as follows:

after calculating the MMLBP of each image, the texture features of the image are represented as a histogram obtained by looking up the frequency of occurrence of MMLBP values in the image, defined as F _Texture ＝h(r _k )＝n _k Wherein r is _k Is the MMLBP value of the pixel in the generated image, the value is [0,255 ]]In between, nk is the number of pixels with a value MMLBP. As in fig. 2.

S130, obtaining a visual feature vector based on the visual feature extraction result;

the color feature and the texture feature are extracted based on the visual feature respectively, and the formed color feature vector and the texture feature vector are fused into a single visual feature vector. The fusion formula is as follows:

wherein F is _Feature Representation of the drawingsLike descriptors of texture features, F _Color Descriptor, len (F _Feature ) The length sum, len (F _Color ) Representing the length of the color feature descriptor, imageSize (I) is the number of pixels in the input image.

S140, extracting text information from the preprocessed image to be queried to obtain text feature vectors;

it can be understood that the text in the image to be queried is recognized through the existing open-source optical character recognition OCR tool, and the recognized text word is used as a label and a keyword to index the image so as to form a text feature vector.

S150, fusing the visual feature vector and the text feature vector to obtain a fused query image vector;

it should be noted that feature fusion is a process of combining two feature vectors into one feature vector having more recognition power than a single input feature vector. Feature fusion can be implemented at a certain level, such as a matching level, a feature level, and a decision level. The present invention contemplates the use of feature level fusion methods because 2 data patterns, namely visual feature vectors and text feature vectors, are generated. According to this fusion method, features extracted from the input entities are first combined and then further processed as a single unit for final fusion analysis. A typical correlation analysis method (CCA) is used to handle the mutual representation between two random eigenvectors. The aim of CCA is to find two sets of projection directions, so that the correlation between the projected two eigenvectors is maximized. The two eigenvectors ultimately represent the result as:

wherein X and Y are respectively expressed as a visual feature vector and a text feature vector, W _x And W is _y Representing a pair of projection directions on X and Y.

S160, similarity calculation is carried out between the fused query image vector and the image vectors in the database so as to obtain a picture similar to the image to be queried.

The invention also provides a step of acquiring images in a database, comprising:

preprocessing each image to be placed in the database;

The image vector in the database is the same as the image vector to be queried in the processing steps, specifically, the processing steps of steps S110-S150, and the embodiments of the present invention are not described herein.

It should be noted that, the images stored in the database are fused with the corresponding visual feature vectors and text feature vectors, the distance between the image to be queried and the image stored in the database is calculated by using Euclidean distance, then the images which are most similar to the image to be queried are output according to ascending order of the obtained distance, and then the retrieval effect of the fused images is measured by common indexes such as precision, recall ratio and the like. The Euclidean distance calculation formula is as follows:

wherein,feature vectors representing database images, F _q Representing to-be-queriedFeature vectors of the image.

It can be understood that the calculated euclidean distance represents the similarity between two images, and then a maximum value can be selected from the values of the euclidean distances, and the image corresponding to the maximum value can be used as the image with the highest similarity with the image to be queried.

In addition, a threshold value can be set, and when the Euclidean distance is larger than the threshold value, all corresponding images are obtained, and the images are the most similar to the images to be queried.

the memory is used for storing an image processing program;

And, the present invention also discloses a computer storage medium storing one or more programs executable by one or more processors to cause the one or more processors to perform the steps of any of the image processing methods. The above embodiments are merely illustrative of the principles of the present invention and its effectiveness, and are not intended to limit the invention. Modifications and variations may be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the invention. Accordingly, it is intended that all equivalent modifications and variations of the invention be covered by the claims, which are within the ordinary skill of the art, be within the spirit and scope of the present disclosure.

Claims

1. An image processing method, comprising:

preprocessing an image to be queried;

performing similarity calculation on the fused query image vector and the image vectors in the database to obtain a picture similar to the image to be queried;

the step of extracting the visual characteristics of the preprocessed image to be queried comprises the following steps:

based on the adder mapping, calculating a local binary pattern of a maximum channel of each pixel in the image to form a texture histogram;

the expression of the visual characteristic vector is as follows:

wherein F is _Feature Descriptors representing texture features of the image, F _Color Descriptor, len (F _Feature ) The length sum, len (F _Color ) Representing the length of the color feature descriptor, imageSize (I) is the number of pixels in the input image;

extracting text information from the preprocessed image to be queried to obtain text feature vectors:

acquiring a text feature vector based on the identified text;

a step of acquiring an image vector in a database, comprising:

preprocessing each image to be placed in the database;

2. An image processing apparatus, comprising a processor, and a memory coupled to the processor via a communication bus; wherein,

the memory is used for storing an image processing program;

the processor is configured to execute the image processing program to implement the steps of the image processing method according to claim 1.

3. A computer storage medium storing one or more programs executable by one or more processors to cause the one or more processors to perform the steps of the image processing method of claim 1.