CN111523554A

CN111523554A - Image recognition method based on reverse bag-of-words model

Info

Publication number: CN111523554A
Application number: CN202010292713.5A
Authority: CN
Inventors: 裴云强; 吴亚东; 王赋攀; 侯志伟
Original assignee: Sichuan University of Science and Engineering
Current assignee: Sichuan University of Science and Engineering
Priority date: 2020-04-13
Filing date: 2020-04-13
Publication date: 2020-08-11

Abstract

The invention discloses an image identification method based on a reverse bag-of-words model, which comprises the following steps: generating a reverse bag-of-words model: extracting SURF characteristic points of all reference images at a server side, obtaining corresponding 64-dimensional descriptors, and establishing a reverse bag-of-words model by using K-means, wherein leaf nodes correspond to 'visual words'; target image transmission: opening a Web camera to capture an image, directly uploading the image to a server, extracting SURF characteristic points of the image and obtaining a corresponding descriptor; image word vector generation: classifying all SURF descriptors of the reference image and the target image by using the established reverse bag-of-words model, associating the classification result of each descriptor to a visual word, and then calculating a corresponding word vector; target image recognition: and calculating Euclidean distances between the target image word vector and the reference image word vector, and taking the reference image corresponding to the minimum distance as a recognition result. The method can ensure the recognition speed and the recognition matching degree.

Description

Image recognition method based on reverse bag-of-words model

Technical Field

The invention relates to an image recognition method, in particular to an image recognition method based on a reverse bag-of-words model.

Background

Image recognition, which refers to a technique for processing, analyzing and understanding images by a computer to recognize various different patterns of objects and objects, is a practical application of applying a deep learning algorithm. Image recognition technology at present is generally divided into face recognition and commodity recognition, and the face recognition is mainly applied to security inspection, identity verification and mobile payment; the commodity identification is mainly applied to the commodity circulation process, in particular to the field of unmanned retail such as unmanned goods shelves and intelligent retail cabinets.

The basic flow of image recognition is divided into four steps: image acquisition → image preprocessing → feature extraction → image recognition. The traditional image recognition method is mainly based on the traditional technology of vocabulary tree retrieval, takes a word bag model established in the retrieval direction to match images under visual words (leaf nodes) as a main technical route, and is divided into two types: the first type is one-to-one comparison between a target image and a reference image, the reference image with the minimum descriptor distance with all feature points of the target image is obtained through calculation and comparison, the time overhead of a retrieval algorithm is linearly increased, the matching time is increased along with the increase of the feature points, and finally, the matched image with the shortest descriptor distance is found; the second type of image matching method is that a user needs to apply for Token in a server-side image database for each target image, and upload the image to the server-side image database, and when the target image is identified, the corresponding image in the server-side server is found according to the Token applied by the user for image matching, so that redundant time-consuming application steps only reduce the experience of the user in identifying the image through equipment.

In summary, the conventional image recognition method has the following defects: firstly, if the reference image parts in the database are too similar, the result of too many layers and too large data processing delay is caused in the process of building the tree; secondly, if the number of layers is limited to shorten the retrieval time, the result of insufficient accuracy is generated; thirdly, after the tree building operation of each layer, all the feature point descriptors contained in each node of the layer need to be stored, so as to obtain the average node of the next layer, and further, the storage space is far from shortage.

Disclosure of Invention

The present invention is directed to solve the above problems, and an object of the present invention is to provide an image recognition method based on a reverse bag-of-words model, which can ensure recognition speed and also ensure recognition matching degree.

The invention realizes the purpose through the following technical scheme:

an image recognition method based on a reverse bag-of-words model comprises the following steps:

step 1, generating a reverse bag-of-words model: extracting SURF characteristic points of all reference images at a server side, obtaining corresponding 64-dimensional descriptors, and establishing a reverse bag-of-words model by using K-means, wherein leaf nodes correspond to 'visual words';

step 2, target image transmission: opening a Web camera to capture an image, directly uploading the image to a server, extracting SURF characteristic points of the image and obtaining a corresponding descriptor;

step 3, generating image word vectors: classifying all SURF descriptors of the reference image and the target image by using the established reverse bag-of-words model, associating the classification result of each descriptor to a visual word, and then calculating a corresponding word vector;

step 4, target image recognition: and calculating Euclidean distances between the target image word vector and the reference image word vector, and taking the reference image corresponding to the minimum distance as a recognition result.

The SURF is a short hand for Speeded Up Robust Features, is an improved algorithm for SIFT algorithm, is mainly characterized by high speed, and is the prior art. The meaning of K-means is K mean value, and a K mean value clustering algorithm is a clustering analysis algorithm for iterative solution and is the prior art.

Preferably, the method for obtaining the descriptors in the step 1 and the step 2 is as follows: firstly, acquiring a reference or target image, respectively expanding each image to establish an image pyramid based on a box filter, then extracting all SURF feature points by using a SURF algorithm for each scale of the image, and generating a corresponding descriptor for each SURF feature point.

Preferably, in step 1, the method for establishing the database of the inverse bag-of-words model includes the following steps:

step 1.1, defining an N-ary tree structure with the number of layersK, the number of nodes in each layer is N_iAnd i represents the ith layer;

step 1.2, determining the number of leaf nodes in the last layer according to the application field of the product and the size of the analyzed database scale;

step 1.3, performing aggregation classification on all descriptors of the database image based on a K-means algorithm to obtain classification results of N sub-nodes, taking cluster centers of all descriptors in each node as the descriptors of the node, and sequencing the nodes according to the position numbers of the nodes;

step 1.4, calculating Euclidean distances between the descriptors of the nodes arranged at the first level and the descriptors of the rest N-1 nodes at the current level, finding out the node with the minimum distance and exchanging the position of the node with the node at the current second position; the matched nodes are not considered, the operation is sequentially carried out on the nodes at the rest odd number positions until the layer is traversed, and then the average descriptor of all the matched nodes of the layer is obtained and used as the descriptor of the father node of the layer;

step 1.5, entering a K-1 layer, repeating the step 1.4 until entering a second layer, and generating an N-ary tree structure model of the image database;

step 1.6, maintaining the tree downwards from the second layer, adjusting the position of each node of each layer to the position where the node is positioned before matching according to Euclidean distance except the second layer until the leaf nodes are correctly returned, wherein each leaf node corresponds to an independent visual word, completing the maintenance of the N-way tree, and generating a reverse bag-of-words model;

and step 1.7, storing the reverse bag-of-words model into a server-side database, and establishing an image database with an N-branch tree structure.

Preferably, in step 3, the generating of the reference image word vector includes the following steps:

step 3.1.1, giving a unique number to the reference image;

step 3.1.2, classifying all SURF descriptors of the reference image by using the established reverse bag-of-words model, calculating a word vector of the reference image corresponding to the descriptor by using TF-IDF according to word frequency after all the descriptors of the reference image are distributed to corresponding leaf nodes, and storing the word vector and the serial number of the corresponding reference image into a database at the server end;

and 3.1.3, performing steps 3.1.1-3.1.2 on each reference image to obtain a word vector of each reference image and storing the word vector into a database at the server side.

Preferably, in step 3, the generating of the target image word vector includes the following steps:

step 3.2.1, opening a Web camera to capture an image by a user, directly uploading the image to a server, extracting SURF characteristic points of a target image and obtaining a corresponding descriptor;

and 3.2.2, classifying all SURF descriptors of the target image by using the established reverse bag-of-words model, associating the classification result of each descriptor to a visual word, and then calculating a word vector of the corresponding target image.

The invention has the beneficial effects that:

the method can ensure the recognition speed and the recognition matching degree; specifically, by establishing a reverse bag-of-words model, compared with establishing a forward bag-of-words model, the generated tree structure is ensured to be a full N-tree, the minimum number of model layers is further ensured, the time overhead of the descriptors distributed to the words is reduced, and after each layer of tree establishment operation, all nodes (all descriptors in the layer) are not required to be stored to obtain the average node of the next layer (the next time of reverse tree establishment is a parent node layer, and the next layer of forward tree establishment is a child node layer), so that the space overhead of the characteristic point of the layer required to be stored when the next layer of nodes is calculated is reduced; after the reverse bag-of-words model is searched, the best matching image is found by utilizing the Euclidean distance, so that the searching precision is greatly improved while the quick searching is ensured. Meanwhile, when the method is adopted and the preprocessed database is matched, the descriptor to be matched and the most similar image of the descriptor can be matched without comparing all reference images one by one, even the time overhead of a fixed constant can be achieved, the time delay of a processing part is greatly reduced, so that the retrieval time is not linearly increased according to the scale of the database, but is fixed to be a constant level and is far smaller than a linear processing task.

Drawings

FIG. 1 is a schematic diagram of the comparison of products using the inverse bag-of-words modeling of the present invention with other products of the same type in terms of time overhead and recognition accuracy.

Detailed Description

The invention will be further illustrated with reference to preferred embodiments and the accompanying drawings in which:

the preferred embodiment:

step 1, generating a reverse bag-of-words model: extracting SURF characteristic points of all reference images at a server side, obtaining corresponding 64-dimensional descriptors, and establishing a reverse bag-of-words model by using K-means, wherein leaf nodes correspond to 'visual words'; in this step, the specific method for obtaining the descriptor is as follows: firstly, acquiring a reference image, respectively expanding each image to establish an image pyramid based on a box filter, then extracting all SURF feature points by using a SURF algorithm for each scale of the image, and generating a corresponding descriptor for each SURF feature point;

in this step, the method for establishing the database of the reverse bag-of-words model includes the following steps:

step 1.1, defining an N-ary tree structure, wherein the number of layers is K, and the number of nodes in each layer is N_iAnd i represents the ith layer;

step 1.7, storing the reverse bag-of-words model into a server-side database, and establishing an image database with an N-branch tree structure;

step 2, target image transmission: opening a Web camera to capture an image, directly uploading the image to a server, extracting SURF characteristic points of the image and obtaining a corresponding descriptor; in this step, the specific method for obtaining the descriptor is as follows: firstly, acquiring a target image, namely an image captured by a Web camera, respectively expanding each image to establish an image pyramid based on a box filter, then extracting all SURF feature points by using a SURF algorithm for each scale of the image, and generating a corresponding descriptor for each SURF feature point;

in this step, the generating of the word vector of the reference image includes the following steps:

step 3.1.1, giving a unique number to the reference image;

step 3.1.3, performing step 3.1.1-3.1.2 on each reference image to obtain a word vector of each reference image and storing the word vector into a server-side database;

the generation of the target image word vector comprises the following steps:

3.2.2, classifying all SURF descriptors of the target image by using the established reverse bag-of-words model, associating the classification result of each descriptor to a visual word, and then calculating a word vector of the corresponding target image;

The advantages of the process of the present invention over other conventional processes are further understood by the following FIG. 1:

FIG. 1 is a schematic diagram of the comparison of products using the inverse bag-of-words modeling of the present invention with other products of the same type in terms of time overhead and recognition accuracy. As can be seen from fig. 1, the method of the present invention, namely the method numbered "invert boww", is significantly reduced in time overhead, significantly improved in accuracy, and has significant advantages compared to other conventional methods. The numbers in parentheses in fig. 1 indicate the total number of reference pictures using different methods.

The above embodiments are only preferred embodiments of the present invention, and are not intended to limit the technical solutions of the present invention, so long as the technical solutions can be realized on the basis of the above embodiments without creative efforts, which should be considered to fall within the protection scope of the patent of the present invention.

Claims

1. An image recognition method based on a reverse bag-of-words model is characterized in that: the method comprises the following steps:

2. The image recognition method based on the inverse bag-of-words model according to claim 1, characterized in that: the method for obtaining the descriptor in the step 1 and the step 2 comprises the following steps: firstly, acquiring a reference or target image, respectively expanding each image to establish an image pyramid based on a box filter, then extracting all SURF feature points by using a SURF algorithm for each scale of the image, and generating a corresponding descriptor for each SURF feature point.

3. The image recognition method based on the inverse bag-of-words model according to claim 1 or 2, characterized in that: in the step 1, the method for establishing the database of the reverse bag-of-words model comprises the following steps:

step 1.1, defining an N-ary tree structure, wherein the number of layers is K, the number of nodes in each layer is Ni, and i represents the ith layer;

4. The image recognition method based on the inverse bag-of-words model according to claim 1 or 2, characterized in that: in step 3, the generating of the word vector of the reference image includes the following steps:

step 3.1.1, giving a unique number to the reference image;

5. The image recognition method based on the inverse bag-of-words model according to claim 1 or 2, characterized in that: in step 3, the generating of the target image word vector includes the following steps: