CN110399897B

CN110399897B - Image recognition method and device

Info

Publication number: CN110399897B
Application number: CN201910286523.XA
Authority: CN
Inventors: 崔泽鹏; 明悦; 范春晓; 翟正元
Original assignee: Beijing Baizhuo Network Technology Co ltd; Beijing University of Posts and Telecommunications
Current assignee: Beijing Baizhuo Network Technology Co ltd; Beijing University of Posts and Telecommunications
Priority date: 2019-04-10
Filing date: 2019-04-10
Publication date: 2021-11-02
Anticipated expiration: 2039-04-10
Also published as: CN110399897A

Abstract

The embodiment of the invention provides an image identification method and device. The method comprises the following steps: selecting a first image group and a second image group from L images, calculating the image association of each image in the first image group and each image in the second image group according to the image characteristics of each image in the first image group and the second image group, initializing the parameters of an objective function, performing iterative update on the parameters of the objective function to obtain an iteratively updated objective function, determining the clustering center of the L images, and performing the iterative update for at least one time; performing binary coding on the image characteristics of the image to be identified according to the hash function in the updated target function to obtain binary coded data; and identifying the image to be identified according to the binary coded data of the image to be identified and the binary coded data of each image in the clustering centers of the L images. Thereby improving the accuracy of image recognition.

Description

Image recognition method and device

Technical Field

The embodiment of the invention relates to an image processing technology, in particular to an image identification method and device.

Background

The image recognition technology is an important field of artificial intelligence, and is a technology for carrying out object recognition on an image so as to recognize various targets and objects in different modes. For example, the object of the image in the first mode is a running husky, the object of the image in the second mode is a standing autumn dog, and according to the image recognition technology, both the object of the image in the first mode and the object of the image in the second mode can be recognized as a dog.

Image recognition algorithms can be divided into global feature-based image recognition algorithms and local feature-based image recognition algorithms. The image recognition algorithm based on the local features is to regard an image as the combination of a plurality of local blocks, extract the local features from each local block respectively, then splice the local features into a vector, and represent an image by utilizing the spliced vector. When the human perception system distinguishes images, distinguishing is carried out according to the prominent characteristics of objects in the images, and the image recognition algorithm based on local characteristics accords with the understanding process of the human perception system to the images.

The image recognition algorithm based on local features comprises an image recognition algorithm based on manual design and an image recognition algorithm based on feature learning. In the prior art, for an image recognition algorithm based on feature learning, when a training image is used for training a target function, parameter optimization and cluster center optimization of a hash function are usually separated, that is, an optimized hash function parameter is calculated first, and then an optimized cluster center is calculated. The adaptation capability of the finally trained hash function and the clustering center to the image is poor, so that the accuracy of image recognition is reduced when the image is recognized by applying the hash function and the clustering center, and the image recognition error is caused.

Disclosure of Invention

The embodiment of the invention provides an image identification method and device, and aims to solve the problem that the image identification method in the prior art cannot accurately identify images.

In a first aspect, an embodiment of the present invention provides an image recognition method, including: selecting a first image group and a second image group from L images, wherein the first image group comprises N images, the second image group comprises M images, and the images of the first image group are not completely the same as the images of the second image group; wherein L is more than or equal to 2, N is more than or equal to 1 and less than or equal to L, and M is more than or equal to 1 and less than or equal to L;

calculating the image relevance of each image in the first image group and each image in the second image group according to the image characteristics of each image in the first image group and the second image group;

initializing parameters of an objective function according to the image relevance of each image in the first image group and each image in the second image group, wherein the objective function comprises a hash function and a cluster center function, the hash function is used for carrying out binary coding on image features, the cluster center function is used for obtaining a cluster center with image feature consistency in the L images, and the cluster center comprises at least one image; the parameters of the hash function comprise a first binary coding parameter, a second binary coding parameter and a prediction scaling variable, and the parameters of the clustering center function comprise a first orthogonal projection matrix and a second orthogonal projection matrix;

iteratively updating the parameters of the target function to obtain an iteratively updated target function, and determining the clustering center of the L images, wherein the number of the iteratively updated times is at least one;

performing binary coding on the image characteristics of the image to be identified according to the hash function in the updated target function to obtain binary coded data;

and identifying the image to be identified according to the binary coded data of the image to be identified and the binary coded data of each image in the clustering centers of the L images.

Optionally, the initializing a parameter of an objective function according to an image association between each image in the first image group and each image in the second image group includes:

obtaining an image relevance matrix according to the image relevance of each image in the first image group and each image in the second image group;

initializing parameters of an objective function according to the image correlation matrix;

the image correlation matrix is an N x M image correlation matrix, and the elements of the ith row and the jth column in the image correlation matrix represent the image correlation between the ith image in the first image group and the jth image in the second image group; alternatively, the first and second electrodes may be,

the image correlation matrix is an M x N image correlation matrix, and the elements of the jth row and the ith column in the image correlation matrix represent the image correlation between the jth image in the second image group and the ith image in the first image group;

wherein i is greater than or equal to 1 and less than or equal to N, and j is greater than or equal to 1 and less than or equal to M.

Optionally, the initializing parameters of the objective function according to the image correlation matrix includes:

acquiring a transpose matrix of the image correlation matrix;

respectively acquiring a covariance matrix of the image correlation matrix and a covariance matrix of a transpose matrix of the image correlation matrix;

acquiring a partial eigenvector of a covariance matrix of the image correlation matrix according to the projection matrix of the image correlation matrix, and acquiring a partial eigenvector of a covariance matrix of the transpose matrix according to the projection matrix of the transpose matrix;

initializing the first binary encoding parameter according to the first orthogonal projection matrix and a partial eigenvector of a covariance matrix of the image correlation matrix, and initializing the second binary encoding parameter according to the second orthogonal projection matrix and a partial eigenvector of a covariance matrix of the transpose matrix.

Optionally, the iteratively updating the parameter of the objective function to obtain an iteratively updated objective function includes:

determining one parameter as a parameter to be updated from the parameters of the target function during each iterative update, fixing other parameters, and updating the parameter to be updated to enable the target function to meet a preset condition;

and recording the iteration times, and stopping updating the parameters of the objective function when the iteration times are more than or equal to the preset iteration times to obtain the iteratively updated objective function.

Optionally, the identifying the image to be identified according to the binary coded data of the image to be identified and the binary coded data of each image in the clustering centers of the L images includes:

calculating the Euclidean distance between the binary coded data of the image to be identified and the binary coded data of each image in the clustering centers of the L images;

and identifying the image to be identified according to the Euclidean distance.

Optionally, the method further includes:

dividing the image into at least two local blocks;

acquiring the image characteristics of each local block;

and combining the image characteristics of the at least two local blocks to obtain the image characteristics of the image.

Optionally, the obtaining of the image feature of each local block includes;

and acquiring the image characteristics of each local block by using a direction gradient histogram algorithm.

In a second aspect, an embodiment of the present invention provides an image recognition apparatus, including:

the image processing device comprises a selecting module, a processing module and a processing module, wherein the selecting module is used for selecting a first image group and a second image group from L images, the first image group comprises N images, the second image group comprises M images, and the images of the first image group are not identical to the images of the second image group; wherein L is more than or equal to 2, N is more than or equal to 1 and less than or equal to L, and M is more than or equal to 1 and less than or equal to L;

the association module is used for calculating the image association of each image in the first image group and each image in the second image group according to the image characteristics of each image in the first image group and the second image group;

the initialization module is used for initializing parameters of an objective function according to the image relevance of each image in the first image group and each image in the second image group, wherein the objective function comprises a hash function and a cluster center function, the hash function is used for carrying out binary coding on image features, the cluster center function is used for acquiring cluster centers with image feature consistency in the L images, and the cluster centers comprise at least one image; the parameters of the hash function comprise a first binary coding parameter, a second binary coding parameter and a prediction scaling variable, and the parameters of the clustering center function comprise a first orthogonal projection matrix and a second orthogonal projection matrix;

the updating module is used for performing iterative updating on the parameters of the target function to obtain an iteratively updated target function, and determining the clustering centers of the L images, wherein the iterative updating is performed at least once;

the encoding module is used for carrying out binary encoding on the image characteristics of the image to be identified according to the hash function in the updated target function to obtain binary encoded data;

and the identification module is used for identifying the image to be identified according to the binary coded data of the image to be identified and the binary coded data of each image in the clustering centers of the L images.

Optionally, the initialization module includes:

the first acquisition module is used for acquiring an image relevance matrix according to the image relevance of each image in the first image group and each image in the second image group;

the sub-initialization module is used for initializing parameters of an objective function according to the image correlation matrix;

Optionally, the sub-initialization module is specifically configured to:

acquiring a transpose matrix of the image correlation matrix;

Optionally, the update module includes:

the sub-updating module is used for determining one parameter as a parameter to be updated from the parameters of the target function during each iterative updating, fixing other parameters and updating the parameter to be updated so that the target function meets a preset condition;

and the counting module is used for recording the iteration times, and stopping updating the parameters of the objective function when the iteration times are more than or equal to the preset iteration times to obtain the iteratively updated objective function.

Optionally, the identification module includes:

the calculation module is used for calculating the Euclidean distance between the binary coded data of the image to be identified and the binary coded data of each image in the clustering centers of the L images;

and the sub-identification module is used for identifying the image to be identified according to the Euclidean distance.

Optionally, the apparatus further comprises: an extraction module;

the extraction module is used for dividing the image into at least two local blocks; acquiring the image characteristics of each local block; and combining the image characteristics of the at least two local blocks to obtain the image characteristics of the image.

Optionally, when the extraction module obtains the image feature of each local block, the extraction module is specifically configured to:

In a third aspect, an embodiment of the present invention provides an image recognition apparatus, including: at least one processor and memory;

the memory stores computer-executable instructions; the at least one processor executes computer-executable instructions stored by the memory to perform the method of any one of the first aspect of the embodiments of the invention.

In a fourth aspect, the present invention provides a computer-readable storage medium, in which program instructions are stored, and when the program instructions are executed by a processor, the method according to any one of the first aspect of the present invention is implemented.

In a fifth aspect, this application provides a program product, which includes a computer program stored in a readable storage medium, from which the computer program can be read by at least one processor of an image recognition apparatus, and the execution of the computer program by the at least one processor causes the image recognition apparatus to implement the method according to any one of the first aspect of the embodiments of the present invention.

The embodiment of the invention provides an image identification method and device, which can realize the calculation of the parameters of a clustering center function and the parameters of a hash function to be cooperatively carried out by enabling the parameters of the hash function to influence the clustering center obtained by the clustering center function and enabling the clustering center obtained by the clustering center function to be fed back to the optimization of the parameters of the hash function, and simultaneously achieve the purpose of optimization, namely the collaborative optimization of the clustering center function and the hash function. The method not only can obtain the image characteristics represented by the better binary coded data of the image according to the hash function, but also can obtain the clustering center with high consistency of the image characteristics, so that the obtained hash function and the clustering center have stronger robustness, and have better image characteristic adaptability to the images in different modes. And finally, performing binary coding on the image features of the image to be recognized according to the hash function, and recognizing the image according to the image features of the image to be recognized after the binary coding and the image features of each image in the clustering center, so that the image recognition accuracy is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flow chart of an image recognition method according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of an image recognition apparatus according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an image recognition apparatus according to another embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a flowchart of an image recognition method according to an embodiment of the present invention, and as shown in fig. 1, the method according to the embodiment may include:

s101, selecting a first image group and a second image group from the L images.

In this embodiment, N images are selected from L candidate images as a first image group, and M images are selected as a second image group. When the first image group is selected from the L images and put into the N images and the M images in the second image group, the images in the first image group and the images in the second image group are required to be not completely identical, that is, at least one image is different between the images in the first image group and the images in the second image group. Wherein, L is more than or equal to 2, N is more than or equal to 1 and less than or equal to L, and M is more than or equal to 1 and less than or equal to L.

It should be noted that the relationship between N and M is not limited in this embodiment.

S102, calculating the image relevance of each image in the first image group and each image in the second image group according to the image characteristics of each image in the first image group and the second image group.

In this embodiment, after obtaining the first image group and the second image group, the image features of each image in the first image group are compared with the image features of each image in the second image group, so as to obtain the relevance, or in other words, the similarity, between the two images. The higher the similarity between the two images is, the stronger the relevance is, for example, the relevance between the two images can be represented by a number, which is defined as that when the image in the first image group and the image in the second image group to be compared are the same image, the number for representing the relevance between the two images is 1; when the images in the first image group and the images in the second image group to be compared are completely different, the number indicating the association of the two images is 0. Therefore, when the number obtained by comparison to indicate the correlation between the two images is a number between (0, 1), the larger the number is, the stronger the correlation between the two images is.

S103, initializing parameters of an objective function according to the image relevance of each image in the first image group and each image in the second image group.

In this embodiment, the target function includes a hash function and a clustering center function, where the hash function is used to perform binary encoding on the image features to obtain the image features represented by the binary encoded data; the cluster center function is used for acquiring a cluster center with image feature consistency from the L images, and the cluster center comprises at least one image. For example, when the target object of the image to be recognized is a dog, the L images may include images of the dog as well as images of other target objects (e.g., hamsters and buildings), and the cluster center obtained by the cluster center function may represent an image feature of the dog.

In this embodiment, the parameters of the hash function include a first binary coding parameter, a second binary coding parameter, and a prediction scaling variable, and the parameters of the cluster center function include a first orthogonal projection matrix and a second orthogonal projection matrix.

And S104, iteratively updating the parameters of the objective function to obtain an iteratively updated objective function, and determining the clustering center of the L images.

In this embodiment, the first binary coding parameter, the second binary coding parameter, the prediction scaling variable, the first orthogonal projection matrix, and the second orthogonal projection matrix are iteratively updated according to the target of the objective function, the parameter of the objective function updated in each iteration is obtained, and the objective function updated in each iteration is obtained, where the number of iterative updates is at least one. For example, the objective function is to obtain a minimum value and update the parameters, so that the target value is reduced, and the corresponding parameter when the target value is minimum is the updated parameter. And updating the first binary coding parameter, the second binary coding parameter, the prediction scaling variable, the first orthogonal projection matrix and the second orthogonal projection matrix once by 5 variables each time iteration is updated. And when the parameters of the target function after each iteration update are obtained, the cluster center function in the target function can obtain the updated cluster center according to the updated parameters of the target function.

And S105, carrying out binary coding on the image features of the image to be identified according to the hash function in the updated target function to obtain binary coded data.

In this embodiment, when the iterative update of the parameters of the target function is stopped, the target function and the clustering center corresponding to the last updated parameter are obtained, and the image features of the image to be identified are obtained. And carrying out binary coding on the image features of the image to be identified according to the hash function in the target function, thereby obtaining the image features represented by the binary coded data.

And S106, identifying the image to be identified according to the binary coded data of the image to be identified and the binary coded data of each image in the clustering centers of the L images.

In this embodiment, the binary coded data of the image to be recognized is compared with the binary coded data of each image in the clustering centers of the L images, and the image to be recognized is recognized according to the comparison result between the binary coded data of the image to be recognized and the binary coded data of each image in the clustering centers of the L images.

In this embodiment, the parameters of the hash function affect the cluster center obtained by the cluster center function, and the cluster center obtained by the cluster center function is fed back to the optimization of the hash function parameters, so that the calculation of the parameters of the cluster center function and the calculation of the parameters of the hash function are performed cooperatively, and the purpose of the optimization is simultaneously achieved, that is, the cluster center function and the hash function are optimized cooperatively. The method not only can obtain the image characteristics represented by the better binary coded data of the image according to the hash function, but also can obtain the clustering center with high consistency of the image characteristics, so that the obtained hash function and the clustering center have stronger robustness, and have better image characteristic adaptability to the images in different modes. And finally, performing binary coding on the image features of the image to be recognized according to the hash function, and recognizing the image according to the image features of the image to be recognized after the binary coding and the image features of each image in the clustering center, so that the image recognition accuracy is improved.

The technical solution of the embodiment of the method shown in fig. 1 will be described in detail below by using several specific examples.

In some embodiments, one possible implementation of S103 is:

and S1031, obtaining an image relevance matrix according to the image relevance between each image in the first image group and each image in the second image group.

S1032, initializing parameters of an objective function according to the image relevance matrix

In this embodiment, one of the images in the first image group is compared with one of the images in the second image group to obtain the relevance between the two images, where the relevance may be represented by a number, and the relevance between the two images is taken as an element in a relevance matrix, so as to obtain an image relevance matrix according to the image relevance between each image in the first image group and each image in the second image group.

The image correlation matrix can be an N x M image correlation matrix, and elements in an ith row and a jth column in the image correlation matrix represent the image correlation between an ith image in the first image group and a jth image in the second image group; alternatively, the first and second electrodes may be,

the image correlation matrix can also be an M × N image correlation matrix, and the element in the jth row and ith column in the image correlation matrix represents the image correlation between the jth image in the second image group and the ith image in the first image group. The embodiment of the invention takes an image relevance matrix with an image relevance matrix of N x M as an example for description; wherein i is greater than or equal to 1 and less than or equal to N, and j is greater than or equal to 1 and less than or equal to M.

In some embodiments, one possible implementation of S1032 is:

s10321, acquiring a transposed matrix of the image correlation matrix.

In this embodiment, for example, the image correlation matrix is U ═ U (U)_ij)∈R^n×mWherein u is_ijAnd the relevance between the ith image in the first image group and the jth image in the second image group is shown. Obtaining a transpose matrix of the image correlation matrix U, and recording as U^TWherein, U^TIs a matrix of M x N.

S10322, respectively obtaining a covariance matrix of the image correlation matrix and a covariance matrix of a transposed matrix of the image correlation matrix.

In this embodiment, a transposed matrix U is obtained^TMean vector u of_hAnd the mean vector U of the image correlation matrix U_gThereby, a transposed matrix U can be obtained^TThe calculation formula of the covariance matrix of (2) may be (U)^T-uh1^T)^T(U^T-uh1^T) Obtaining the covariance matrix of the image correlation matrix UThe covariance matrix of the matrix, the calculation formula for example, may be (U-ug 1)^T)^T(U-ug1^T)。

S10323, obtaining a partial eigenvector of the covariance matrix of the image correlation matrix according to the projection matrix of the image correlation matrix, and obtaining a partial eigenvector of the covariance matrix of the transposed matrix according to the projection matrix of the transposed matrix.

In this embodiment, the first-choice is to obtain the transpose matrix U^TProjection matrix P of_hProjection matrix P with image correlation matrix_gFrom the projection matrix P_hAnd a transposed matrix U^TObtaining a transposed matrix U from the covariance matrix^TPartial eigenvectors of the covariance matrix of (3), for example, the transposed matrix U may be obtained^TThe first b eigenvectors of (a), wherein a transposed matrix U is obtained^TThe calculation formula of the first b eigenvectors of the covariance matrix of (1) is, for example, formula 1. Similarly, the calculation formula for obtaining the first b eigenvectors of the covariance matrix of the image correlation matrix U is, for example, formula 2:

Uh＝Ph(U^T-uh1^T) Equation 1

U_g＝P_g(U-ug1^T) Equation 2

S10324, initializing the first binary coding parameter according to the first orthogonal projection matrix and a partial eigenvector of a covariance matrix of the image correlation matrix, and initializing the second binary coding parameter according to the second orthogonal projection matrix and a partial eigenvector of a covariance matrix of the transpose matrix.

In this embodiment, the first orthogonal projection matrix R is used_gPartial eigenvectors U of covariance matrix of correlation matrix with image_gInitializing the first binary encoding parameter G, wherein the first binary encoding parameter may be denoted, for example, as G sign (R)_gU_g)。

According to a second orthogonal projection matrix R_hAnd part of the covariance matrix of the transposed matrixFeature vector U_hInitializing the first binary encoding parameter H, wherein the first binary encoding parameter may be denoted as H sign (R), for example_hU_h)。

The present embodiment does not limit the manner in which the prediction scaling variable, the first orthogonal projection matrix, and the second orthogonal projection matrix are initialized, and for example, the values of the prediction scaling variable, the first orthogonal projection matrix, and the second orthogonal projection matrix may be randomly obtained.

In this embodiment, the parameters of the objective function are initialized by the image correlation matrix of each image in the first image group and each image in the second image group and the transpose matrix of the image correlation matrix. The obtained hash function and the clustering center are further improved to have stronger robustness, and the image feature adaptability to images in different modes is better.

In some embodiments, one possible implementation of S104 is:

and S1041, determining one parameter as a parameter to be updated from the parameters of the objective function during each iterative update, fixing other parameters, and updating the parameter to be updated to enable the objective function to meet a preset condition.

In this embodiment, for example, the target function expression is formula 3:

the first two terms of the objective function represent hash functions, and the third term represents a cluster center function.

At E₁In, transpose matrix U^TThe covariance matrix part eigenvectors of (a), for example, the first b eigenvectors are:

the covariance matrix part eigenvectors of the image correlation matrix U, for example, the first b eigenvectors, are:

wherein U can be utilized_hRepresenting image features of the first image group, U_gImage features representing the second image set. A first orthogonal projection matrix R_gSecond orthogonal projection matrix R_hAre all matrices of b.

At E₂Where σ is the prediction scaling variable, co (h)_i,g_j) Indicates the relevance of the first image group and the second image group, wherein co (h)_i,g_j) For example, formula 4:

at E₃，C＝[c₁,c₂,…,c_K]∈R^b×kRepresents the cluster center of L images, wherein K represents the number of images in the cluster center, c in equation 3_kImage features representing the material encoded data representation of the kth image in the cluster.

For convenient calculation and simplification, E in the objective function is used₁+E₂Written in matrix form, as shown in equation 5:

wherein H is { -1, +1}^b×n，G∈{-1,+1}^b×mJ is n x m and each element is a 1 matrix. In order to reduce redundancy in binary coding and to make the coding as much information as possible, for example, HH may be used^T＝nI，GG^TI is the identity matrix.

The process of updating the parameters in each iteration is as follows:

(1) firstly, fixing a second binary coding parameter G, a prediction scaling variable sigma and a first orthogonal projection matrix R_gAnd a second orthogonal projection matrix R_hUpdating the first binary encoding parameter H.

Predicting the second binary coding parameter G obtained after initialization or last iteration updateScaling variable sigma, first orthogonal projection matrix R_gAnd a second orthogonal projection matrix R_hIs substituted into equation 5, equation 5 reduces to equation 6:

wherein the content of the first and second substances,

HH^T＝nI，H∈{-1,+1}^b×n. By relaxing the discrete constraint on the first binary coding parameter H, the first binary coding parameter H can be solved by a singular value decomposition method, and the first binary coding parameter H can be obtained by the singular value decomposition method

Therefore, the updated first binary encoding parameter is:

(2) then fixing the first binary encoding parameter H, the prediction scaling variable sigma and the first orthogonal projection matrix R_gAnd a second orthogonal projection matrix R_hUpdating the second binary-coded parameter G.

Since the updated first binary encoding parameter H is obtained in (1), when the second binary encoding parameter G is updated, the updated first binary encoding parameter H in (1) and the prediction scaling variable σ and the first orthogonal projection matrix R obtained after initialization or last iteration update are used_gAnd a second orthogonal projection matrix R_hIs substituted into equation 5, equation 5 reduces to equation 7:

wherein the content of the first and second substances,

G∈{-1,+1}^b×n，GG^TmI. By relaxing the discrete constraint on G, we can solve the second binary coding parameter G by a singular value decomposition method, that is, the second binary coding parameter G can be obtained by the singular value decomposition method

Therefore, the updated second binary coding parameter is:

(3) then fixing the first binary coding parameter H, the second binary coding parameter G, the prediction scaling variable sigma and the second orthogonal projection matrix R_hUpdating the first orthogonal projection matrix R_g。

Since the updated first binary encoding parameter H is obtained in (1) and the updated second binary encoding parameter G is obtained in (2), the first orthogonal projection matrix R is updated_gThen, the first binary coding parameter H updated in (1) and the second binary coding parameter G updated in (2) are initialized or updated in the last iteration to obtain the prediction scaling variable sigma and the second orthogonal projection matrix R_hIs substituted into equation 3, equation 3 reduces to equation 8:

the clustering centers of the L images can be obtained according to the first binary coding parameter H and the second binary coding parameter G and a clustering algorithm (e.g., K-Means algorithm), and are related to the first binary coding parameter H and the second binary coding parameter G. The specific implementation of obtaining the clustering centers of the L images through the first binary coding parameter H and the second binary coding parameter G and the clustering algorithm may refer to the prior art, and will not be described herein again.

Obtaining an updated first orthogonal projection matrix R by a random gradient descent method_gWherein the updated first orthogonal projection matrix R is obtained by a stochastic gradient descent method_gSpecific method of (1)Reference may be made to existing methods, which are not described in detail herein.

(4) Then fixing the first binary coding parameter H, the second binary coding parameter G, the prediction scaling variable sigma and the first orthogonal projection matrix R_gUpdating the second orthogonal projection matrix R_h。

After updating the second orthogonal projection matrix R_hThen, the updated first binary coding parameter H in (1), the updated second binary coding parameter G in (2), and the updated first orthogonal projection matrix R in (3) are used_gAnd substituting the value of the predicted scaling variable sigma obtained after initialization or last iteration update into formula 3, wherein formula 3 is simplified into formula 9:

obtaining the updated second orthogonal projection matrix R by a random gradient descent method_hWherein the updated second orthogonal projection matrix R is obtained by a stochastic gradient descent method_hThe specific method can refer to the existing method, and is not described herein again.

(5) Finally, fixing a first binary coding parameter H, a second binary coding parameter G and a first orthogonal projection matrix R_gAnd a second orthogonal projection matrix R_hUpdate the predictive scaling variable sigma.

When the prediction scaling variable sigma is updated, the iteration is updated to obtain a first binary coding parameter H, a second binary coding parameter G and a first orthogonal projection matrix R_gAnd a second orthogonal projection matrix R_hIs substituted into equation 5, equation 5 reduces to equation 10:

wherein, it is made

Equation 5 can be further simplified to

Can obtain an approximate solution

Through the processes (1) - (5) above, the parameters of the objective function are updated in one iteration.

It should be noted that, when the parameter of the objective function is updated every iteration, the update sequence of the parameter is not limited in the embodiment of the present invention. For example, the variable σ may be scaled according to the prediction, and the second orthogonal projection matrix R_hA first binary encoding parameter H and a first orthogonal projection matrix R_gAnd updating the second binary coding parameter G in sequence.

And S1042, recording the iteration times, and stopping updating the parameters of the objective function when the iteration times are greater than or equal to the preset iteration times to obtain the iteratively updated objective function.

In this embodiment, after each iteration update, the number of iteration updates is recorded, and when the number of iterations is greater than or equal to the preset number of iterations, the update of the parameter of the target function is stopped, and the target function after the iteration update is obtained, so that the updated hash parameter and the clustering center of the L images are obtained. Wherein, it is made

For an image feature x of any input image, the hash function binary-codes x, and a formula for obtaining first binary-coded data is, for example, formula 11, and a formula for obtaining second binary-coded data is, for example, formula 12:

whereinWhere i in equation 11 denotes the ith row in the matrix W, W_iRepresenting the element of the ith row in the matrix W. I in equation 11 denotes the ith row in the matrix V, V_iRepresenting the elements of the ith row in matrix V.

In this embodiment, each iteration is performed, so that the parameter of the hash function affects the cluster center obtained by the cluster center function, and the cluster center obtained by the cluster center function is fed back to the optimization of the hash function parameter, so that the calculation of the parameter of the cluster center function and the parameter of the hash function is performed cooperatively, and the purpose of optimization, that is, the cluster center function and the hash function are optimized cooperatively, improves the robustness of the hash function and the cluster center, and the adaptability to the image characteristics of the images in different modes, thereby improving the accuracy of image identification.

In some embodiments, one possible implementation of S106 is:

s1061, calculating the Euclidean distance between the binary coded data of the image to be identified and the binary coded data of each image in the clustering centers of the L images.

In this embodiment, the image features of the image to be recognized, which are represented by the first binary coded data, and the image features of the image to be recognized, which are represented by the second binary coded data, can be obtained according to the hash function. Therefore, after obtaining the image features of the identified image represented by the first binary coded data and the image features represented by the second binary coded data, the embodiments of the present invention do not limit the image features represented by the binary coded data of the image to be identified, which are compared with the binary coded data of each image in the clustering centers of the L images, and for example, the euclidean distance between the image features represented by the first binary coded data and the binary coded data of each image in the clustering centers of the L images may be calculated. Or calculating the Euclidean distance between the image feature represented by the second binary coded data and the binary coded data of each image in the clustering centers of the L images, or respectively calculating the Euclidean distance between the image feature represented by the first binary coded data and the binary coded data of each image in the clustering centers of the L images and the Euclidean distance between the image feature represented by the second binary coded data and the binary coded data of each image in the clustering centers of the L images.

The binary coded data may be, for example, 1 and 0.

S1062, identifying the image to be identified according to the Euclidean distance

In this embodiment, the band recognition image is recognized based on the euclidean distance obtained in S1061.

In the embodiment, the image to be recognized is recognized by comparing the image features represented by the binary coded data of the image to be recognized with the image features represented by the binary coded data of the image at the clustering center, and the speed of image recognition is improved because only 1 and 0 need to be compared.

In some embodiments, the method according to the embodiments of the present invention may further include:

s201, dividing the image into at least two local blocks.

S202, acquiring the image characteristics of each local block.

Alternatively, the image feature of each local block may be obtained by using a Histogram of Oriented Gradient (HOG) algorithm.

S203, combining the image characteristics of the at least two local blocks to obtain the image characteristics of the image.

In this embodiment, when acquiring the image features, the image is divided into at least two local blocks, and the image features of each local block are acquired by using the HOG algorithm. For example, 8 × 8 pixels are selected to form a cell, and every 2 × 2 cells form a block, with 8 steps. Each cell generates 9 features and one block generates 36 features, i.e. 36-dimensional feature vectors. And finally, overlapping the image characteristics of each local block according to a preset sequence to obtain the image characteristics of the whole image. For example, the image is divided into 4 local blocks, each local block is a 36-dimensional feature vector, and then the feature vector of the image feature of the entire image is composed of 4 36-dimensional feature vectors.

An apparatus for carrying out the above method is described below.

Fig. 2 is a schematic structural diagram of an image recognition apparatus according to an embodiment of the present invention. As shown in fig. 2, the image recognition apparatus provided in this embodiment may include: a selection module 21, an association module 22, an initialization module 23, an update module 24, an encoding module 25 and an identification module 26.

Optionally, the initialization module 23 includes: a first acquisition module 231 and a sub-initialization module 232.

Optionally, the update module 24 includes: a sub-update module 241 and a statistics module 242.

Optionally, the identification module 26 includes: a calculation module 261 and a sub-identification module 262.

Optionally, on the basis of the embodiment shown in fig. 2, the image recognition apparatus may further include: an extraction module 27.

The image selecting module 21 is configured to select a first image group and a second image group from the L images, where the first image group includes N images, the second image group includes M images, and the images of the first image group are not identical to the images of the second image group; wherein, L is more than or equal to 2, N is more than or equal to 1 and less than or equal to L, and M is more than or equal to 1 and less than or equal to L.

And the association module 22 is configured to calculate an image association between each image in the first image group and each image in the second image group according to the image characteristics of each image in the first image group and the second image group.

The initialization module 23 is configured to initialize a parameter of an objective function according to an image association between each image in the first image group and each image in the second image group, where the objective function includes a hash function and a cluster center function, the hash function is used to perform binary encoding on image features, the cluster center function is used to obtain a cluster center with image feature consistency in L images, and the cluster center includes at least one image; the parameters of the hash function comprise a first binary coding parameter, a second binary coding parameter and a prediction scaling variable, and the parameters of the clustering center function comprise a first orthogonal projection matrix and a second orthogonal projection matrix.

And the updating module 24 is configured to perform iterative updating on the parameters of the objective function to obtain an iteratively updated objective function, and determine a clustering center of the L images, where the number of iterative updating is at least one.

And the encoding module 25 is configured to perform binary encoding on the image features of the image to be recognized according to the hash function in the updated target function, so as to obtain binary encoded data.

And the identification module 26 is used for identifying the image to be identified according to the binary coded data of the image to be identified and the binary coded data of each image in the clustering centers of the L images.

Optionally, the first obtaining module 231 is configured to obtain an image relevance matrix according to an image relevance between each image in the first image group and each image in the second image group.

And a sub-initialization module 232, configured to initialize parameters of the objective function according to the image correlation matrix.

the image correlation matrix is an M-N image correlation matrix, and the elements of the jth row and the ith column in the image correlation matrix represent the image correlation between the jth image in the second image group and the ith image in the first image group.

Optionally, the sub-initialization module 232 is specifically configured to:

and acquiring a transposed matrix of the image correlation matrix.

And respectively acquiring the covariance matrix of the image correlation matrix and the covariance matrix of the transpose matrix of the image correlation matrix.

And acquiring a partial eigenvector of the covariance matrix of the image correlation matrix according to the projection matrix of the image correlation matrix, and acquiring a partial eigenvector of the covariance matrix of the transposed matrix according to the projection matrix of the transposed matrix.

Initializing a first binary coding parameter according to the first orthogonal projection matrix and a partial eigenvector of a covariance matrix of an image correlation matrix, and initializing a second binary coding parameter according to the second orthogonal projection matrix and a partial eigenvector of a covariance matrix of a transposed matrix.

Optionally, the sub-updating module 241 is configured to determine one parameter from the parameters of the objective function as a parameter to be updated, fix other parameters, and update the parameter to be updated so that the objective function meets a preset condition.

And the counting module 242 is configured to record the iteration number, and when the iteration number is greater than or equal to a preset iteration number, stop updating the parameter of the objective function to obtain the iteratively updated objective function.

Optionally, the calculating module 261 is configured to calculate a euclidean distance between the binary coded data of the image to be identified and the binary coded data of each image in the clustering centers of the L images;

and the sub-identification module 262 is used for identifying the image to be identified according to the Euclidean distance.

Optionally, the extracting module 27 is configured to divide the image into at least two local blocks; acquiring the image characteristics of each local block; and combining the image characteristics of the at least two local blocks to obtain the image characteristics of the image.

Optionally, when the extracting module 27 obtains the image feature of each local block, it is specifically configured to:

and acquiring the image characteristics of each local block by using a directional gradient histogram algorithm.

The image recognition apparatus described above in this embodiment may be configured to execute the technical solutions in the above method embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.

Fig. 3 is a schematic structural diagram of an image recognition apparatus according to another embodiment of the present invention. As shown in fig. 3, the image recognition apparatus may be a network device or a chip of a network device, and the apparatus may include: at least one processor 31 and a memory 32. Fig. 3 shows an image recognition apparatus, for example a processor, in which,

and a memory 32 for storing programs. In particular, the program may include program code comprising computer operating instructions. The Memory 32 may include a Random Access Memory (RAM) and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory.

The processor 31 is configured to execute the computer-executable instructions stored in the memory 32 to implement the image recognition method in the foregoing embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.

The processor 31 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement the embodiments of the present Application.

Alternatively, in a specific implementation, if the communication interface, the memory 32 and the processor 31 are implemented independently, the communication interface, the memory 32 and the processor 31 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The buses may be divided into address buses, data buses, control buses, etc., but do not represent only one bus or one type of bus.

Alternatively, in a specific implementation, if the communication interface, the memory 32 and the processor 31 are integrated on a chip, the communication interface, the memory 32 and the processor 31 may complete the same communication through an internal interface.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. An image recognition method, comprising:

selecting a first image group and a second image group from L images, wherein the first image group comprises N images, the second image group comprises M images, and the images of the first image group are not completely the same as the images of the second image group; wherein L is more than or equal to 2, N is more than or equal to 1 and less than or equal to L, and M is more than or equal to 1 and less than or equal to L;

according to the target of the target function, carrying out iterative updating on the parameters of the target function to obtain an iteratively updated target function, and determining the clustering center of the L images, wherein the iterative updating times are at least one time, and 5 variables of the first binary coding parameter, the second binary coding parameter, the prediction scaling variable, the first orthogonal projection matrix and the second orthogonal projection matrix are updated once during each iterative updating;

2. The method according to claim 1, wherein initializing parameters of an objective function according to the image correlation of each image in the first image group and each image in the second image group comprises:

3. The method of claim 2, wherein initializing parameters of an objective function according to the image correlation matrix comprises:

acquiring a transpose matrix of the image correlation matrix;

4. The method of claim 1, wherein iteratively updating the parameters of the objective function to obtain an iteratively updated objective function comprises:

5. The method according to claim 1, wherein the identifying the image to be identified according to the binary coded data of the image to be identified and the binary coded data of each image in the clustering centers of the L images comprises:

and identifying the image to be identified according to the Euclidean distance.

6. The method according to any one of claims 1-5, further comprising:

dividing the image into at least two local blocks;

acquiring the image characteristics of each local block;

7. The method of claim 6, wherein the obtaining of the image feature of each local block comprises;

8. An image recognition apparatus, comprising:

the updating module is used for performing iterative updating on the parameters of the target function according to the target of the target function to obtain an iteratively updated target function, and determining the clustering center of the L images, wherein the iterative updating is performed at least once, and 5 variables of the first binary coding parameter, the second binary coding parameter, the predictive scaling variable, the first orthogonal projection matrix and the second orthogonal projection matrix are updated once during each iterative updating;

9. An image recognition apparatus, comprising: a memory for storing program instructions and a processor for calling the program instructions in the memory to perform the image recognition method of any one of claims 1 to 7.

10. A readable storage medium, characterized in that the readable storage medium has stored thereon a computer program; the computer program, when executed, implements an image recognition method as claimed in any one of claims 1-7.