CN110895700A

CN110895700A - Image recognition method and system

Info

Publication number: CN110895700A
Application number: CN201811062833.5A
Authority: CN
Inventors: 祖辰
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2018-09-12
Filing date: 2018-09-12
Publication date: 2020-03-20

Abstract

The embodiment of the invention provides an image identification method and system, and relates to the technical field of computers. The method comprises the following steps: extracting image features of a plurality of modalities from an image training sample; constructing a plurality of hypergraphs according to the image features of the plurality of modalities; performing joint learning on the plurality of hypergraphs; and carrying out image classification according to the local optimal solution obtained by the joint learning. According to the image identification method provided by the embodiment of the invention, in the process of constructing the hypergraph, the plurality of hypergraphs are subjected to joint learning; and carrying out image classification according to the local optimal solution obtained by the joint learning. The similarity measurement among samples is carried out by using the long characteristic vectors spliced by all the characteristic vectors of different modes of one sample, so that the similarity of sample nodes in the same super edge is ensured, and the complementary information and the related information among the image characteristics of different modes are kept to the maximum extent.

Description

Image recognition method and system

Technical Field

The invention relates to the technical field of computers, in particular to an image identification method and system.

Background

In real life, the human diverse senses are composed of vision, hearing, touch, smell, and the like. There are many ways in which people perceive the world, such as objects seen, sounds heard, texture perceived, smells, etc. Invariance features that describe things are called modal features. In computer vision and multimedia analysis, multiple modal features are often utilized to characterize the same object from different perspectives. For example, to characterize an image of a natural scene well, a set of visual features representing its color, texture, or shape, etc. is typically extracted. How to fully mine the complementary information and the related information of the different modal characteristics to further improve the image identification performance has become a hotspot and difficult problem in the field of computer vision. Hypergraphs are widely used in the field of image recognition because they are able to portray complex relationships between a number of different objects.

In the prior art, in the process of constructing a hypergraph, a plurality of modal features of each sample are spliced into a long feature vector. And constructing a superedge according to the long feature vector, and finally generating a hypergraph with n nodes. The method for constructing the hypergraph ignores the difference of different modal characteristics, cannot ensure the similarity of sample nodes in the same hyperedge, and does not fully utilize complementary information and related information among the different modal characteristics.

Disclosure of Invention

In view of this, embodiments of the present invention provide an image recognition method and system, which complete image recognition based on joint hypergraph learning.

According to an aspect of the present invention, there is provided an image recognition method including: extracting image features of a plurality of modalities from an image training sample; constructing a plurality of hypergraphs according to the image features of the plurality of modalities; performing joint learning on the plurality of hypergraphs; and carrying out image classification according to the local optimal solution obtained by the joint learning.

Preferably, the plurality of hypergraphs each include a respective set of vertices and a respective set of hyper-edges. .

Preferably, the step of jointly learning the plurality of hypergraphs comprises: constructing an object model, wherein the object model comprises a super edge weight and a prediction function; performing alternate iterative optimization on the excess edge weight and the prediction function in the target model; and obtaining a locally optimal solution of the prediction function when the alternating iterations converge.

Preferably, the alternating iterative optimization comprises: fixing the excess edge weight, and optimizing a prediction function, wherein an independent item of the variable prediction function in the target model is removed to obtain an analytic solution of the prediction function; and fixing a prediction function, and optimizing the excess edge weight, wherein an item which is irrelevant to the variable excess edge weight in the target model is removed, and an analytic solution of the excess edge weight is obtained.

Preferably, the image recognition method further includes: collecting an image test sample and acquiring a category label vector corresponding to the image test sample; and verifying the target model by using the image test sample, and adjusting and optimizing parameters of the target model according to a verification result.

Preferably, the verifying the target model by using the image test sample, and adjusting and optimizing the parameters of the target model according to the verification result includes: applying the image test sample to the prediction function to obtain a prediction label vector corresponding to the image test sample; and comparing the prediction index vector and the category index vector corresponding to the image test sample.

Preferably, the verifying the target model by using the image test sample, and adjusting and optimizing the parameters of the target model according to the verification result further includes: according to the comparison result, if the alternative iterative optimization process does not reach the local optimal solution of the prediction function, further alternative iterative optimization is carried out on the weight of the excess edge in the target model and the prediction function; and according to the comparison result, if the alternative iterative optimization process converges to the local optimal solution of the prediction function, obtaining the prediction function of the target model.

Preferably, the classifying the image according to the local optimal solution obtained by the joint learning includes: acquiring an unknown image; and applying the unknown image to the prediction function to obtain an identification result of the unknown image.

Preferably, the k-nearest neighbor method is used to construct the super-edge set according to the image features of the plurality of modalities.

Preferably, the image features of the plurality of modalities include at least one of: color moment feature vectors, local two-dimensional histogram feature vectors, and histogram of oriented gradients feature vectors.

Preferably, the specific step of extracting the color moment feature vector includes: segmenting each image of the image training sample into a plurality of non-overlapping meshes; respectively calculating feature vectors of a color mean value, a color variance and a color skewness of the image in a plurality of channels of each grid; and connecting the feature vectors of the color mean, the color variance and the color skewness which are obtained by calculation in each grid to form a color moment feature vector.

Preferably, the specific step of extracting the local two-dimensional histogram feature vector includes: segmenting each image of the image training sample into a plurality of non-overlapping meshes; local two-dimensional histogram feature vectors are obtained by comparing image pixels of the central grid with image pixels of the surrounding grids of each image.

Preferably, the local two-dimensional histogram feature vector has good illumination invariance.

Preferably, the specific step of extracting the feature vector of the histogram of directional gradients includes: segmenting each image of the image training sample into a plurality of blocks; calculating a directional gradient histogram feature vector of each block; and normalizing the histogram feature vector of the direction gradient of each block in a block-by-block mode to obtain the histogram feature vector of the direction gradient.

According to another aspect of the present invention, there is provided an image recognition system including: a modal feature extraction module: extracting image features of a plurality of modalities from an image training sample; a target model building module: for constructing a plurality of hypergraphs from image features of the plurality of modalities; a joint learning module: for joint learning of the plurality of hypergraphs; a prediction module: carrying out image classification according to the local optimal solution obtained by the joint learning;

preferably, the image recognition system further comprises: an image acquisition module: the image training system is used for acquiring an image training sample and acquiring a class label vector corresponding to the image training sample; acquiring an image test sample and acquiring a category label vector corresponding to the image test sample; a test module: and verifying the target model by using the image test sample, and adjusting and optimizing parameters of the target model according to a verification result.

According to yet another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions which, when executed, implement the image recognition method as described above.

According to still another aspect of the present invention, there is provided a control apparatus for image recognition, comprising: a memory for storing computer instructions; a processor coupled to the memory, the processor configured to perform an image recognition method implemented as described below based on computer instructions stored by the memory.

One embodiment of the present invention has the following advantages or benefits: performing joint learning on the plurality of hypergraphs; and carrying out image classification according to the local optimal solution obtained by the joint learning. The similarity measurement among samples is carried out by using the long characteristic vectors spliced by all the characteristic vectors of different modes of one sample, so that the similarity of sample nodes in the same super edge is ensured, and the complementary information and the related information among the image characteristics of different modes are kept to the maximum extent.

Drawings

The above and other objects, features and advantages of the present invention will become more apparent by describing embodiments of the present invention with reference to the following drawings, in which:

fig. 1 shows a flow chart of an image recognition method according to an embodiment of the present invention.

Fig. 2 shows a flow chart of an image recognition method according to an embodiment of the present invention.

Fig. 3 is a schematic structural diagram of an image recognition system according to an embodiment of the present invention.

Fig. 4 is a block diagram of a control apparatus for image recognition according to an embodiment of the present invention.

Detailed Description

The present invention will be described below based on examples, but the present invention is not limited to only these examples. In the following detailed description of the present invention, certain specific details are set forth. It will be apparent to one skilled in the art that the present invention may be practiced without these specific details. Well-known methods, procedures, and procedures have not been described in detail so as not to obscure the present invention. The figures are not necessarily drawn to scale.

In the image recognition technique, a hypergraph

By collection of vertices

Set of hyper-edges ε ═ e₁,e₂,…,e_dAnd the super-edge weight vector w. The weight of the corresponding superedge of e ∈ is denoted w (e), whose value is set to 1 by default. Hypergraph

Can be made of the size of

The indication matrix H of (d) represents:

for node

Based on the indicator matrix H, the nodes

The degree of (d) is defined as:

similarly, the degree of the hyper-edge e ∈ ε is defined as:

D_vand D_eRespectively representing diagonal matrices with the degrees of nodes and the degrees of super edges as diagonal elements. D_wRepresents a super-edge weight matrix of size | ε × | ε | whose diagonal elements are the weights w (e) of the super-edges.

Assume that there are n samples, each of which has characteristics of m modalities. On each sample, feature vectors of m modes are spliced in series into a long feature vector. Constructing a superedge according to the long characteristic vector, and finally generating a hypergraph with n nodes

The objective function for hypergraph learning is defined as:

wherein omega_fIs a hypergraph Laplace regularization term, R_emp(f) To experience the loss term, either mean square error loss or hinge loss can be used. Lambda [ alpha ]>0 is a regularization parameter that balances the relative weight magnitudes of the two terms of the above equation. Hypergraph laplacian regularization term Ω_fThe definition is as follows:

the intuitive understanding of the above formula is that when all nodes in the same super edge have similar class labels, formula (5) takes a smaller value. Order to

L＝I-Λ，

The regularization term of the finished hypergraph is obtained as follows:

Ω_f＝f^TLf (6)

where L is a semi-positive definite matrix called hypergraph laplacian, f ═ f₁,f₂,…,f_n]^TIs a prediction function defined at (-1, 1). When using the mean square error loss as the empirical loss term

The objective function has the following closed form solution:

in the above formula it is assumed that the weights of the super edges are all the same and 1. And finally, identifying the image through a prediction function f.

In summary, the inventors found that the image recognition method based on hypergraph learning described above has the following disadvantages:

in the process of constructing the hypergraph, the image identification method splices the modal features of each sample into a long feature vector. And constructing a superedge according to the long feature vector, and finally generating a hypergraph with n nodes. The method for constructing the hypergraph ignores the difference of different modal characteristics, cannot ensure the similarity of sample nodes in the same hyperedge, and does not fully utilize complementary information and related information among the different modal characteristics.

In the process of constructing the hypergraph, the image identification method splices the modal features of each sample into a long feature vector. Let the feature dimension of the mth modal feature of a sample be p_mThen the long feature vector formed by splicing all modal features of a sample has the dimension of

The high-dimensional feature vector in the hypergraph construction method easily causes the problem of dimension disaster, and the similarity distance between every two samples is difficult to accurately measure in the high-dimensional feature space, so that the hypergraph cannot be accurately constructed.

In the process of constructing the hypergraph, the weight of each hyperedge in the image identification method is defaulted to be 1. The importance of the samples is different because the different super-edges contain different sample categories. Those super-edges that contain a greater number of samples of the same class should be given greater weight, while those that contain a greater number of samples of different classes should be given less weight. The super-edge weight of the method for constructing the hypergraph is set to be single, so that the discrimination capability of the image identification method based on hypergraph learning is reduced.

Fig. 1 is a schematic flow chart of an image recognition method according to an embodiment of the present invention, which specifically includes the following steps.

In step S101, image features of a plurality of modalities are extracted from an image training sample.

In step S102, a plurality of hypergraphs is constructed from image features of the plurality of modalities.

In step S103, joint learning is performed on the plurality of hypergraphs.

In step S104, image classification is performed based on the locally optimal solution obtained by the joint learning.

In one embodiment of the invention, first, image features of multiple modalities are extracted from an image training sample. Then, a plurality of hypergraphs is constructed from the image features of the plurality of modalities. Secondly, the plurality of hypergraphs are subjected to joint learning. And finally, carrying out image classification according to the local optimal solution obtained by the joint learning.

According to the embodiment of the invention, in the process of constructing the hypergraph, the plurality of hypergraphs are subjected to joint learning; and carrying out image classification according to the local optimal solution obtained by the joint learning. The similarity measurement among samples is carried out by using the long characteristic vectors spliced by all the characteristic vectors of different modes of one sample, so that the similarity of sample nodes in the same super edge is ensured, and the complementary information and the related information among the image characteristics of different modes are kept to the maximum extent.

Fig. 2 is a schematic flow chart of an image recognition method according to an embodiment of the present invention, which specifically includes the following steps:

in step S201, an image training sample is acquired and a class label vector corresponding to the image training sample is acquired.

In step S202, image features of a plurality of modalities are extracted from an image training sample.

In step S203, a plurality of hypergraphs is constructed from the image features of the plurality of modalities.

In step S204, an object model is constructed, alternating iterative optimization is performed on the excess edge weight and the prediction function in the object model, and a local optimal solution of the prediction function is obtained when the alternating iterative optimization converges.

In step S205, an image test sample is acquired and a category label vector corresponding to the image test sample is acquired.

In step S206, the target model is verified by using the image test sample, and parameters of the target model are adjusted and optimized according to a verification result.

In one embodiment of the invention, an image training sample is acquired and a class label vector corresponding to the image training sample is obtained. Image features of multiple modalities are extracted from an image training sample. The plurality of hypergraphs respectively include respective sets of vertices and hyper-edges. The plurality of hypergraphs respectively include respective sets of vertices and hyper-edges. And constructing a target model, performing alternate iterative optimization on the excess edge weight and the prediction function in the target model, and obtaining a local optimal solution of the prediction function when the alternate iterative convergence is performed. And acquiring an image test sample and acquiring a category label vector corresponding to the image test sample. And verifying the target model by using the image test sample, and adjusting and optimizing parameters of the target model according to a verification result.

According to the embodiment of the invention, in the process of constructing the hypergraph, a plurality of hypergraphs are constructed according to the image characteristics of the plurality of modalities. The plurality of hypergraphs respectively include respective sets of vertices and hyper-edges. The super-edge is constructed without stitching the modal features of each sample into a long feature vector. Therefore, the characteristic dimension of the modal characteristic is reduced, and the accuracy of constructing the hypergraph is improved.

In one embodiment of the present invention, the target formula of the target model is:

wherein omega_fIs the laplace regularization term of hypergraph, f ═ f₁,f₂,…,f_n]^TIs a prediction function defined on (-1,1), y ═ y₁,y₂,…,y_n]^T∈RⁿClass label vector for n training samples, w ═ w¹,w²,…,w^m]∈R^d×mIs a weight matrix consisting of weight vectors of image features of different modalities, λ and γ being regularization parameters.

Since the variables w and f in the target model are coupled together, the excess edge weights and the prediction function in the target model are alternately and iteratively optimized. When W is fixed to optimize f, removing the independent items of the variable f in the target model to obtain the following optimization problem:

derivation is performed on the above equation f, and the derivative is made zero to obtain an analytical solution for the prediction function:

and then f is fixed, w is optimized, wherein the item which is irrelevant to the variable w in the target model is removed, and the following optimization problem is obtained:

and (5) carrying out derivation on the formula w, and enabling the derivative to be zero to obtain an analytic solution of the super-edge weight.

And initializing the values of f and w in the next iteration by using the optimized values of f and w after each iteration, repeating iterative optimization in such a way, and obtaining a local optimal solution of the prediction function when the prediction function and the super-edge weight alternately and iteratively converge.

According to the embodiment of the invention, in the process of constructing the hypergraph, alternating iterative optimization is carried out on the hyperedge weight and the prediction function in the target model; and obtaining a local optimal solution of the prediction function during alternate iterative convergence, and finally obtaining the excess edge weight which accurately reflects the importance of the excess edge, thereby improving the discrimination capability of the image recognition method.

In one embodiment, the extracting image features of a plurality of modalities from image training samples includes at least one of: color moment feature vectors, local two-dimensional histogram feature vectors, and histogram of oriented gradients feature vectors.

The specific steps of extracting the color moment feature vectors comprise: segmenting each image of the image training sample into non-overlapping 3 x 3 sized grids; respectively calculating characteristic vectors of a color mean value, a color variance and a color skewness of the three channels of the image of each grid; and connecting the feature vectors of the color mean, the color variance and the color skewness which are obtained by calculation in each grid to form a color moment feature vector with a feature dimension of 81 dimensions. In one embodiment, the sample matrix of color moment eigenvectors is assumed to be

Where n is the number of samples, d_cmThe feature dimension of the sample color moment feature vector. By a sample

For example, using it as a node, k neighbor samples are calculated. The set of k +1 samples is taken as a super edge. Then, a super edge is generated according to each sample, and then n super edges are generated by the image features of the color moment mode.

The specific step of extracting the local two-dimensional histogram feature vector comprises the following steps: segmenting each image of the image training sample into non-overlapping 3 x 3 sized grids; a local two-dimensional histogram feature vector with a feature dimension of 58 dimensions is obtained by comparing the image pixels of the central grid with the image pixels of the surrounding grids of each image. The local two-dimensional histogram feature vector has good illumination invariance.

The specific steps of extracting the feature vector of the directional gradient histogram comprise: segmenting each image of the image training sample into a plurality of blocks; calculating a directional gradient histogram feature vector of each block; and normalizing the histogram feature vector of the directional gradient of each block in a block-by-block mode to obtain a 31-dimensional histogram feature vector of the directional gradient with a feature dimension.

Fig. 3 is a schematic structural diagram of an image recognition system according to an embodiment of the present invention. As shown in fig. 3, the system 30 includes: a modal feature extraction module 301, an object model building module 302, a joint learning module 303, a prediction module 304, an image acquisition module 305, and a testing module 306.

Modal feature extraction module 301: the method is used for extracting image features of a plurality of modalities from an image training sample.

The object model building module 302: for constructing a plurality of hypergraphs from image features of the plurality of modalities.

The joint learning module 303: for joint learning of the plurality of hypergraphs.

The prediction module 304: and carrying out image classification according to the local optimal solution obtained by the joint learning.

Image acquisition module 305: the image training system is used for acquiring an image training sample and acquiring a class label vector corresponding to the image training sample; and acquiring an image test sample and acquiring a category label vector corresponding to the image test sample.

The test module 306: and verifying the target model by using the image test sample, and adjusting and optimizing parameters of the target model according to a verification result.

In one embodiment of the invention, the testing module 306 applies the image test sample to the prediction function to obtain a predictor vector corresponding to the image test sample; and comparing the prediction index vector and the category index vector corresponding to the image test sample. According to the comparison result, if the alternative iterative optimization process does not reach the local optimal solution of the prediction function, further alternative iterative optimization is carried out on the weight of the excess edge in the target model and the prediction function; and according to the comparison result, if the alternative iterative optimization process converges to the local optimal solution of the prediction function, obtaining the prediction function of the target model.

In one embodiment, the prediction module 304 obtains an unknown image and applies the unknown image to the prediction function to obtain the recognition result of the unknown image.

Fig. 4 is a structural diagram of a control apparatus of image recognition according to an embodiment of the present invention. The apparatus shown in fig. 4 is only an example and should not limit the functionality and scope of use of embodiments of the present invention in any way.

Referring to fig. 4, the apparatus includes a processor 401, a memory 402, and an input-output device 403 connected by a bus. Memory 402 includes Read Only Memory (ROM) and Random Access Memory (RAM), with various computer instructions and data required to perform system functions being stored in memory 402, and with various computer instructions being read by processor 401 from memory 402 to perform various appropriate actions and processes. An input/output device including an input portion of a keyboard, a mouse, and the like; an output section including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section including a hard disk and the like; and a communication section including a network interface card such as a LAN card, a modem, or the like. The memory 402 also stores the following computer instructions to perform the operations specified by the image recognition method of the embodiment of the present invention: extracting image features of a plurality of modalities from an image training sample; constructing a plurality of hypergraphs according to the image features of the plurality of modalities; performing joint learning on the plurality of hypergraphs; and carrying out image classification according to the local optimal solution obtained by the joint learning.

Accordingly, an embodiment of the present invention provides a computer-readable storage medium, which stores computer instructions that, when executed, implement the operations specified by the control method for image recognition.

The flowcharts and block diagrams in the figures and block diagrams illustrate the possible architectures, functions, and operations of the systems, methods, and apparatuses according to the embodiments of the present invention, and may represent a module, a program segment, or merely a code segment, which is an executable instruction for implementing a specified logical function. It should also be noted that the executable instructions that implement the specified logical functions may be recombined to create new modules and program segments. The blocks of the drawings, and the order of the blocks, are thus provided to better illustrate the processes and steps of the embodiments and should not be taken as limiting the invention itself.

The above description is only a few embodiments of the present invention, and is not intended to limit the present invention, and various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An image recognition method, comprising:

extracting image features of a plurality of modalities from an image training sample;

constructing a plurality of hypergraphs according to the image features of the plurality of modalities;

performing joint learning on the plurality of hypergraphs; and

and carrying out image classification according to the local optimal solution obtained by the joint learning.

2. The image recognition method of claim 1, wherein the plurality of hypergraphs each include a respective set of vertices and a respective set of hyperedges.

3. The image recognition method of claim 1, wherein the step of jointly learning the plurality of hypergraphs comprises:

constructing an object model, wherein the object model comprises a super edge weight and a prediction function;

performing alternate iterative optimization on the excess edge weight and the prediction function in the target model; and obtaining a locally optimal solution of the prediction function when the alternating iterations converge.

4. The image recognition method of claim 3, wherein the alternating iterative optimization comprises:

fixing the excess edge weight, and optimizing a prediction function, wherein an independent item of the variable prediction function in the target model is removed to obtain an analytic solution of the prediction function; and

and fixing a prediction function, and optimizing the excess edge weight, wherein an item which is irrelevant to the variable excess edge weight in the target model is removed, and an analytic solution of the excess edge weight is obtained.

5. The image recognition method according to claim 4, further comprising: collecting an image test sample and acquiring a category label vector corresponding to the image test sample;

and verifying the target model by using the image test sample, and adjusting and optimizing parameters of the target model according to a verification result.

6. The image recognition method of claim 5, wherein the verifying the target model by using the image test sample, and adjusting and optimizing the parameters of the target model according to the verification result comprises: applying the image test sample to the prediction function to obtain a prediction label vector corresponding to the image test sample; and

the predictor label vector and the category label vector corresponding to the image test sample are compared.

7. The image recognition method of claim 6, wherein the verifying the target model by using the image test sample, and adjusting and optimizing the parameters of the target model according to the verification result further comprises: according to the comparison result, if the alternative iterative optimization process does not reach the local optimal solution of the prediction function, further alternative iterative optimization is carried out on the weight of the excess edge in the target model and the prediction function;

and according to the comparison result, if the alternative iterative optimization process converges to the local optimal solution of the prediction function, obtaining the prediction function of the target model.

8. The image recognition method of claim 7, wherein the classifying the image according to the locally optimal solution obtained by the joint learning comprises: acquiring an unknown image; and

and applying the unknown image to the prediction function to obtain the identification result of the unknown image.

9. The image recognition method according to claim 2, wherein the hyper-edge sets are constructed using a k-nearest neighbor method respectively according to the image features of the plurality of modalities.

10. The image recognition method of claim 1, wherein the image features of the plurality of modalities include at least one of: color moment feature vectors, local two-dimensional histogram feature vectors, and histogram of oriented gradients feature vectors.

11. The image recognition method of claim 10, wherein the step of extracting the color moment feature vectors comprises: segmenting each image of the image training sample into a plurality of non-overlapping meshes;

respectively calculating feature vectors of a color mean value, a color variance and a color skewness of the image in a plurality of channels of each grid; and

and connecting the feature vectors of the color mean, the color variance and the color skewness which are obtained by calculation in each grid to form a color moment feature vector.

12. The image recognition method of claim 10, wherein the step of extracting the local two-dimensional histogram feature vector comprises: segmenting each image of the image training sample into a plurality of non-overlapping meshes;

local two-dimensional histogram feature vectors are obtained by comparing image pixels of the central grid with image pixels of the surrounding grids of each image.

13. The image recognition method of claim 12, wherein the local two-dimensional histogram feature vector has good illumination invariance.

14. The image recognition method of claim 10, wherein the step of extracting the histogram of oriented gradients feature vector comprises: segmenting each image of the image training sample into a plurality of blocks;

calculating a directional gradient histogram feature vector of each block; and

and normalizing the histogram feature vector of the direction gradient of each block in a block-by-block mode to obtain the histogram feature vector of the direction gradient.

15. An image recognition system, comprising:

a modal feature extraction module: extracting image features of a plurality of modalities from an image training sample;

a target model building module: for constructing a plurality of hypergraphs from image features of the plurality of modalities;

a joint learning module: for joint learning of the plurality of hypergraphs;

a prediction module: and carrying out image classification according to the local optimal solution obtained by the joint learning.

16. The image recognition system of claim 15, further comprising:

an image acquisition module: the image training system is used for acquiring an image training sample and acquiring a class label vector corresponding to the image training sample; acquiring an image test sample and acquiring a category label vector corresponding to the image test sample;

a test module: and verifying the target model by using the image test sample, and adjusting and optimizing parameters of the target model according to a verification result.

17. A computer-readable storage medium storing computer instructions which, when executed, implement the image recognition method of any one of claims 1 to 14.

18. An image recognition control device, comprising:

a memory for storing computer instructions;

a processor coupled to the memory, the processor configured to perform implementing the image recognition method of any of claims 1-14 based on computer instructions stored by the memory.