CN108681746B

CN108681746B - Image identification method and device, electronic equipment and computer readable medium

Info

Publication number: CN108681746B
Application number: CN201810443324.0A
Authority: CN
Inventors: 魏秀参
Original assignee: Beijing Megvii Technology Co Ltd
Current assignee: Beijing Megvii Technology Co Ltd
Priority date: 2018-05-10
Filing date: 2018-05-10
Publication date: 2021-01-12
Anticipated expiration: 2038-05-10
Also published as: CN108681746A

Abstract

The invention provides an image recognition method, an image recognition device, electronic equipment and a computer readable medium, and relates to the technical field of image recognition, wherein the method comprises the following steps: obtaining a test sample, wherein the test sample comprises a test example image and an image to be identified, and the category of an object in the test example image comprises the category of the object in the image to be identified; performing feature extraction on the test sample image to obtain a feature set, wherein the feature set comprises a plurality of sub-vectors, and the sub-vectors are part feature vectors of an object in the sample image; each sub-vector is mapped into a sub-classifier of a corresponding type through a segmented classifier mapping model, the sub-classifier is determined through the sub-classifier, a target classifier is further obtained, image recognition is carried out on an image to be recognized through the target classifier, the segmented classifier mapping model is a model obtained after small sample learning, and the method enables the traditional fine-grained level image recognition technology to be independent of massive fine-grained level images.

Description

Image identification method and device, electronic equipment and computer readable medium

Technical Field

The present invention relates to the field of image recognition technologies, and in particular, to an image recognition method, an image recognition apparatus, an electronic device, and a computer-readable medium.

Background

With the rapid development of artificial intelligence technology, image recognition and image detection technology is also being developed, and has been widely applied to life, for example, image recognition technology. Image recognition techniques may be used to identify the type of object contained in the image, e.g., whether the object contained in the image is a dog or cat, etc. With the rapid development of image recognition technology, another image recognition technology is also developed, namely, a fine-grained image recognition technology.

The fine-grained image recognition technology is an important research subject in the field of computer vision, and at present, the research mainly focuses on how to find object parts (object parts) areas with resolution capability in fine-grained images or construct a novel network model suitable for a fine-grained image recognition task. However, no matter what kind of deep learning method is described above, the deep learning method depends on massive fine-grained level images, so that development of fine-grained level image recognition and application of the fine-grained level image recognition in a real scene are limited.

Disclosure of Invention

In view of the above, the present invention provides an image recognition method, an image recognition apparatus, an electronic device, and a computer-readable medium, which enable a conventional fine-grained image recognition technology to be independent of a large amount of fine-grained images.

In a first aspect, an embodiment of the present invention provides an image recognition method, including: obtaining a test sample, wherein the test sample comprises a test example image and an image to be identified, and the category of an object in the test example image comprises the category of the object in the image to be identified; extracting features of the test sample image to obtain a feature set, wherein the feature set comprises a plurality of sub-vectors, and the sub-vectors are feature vectors of parts of objects in the sample image; and mapping each sub-vector into a sub-classifier of a corresponding type through a segmented classifier mapping model, and determining a target classifier through the sub-classifier, so that the image to be identified is identified through the target classifier, wherein the segmented classifier mapping model is a model obtained after small sample learning.

Further, the class of the object in the test example image is at least one; performing feature extraction on the test sample image to obtain a feature set, wherein the feature set comprises: for the class A in the test sample image_iThe test sample image is subjected to feature extraction to obtain a feature set X_iWherein, class A_iTaking 1 to k in sequence for the ith category in the plurality of categories, wherein k is the number of categories of the test sample image; mapping each sub-vector into a sub-classifier of a corresponding type through a segmented classifier mapping model, so as to obtain a target classifier, wherein the target classifier comprises: mapping each sub-vector in the feature set Xi to a sub-classifier of a corresponding type through the segmented classifier mapping model, and determining a target classifier F through the sub-classifier_i。

Further, mapping each sub-vector in the feature set Xi to a sub-classifier of a corresponding type through the segmented classifier mapping model, and determining a target classifier F through the sub-classifier_iThe method comprises the following steps: mapping each sub-vector in the feature set Xi to a sub-classifier of a corresponding type through a segmented classification mapping function in the segmented classifier mapping model to obtain a plurality of sub-classifiersA sub-classifier; cascading the plurality of sub-classifiers to obtain the target classifier F_i。

Further, mapping each sub-vector in the feature set Xi to a corresponding type of sub-classifier by a segment classification mapping function in the segment classifier mapping model comprises: by the formula

Collecting the features X_iIs mapped to a corresponding type of sub-classifier, wherein,

representing said feature set X_iThe t-th sub-vector of

The sub-classifiers corresponding to the sub-classifiers,

representing the piecewise-classified mapping function, t being 1 to n in sequence_B，n_BIs the feature set X_iThe number of neutron vectors.

Further, for the test sample image with class A_iThe test sample image is subjected to feature extraction to obtain a feature set X_iThe method comprises the following steps: for the class A in the test sample image_iEach test example image is subjected to feature extraction to obtain a feature set

Wherein x is_jRepresents that the category is A_iN of the j test case image_eFor the type A in the test sample image_iThe number of test case images of (1); according to the formula

For feature sets

Calculating to obtain the feature set X_i。

Further, performing feature extraction on the test sample image to obtain a feature set, including: and performing feature extraction on the test example image through a bilinear neural network to obtain a bilinear feature set of the test example image, wherein the bilinear feature set comprises a plurality of sub-vectors.

Further, the feature extraction of the test example image through a bilinear neural network to obtain a bilinear feature set of the test example image includes: extracting a first bilinear feature set of the test example image through a first feature extraction network in the bilinear neural network; extracting a second bilinear feature set of the test exemplar image through a second branch feature extraction network in the bilinear neural network; and performing an outer product operation on the first bilinear feature set and the second bilinear feature set to obtain a bilinear feature set of the test example image.

Further, the image recognition of the image to be recognized by the target classifier further includes: extracting the features of the image to be recognized to obtain a target feature matrix, wherein the target feature matrix comprises feature information of the image to be recognized; and carrying out image recognition on the target feature matrix through the target classifier.

Further, the number of the target classifiers is multiple, and performing image recognition on the target feature matrix of the image to be recognized by the target classifier includes: performing image recognition on the target feature matrix through each target classifier to obtain a plurality of class confidence coefficients; determining the category of the image to be recognized based on the category of the target classifier corresponding to the target category confidence in the plurality of category confidences, wherein the target category confidence is the confidence of the plurality of category confidences which is greater than a preset threshold.

Further, the image recognition of the feature vector of the image to be recognized by each target classifier to obtain a plurality of class confidence degrees comprises: and carrying out inner product operation on the feature matrix of the target classifier and the target feature matrix, and taking an operation result as the class confidence.

Further, the method further comprises: acquiring a training sample set; wherein the training sample set comprises training images and the training image label information, the label information is used for representing the classes of the training images, and the training images comprise training example images and challenge set images; extracting features of the training example images to obtain a feature set of the training example images; and carrying out small sample training on a segmentation classification mapping function in the original segmentation classifier mapping model through the feature set of the training example image and the label information of the training example image to obtain the trained original segmentation classifier mapping model.

Further, the method further comprises: carrying out image recognition on the inquiry set image through the trained original segmentation classifier mapping model, and determining the value of a classification loss function based on the recognition result; and adjusting the parameters of the original segmentation classification mapping function based on the value of the classification loss function.

In a second aspect, an embodiment of the present invention provides an image recognition apparatus, including: the device comprises an acquisition unit, a recognition unit and a recognition unit, wherein the acquisition unit is used for acquiring a test sample, the test sample comprises a test example image and an image to be recognized, and the category of an object in the test example image comprises the category of the object in the image to be recognized; the characteristic extraction unit is used for extracting characteristics of the test sample image to obtain a characteristic set, wherein the characteristic set comprises a plurality of sub-vectors, and the sub-vectors are component characteristic vectors of objects in the sample image; and the mapping identification unit is used for mapping each sub-vector into a sub-classifier of a corresponding type through a segmented classifier mapping model, and determining a target classifier through the sub-classifier so as to perform image identification on the image to be identified through the target classifier, wherein the segmented classifier mapping model is a model obtained after small sample learning.

In a third aspect, an embodiment of the present invention provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the image recognition method described in any one of the above when executing the computer program.

In a fourth aspect, an embodiment of the present invention provides a computer-readable medium having non-volatile program code executable by a processor, the program code causing the processor to perform the image recognition method of any one of the above claims.

In the embodiment of the invention, firstly, a test sample is obtained, wherein the test sample comprises a test example image and an image to be identified, and the category of an object in the test example image comprises the category of the object in the image to be identified; then, extracting features of the test sample image to obtain a feature set, wherein the feature set comprises a plurality of sub-vectors, and the sub-vectors are part feature vectors of objects in the sample image; and finally, mapping each sub-vector into a sub-classifier of a corresponding type through a segmented classifier mapping model, determining to obtain a target classifier through the sub-classifier, and carrying out image recognition on an image to be recognized through the target classifier, wherein the segmented classifier mapping model is a model obtained after small sample learning.

In this embodiment, the segmented classifier mapping model may also be referred to as a fine-grained image recognition model, and the model can enable the segmented classifier mapping model to learn a learning paradigm of the small sample learning task through the small sample learning task and be used for testing image recognition to perform accurate image recognition on an image to be recognized, thereby solving a technical problem that a conventional fine-grained image recognition technology is too dependent on a large amount of fine-grained images, so that the conventional fine-grained image recognition technology is not dependent on a technical effect of the large amount of fine-grained images.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a schematic diagram of an electronic device;

FIG. 2 is a flow chart of an image recognition method according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a fine-grained level image recognition model according to an embodiment of the present invention;

fig. 4 is a schematic diagram of an image recognition apparatus according to an embodiment of the present invention.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, not all, embodiments of the present invention. First, an electronic device 100 for implementing an embodiment of the present invention, which may be used to execute the image recognition method of embodiments of the present invention, is described with reference to fig. 1.

As shown in FIG. 1, electronic device 100 includes one or more processors 102, one or more memories 104, an input device 106, an output device 108, and an image collector 110, which are interconnected via a bus system 112 and/or other form of connection mechanism (not shown). It should be noted that the components and structure of the electronic device 100 shown in fig. 1 are exemplary only, and not limiting, and the electronic device may have other components and structures as desired.

The processor 102 may be implemented in at least one hardware form of a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), and an asic (application Specific Integrated circuit), the processor 102 may be a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or other form of processing unit having data processing capability and/or instruction execution capability, and may control other components in the electronic device 100 to perform desired functions.

The memory 104 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. On which one or more computer program instructions may be stored that may be executed by processor 102 to implement client-side functionality (implemented by the processor) and/or other desired functionality in embodiments of the invention described below. Various applications and various data, such as various data used and/or generated by the applications, may also be stored in the computer-readable storage medium.

The input device 106 may be a device used by a user to input instructions and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like.

The output device 108 may output various information (e.g., images or sounds) to the outside (e.g., a user), and may include one or more of a display, a speaker, and the like.

The image collector 110 is configured to collect an image, where data collected by the image collector is used to be input to an image recognition method for operation, for example, the image collector may capture an image (e.g., a photo, a video, etc.) desired by a user, and then input the image to the image recognition method for operation, and the image collector may further store the captured image in the memory 104 for use by other components.

Exemplarily, an electronic device for implementing an image recognition method according to an embodiment of the present invention may be implemented as a smart terminal such as a video camera, a snapshot machine, a smart phone, a tablet computer, and the like.

As can be known from the description of the background art, the fine-grained image recognition technology depends on a large amount of fine-grained images, so that the development of the fine-grained image recognition technology and the application thereof in a real scene are limited. Against humans, the ability to learn new concepts with little oversight information, for example, for an average adult to learn to identify a new species of birds with only a few images.

In order to enable the fine-grained image recognition model to have learning capacity under a small number of training samples like a human, the invention firstly provides and researches a small number of sample learning tasks of the fine-grained image recognition model. The fine-grained image recognition task based on a small number of training samples requires that the model is trained to obtain an ideal fine-grained object classifier under the condition that only a plurality of (generally one or five) marked samples exist, so that the recognition task is completed. The "several" marked samples are often referred to as "example images" or "examples" (exemplars), or "example samples", etc. It can be seen that, because the fine-grained image markers are difficult to acquire and mass data are difficult to acquire, the task has a huge prospect in practical application, but the task difficulty is greatly increased because the supervision information provided by a small number of samples is extremely limited. The image recognition method will be described below with reference to specific embodiments.

In accordance with an embodiment of the present invention, there is provided an embodiment of an image recognition method, it should be noted that the steps illustrated in the flowchart of the accompanying drawings may be performed in a computer system such as a set of computer-executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.

Fig. 2 is a flowchart of an image recognition method according to an embodiment of the present invention, as shown in fig. 1, the method including the steps of:

step S202, a test sample is obtained, wherein the test sample comprises a test example image and an image to be identified, and the category of the object in the test example image comprises the category of the object in the image to be identified.

In an embodiment of the present invention, the test sample image is the marked sample described above, that is, the test sample image is a sample marked with a type, wherein the type in the embodiment is determined by the type of the object included in the image.

The test sample image further includes label information, and the label information is used for characterizing the category of the corresponding test sample image.

Step S204, extracting the characteristics of the test sample image to obtain a characteristic set, wherein the characteristic set comprises a plurality of subvectors, and the subvectors are the component characteristic vectors of the object in the test sample image.

In the embodiment of the present invention, after the feature extraction is performed on the test sample image, the obtained feature set includes a plurality of sub-vectors, and each sub-vector is used for characterizing a feature of an implicit component of an object in the corresponding test sample image.

It should be noted that, in this embodiment, the component refers to a component obtained by implicitly dividing the image, that is, the component may be a local area of an object in the test sample image.

Implicit division refers to implicitly dividing an image block into a plurality of parts, and recording the position information of the parts in the image block. The image block after the implicit division is still represented originally, and only the part information existing in the image block is recorded. The corresponding display division means that the image block is actually broken down into a plurality of small component image blocks, and the original image block does not exist.

And step S206, mapping each sub-vector into a sub-classifier of a corresponding type through a segmented classifier mapping model, and determining a target classifier through the sub-classifier, so as to perform image recognition on an image to be recognized through the target classifier, wherein the segmented classifier mapping model is a model obtained after small sample learning.

In the embodiment of the present invention, a segment Classifier Mapping model (PCM) may also be referred to as a granularity level image recognition deep network model PCM.

In this embodiment, the segmenting means that each sub-vector is mapped to a sub-classifier of a corresponding type according to different mapping manners, so as to obtain a target classifier.

In this embodiment, the segmented classifier mapping model may also be referred to as a fine-grained level image recognition model, and the model can enable the segmented classifier mapping model to learn a learning paradigm (learning paradigm) of a small sample learning task through the small sample learning task, and is used for testing image recognition to perform accurate image recognition on an image to be recognized, thereby solving a technical problem that a conventional fine-grained level image recognition technology is too dependent on a large amount of fine-grained level images, so that the conventional fine-grained level image recognition technology does not depend on a technical effect of the large amount of fine-grained level images any more.

In an alternative embodiment, if the test sample image has a plurality of categories, then in step S204, the extracting features of the test sample image to obtain the feature set includes: for the type A in the test sample image_iThe test sample image is subjected to feature extraction to obtain a feature set X_iWherein, class A_iFor the ith category in the plurality of categories, i is 1 to k in turn, and k is the number of categories of the test sample image.

If there are multiple classes of the test sample image, then in step S206, mapping each sub-vector to a corresponding type of sub-classifier through the segmented classifier mapping model, and obtaining the target classifier includes: mapping model of feature set X by segment classifier_iEach sub-vector in (a) is mapped to a corresponding type of sub-classifier, and a target score is determined by the sub-classifierClass device F_i。

Specifically, it is assumed that the types of the test sample images are plural, for example, k. At this time, feature extraction may be performed on the test sample image of each type in the test sample image to obtain a corresponding feature set. For example, sequentially for class A₁To A_kThe test sample image is subjected to feature extraction, and a corresponding feature set X is obtained₁To X_k。

Obtaining the corresponding feature set X₁To X_kThen, each sub-vector in each feature set can be mapped into a sub-classifier of a corresponding type through a segmented classifier mapping model, and then a target classifier is determined according to the sub-classifiers.

Is represented by A₁To A_kClass A of_iThe description is given for the sake of example. First, for the type A in the test sample image_iThe test sample image is subjected to feature extraction to obtain a feature set X_iWherein, the feature set X_iIncluding a plurality of sub-vectors; then, feature set X is mapped to model by segment classifier_iEach sub-vector in the sub-vector mapping table is mapped to a sub-classifier of a corresponding type to obtain a plurality of sub-classifiers; finally, a target classifier F may be determined from the plurality of sub-classifiers_i。

As can be seen from the above description, in the present embodiment, a segmented mapping manner is adopted, that is, the feature set X_iEach sub-vector in (a) is mapped to a corresponding type of sub-classifier. Compared with the global mapping method, the segmented mapping method simplifies the global mapping method, and can reduce the training difficulty of the network.

Optionally, in this embodiment, the category in the test sample image is a_iWhen the feature extraction is performed on the test sample image, the feature extraction may be performed on the test sample image through the following implementation manner to obtain a feature set, where the implementation manner specifically includes:

and performing feature extraction on the test example image through a bilinear neural network to obtain a bilinear feature set of the test example image, wherein the bilinear feature set comprises a plurality of sub-vectors.

In this embodiment, feature extraction may be performed on the example image through a bilinear neural network (bilinear network), so as to obtain a bilinear feature set.

For example, for the type A in the test sample image_iThe test sample image is subjected to feature extraction to obtain a feature set X_iWherein, the feature set X_iAlso called bilinear feature set, in which n is included_BSubvector x^tI.e. the set is represented as:

in the present embodiment, feature sets

In the above description, the subscripts are identification information of the test case images belonging to the same category, and the subscripts are identification information of the components in a sample.

For example, feature sets

X in (2)_jIndicates that the object belongs to the category A_iThe j test example image in the test example image, wherein x_jCan be expressed as

A sub-vector representing the component t in the jth test case image.

In this embodiment, a set of implicit component feature vectors of an object in an image that cannot be obtained by another model can be obtained by using a bilinear neural network. The method can effectively help the identification of the objects with fine granularity by extracting the set of the implicit component feature vectors of the objects in the image, thereby improving the identification precision of the objects with fine granularity.

Further, the step of extracting the features of the test sample image through the bilinear neural network to obtain a bilinear feature set of the test sample image includes the following steps:

step S11, extracting a first bilinear feature set of the test sample image through a first feature extraction network in the bilinear neural network;

step S12, extracting a second bilinear feature set of the test sample image through a second branch feature extraction network in the bilinear neural network;

step S13, performing an outer product operation on the first bilinear feature set and the second bilinear feature set to obtain a bilinear feature set of the test sample image.

In this embodiment, the bilinear neural network includes two network branches, which are the first branch feature extraction networks f_AAnd a second sub-feature extraction network f_B. In the present invention, the network f can be extracted by the first branch feature_AAnd a second sub-feature extraction network f_BFeature extraction is respectively carried out on the test sample image, and a first bilinear feature set and a second bilinear feature set are respectively obtained. And then performing an outer product operation on the first bilinear feature set and the second bilinear feature set.

For example, collections

X in (2)^tIt can be seen that the network f is extracted from the first feature_AThe obtained feature and the second sub-feature extraction network f_BAnd calculating the obtained feature outer product.

The specific outer product calculation method is as follows: assume that the first bilinear feature set contains 49 512-dimensional sub-vectors, each of which actually corresponds to an image region of the original image according to the CNN characteristic. Also the assumption in the second bilinear feature set includes 49 512-dimensional subvectors. The outer product refers to an outer product operation of sub-vectors obtained by two network branches of the bilinear neural network, specifically, for a first sub-vector of the second bilinear feature set, a first dimension of the first sub-vector is multiplied by a first sub-vector in the first bilinear feature set to obtain a 512-dimensional result; continuing to perform product operation on the second dimension and the first sub-vector in the first bilinear feature set to obtain another 512-dimensional result; and repeatedly executing the operations until all dimensions of the first sub-vector in the second bilinear feature set are acted with the first sub-vector in the first bilinear feature set. At this point, 512 results of 512 dimensions are obtained, and we concatenate the 512 results directly into a 262144-dimension vector, which is the outer product of the first sub-vector in the second bi-linear feature set and the first sub-vector in the second bi-linear feature set. Then, the operations are performed on the second, third, … …, and forty-nine sub-vectors, respectively, that is, all the dimensions in the second sub-vector in the second bilinear feature set are completely acted on the second sub-vector in the first bilinear feature set, and so on, until all the dimensions in the forty-ninth sub-vector in the second bilinear feature set are completely acted on the forty-ninth sub-vector in the first bilinear feature set, so that 49 262144 outer product results can be obtained. Finally, the 49 outer product results are averaged to obtain the final 262144 dimension outer product result vector.

Wherein, aggregate

In x^tI.e. the t-th of the averaged 512-dimensional results.

Class A in the test sample image in the manner described above_iThe test sample image is subjected to feature extraction to obtain a feature set X_iThen, each sub-vector in the feature set Xi is mapped into a sub-classifier of a corresponding type through a segmented classifier mapping model, and a target classifier F is obtained_i。

If the category is A_iThe number of the images in the test sample image is multiple, and when the feature extraction is performed on each test sample image according to the method, a set is obtained

Based on this, the class is A in the test sample image_iWhen the feature extraction is performed on the test sample image, the type of the test sample image is A_iEach test example image is subjected to feature extraction to obtain a feature set

Wherein x is_jIs represented by class A_iN of the j test case image_eFor testing a class A in a sample image_iNumber of test sample images. Wherein x is_jCan be expressed as:

in the present embodiment, the category a can be matched by the method described in the above-described step S11 to step S13_iFeature extraction is performed on each test case image to obtain a feature set of each test case image, for example,

after obtaining the feature set of each test case image, { x } can be obtained_jSet of (C) } s

Obtaining a feature set

Then, the formula can be followed

For feature sets

Calculating to obtain a feature set X_i. Wherein, X_iCan be expressed as

The middle subscript being the sample identity and the superscript being the identity of the part t in the sample, i.e.

Representing the feature set of part t in sample i.

In an optional embodiment, each sub-vector in the feature set Xi is mapped to a sub-classifier of a corresponding type through a segmented classifier mapping model, so as to obtain a target classifier F_iThe method comprises the following steps:

firstly, mapping each sub-vector in a feature set Xi into a sub-classifier of a corresponding type through a segmented classification mapping function in a segmented classifier mapping model to obtain a plurality of sub-classifiers;

then, cascading a plurality of sub-classifiers to obtain a target classifier F_i。

The main function of the segmented classifier mapping model is to map the feature set (or bilinear feature set) of the test case image to the classifier of its corresponding class. An intuitive solution to this is to map the bilinear representation directly from the feature space to the classifier space through global mapping, taking linear mapping as an example: f_i＝W_gX_i+b_gWherein W is_g∈R^D×DAnd b_g∈R^DParameters representing a global mapping function.

Careful analysis can find that the above global mapping has the following two disadvantages. First, due to example feature X_iGlobal information at the category level is included and thus its distribution is more complex. Learning global mappings based on this feature presents a significant challenge to find corresponding class classifiers. Secondly, the high dimensionality of the bilinear feature causes the amount of parameters to be learned by the mapping function to be extremely large, which brings great difficulty to model training.

To close the aboveThe invention provides a novel mapping strategy, namely a segment mapping method of a segment Classifier mapping model (PCM). As above, bilinear feature set X_iCan be viewed as a subvector

While each subvector can be considered as an implicit component feature set. Intuitively, in the fine-grained level identification process, if the similarity between the component and the corresponding component in the sample image is judged according to the component, the fine-grained level category attribution of the image to be identified can be judged. This causes us to map the feature set of the component to the class classifier corresponding to the component using the segmentation classification mapping function in the segmentation classifier mapping model at the component level, and then cascade the classifiers corresponding to the respective components as the whole object level classifier. The whole learning process of the segment map is shown in fig. 3, and the working process of fig. 3 will be described in the following embodiment.

In this embodiment, optionally, mapping each sub-vector in the feature set Xi to a sub-classifier of a corresponding type in the following manner includes:

by the formula

Set the characteristics X_iIs mapped to a corresponding type of sub-classifier, wherein,

representing a feature set X_iThe t-th sub-vector of

The sub-classifiers corresponding to the sub-classifiers,

representing a piecewise-categorical mapping function, t being 1 to n in sequence_B，n_BIs a feature set X_iThe number of neutron vectors.

Specifically, feature set X_iThe t-th sub-vector of

First by a multilayer perceptron (Multi layer perceptron)

Mapped to a corresponding type of sub-classifier. Wherein, the above formula can be passed

Sub-vectors

Mapped to a corresponding type of sub-classifier. Wherein the content of the first and second substances,

the mapping function for the segment classification can be specifically expressed as:

in the segmented classification mapping function,

for dividing subvectors

Is mapped to a sub-classifier of the corresponding type,

for dividing subvectors

Is mapped to a sub-classifier of the corresponding type,

for dividing subvectors

Is mapped to a sub-classifier of the corresponding type,

in the sub-vector

Mapped to a corresponding type of sub-classifier.

Feature set X is assembled in the manner described above_iAfter the t-th sub-vector in the sub-classifier is mapped to the corresponding type of sub-classifier, a plurality of sub-classifiers are obtained. Then, after cascading a plurality of sub-classifiers, the target classifier F can be obtained_iIs shown as F_i＝[F¹；F²；...；F^nB]。

It should be noted that if the class of the test sample image is k, k object classifiers are obtained, which in turn are: f₁，F₂，…，F_i，…，F_k。

As can be seen from the above description, in the image recognition method provided in this embodiment, a piecewise function mapping method is used to map each sub-vector in the bilinear feature set to a corresponding type of sub-classifier, so that the aforementioned global mapping method is greatly simplified, and the training difficulty of the network is further reduced.

It should be noted that the piecewise function mapping method described in this embodiment may also greatly reduce the network parameters in the classifier generation phase. Using a single layer mapping as an example, assume n_A＝n_B512, this gives 512²Bilinear feature sets of dimensions. In this case, the global mapping model requires 512⁴One parameter, and the segment mapping method only needs to be (512 × 512) × 512 ═ 512³A mapping parameter, wherein n_AExtracting dimensions, n, of a first bilinear feature set extracted by a network for a first feature_BExtracting the dimensionality of a second bilinear feature set extracted by the network for a second sub-feature, in general, n_A＝n_B。

In the present embodimentK object classifiers F obtained by the method described above₁，F₂，…，F_i，…，F_kThen, the image to be recognized can be subjected to image recognition through the k target classifiers.

In an optional embodiment, the image recognition of the image to be recognized by the target classifier further comprises: firstly, extracting the characteristics of the image to be recognized to obtain a target characteristic matrix, wherein the target characteristic matrix comprises the characteristic information of the image to be recognized; then, the target feature matrix is subjected to image recognition through the target classifier.

Specifically, in this embodiment, feature extraction may be performed on the image to be recognized through a bilinear neural network, so as to obtain a target feature matrix of the image to be recognized. The target feature matrix of the image to be recognized may be represented as:

N^ta feature vector representing the part t of the image to be recognized. After the bilinear feature matrix of the image to be recognized is obtained, the target feature matrix can be subjected to image recognition through the target classifier.

Optionally, if the number of the target classifiers is multiple, performing image recognition on the target feature matrix of the image to be recognized by the target classifier includes the following steps:

firstly, carrying out image recognition on the target feature matrix through each target classifier to obtain a plurality of category confidence coefficients;

performing image recognition on the feature vectors of the image to be recognized through each target classifier, and obtaining multiple category confidence degrees comprises the following steps: and carrying out inner product operation on the feature matrix of the target classifier and the target feature matrix, and taking an operation result as the class confidence.

Then, the category of the image to be recognized is determined based on the category of the target classifier corresponding to the target category confidence in the plurality of category confidences, wherein the target category confidence is the confidence of the plurality of category confidences which is larger than a preset threshold.

If k (k is greater than 1) object classifiers are included in the object classifier, the specific identification process is described as follows:

and sequentially carrying out image recognition on the target feature matrix of the image to be recognized through each target classifier in the k target classifiers to obtain a corresponding recognition result, wherein the recognition result can be a category confidence coefficient, the category confidence coefficient represents the possibility that the type of the object to be recognized is the type corresponding to the target classifier, and the category confidence coefficient can be a numerical value in a range of 0-1. After the k target classifiers recognize the images to be recognized, k category confidences are obtained. At this time, the category of the image to be recognized can be determined by selecting the category of the target classifier corresponding to the target category confidence from the k category confidences; or determining the class of the target classifier corresponding to the maximum confidence coefficient as the class of the image to be recognized.

It should be noted that, in the embodiment of the present invention, before the test sample is identified, the initial model of the segmentation classifier mapping model (i.e., the original segmentation classifier mapping model) needs to be learned and trained, and a specific training process is described as follows:

step S21, acquiring a training sample set; the training sample set comprises training images and training image label information, the label information is used for representing the types of the training images, and the training images comprise training example images and inquiry set images;

step S22, extracting the features of the training example image to obtain a feature set of the training example image;

step S23, carrying out small sample training on the segmentation classification mapping function in the original segmentation classifier mapping model through the feature set of the training example image and the label information of the training example image to obtain the trained original segmentation classifier mapping model;

step S24, carrying out image recognition on the images of the inquiry set through the trained original segmented classifier mapping model, and determining the value of a classification loss function based on the recognition result;

and step S25, adjusting the parameters of the segmented classification mapping function based on the values of the classification loss function.

Specifically, in the present embodiment, first, an auxiliary data set B including Q marker images is acquired, B { (I)₁,y₁),(I₂,y₂)，…，(I_i,y_i)，…，(I_R,y_R) In which I_iAs an image sample, y_i∈{1,2,...,C_BAnd marking information for the corresponding category of the image.

Before training the original segmentation classifier mapping model, a small number of sample recognition tasks similar to the test environment need to be constructed in the auxiliary data set B. Specifically, at least one meta-training set is first randomly sampled from the auxiliary data set B, and each meta-training set includes C_E<C_BA randomly sampled class and an image sample belonging to the class. Then, each meta-training set is divided into two parts, namely a training example set E and a challenge set Q, wherein the image samples in the training example set E play the role of a small number of training samples, and the challenge set Q is used for evaluating the recognition performance of the classifier after learning.

After each element training set is divided according to the method, a training sample set is obtained; the training sample set comprises training images and training image label information, the label information is used for representing the classes of the training images, and the training images comprise training example images and inquiry set images. The training example images are images in a training example set, and the inquiry set images are images in the inquiry set.

Specifically, the training example set E includes a plurality of classes of training samples, and each class of training sample includes Ne (typically 1 or 5) training samples. The image samples in the challenge set Q are the rest of the images in the meta-training set (or training samples) except the example set.

After the training sample set is obtained, the original segmentation class mapping function can be trained by the training sample set. In the embodiment, the main idea of training is to hope that the model learns the learning paradigm of fine-grained level object recognition under a small number of samples, specifically, the original segmentation classifier mapping model can learn the mapping from the "example to the class classifier" through the training sample set, so as to complete the learning of the learning paradigm of the fine-grained level object recognition model, and perform the recognition task of the image based on the learning result.

Before training an original segmentation classification mapping function through a training sample set, firstly, performing feature extraction on training example images in the training sample set to obtain a feature set of the training example images; and then, carrying out small sample training on the segmentation classification mapping function in the original segmentation classifier mapping model through the feature set of the training example image and the label information of the training example image to obtain the trained original segmentation classifier mapping model. It should be noted that, in this embodiment, the training process of the original segmentation classifier mapping model may be understood as a process of learning and training the segmentation classification mapping function in the original segmentation classifier mapping model.

In this embodiment, after the primitive segmentation classifier mapping model is trained through the training example image, the image recognition on the image of the challenge set can be performed through the trained primitive segmentation classifier mapping model, so as to generate a classification error (a classification loss, or a value of a loss function). The obtained classification error is used for updating parameters of the segmented classification mapping function.

The process of evaluating the trained raw segment classifier mapping model through the challenge set image can also be described as follows:

wherein λ represents the target classifier F corresponding to the training example set E_EL is a loss function,

representing the object classifier F to be generated from the set of training examples E_EApplied to the challenge set Q.

The above embodiment will be further described with reference to fig. 3. Fig. 3 is a schematic structural diagram of a fine-grained level image recognition model. As shown in fig. 3, the fine-grained image recognition model includes: the system comprises a representation learning module and a classifier mapping module, wherein the representation learning module comprises a bilinear neural network, the classifier mapping module comprises a segmented classifier mapping model, the segmented classifier mapping model comprises a segmented mapping network, and the segmented mapping network maps a function through a segmented classification

To

To implement the mapping of the subvectors in the feature set.

As shown in fig. 3, in the present embodiment, first, an Input image (Input Images) is acquired, where the Input image includes an image in a training sample set or an image in a test sample in the above-described embodiment. After the Input Images are acquired, feature extraction may be performed on the Input Images through a bilinear neural network (bilinear network). Specifically, the feature extraction may be performed in the manner described in steps S11 to S13 in the above embodiment, and details are not repeated in this embodiment.

After feature extraction is performed on the input image through a bilinear neural network (bilinear network), a bilinear feature set (or segmented bilinear feature bilinear features) of the input image can be obtained. As shown in FIG. 3, after obtaining the bilinear feature set of the input image, the segmentation classification mapping function in the model may be mapped through the segmentation classifier

And mapping the sub-vectors in the feature set to the sub-classifiers of the corresponding types. As shown in fig. 3, segmentationThe mapping network includes: a piecewise linear feature layer, a hidden layer and a piecewise class classifier layer.

As can be seen from the above description, in the present embodiment, an image recognition method is proposed, which performs image recognition by using a fine-grained level image recognition deep network model PCM based on a small number of training samples, where PCM may also be referred to as a segmented classifier mapping model.

In this embodiment, the segmented classifier mapping model may also be referred to as a fine-grained image recognition model, and the network may learn a learning paradigm of the small sample learning task through the small sample learning task, and use the learning paradigm for testing image recognition to perform accurate image recognition on an image to be recognized, thereby solving a technical problem that a conventional fine-grained image recognition technology is too dependent on a large amount of fine-grained images, so that the conventional fine-grained image recognition technology is not dependent on a technical effect of the large amount of fine-grained images.

The embodiment of the present invention further provides an image recognition apparatus, which is mainly used for executing the image recognition method provided by the foregoing content of the embodiment of the present invention, and the image recognition apparatus provided by the embodiment of the present invention is specifically described below.

Fig. 4 is a schematic diagram of an image recognition apparatus according to an embodiment of the present invention, as shown in fig. 4, the image recognition apparatus mainly includes an acquisition unit 10, a feature extraction unit 20, and a mapping recognition unit 30, wherein:

the system comprises an acquisition unit 10, a recognition unit and a recognition unit, wherein the acquisition unit is used for acquiring a test sample, the test sample comprises a test example image and an image to be recognized, and the category of an object in the test example image comprises the category of the object in the image to be recognized;

a feature extraction unit 20, configured to perform feature extraction on the test sample image to obtain a feature set, where the feature set includes a plurality of sub-vectors, and the sub-vectors are feature vectors of components of an object in the sample image;

the mapping identification unit 30 is configured to map each sub-vector to a sub-classifier of a corresponding type through a segmented classifier mapping model, and determine a target classifier through the sub-classifier, so as to perform image identification on the image to be identified through the target classifier, where the segmented classifier mapping model is a model obtained after small sample learning.

Optionally, the feature extraction unit includes: a first feature extraction module for extracting A class from the test sample image_iThe test sample image is subjected to feature extraction to obtain a feature set X_iWherein, class A_iTaking 1 to k in sequence for the ith category in the plurality of categories, wherein k is the number of categories of the test sample image; the mapping identification unit includes: a mapping module, configured to map each sub-vector in the feature set Xi to a sub-classifier of a corresponding type through the segmented classifier mapping model, and determine a target classifier F through the sub-classifier_i。

Optionally, the mapping module is configured to: mapping each sub-vector in the feature set Xi into a sub-classifier of a corresponding type through a segmented classification mapping function in the segmented classifier mapping model to obtain a plurality of sub-classifiers; cascading the plurality of sub-classifiers to obtain the target classifier F_i。

Optionally, the mapping module is further configured to: by the formula

Collecting the features X_iThe tth sub-unit ofThe vectors are mapped to corresponding types of sub-classifiers, where,

representing said feature set X_iThe t-th sub-vector of

The sub-classifiers corresponding to the sub-classifiers,

Optionally, the first feature extraction module is configured to: for the class A in the test sample image_iEach test example image is subjected to feature extraction to obtain a feature set

For feature sets

Calculating to obtain the feature set X_i。

Optionally, the feature extraction unit further includes: the second feature extraction module is configured to perform feature extraction on the test sample image through a bilinear neural network to obtain a bilinear feature set of the test sample image, where the bilinear feature set includes a plurality of sub-vectors.

Optionally, the second feature extraction module is configured to: extracting a first bilinear feature set of the test example image through a first feature extraction network in a bilinear neural network; extracting a second bilinear feature set of the test example image through a second branch feature extraction network in the bilinear neural network; and performing outer product operation on the first bilinear feature set and the second bilinear feature set to obtain a bilinear feature set of the test sample image.

Optionally, the mapping identification unit further includes: the characteristic extraction module is used for extracting the characteristics of the image to be identified to obtain a target characteristic matrix, wherein the target characteristic matrix comprises the characteristic information of the image to be identified; and the identification module is used for carrying out image identification on the target characteristic matrix through the target classifier.

Optionally, the identification module is further configured to: under the condition that the number of the target classifiers is multiple, performing image recognition on the target feature matrix through each target classifier to obtain multiple class confidence coefficients; determining the category of the image to be recognized based on the category of the target classifier corresponding to the target category confidence in the plurality of category confidences, wherein the target category confidence is the confidence of the plurality of category confidences which is greater than a preset threshold.

Optionally, the identification module is further configured to: and carrying out inner product operation on the feature matrix of the target classifier and the target feature matrix, and taking an operation result as the class confidence.

Optionally, the apparatus is further configured to: acquiring a training sample set; the training sample set comprises training images and training image label information, the label information is used for representing the types of the training images, and the training images comprise training example images and inquiry set images; extracting features of the training example images to obtain a feature set of the training example images; and carrying out small sample training on the segmentation classification mapping function in the original segmentation classifier mapping model through the feature set of the training example image and the label information of the training example image to obtain the trained original segmentation classifier mapping model.

Optionally, the apparatus is further configured to: carrying out image recognition on the images of the challenge set through the trained original segmented classifier mapping model, and determining the value of a classification loss function based on the recognition result; and adjusting parameters of the original segmentation classification mapping function based on the value of the classification loss function.

In another embodiment, a computer-readable medium having non-volatile program code executable by a processor, the program code causing the processor to perform any of the above-described method embodiments is also provided.

In another embodiment, a computer program is also provided, which may be stored on a storage medium in the cloud or locally. When being executed by a computer or a processor, the computer program is used for executing the corresponding steps of the image recognition method of the embodiment of the invention and realizing the corresponding modules in the image recognition device according to the embodiment of the invention.

The device provided by the embodiment of the present invention has the same implementation principle and technical effect as the method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the method embodiments without reference to the device embodiments.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. An image recognition method, comprising:

obtaining a test sample, wherein the test sample comprises a test example image and an image to be identified, and the category of an object in the test example image comprises the category of the object in the image to be identified;

extracting features of the test sample image to obtain a feature set, wherein the feature set comprises a plurality of sub-vectors, and the sub-vectors are feature vectors of parts of objects in the sample image;

mapping each sub-vector into a sub-classifier of a corresponding type through a segmented classifier mapping model, and determining a target classifier through the sub-classifier, wherein the segmented classifier mapping model is a model obtained after learning through a small sample;

performing image recognition on the image to be recognized through the target classifier;

wherein the segmented classifier mapping model is a fine-grained level image recognition model.

2. The method of claim 1, wherein the class of the object in the test example image is at least one;

performing feature extraction on the test sample image to obtain a feature set, wherein the feature set comprises: for the class A in the test sample image_iThe test sample image is subjected to feature extraction to obtain a feature set X_iWherein, class A_iTaking 1 to k in sequence for the ith category in the plurality of categories, wherein k is the number of categories of the test sample image;

mapping each sub-vector into a sub-classifier of a corresponding type through a segmented classifier mapping model, so as to obtain a target classifier, wherein the target classifier comprises: mapping each sub-in the feature set Xi by the segment classifier modelThe vectors are mapped into corresponding types of sub-classifiers, and a target classifier F is determined by the sub-classifiers_i。

3. The method according to claim 2, wherein each sub-vector in the feature set Xi is mapped to a corresponding type of sub-classifier by the segmented classifier mapping model, and a target classifier F is determined by the sub-classifier_iThe method comprises the following steps:

mapping each sub-vector in the feature set Xi into a sub-classifier of a corresponding type through a segmented classification mapping function in the segmented classifier mapping model to obtain a plurality of sub-classifiers;

cascading the plurality of sub-classifiers to obtain the target classifier F_i。

4. The method of claim 3, wherein mapping each sub-vector in the feature set Xi to a corresponding type of sub-classifier via a segment classification mapping function in the segment classifier mapping model comprises:

by the formula

Collecting the features X_iIs mapped to a corresponding type of sub-classifier, wherein F_i ^tRepresenting said feature set X_iThe t-th sub-vector of

The sub-classifiers corresponding to the sub-classifiers,

5. The method of claim 2, wherein the step of removing the substrate comprises removing the substrate from the substrateFor the class A in the test sample image_iThe test sample image is subjected to feature extraction to obtain a feature set X_iThe method comprises the following steps:

for the class A in the test sample image_iEach test example image is subjected to feature extraction to obtain a feature set

Wherein x is_jRepresents that the category is A_iN of the j test case image_eFor the type A in the test sample image_iThe number of test case images of (1);

according to the formula

For feature sets

Calculating to obtain the feature set X_i。

6. The method of any one of claims 1 to 5, wherein performing feature extraction on the test case image to obtain a feature set comprises:

7. The method of claim 6, wherein the feature extraction of the test example image through a bilinear neural network to obtain a bilinear feature set of the test example image comprises:

extracting a first bilinear feature set of the test example image through a first feature extraction network in the bilinear neural network;

extracting a second bilinear feature set of the test exemplar image through a second branch feature extraction network in the bilinear neural network;

and performing an outer product operation on the first bilinear feature set and the second bilinear feature set to obtain a bilinear feature set of the test example image.

8. The method of claim 1, wherein image recognizing the image to be recognized by the target classifier further comprises:

extracting the features of the image to be recognized to obtain a target feature matrix, wherein the target feature matrix comprises feature information of the image to be recognized;

and carrying out image recognition on the target feature matrix through the target classifier.

9. The method according to claim 8, wherein the number of the target classifiers is multiple, and performing image recognition on the target feature matrix of the image to be recognized by the target classifier comprises:

performing image recognition on the target feature matrix through each target classifier to obtain a plurality of class confidence coefficients;

determining the category of the image to be recognized based on the category of the target classifier corresponding to the target category confidence in the plurality of category confidences, wherein the target category confidence is the confidence of the plurality of category confidences which is greater than a preset threshold.

10. The method of claim 9, wherein performing image recognition on the feature vectors of the image to be recognized by each of the target classifiers to obtain a plurality of class confidences comprises:

and carrying out inner product operation on the feature matrix of the target classifier and the target feature matrix, and taking an operation result as the class confidence.

11. The method of claim 1, further comprising:

acquiring a training sample set; wherein the training sample set comprises training images and the training image label information, the label information is used for representing the classes of the training images, and the training images comprise training example images and challenge set images;

extracting features of the training example images to obtain a feature set of the training example images;

and carrying out small sample training on a segmentation classification mapping function in the original segmentation classifier mapping model through the feature set of the training example image and the label information of the training example image to obtain the trained original segmentation classifier mapping model.

12. The method of claim 11, further comprising:

carrying out image recognition on the inquiry set image through the trained original segmentation classifier mapping model, and determining the value of a classification loss function based on the recognition result;

and adjusting the parameters of the original segmentation classification mapping function based on the value of the classification loss function.

13. An image recognition apparatus, comprising:

the device comprises an acquisition unit, a recognition unit and a recognition unit, wherein the acquisition unit is used for acquiring a test sample, the test sample comprises a test example image and an image to be recognized, and the category of an object in the test example image comprises the category of the object in the image to be recognized;

the characteristic extraction unit is used for extracting characteristics of the test sample image to obtain a characteristic set, wherein the characteristic set comprises a plurality of sub-vectors, and the sub-vectors are component characteristic vectors of objects in the sample image;

the mapping identification unit is used for mapping each sub-vector into a sub-classifier of a corresponding type through a segmented classifier mapping model, determining a target classifier through the sub-classifier, and performing image identification on the image to be identified through the target classifier, wherein the segmented classifier mapping model is a model obtained after small sample learning;

14. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of the preceding claims 1 to 12 when executing the computer program.

15. A computer-readable medium having non-volatile program code executable by a processor, the program code causing the processor to perform the method of any of the preceding claims 1 to 12.