CN110837846B

CN110837846B - Image recognition model construction method, image recognition method and device

Info

Publication number: CN110837846B
Application number: CN201910966342.1A
Authority: CN
Inventors: 李一力; 尉桦; 邵新庆; 刘强; 徐�明
Original assignee: Shenzhen ZNV Technology Co Ltd; Nanjing ZNV Software Co Ltd
Current assignee: Shenzhen ZNV Technology Co Ltd; Nanjing ZNV Software Co Ltd
Priority date: 2019-10-12
Filing date: 2019-10-12
Publication date: 2023-10-31
Anticipated expiration: 2039-10-12
Also published as: CN110837846A

Abstract

A construction method of an image recognition model, an image recognition method and a device, wherein the construction method comprises the following steps: acquiring a sample image, and training the sample image to obtain a first network model and a second network model for image recognition; calculating the maximum mean difference between the first network model and the second network model to form a loss function; and carrying out knowledge distillation processing on the first network model according to the loss function, and training the second network model by utilizing the image identification information distilled by the knowledge to obtain a corresponding image identification model. The maximum mean value difference is adopted to form the loss function, so that the difference between the first network model and the second network model is measured, knowledge distillation processing can still be carried out under different feature layer dimensions, the effectiveness of learning the second network model to the first network model can be ensured, the possibility of multidimensional feature processing is provided, and the application limitation of the knowledge distillation method in the field of image recognition can be overcome.

Description

Image recognition model construction method, image recognition method and device

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method for constructing an image recognition model, and an image recognition method and apparatus.

Background

A complex neural network structure model is a set of several individual models, or a very large network model trained under very strong constraints. Once the complex network model training is complete, another training method can be used: the 'distillation' extracts the reduced model which needs to be configured at the application end from the complex model. Therefore, the concept of distilling neural networks has been proposed.

The article Distilling the Knowledge in a Neural Network by Hinton proposes the concept of knowledge distillation for the first time, and knowledge migration is realized by introducing a teacher network to induce training of student networks. It is somewhat like in nature and transfer learning, but the implementation is not the same, so it is quite visual to adjective the process with the term "distillation". Knowledge distillation refers to the migration of dark knowledges in complex models (e.g., teacher/teacher models) to simple models (e.g., student/student models), where, in general, teacher models have strong capabilities and behavior, and student models are more compact. Then by knowledge distillation it is desirable that the student model be as close as possible to or beyond the teacher model to achieve a similar predictive effect with less complexity. The teacher model is often a model with larger size, complex structure, higher operation amount and better performance, for example, a model resnet100 with accuracy up to 99% and size of 200M; the student model is a model with smaller size, simpler structure, lower operation amount and poorer performance, such as a model mobilet with accuracy of 60 percent and size of 20M.

Currently, the idea of knowledge distillation through distillation neural networks is as follows: the teacher model is limited by various problems such as model size, reasoning time and the like, so that the teacher model cannot be used in practical application occasions, the student model cannot meet practical application scenes due to poor performance, in order to enable the student model to have performance equivalent to that of the teacher model while maintaining a model structure, the knowledge distillation method is to enable the student model to learn class probabilities output by the teacher model, specifically, for general classification tasks, the teacher model and the student model can output probabilities that the input belongs to each class for any input, and if the two model performances are good, the outputs of the two models for the same input are consistent. Therefore, during the model training process, the student model may refer to not only the input real label (i.e., hard label) but also the output of the teacher model (i.e., soft label); the aim of this method is that after the model has been trained, the outputs of the two models are as close as possible. However, due to the specificity of the face recognition task, the classification number is often up to hundreds of thousands to millions or even higher, and the direct knowledge distillation at the classification layer may cause the problems of too high video memory and difficult convergence, so for the face recognition scene, the knowledge distillation often occurs at the feature layer.

In addition, the loss function used by the student model when learning the hard tag and the soft tag is cross entropy, but doing so brings a problem that the feature layer dimensions of the two models are required to be equal. Under the scene of image recognition, knowledge distillation operation can be performed only when the feature layer dimensions of the teacher model and the student model are consistent, so that the application of knowledge distillation in the field of image recognition is limited, and adverse effects are brought to technical development.

Disclosure of Invention

The technical problem to be solved by the application is how to improve the limitation of the knowledge distillation method in the field of image recognition, and a technical scheme is provided for realizing knowledge distillation operation under the condition that the dimension of the feature layers of a teacher model and a student model are inconsistent.

According to a first aspect, in one embodiment, a method for constructing an image recognition model is provided, including the following steps: acquiring a sample image, and training the sample image to obtain a first network model and a second network model for image recognition, wherein the first network model is higher than the second network model in the aspects of knowledge capacity of image recognition information and dimensionality of a feature layer; calculating the maximum mean difference between the first network model and the second network model to form a loss function; and carrying out knowledge distillation processing on the first network model according to the loss function, and training the second network model by utilizing the image identification information distilled by knowledge to obtain a corresponding image identification model.

Said calculating a maximum mean difference between said first network model and said second network model to form a loss function comprising: carrying out normalization index processing on the characteristic information output by the characteristic layer of the first network model to obtain a first probability distribution X, and carrying out normalization index processing on the characteristic information output by the characteristic layer of the second network model to obtain a second probability distribution Y; calculating a maximum mean difference from the first probability distribution X and the second probability distribution Y, expressed as

Wherein n and i respectively represent the feature layer dimension and the dimension sequence number of the first network model, x represents the feature information output by the feature layer of the first network model, m and j respectively represent the feature layer dimension and the dimension sequence number of the second network model, y represents the feature information output by the feature layer of the second network model, H represents the feature space, phi () is a mapping function, and II is a norm; a loss function is formed from the maximum mean difference MMD (X, Y), the loss function being used to measure differences between feature layers of the first network model and the second network model.

The method further comprises an optimization calculation step after obtaining the maximum mean value difference, wherein the optimization calculation step comprises the following steps: performing expansion processing on the maximum mean difference MMD (X, Y) to obtain

Performing form transformation on the calculated result MMD' (X, Y) after the expansion processing by using a kernel function k (), thereby obtaining

And taking the calculation result MMD' (X, Y) after the form transformation as the loss function.

The kernel function k () is a Gaussian kernel function or a polynomial kernel function, and is expressed as the following formulas respectively

k(u,v)＝(＜u,v＞+R) ^d ；

Wherein u and v are internal parameters of the kernel function, exp () represents an exponential operation, < u, v > represents an inner product operation, and sigma and R, d are variance, constant term and power respectively.

The knowledge distillation processing is performed on the first network model according to the loss function, the second network model is trained by using the image recognition information distilled by knowledge, and a corresponding image recognition model is obtained, and the method comprises the following steps: judging whether the loss function is converged or not, if not, continuing to carry out knowledge migration between the first network model and the second network model through knowledge distillation processing, and training the second network model by using probability distribution information and image identification information obtained by the knowledge migration; and when the loss function reaches convergence, taking the trained second network model as the image recognition model.

According to a second aspect, in one embodiment, there is provided an image recognition method, including: acquiring an image of an object to be detected; extracting characteristic information in the image of the object to be detected according to a pre-constructed image recognition model; the image recognition model is obtained by the construction method described in the first aspect; and identifying the object to be detected according to the extracted characteristic information.

According to a third aspect, there is provided in one embodiment an image recognition apparatus comprising: an image acquisition unit for acquiring an image of an object to be detected; the feature extraction unit is used for extracting feature information in the image of the object to be detected according to a pre-constructed image recognition model; the image recognition model is obtained by the construction method described in the first aspect; and the identification unit is used for identifying the object to be detected according to the extracted characteristic information.

The image recognition device further comprises a model construction unit connected with the feature extraction unit, wherein the model construction unit comprises: the first training module is used for acquiring a sample image, training the sample image to obtain a first network model and a second network model for image recognition, wherein the first network model is higher than the second network model in the aspects of knowledge capacity of image recognition information and dimensionality of a feature layer; the calculation module is used for calculating the maximum mean difference between the first network model and the second network model to form a loss function; and the second training module is used for carrying out knowledge distillation processing on the first network model according to the loss function, and training the second network model by utilizing the image identification information distilled by the knowledge to obtain a corresponding image identification model.

The model building unit further comprises a judging module and an output module; the judging module is used for judging whether the loss function converges or not, if not, the second training module continues to carry out knowledge migration between the first network model and the second network model through knowledge distillation processing, and training the second network model by utilizing probability distribution information and image identification information obtained by knowledge migration; and if the judging module judges that the loss function reaches convergence, the output module takes the second network model trained by the second training module as the image recognition model and outputs the image recognition model.

According to a fourth aspect, there is provided in one embodiment a computer readable storage medium comprising a program executable by a processor to implement the method of the first or second aspect described above.

The beneficial effects of the application are as follows:

according to the embodiment, the method for constructing the image recognition model and the device for constructing the image recognition model comprise the following steps: acquiring a sample image, and training the sample image to obtain a first network model and a second network model for image recognition, wherein the first network model is higher than the second network model in knowledge capacity of image recognition information and dimension of a feature layer; calculating the maximum mean difference between the first network model and the second network model to form a loss function; and carrying out knowledge distillation processing on the first network model according to the loss function, and training the second network model by utilizing the image identification information distilled by the knowledge to obtain a corresponding image identification model. In the first aspect, the maximum mean value difference is adopted to form the loss function, so that the difference between the first network model and the second network model is measured, knowledge distillation processing can still be performed under different feature layer dimensions, the effectiveness of learning the second network model to the first network model can be ensured, the possibility of multidimensional feature processing is provided, and the application limitation of the knowledge distillation method in the field of image recognition can be overcome; in the second aspect, the first network model is thinned by utilizing knowledge distillation processing, so that a simplified image recognition model is obtained on the premise of ensuring the image recognition accuracy, the complexity of the image recognition model is lower than that of the original first network model, the comprehensive performance of the image recognition model is better than that of the original second network model, and the actual application is facilitated; in the third aspect, the image recognition method can rapidly and accurately extract the characteristic information in the image by means of the image recognition model, so that the situation that the occupied amount of computing resources is large when the image is recognized by means of the first network model can be avoided, and the recognition efficiency is effectively improved.

Drawings

FIG. 1 is a flow chart of a method for constructing an image recognition model in the present application;

FIG. 2 is a flow chart for forming a loss function based on a maximum averaged difference;

FIG. 3 is a flowchart of the optimization calculation step for maximum averaged differences;

FIG. 4 is a flow chart of training a second network model to obtain a corresponding image recognition model;

FIG. 5 is a schematic diagram of training to obtain a first network model and a second network model;

FIG. 6 is a flow chart of an image recognition method of the present application;

fig. 7 is a schematic structural diagram of an image recognition device in the present application.

Detailed Description

The application will be described in further detail below with reference to the drawings by means of specific embodiments. Wherein like elements in different embodiments are numbered alike in association. In the following embodiments, numerous specific details are set forth in order to provide a better understanding of the present application. However, one skilled in the art will readily recognize that some of the features may be omitted, or replaced by other elements, materials, or methods in different situations. In some instances, related operations of the present application have not been shown or described in the specification in order to avoid obscuring the core portions of the present application, and may be unnecessary to persons skilled in the art from a detailed description of the related operations, which may be presented in the description and general knowledge of one skilled in the art.

Furthermore, the described features, operations, or characteristics of the description may be combined in any suitable manner in various embodiments. Also, various steps or acts in the method descriptions may be interchanged or modified in a manner apparent to those of ordinary skill in the art. Thus, the various orders in the description and drawings are for clarity of description of only certain embodiments, and are not meant to be required orders unless otherwise indicated.

The numbering of the components itself, e.g. "first", "second", etc., is used herein merely to distinguish between the described objects and does not have any sequential or technical meaning. The term "coupled" as used herein includes both direct and indirect coupling (coupling), unless otherwise indicated.

Embodiment 1,

Referring to fig. 1, the present application discloses a method for constructing an image recognition model, which includes steps S100-S300, respectively described below.

Step S100, acquiring a sample image, and training the sample image to obtain a first network model and a second network model for image recognition, wherein the first network model is higher than the second network model in the aspects of knowledge capacity of image recognition information and dimension of a feature layer. The sample image may be a multi-frame image of one or more recognition objects, such as a multi-frame face image, an animal image, an automobile image, a plant image, or a building image.

In a specific embodiment, referring to fig. 5, a sample image is input to a complex neural network, and machine learning is performed on a model of the complex neural network by using the sample image as a training set, so as to train to obtain a first network model; the complex neural network may include multiple convolution layers (e.g., convolution layer 1 … convolution layer l ₁ ) A feature layer and a classification layer. Inputting the sample image into a simple neural network, performing machine learning with a model of the simple neural network by using the sample image as a training set, thereby training to obtain a second network model, wherein the simple neural network can comprise a plurality of convolution layers (such as convolution layer 1 … convolution layer l ₂ And l ₂ <<l ₁ ) A feature layer and a classification layer. The plurality of convolution layers are used for carrying out convolution processing on the sample image for a plurality of times; the feature layer is a unique network structure in the image recognition scene, and is often called a fully connected layer after a plurality of convolution operations, and the feature layer comprises feature data (information) obtained from sample image analysis, and is usuallyThe dimension of the feature layer is 256 or 512; the classification layer is related to a specific classification task, such as the classification number is higher for an image recognition scene as is the dimension of the classification layer. Since the convolution layer, the feature layer and the classification layer are network structures commonly found in artificial neural networks, they will not be described in detail herein.

It should be noted that, the first network model may also be a teacher model, and the second network model may also be a student model, where the first network model has more network layers, a large model size, high complexity, and poor calculation performance compared with the second network model, but the knowledge capacity of the image identification information and the dimension of the feature layer are larger, so that the first network model has better image feature extraction capability.

Step S200, calculating the maximum mean difference between the first network model and the second network model to form a loss function. In one embodiment, see FIG. 2, the step S200 may include steps S210-S230, each of which is described below.

Step S210, carrying out normalization index processing on the feature information output by the feature layer of the first network model to obtain a first probability distribution X, and carrying out normalization index processing on the feature information output by the feature layer of the second network model to obtain a second probability distribution Y.

It should be noted that, the normalized exponential processing herein refers to converting the multi-classification output into probability by an exponential function and a normalization method, and may be specifically expressed as: and mapping the multi-classification result to zero to positive infinity by using an exponential function, and then carrying out normalization processing to obtain approximate probability. Typically, the normalized exponential processing can be implemented with a softmax function (normalized exponential function) that can "compress" one K-dimensional vector z containing arbitrary real numbers into another K-dimensional real vector σ (z) such that each element ranges between (0, 1) and the sum of all elements is 1. However, applying the softmax function to the multi-classification problem, it can map the output of multiple neurons into (0, 1) intervals, multi-classification of tasks from a probability perspective. Since the normalization index processing belongs to the prior art, a detailed description is not given here.

Step S220, calculating the maximum mean difference according to the first probability distribution X and the second probability distribution Y, expressed as

Wherein n and i respectively represent the dimension and dimension sequence number of the feature layer of the first network model, and x represents the feature information output by the feature layer of the first network model; m and j respectively represent the dimension and the dimension sequence number of the feature layer of the second network model, y represents the feature information output by the feature layer of the second network model, H represents the feature space, phi () is a mapping function, and I is a norm.

Note that MMD (Maximum Mean Discrepancy) is used to measure the similarity between two distributions, and is mainly used to measure the distance between two different but related distributions, which is often used to measure the difference between the source domain and the target domain in the migration learning. The basic assumption is that: two distributions can be considered to be the same distribution if they generate enough samples for all functions f that take as input the sample space generated by the distribution, and their average values for all corresponding values of the function f are equal.

In step S230, a loss function is formed according to the maximum mean difference MMD (X, Y), and the loss function is mainly used for measuring the difference between the feature layer of the first network model and the feature layer of the second network model.

In a preferred embodiment, see fig. 3, an optimization calculation step is further included after the maximum averaged difference is obtained in step S220, and the optimization calculation step may include steps S221-S223, which are described below, respectively.

Step S221, performing expansion processing on the maximum mean difference MMD (X, Y) to obtain

Where x and x 'both represent the feature information output by the feature layer, as do y and y'.

Step S222, performing form transformation on the expanded calculation result MMD' (X, Y) by using the kernel function k (), to obtain

The kernel function k () in the present embodiment is a gaussian kernel function or a polynomial kernel function, expressed as

k(u,v)＝(＜u,v＞+R) ^d ；

It should be noted that, the Kernel Function (Kernel Function) is an inner product form of the mapping relationship, and the mapping Function is only a mapping relationship, and has no feature of increasing the dimension, but the feature of the Kernel Function may be utilized to construct the Kernel Function capable of increasing the dimension. Where k () is used to directly skip phi (x _i )φ(x _i ') or phi (y) _j )φ(y' _j ) In the inner product calculation form, the result of the whole expression is directly calculated in an auxiliary way, thereby being beneficial to improving the calculation efficiency. This is because the dimension of the feature space may be very high if only the inner product operation is used for solving, while in the low-dimensional input space there is a certain function K (x, x ') which is exactly equal to this inner product in the high-dimensional space, i.e., K (x, x') =<φ(x)φ(x)>The inner product of the nonlinear transformation is directly derived from this function K (x, x') without the use of a complex nonlinear transformation, whereby the calculation process can be greatly simplified.

In step S223, the computation result MMD "(X, Y) after the form transformation is used as a loss function, that is, the loss function MMD" (X, Y) is used to measure the difference between the feature layer of the first network model and the feature layer of the second network model, that is, the dimension difference performance between the two is characterized by the magnitude of the maximum mean difference.

And step S300, performing knowledge distillation processing on the first network model according to a loss function MMD' (X, Y), and training the second network model by using the knowledge distilled image identification information to obtain a corresponding image identification model. In one embodiment, see FIG. 4, the step S300 may include steps S310-S340, each of which is described below.

And step S310, performing knowledge distillation processing on the first network model according to the loss function, and performing knowledge migration between the first network model and the second network model through the knowledge distillation processing.

Step S320, training the second network model by using the probability distribution information and the image identification information obtained by the knowledge migration, and determining the learning degree of the second network model to the first network model after the dimension reduction processing by using the loss function.

Step S330, judging whether the loss function is converged, if yes, proceeding to step S340, otherwise returning to step S310. After returning to step S310, knowledge migration between the first network model and the second network model is continued through the knowledge distillation process, and the second network model is trained using probability distribution information and image identification information obtained by the knowledge migration.

And step S340, taking the trained second network model as an image recognition model when the loss function reaches convergence.

It should be noted that, the first network model (teacher model) may be considered as a complex neural network, the complex network structure is an impurity, and for the distillation process of the teacher network, the probability distribution in the complex network structure may be visually considered as distillation, and the probability distribution is used to guide the second network model (i.e. student model or simplified neural network) to train, and the learned probability distribution is essence. Then after distillation is completed, the reduced network starts to learn the probability distribution of the complex network, and the learning process is supervised by using the loss function.

In this embodiment, the maximum mean difference is used to form the loss function, the difference between the first network model and the second network model is measured through the loss function, through such model training process, the training label of the second network model not only has the hard label, but also refers to the soft label output by the first network model, and the information entropy of the soft label is larger, so that the information quantity is more, and the training of the second network model is more facilitated. The result of the training is that the second network model performs the same or similar on the basis that the model size is much smaller than the first network model.

Those skilled in the art will appreciate that there are several technical advantages in constructing the image recognition model through the above steps S100-S300. On the one hand, the maximum mean value difference is adopted to form the loss function, so that the difference between the first network model and the second network model is measured, knowledge distillation processing can still be carried out under different feature layer dimensions, the effectiveness of learning the second network model to the first network model can be ensured, the possibility of multidimensional feature processing is provided, and meanwhile, the application limitation of the knowledge distillation method in the field of image recognition can be avoided. On the other hand, the first network model is thinned by utilizing knowledge distillation processing, so that a simplified image recognition model is obtained on the premise of ensuring the image recognition accuracy, the complexity of the image recognition model is lower than that of the original first network model, the comprehensive performance of the image recognition model is better than that of the original second network model, and the image recognition model is more beneficial to practical application.

Embodiment II,

Referring to fig. 6, the present application further provides an image recognition method based on the method for constructing an image recognition model in the first embodiment, and the image recognition method includes steps S410-S430, which are described below.

In step S410, an image of the object to be detected is acquired. In a specific embodiment, the image of the object to be detected may be acquired by means of a camera, a camera or the like, and the object to be detected may be a human face, an animal, a plant, an automobile, a building or the like.

Step S420, extracting characteristic information in the image of the object to be detected according to the pre-constructed image recognition model.

It should be noted that, the image recognition model is obtained by the construction method disclosed in the first embodiment, and reference may be made to steps S100-S300 in detail, which will not be described herein. In addition, the technical means of extracting feature information (such as feature vectors) from an image according to an already-established learning model has been widely applied to the current image processing work, and a technician can perform the work without performing creative work, so that a detailed description of an application process of the image recognition model is omitted here.

Step S430, identifying the object to be detected according to the characteristic information extracted in step S420.

For example, the object to be detected is a Chinese person, then facial feature information of the Chinese person can be extracted well according to the established image recognition model, so that the facial feature information is matched in the database through big data operation, when the matching result exceeds a standard threshold value, the face of the Chinese person is considered to be similar to the matched face in the database in height, and the two faces are judged to correspond to the same person, so that the face recognition effect is achieved. Since such a data query and matching process belongs to the prior art, a detailed description thereof will not be provided herein.

It can be appreciated by those skilled in the art that the object to be detected can be identified through the above steps S410-S430, and in this process, the technical scheme also has some beneficial technical effects, and the image identification method of the present application can quickly and accurately extract the feature information in the image by means of the image identification model, so as to avoid the occurrence of a situation of large occupation of computing resources caused by image identification by means of the first network model, and effectively improve the identification efficiency.

Third embodiment,

Referring to fig. 7, on the basis of the image recognition method disclosed in the second embodiment, the present application also correspondingly discloses an image recognition apparatus 1, which mainly includes an image acquisition unit 11, a feature extraction unit 12, and a recognition unit 13, which are described below.

The image acquisition unit 11 is configured to acquire an image of an object to be detected. Specifically, the image acquisition unit 11 may acquire an image of the object to be detected by means of an image capturing apparatus such as a video camera, a still camera, or the like, even a media video. For the specific function of the image acquisition unit 11, reference may be made to step S410 in the second embodiment, and detailed description thereof will be omitted.

The feature extraction unit 12 is configured to extract feature information in an image of an object to be detected according to a pre-constructed image recognition model. The image recognition model is obtained by the construction method disclosed in the first embodiment, and the specific function of the feature extraction unit 12 can be referred to step S420 in the second embodiment, and the description thereof will not be repeated here.

The identifying unit 13 is connected to the feature extracting unit 12, and is configured to identify the object to be detected according to the extracted feature information. For the specific function of the identification unit 13, reference may be made to step S430 in the second embodiment, and detailed description thereof will be omitted.

Further, referring to fig. 7, the image recognition apparatus 1 of the present embodiment further includes a model construction unit 14 connected to the feature extraction unit 12, the model construction unit 14 being configured to construct an image recognition model by a knowledge distillation method.

In a specific embodiment, referring to fig. 7, the model building unit 14 includes a first training module 141, a calculating module 142, and a second training module 143, which are respectively described below.

The first training module 141 is configured to acquire a sample image, and train to obtain a first network model and a second network model for image recognition by using the sample image, where the first network model is higher than the second network model in knowledge capacity of image recognition information and dimension of a feature layer.

The calculation module 142 is configured to calculate a maximum mean difference between the first network model and the second network model to form a loss function.

The second training module 143 is connected to the calculating module 142, and is configured to perform knowledge distillation processing on the first network model according to the loss function, and train the second network model by using the image identification information distilled by knowledge, so as to obtain a corresponding image identification model.

Further, the model building unit 14 further includes a judging module 144 and an output module 145. The output module 145 is connected to the second training module 143, and the judging module 144 is connected to the calculating module 142 and the output module 145.

The judging module 144 is configured to judge whether the loss function converges, and if not, the second training module 143 continues knowledge migration between the first network model and the second network model through knowledge distillation processing, and trains the second network model by using probability distribution information and image identification information obtained by the knowledge migration; if the judging module 144 judges that the loss function converges, the output module 145 takes the second network model trained by the second training module 143 as the image recognition model and outputs the image recognition model.

It should be noted that, regarding the first training module 141 and the calculating module 142, reference may be made to step S100 and step S200 in the first embodiment, and regarding the specific functions of the second training module 143, the judging module 144 and the output module, reference may be made to step S300 in the first embodiment, which will not be described herein.

Those skilled in the art will appreciate that all or part of the functions of the various methods in the above embodiments may be implemented by hardware, or may be implemented by a computer program. When all or part of the functions in the above embodiments are implemented by means of a computer program, the program may be stored in a computer readable storage medium, and the storage medium may include: read-only memory, random access memory, magnetic disk, optical disk, hard disk, etc., and the program is executed by a computer to realize the above-mentioned functions. For example, the program is stored in the memory of the device, and when the program in the memory is executed by the processor, all or part of the functions described above can be realized. In addition, when all or part of the functions in the above embodiments are implemented by means of a computer program, the program may be stored in a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk, or a removable hard disk, and the program in the above embodiments may be implemented by downloading or copying the program into a memory of a local device or updating a version of a system of the local device, and when the program in the memory is executed by a processor.

The foregoing description of the application has been presented for purposes of illustration and description, and is not intended to be limiting. Several simple deductions, modifications or substitutions may also be made by a person skilled in the art to which the application pertains, based on the idea of the application.

Claims

1. The construction method of the image recognition model is characterized by comprising the following steps of:

acquiring a sample image, and training the sample image to obtain a first network model and a second network model for image recognition, wherein the first network model is higher than the second network model in the aspects of knowledge capacity of image recognition information and dimensionality of a feature layer;

carrying out normalization index processing on the characteristic information output by the characteristic layer of the first network model to obtain a first probability distribution X, and carrying out normalization index processing on the characteristic information output by the characteristic layer of the second network model to obtain a second probability distribution Y;

calculating a maximum mean difference according to the first probability distribution X and the second probability distribution Y, and expressing the maximum mean difference as follows:

wherein n and i respectively represent feature layer dimensions and dimension sequence numbers of the first network model, x represents feature information output by a feature layer of the first network model, m and j respectively represent feature layer dimensions and dimension sequence numbers of the second network model, y represents feature information output by a feature layer of the second network model, H represents feature space, phi () is a mapping function, and phi is a norm;

forming a loss function according to the maximum mean difference MMD (X, Y), the loss function being used to measure the difference between each feature layer of the first network model and each feature layer of the second network model;

performing knowledge distillation processing on the first network model according to the loss function, and performing knowledge migration between the first network model and the second network model through the knowledge distillation processing;

training the second network model by using probability distribution information and image identification information obtained by knowledge migration, and determining the learning degree of the second network model to the first network model after the dimension reduction processing by using the loss function;

judging whether the loss function is converged or not, if not, continuing to carry out knowledge migration between the first network model and the second network model through knowledge distillation processing, and training the second network model by using probability distribution information and image identification information obtained by the knowledge migration; and if yes, taking the trained second network model as the image recognition model.

2. The construction method according to claim 1, further comprising an optimization calculation step after obtaining the maximum mean difference, the optimization calculation step comprising:

performing expansion processing on the maximum mean difference MMD (X, Y) to obtain

3. The construction method according to claim 2, wherein the kernel function k () is a gaussian kernel function or a polynomial kernel function, respectively, expressed as

k(u,v)＝(＜u,v＞+R) ^d ；

4. An image recognition method, comprising:

acquiring an image of an object to be detected;

extracting characteristic information in the image of the object to be detected according to a pre-constructed image recognition model; the image recognition model is obtained by the construction method according to any one of claims 1 to 3;

and identifying the object to be detected according to the extracted characteristic information.

5. An image recognition apparatus, comprising:

an image acquisition unit for acquiring an image of an object to be detected;

the feature extraction unit is used for extracting feature information in the image of the object to be detected according to a pre-constructed image recognition model; the image recognition model is obtained by the construction method according to any one of claims 1 to 3;

and the identification unit is used for identifying the object to be detected according to the extracted characteristic information.

6. The image recognition apparatus according to claim 5, further comprising a model construction unit connected to the feature extraction unit, the model construction unit comprising:

the first training module is used for acquiring a sample image, training the sample image to obtain a first network model and a second network model for image recognition, wherein the first network model is higher than the second network model in the aspects of knowledge capacity of image recognition information and dimensionality of a feature layer;

the calculation module is used for calculating the maximum mean difference between the first network model and the second network model to form a loss function;

and the second training module is used for carrying out knowledge distillation processing on the first network model according to the loss function, and training the second network model by utilizing the image identification information distilled by the knowledge to obtain a corresponding image identification model.

7. The image recognition apparatus according to claim 6, wherein the model construction unit further comprises a judgment module and an output module;

the judging module is used for judging whether the loss function converges or not, if not, the second training module continues to carry out knowledge migration between the first network model and the second network model through knowledge distillation processing, and training the second network model by utilizing probability distribution information and image identification information obtained by knowledge migration;

and if the judging module judges that the loss function reaches convergence, the output module takes the second network model trained by the second training module as the image recognition model and outputs the image recognition model.

8. A computer readable storage medium comprising a program executable by a processor to implement the method of any one of claims 1-3.