CN111242217A

CN111242217A - Training method and device of image recognition model, electronic equipment and storage medium

Info

Publication number: CN111242217A
Application number: CN202010031241.8A
Authority: CN
Inventors: 郭明宇; 徐崴
Original assignee: Alipay Labs Singapore Pte Ltd
Current assignee: Alipay Labs Singapore Pte Ltd
Priority date: 2020-01-13
Filing date: 2020-01-13
Publication date: 2020-06-05

Abstract

The application discloses a training method and device of an image recognition model, electronic equipment and a storage medium, wherein the method comprises the following steps: initializing a preset feature extraction layer parameter of a neural network; performing feature extraction on sample images in a target sample data set based on the initialized feature extraction layer parameters to obtain sample image features of different categories, wherein the target sample data set comprises sample images of different categories; constructing an image feature matrix based on the sample image features of different classes; initializing full-connection layer parameters of the neural network based on the image feature matrix; and carrying out fine tuning training on the initialized neural network based on the target sample data set to obtain an image recognition model. The method and the device can improve the convergence speed and performance of the image recognition model, finally improve the efficiency of the training process of the whole image recognition model, and reduce the consumption of computing resources of the electronic equipment for executing the model training task.

Description

Training method and device of image recognition model, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for training an image recognition model, an electronic device, and a storage medium.

Background

Model training is performed based on a large amount of sample image data to obtain an image recognition model with high accuracy for executing an image recognition task, which is a hot spot in the field of image recognition.

At present, when an image recognition model is trained, an existing image recognition model trained based on other image data sets is usually used as a base model to perform fine-tuning Finetune training on an image data set of the model, that is, parameters corresponding to a pre-created model to be trained are initialized by using parameters of the existing model, and then the initialized model to be trained is trained based on the image data set of the model to obtain the image recognition model capable of being used for recognizing image data of the model. However, since the image data used for training the existing model is different from the image data of the model, the Finetune training is performed on the model to be trained after the initialization is performed by using the parameters of the existing model, so that the convergence rate of the model training is slow, the model performance is poor, the model training process is long in time consumption, and a large amount of computing resources are consumed.

Disclosure of Invention

An embodiment of the present application provides a training method and apparatus for an image recognition model, an electronic device, and a storage medium, so as to at least solve the problems of slow convergence rate and poor model performance in the image recognition model training in the related art.

In order to solve the technical problem, the embodiment of the application adopts the following technical scheme:

in a first aspect, a training method for an image recognition model is provided, which includes:

initializing a preset feature extraction layer parameter of a neural network;

performing feature extraction on sample images in a target sample data set based on the initialized feature extraction layer parameters to obtain sample image features of different categories, wherein the target sample data set comprises sample images of different categories;

constructing an image feature matrix based on the sample image features of different classes;

initializing full-connection layer parameters of the neural network based on the image feature matrix;

and carrying out fine tuning training on the initialized neural network based on the target sample data set to obtain an image recognition model.

Optionally, the initializing a feature extraction layer parameter of a preset neural network includes:

acquiring a feature extraction layer parameter of a preset original model, wherein the original model is obtained by training based on a source sample data set different from the target sample data set;

initializing the feature extraction layer parameters of the neural network to the feature extraction layer parameters of the original model.

Optionally, the initializing full-connection layer parameters of the neural network based on the image feature matrix includes:

initializing a weight matrix of a fully connected layer of the neural network to the image feature matrix;

initializing the bias matrix of the full connection layer to a preset matrix.

Optionally, the performing feature extraction on the sample images in the target sample data set based on the initialized feature extraction layer parameters to obtain different types of sample image features includes:

for each category in the target sample data set, randomly screening a sample image from all sample images of the category to serve as a category representation image of the category;

performing feature extraction on the category representation images of the categories based on the initialized feature extraction layer parameters to obtain image features of the category representation images of the categories;

and taking the image characteristics of the class representation images as sample image characteristics of the class.

for each category in the target sample data set, performing feature extraction on each sample image of the category based on the initialized feature extraction layer parameters to obtain image features of each sample image of the category;

determining the center of the category based on the image features of the sample images of the category and determining the distance between the sample images of the category and the center of the category;

and screening out the sample image with the minimum distance from all the sample images of the category, and taking the image characteristic of the sample image with the minimum distance as the sample image characteristic of the category.

Optionally, for each class in the target sample data set, the determining a center of the class based on image features of sample images of the class comprises:

determining an average value of image features of each sample image of the category;

determining the average as the center of the category.

Optionally, the feature extraction layer comprises a convolutional layer, a pooling layer, and a fully-connected layer.

In a second aspect, there is provided an apparatus for training an image recognition model, including:

the initialization unit is used for initializing the parameters of a preset feature extraction layer of the neural network;

the characteristic extraction unit is used for extracting the characteristics of the sample images in the target sample data set based on the initialized characteristic extraction layer parameters to obtain the characteristics of the sample images of different types, wherein the target sample data set comprises the sample images of different types;

the construction unit is used for constructing an image feature matrix based on the sample image features of different categories;

the second initialization unit is used for initializing the full connection layer parameters of the neural network based on the image feature matrix;

and the fine tuning unit is used for carrying out fine tuning training on the initialized neural network based on the target sample data set so as to obtain an image recognition model.

Optionally, the first initialization unit is specifically configured to:

Optionally, the second initialization unit is specifically configured to:

initializing the bias matrix of the full connection layer to a preset matrix.

Optionally, the feature extraction unit is specifically configured to:

Optionally, the feature extraction unit is further specifically configured to:

determining the average as the center of the category.

In a third aspect, an electronic device is provided, including:

a processor; and

a memory arranged to store computer executable instructions that, when executed, cause the processor to:

initializing a preset feature extraction layer parameter of a neural network;

In a fourth aspect, a computer-readable storage medium is provided that stores one or more programs that, when executed by an electronic device that includes a plurality of application programs, cause the electronic device to:

initializing a preset feature extraction layer parameter of a neural network;

The embodiment of the application adopts at least one technical scheme which can achieve the following beneficial effects:

according to the training method of the image recognition model, firstly, the parameters of the feature extraction layer of the preset neural network are initialized, so that the features of the neural network have the capability of extracting the image features; and secondly, performing feature extraction on the sample images in the target sample data set based on the initialized feature extraction layer parameters, constructing an image feature matrix based on the extracted sample image features of different types, and further initializing the full connection layer parameters of the neural network based on the constructed image feature matrix, so that the constructed full connection layer parameters can be adapted to the target sample data set. And finally, carrying out fine-tuning Finetune training on the initialized neural network based on the target sample data set, so that the loss of the model at the initial training stage can be reduced, the convergence speed and performance of the image recognition model can be further improved, the efficiency of the training process of the whole image recognition model is finally improved, and the consumption of computing resources of electronic equipment for executing the model training task is reduced.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

FIG. 1 is a flow chart of a training method of an image recognition model in the related art;

fig. 2 is a flowchart of a training method of an image recognition model according to an embodiment of the present disclosure;

FIG. 3 is a flowchart of another training method for an image recognition model according to an embodiment of the present disclosure;

FIG. 4 is a flowchart of a training method for an image recognition model according to an embodiment of the present disclosure;

fig. 5 is a block diagram illustrating a structure of an apparatus for training an image recognition model according to an embodiment of the present disclosure;

fig. 6 is a block diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

As mentioned above, in the training of the image recognition model, the parameters of the model to be trained are generally initialized with the existing model, and then the initialized model to be trained is subjected to the Finetune training. For example, as shown in fig. 1, firstly, initializing part of processing layer parameters (such as parameters of processing layers such as a pooling layer and a convolutional layer for extracting image features) of a model to be trained to processing layer parameters corresponding to an existing model (for convenience of distinguishing from an image recognition model obtained after Finetune training, hereinafter, both of them are referred to as "original model"); then, using the sample image data of the user as training data, starting the Finetune training of the initialized model to be trained to obtain an image recognition model capable of being used for recognizing the image data of the user. However, the method for training the image recognition model leads to low convergence rate of model training and poor model performance, which leads to long time consumption and large consumption of computing resources in the model training process.

In view of this, the present application aims to provide a technical solution for training an image recognition model, which enables model training to converge quickly, so as to accelerate the model training process and reduce the occupation of computing resources.

The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.

Referring to fig. 2, fig. 2 is a flowchart of a training method for an image recognition model according to an embodiment of the present application, where an execution subject of the method may be a server; of course, the execution subject of the method when implemented may also be various other electronic devices with computing and processing functions, such as a computer. As shown in fig. 2, the method comprises the steps of:

step S10, initializing the feature extraction layer parameters of the preset neural network.

In this embodiment of the application, the preset neural Network may be a Convolutional Neural Network (CNN), or may also be another multilayer neural Network, which is not limited in this embodiment of the application.

The feature extraction layer is a processing layer for extracting features of an image. For example, for a Convolutional neural network, the feature extraction Layer may include, but is not limited to, a Convolutional Layer (Convolutional Layer), a pooling Layer (MaxPooling Layer), and of course, other layers (full Connected Layer) than the last Layer for outputting the image recognition result (referred to as an output Layer, such as a Softmax Layer) may be included. Accordingly, the feature extraction layer parameters may include, for example, but not limited to, the number, size, and number of channels of convolution kernels in the convolution layer, the pooling manner of the pooled layers, the size of the pooled region kernels, and the weight matrix W and bias matrix B of the fully-connected layers, and so on.

And step S20, performing feature extraction on the sample images in the target sample data set based on the initialized feature extraction layer parameters to obtain different types of sample image features.

Wherein the target sample data set comprises sample images of different categories. As mentioned above, according to the difference of the image recognition tasks executed by the image recognition model, the target sample data sets used for training the image recognition model are different. For example, if the image recognition model is used to perform a face recognition task, the target sample data set may include face images of different users; if the image recognition model is used for executing a gesture recognition task, the target sample data set may include different types of gesture images; if the image recognition model is used to perform a scene recognition task, the target sample data set may include scene images of different scenes, and so on.

After initializing the parameters of the preset feature extraction layer of the neural network, the target sample data set can be input into the feature extraction layer of the neural network to obtain sample image features of different categories.

And step S30, constructing an image feature matrix based on the sample image features of different classes.

In particular, each class of sample image features can be used as a row vector, and thus, an image feature matrix can be constructed based on different classes of sample image features.

For example, the target sample data set includes N types of sample images, and the length of the sample image feature for each type is F obtained by performing feature extraction on sample images of different types in step S20, thereby obtaining an image feature matrix M of size N × F, where a row vector of the image feature matrix corresponds to a sample image feature of one type.

And step S40, initializing the parameters of the full connection layer of the neural network based on the constructed image feature matrix.

Wherein, the full connection layer parameters of the neural network can comprise a weight matrix W and a bias matrix B.

And step S50, carrying out fine tuning training on the initialized neural network based on the target sample data set to obtain an image recognition model.

After the parameters of the neural network are initialized based on the above steps S10 to S40, the initialized neural network may be trained by taking the target sample data set as input and the classification labels of the different types of sample images in the target sample data set as output, so as to obtain a final image recognition model. The classification label may be obtained by manually labeling based on a category to which each sample image belongs.

It should be noted that, in the embodiment of the present application, the training method may be applied to different practical application scenarios to train different image recognition models. For example, the training method may be used to train an image recognition model for recognizing international face images, and in this application scenario, the target sample data set used to train the image recognition model may include face image sets of different users (e.g., face image sets of different races), where the face image set of each user may include multiple face images of each user in different light, scenes, poses, and different definitions.

The technical solution of the present application is explained in detail below.

First, regarding the step S10, in an alternative embodiment, as shown in fig. 3, the step S10 may include:

and step S11, acquiring the preset feature extraction layer parameters of the original model.

Step S12, initializing the feature extraction layer parameters of the neural network to the feature extraction layer parameters of the primitive model.

The source sample data set used for training the original model is different from the target sample data set used for training the image recognition model, namely the original model is obtained by training based on the source sample data set different from the target sample data set. In addition, the neural network has a model with a similar structure to the original model.

In specific implementation, the original models used for initializing the parameters of the feature extraction layer of the neural network are different according to different image recognition tasks executed by the image recognition models.

For example, if the image recognition model is an international face recognition model for recognizing international face image data, the original model may be an existing domestic face recognition model for recognizing domestic face data. Accordingly, the target sample data set used for training the image recognition model may include face images of different international races, and the source sample data set used for training the original model may include face images of different domestic users.

Of course, the image recognition model in the embodiment of the present application is not limited to the international face recognition model, and may also be an image recognition model for recognizing various living bodies or non-living bodies, for example, which is not limited in the embodiment of the present application.

It should be understood that, by this embodiment, the fast and accurate initialization of the feature extraction layer parameters of the preset neural network can be realized, and since the preset neural network has a similar structure to the original model, the feature extraction layer parameters of the neural network are initialized through the feature extraction layer parameters of the original model, so that the initialized feature extraction layer of the neural network can be used for fast extracting image features.

Of course, in some other alternative embodiments, parameters of the feature extraction layer of the neural network may also be initialized according to a priori experience, or randomly generated initial parameters are given to the feature extraction layer of the neural network, which is not limited in this application.

Secondly, for the step S20, since the light, the image definition, and other factors all affect the accuracy of the image recognition result, the target sample data set includes multiple sample images of different categories. In this case, when acquiring the sample image features of each category, feature extraction may be performed on each sample image of each category based on the initialized feature extraction layer parameters, the sample image features of each category may be determined based on the image features of each sample image, and for example, an average value of the image features of each sample image may be determined as the sample image features of each category. Alternatively, in order to reduce the workload of feature extraction and further reduce the consumption of computing resources of the electronic device that executes the model training task, as shown in fig. 4, one sample image may be selected from all sample images of each category as a category representing image of the corresponding category, and the image feature of the category representing image of each category may be determined as the sample image feature of the corresponding category. Further, an image feature matrix is constructed based on sample image features of each category, full connection layer parameters of the neural network are initialized based on the image feature matrix, and finally, Finetune training of the neural network is started to obtain an image recognition model.

For the second embodiment, optionally, as shown in fig. 3, the step S20 may include:

step S21, for each category in the target sample data set, randomly screening out one sample image from all sample images of the category as a category representation image of the category.

Step S22, extracting features of the category-indicating images of the categories based on the initialized feature extraction layer parameters, and obtaining image features of the category-indicating images of the categories.

Step S23, the image feature of the category representative image is set as the sample image feature of the category.

It can be understood that, in this embodiment, one sample image is randomly selected from sample images of each category as a category representing image of a corresponding category, and feature extraction is performed only on the category representing image of each category based on the feature extraction layer parameters obtained by initialization, so that simplicity and high efficiency are achieved, and compared with performing feature extraction on all sample images, consumption of computing resources of the electronic device can be further reduced.

Alternatively, as shown in fig. 3, the step S20 may include:

step S24, for each category in the target sample data set, performing feature extraction on each sample image of the category based on the initialized feature extraction layer parameters, to obtain image features of each sample image of the category.

Step S25, determining the center of the category based on the image features of each sample image of the category and determining the distance between each sample image of the category and the center of the category.

The distance between each sample image of the category and the center of the category may be any distance, such as a euclidean distance and a mahalanobis distance, which is not limited in the embodiment of the present application.

For the determination of the center of each category, in particular implementation, an average value of the image features of each sample image of the category may be determined and the average value may be determined as the center of the category.

Of course, a method such as a clustering algorithm may also be used to determine the center of the category based on the image features of each sample image of the category, which is not limited in the embodiment of the present application.

Step S26, selecting the sample image with the minimum distance from all the sample images of the category, and using the image feature of the sample image with the minimum distance as the sample image feature of the category.

It can be understood that the light brightness, the definition and the pose of the object in the image of different sample images of each category are different, the randomly screened sample images may have dark pipelines, blurred images and the like, which may cause the inaccuracy of the image characteristics obtained by extracting the characteristics of the sample images, further, the accuracy of the image feature matrix constructed based on the image and the image recognition model obtained by performing the finenet training is not high, and in this embodiment, for each category, the image characteristics of the sample images of the category are integrated to screen the sample images of the category, so that the screened sample images have better and more representative effects, further, the image characteristic matrix constructed based on the image characteristics of the sample image is used for initializing the parameters of the full connection layer, so that the initial parameters of the full connection layer are better, and then the accuracy of the image recognition model obtained by subsequently starting the Finetune training is high.

Next, regarding the step S30, in an alternative embodiment, as shown in fig. 3, the step S30 may include: initializing a weight matrix W of a full connection layer of the neural network to the image feature matrix and initializing a bias matrix B of the full connection layer to a preset matrix.

The initial matrix can be set by user according to actual service requirements. For example, the preset matrix may be a zero matrix.

It can be understood that, in this embodiment, since the image feature matrix is constructed based on different types of sample image features in the sample data set, by initializing the weight matrix of the fully-connected layer of the neural network to the image feature matrix and initializing the bias matrix of the fully-connected layer to the preset matrix, when performing finetune training on the initialized neural network based on the target sample data set, the dimension of the sample image features input to the fully-connected layer after the feature extraction is performed on the target sample data set by the feature extraction layer of the neural network is exactly equal to the dimension of the weight matrix, so that the loss amount loss of the initial model of the training can be reduced by the model training, and the convergence of the model training can be accelerated.

It should be noted that the execution subjects of the steps of the method provided in embodiment 1 may be the same device, or different devices may be used as the execution subjects of the method. For example, the execution subject of steps 21 and 22 may be device 1, and the execution subject of step 23 may be device 2; for another example, the execution subject of step 21 may be device 1, and the execution subjects of steps 22 and 23 may be device 2; and so on.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Corresponding to the embodiment of the method, the application also provides a training device of the image recognition model.

Referring to fig. 5, fig. 5 is a schematic structural diagram of an apparatus for training an image recognition model according to an embodiment of the present disclosure, where the apparatus 500 is applicable to a server; of course, the apparatus 500 may also be applied to other various electronic devices with computing and processing functions, such as a computer. As shown in fig. 5, the apparatus 500 may include:

an initialization unit 501, which initializes the parameters of the feature extraction layer of the preset neural network;

a feature extraction unit 502, configured to perform feature extraction on sample images in a target sample data set based on the initialized feature extraction layer parameters to obtain sample image features of different categories, where the target sample data set includes sample images of different categories;

a constructing unit 503, configured to construct an image feature matrix based on the sample image features of different categories;

a second initialization unit 504, which initializes the full connection layer parameters of the neural network based on the image feature matrix;

and the fine tuning unit 505 is configured to perform fine tuning training on the initialized neural network based on the target sample data set to obtain an image recognition model.

Optionally, the first initializing unit 501 obtains a preset feature extraction layer parameter of an original model, where the original model is obtained by training based on a source sample data set different from the target sample data set; initializing the feature extraction layer parameters of the neural network to the feature extraction layer parameters of the original model.

Optionally, the second initializing unit 504 initializes a weight matrix of a fully connected layer of the neural network to the image feature matrix; initializing the bias matrix of the full connection layer to a preset matrix.

Optionally, the feature extraction unit 502 randomly selects, for each category in the target sample data set, one sample image from all sample images of the category as a category representation image of the category; performing feature extraction on the category representation images of the categories based on the initialized feature extraction layer parameters to obtain image features of the category representation images of the categories; and taking the image characteristics of the class representation images as sample image characteristics of the class.

Optionally, the feature extraction unit 502, for each category in the target sample data set, performs feature extraction on each sample image of the category based on the initialized feature extraction layer parameter, to obtain an image feature of each sample image of the category; determining the center of the category based on the image features of the sample images of the category and determining the distance between the sample images of the category and the center of the category; and screening out the sample image with the minimum distance from all the sample images of the category, and taking the image characteristic of the sample image with the minimum distance as the sample image characteristic of the category.

Optionally, the feature extraction unit 502 determines an average value of image features of each sample image of the category; determining the average as the center of the category.

With regard to the apparatus in the above-described embodiment, the specific manner in which each unit performs the operation has been described in detail in the embodiment related to the method, and will not be described in detail here.

Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application. Referring to fig. 6, in the hardware level, the electronic device includes a processor, and optionally an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory, such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.

The processor, the network interface, and the memory may be connected to each other via an internal bus, which may be an ISA (Industry Standard Architecture) bus, a PCI (peripheral component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 6, but that does not indicate only one bus or one type of bus.

And the memory is used for storing programs. In particular, the program may include program code comprising computer operating instructions. The memory may include both memory and non-volatile storage and provides instructions and data to the processor.

The processor reads a corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form the training device of the image recognition model on a logic level. The processor is used for executing the program stored in the memory and is specifically used for executing the following operations:

initializing a preset feature extraction layer parameter of a neural network;

and carrying out Fine-tune training on the initialized neural network based on the target sample data set to obtain an image recognition model.

The method performed by the training apparatus for image recognition model disclosed in the embodiment of fig. 1 of the present application may be applied to or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.

The electronic device may further execute the method shown in fig. 1, and implement the functions of the training apparatus for image recognition models in the embodiments shown in fig. 1 and fig. 2, which are not described herein again in this embodiment of the present application.

Of course, besides the software implementation, the electronic device of the present application does not exclude other implementations, such as a logic device or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or a logic device.

Embodiments of the present application also provide a computer-readable storage medium storing one or more programs, where the one or more programs include instructions, which when executed by a portable electronic device including a plurality of application programs, enable the portable electronic device to perform the method of the embodiment shown in fig. 1, and are specifically configured to:

initializing a preset feature extraction layer parameter of a neural network;

In short, the above description is only a preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

Claims

1. A training method of an image recognition model comprises the following steps:

initializing a preset feature extraction layer parameter of a neural network;

2. The method of claim 1, wherein initializing feature extraction layer parameters of a preset neural network comprises:

3. The method of claim 1, the initializing full connectivity layer parameters of the neural network based on the image feature matrix, comprising:

initializing the bias matrix of the full connection layer to a preset matrix.

4. The method according to claim 1, wherein the performing feature extraction on the sample images in the target sample data set based on the initialized feature extraction layer parameters to obtain different types of sample image features comprises:

5. The method according to claim 1, wherein the performing feature extraction on the sample images in the target sample data set based on the initialized feature extraction layer parameters to obtain different types of sample image features comprises:

6. The method of claim 5, for each class in the target sample data set, said determining a center of the class based on image features of sample images of the class, comprising:

determining the average as the center of the category.

7. The method of any of claims 1-6, the feature extraction layers comprising convolutional layers, pooling layers, and fully-connected layers.

8. An apparatus for training an image recognition model, comprising:

the first initialization unit is used for initializing the parameters of a preset feature extraction layer of the neural network;

the characteristic extraction unit is used for extracting the characteristics of the sample images in the target sample data set based on the initialized characteristic extraction layer parameters to obtain the characteristics of the sample images in different categories, wherein the target sample data set comprises the sample images in different categories;

the construction unit is used for constructing an image characteristic matrix based on the sample image characteristics of different classes;

the second initialization unit initializes the parameters of the full connection layer of the neural network based on the image feature matrix;

and the fine tuning unit is used for performing fine tuning training on the initialized neural network based on the target sample data set so as to obtain an image recognition model.

9. An electronic device, comprising:

a processor; and

initializing a preset feature extraction layer parameter of a neural network;

10. A computer-readable storage medium storing one or more programs that, when executed by an electronic device including a plurality of application programs, cause the electronic device to:

initializing a preset feature extraction layer parameter of a neural network;