CN112381169B

CN112381169B - Image identification method and device, electronic equipment and readable storage medium

Info

Publication number: CN112381169B
Application number: CN202011320831.9A
Authority: CN
Inventors: 韩泽; 梁潇; 焦任直; 王哲; 谢会斌; 李聪廷
Original assignee: Jinan Boguan Intelligent Technology Co Ltd
Current assignee: Jinan Boguan Intelligent Technology Co Ltd
Priority date: 2020-11-23
Filing date: 2020-11-23
Publication date: 2023-01-13
Anticipated expiration: 2040-11-23
Also published as: CN112381169A

Abstract

The application discloses an image identification method, an image identification device, electronic equipment and a computer readable storage medium, wherein the method comprises the following steps: acquiring an initial classification layer, an initial image recognition model and a training set; extracting first training data corresponding to the first label from the training set, and determining a first class central vector corresponding to the first label in the initial classification layer; generating a training classification layer by using the first class central vector, and training the training classification layer and an initial image recognition model in a video memory by using first training data to obtain a first classification layer; updating the initial classification layer by using the first classification layer, and updating the first label until the training of the initial image recognition model is finished to obtain an image recognition model; acquiring an image to be recognized, and inputting the image to be recognized into an image recognition model to obtain a recognition result; the method allows a larger classification layer to be adopted during training, obtains the recognition model with higher recognition accuracy, and can reduce the requirements on GPU equipment.

Description

Image identification method and device, electronic equipment and readable storage medium

Technical Field

The present disclosure relates to the field of deep learning technologies, and in particular, to an image recognition method, an image recognition apparatus, an electronic device, and a computer-readable storage medium.

Background

Face recognition is an important technology for deep learning, and has been widely used in multiple fields such as public security and finance. The face recognition task is an open set task and cannot contain all people samples during training, so the face recognition task mainly depends on extracting input feature vectors and using the similarity of the feature vectors to perform identity verification. In the related art, when a face recognition task is trained, the face recognition task is generally regarded as a classification task, a classification layer is connected behind a feature layer and is trained by using a loss function such as softmax loss, and the classification layer is removed in actual use. In order to make the feature vectors extracted by the neural network have better distinctiveness, the more the number of class IDs contained in the training set is, the better the index of the obtained face image recognition model is, however, this also means that the classification layer is also larger. Because the neural network and the classification layer are mainly trained on a GPU (Graphics Processing Unit) device, and the video memory of the GPU device usually ranges from 2G to 32G and cannot be expanded at will, the size of the classification layer is affected by the size of the image recognition model, the size of the image input, and other factors, and a sufficiently large classification layer cannot be used, so that the recognition effect of the trained face image recognition model is poor, and the recognition accuracy is low. If a large enough classification layer is used, a GPU device with a large enough video memory is needed, and thus, a high requirement is imposed on the GPU device.

Therefore, the problems of low recognition accuracy and high requirement on GPU equipment in the related art are technical problems that need to be solved by those skilled in the art.

Disclosure of Invention

In view of this, an object of the present application is to provide an image recognition method, an image recognition apparatus, an electronic device, and a computer-readable storage medium, which improve recognition accuracy and reduce requirements for GPU devices.

In order to solve the above technical problem, the present application provides an image recognition method, including:

acquiring an initial classification layer, an initial image recognition model and a training set;

extracting first training data corresponding to a first label from the training set, and determining a first class central vector corresponding to the first label in the initial classification layer;

generating a training classification layer by using the first class central vector, and training the training classification layer and the initial image recognition model in a video memory by using the first training data to obtain a first classification layer;

updating the initial classification layer by using the first classification layer, and updating the first label until the initial image recognition model is trained, so as to obtain an image recognition model;

and acquiring an image to be recognized, and inputting the image to be recognized into the image recognition model to obtain a recognition result.

Optionally, the updating the first tag includes:

acquiring all training labels corresponding to the training set;

extracting a number of target training labels from the training labels based on the uniform distribution, and determining the target training labels as the first labels.

Optionally, the generating a training classification layer by using the first type center vector includes:

generating a second label corresponding to the first label;

and sequencing the first class central vectors according to the second label to obtain the training classification layer.

Optionally, the training classification layer and the initial image recognition model in a video memory by using the first training data includes:

performing identification processing on the first training data by using the second label to obtain second training data;

and training the training classification layer and the initial image recognition model in a video memory by using the second training data based on a target loss function.

Optionally, the sorting the first class central vectors according to the second label to obtain the training classification layer includes:

sequencing the first type central vectors according to the second labels to obtain a vector matrix;

carrying out normalization processing on the vector matrix to obtain the training classification layer;

correspondingly, the method also comprises the following steps:

and carrying out output normalization processing on the initial image recognition model.

acquiring a first learning rate corresponding to the training classification layer and a second learning rate corresponding to the initial image recognition model by using training frequency information; a first convergence rate corresponding to the first learning rate is greater than a second convergence rate corresponding to the second learning rate, and a first initial value corresponding to the first learning rate is greater than a second initial value corresponding to the second learning rate;

training the training classification layer and the initial image recognition model in the video memory based on the first learning rate and the second learning rate.

Optionally, the updating the initial classification layer by using the first classification layer includes:

determining a corresponding second classification layer of the first classification layer in the initial classification layer;

extracting a first parameter corresponding to the first classification layer and a second parameter corresponding to the second classification layer;

calculating a target parameter using a sliding update factor, the first parameter, and the second parameter, and replacing the second parameter with the target parameter.

The present application also provides an image recognition apparatus, including:

the initial data acquisition module is used for acquiring an initial classification layer, an initial image recognition model and a training set;

the data extraction module is used for extracting first training data corresponding to a first label from the training set and determining a first class central vector corresponding to the first label in the initial classification layer;

the training module is used for generating a training classification layer by using the first class central vector, and training the training classification layer and the initial image recognition model in a video memory by using the first training data to obtain a first classification layer;

the updating module is used for updating the initial classification layer by using the first classification layer and updating the first label until the initial image recognition model is trained, so as to obtain an image recognition model;

and the identification module is used for acquiring an image to be identified and inputting the image to be identified into the image identification model to obtain an identification result.

The present application further provides an electronic device comprising a memory and a processor, wherein:

the memory is used for storing a computer program;

the processor is configured to execute the computer program to implement the image recognition method.

The present application also provides a computer-readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the image recognition method described above.

The image identification method provided by the application comprises the steps of obtaining an initial classification layer, an initial image identification model and a training set; extracting first training data corresponding to the first label from the training set, and determining a first class central vector corresponding to the first label in the initial classification layer; generating a training classification layer by using the first class central vector, and training the training classification layer and an initial image recognition model in a video memory by using first training data to obtain a first classification layer; updating the initial classification layer by using the first classification layer, and updating the first label until the training of the initial image recognition model is finished to obtain an image recognition model; and acquiring an image to be recognized, and inputting the image to be recognized into the image recognition model to obtain a recognition result.

Therefore, after the training set is obtained, the initial classification layer and the initial image recognition model are not trained by all data in each training, and first training data with first labels are extracted from the initial classification layer and the initial image recognition model. Correspondingly, the first class central vector corresponding to the first label in the initial classification layer is extracted, and a training classification layer is formed by using the first class central vector, namely, the training set and the corresponding classification layer are compressed. And training the training classification layer and the initial image recognition model in a video memory to obtain a trained first classification layer. After one-time training is finished, the initial classification layer is updated by the first classification layer, meanwhile, the first label is updated so as to carry out iterative training, an image recognition model is obtained until the initial image recognition model is trained, and the image to be recognized is recognized by the image recognition model. The method allows a larger classification layer to be set, so that the identification accuracy of the obtained image identification model is improved; the requirements on GPU equipment can be reduced, and GPU equipment with smaller video memory is selected for training. Specifically, by selecting part of data in the training set and training the corresponding part in the initial classification layer by using the data, the occupation of the training data and the classification layer on the video memory space can be reduced, so that the requirement on GPU equipment is reduced, and the GPU equipment with larger video memory is not required to be adopted because of adopting a larger classification layer. Meanwhile, a larger classification layer can be adopted, and the image recognition model obtained by training has stronger recognition capability and higher recognition accuracy. The problems of low identification accuracy and high requirement on GPU equipment in the related technology are solved.

In addition, the application also provides an image recognition device, electronic equipment and a computer readable storage medium, and the image recognition device, the electronic equipment and the computer readable storage medium also have the beneficial effects.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments or related technologies of the present application, the drawings used in the description of the embodiments or related technologies are briefly introduced below, it is obvious that the drawings in the description below are only the embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a flowchart of an image recognition method according to an embodiment of the present disclosure;

FIG. 2 is a flowchart illustrating a specific training process of an image recognition model according to an embodiment of the present disclosure;

FIG. 3 is a graph of learning rate provided by an embodiment of the present application;

fig. 4 is a schematic structural diagram of an image recognition apparatus according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, fig. 1 is a flowchart of an image recognition method according to an embodiment of the present disclosure. The method comprises the following steps:

s101: and acquiring an initial classification layer, an initial image recognition model and a training set.

The initial classification layer and the initial image recognition model are untrained classification layers and recognition models, and the specific form is not limited. The training set comprises training data, each training data has a corresponding label, and the training data with the same label belongs to the training data with the same category. It will be appreciated that the number of labels (i.e. the number of classes) corresponding to all training data in the training set is limited, and may be represented by N, for example, and in correspondence with which the initial classification layer should also have the same number of class center vectors, i.e. N class center vectors, corresponding to each label. Therefore, when the training set comprises a larger number of labels, the class center vectors in the classification layer are larger, so that the classification layer is larger, and the recognition capability of the image recognition model obtained by training through the classification layer is stronger, and the recognition accuracy is higher. The embodiment does not limit the specific content of the training data in the training set, and may be, for example, a human face image, or a flower image. In this embodiment, the initial classification level may be represented by W, which is specifically N × L dimension, where L is a feature vector dimension, i.e., a class center vector dimension.

In the related art, because the GPU device has a limited memory, the size of the classification layer is limited by many factors, so that the size of the classification layer cannot be arbitrarily enlarged, and therefore, when the recognition accuracy of the image recognition model is to be improved, the GPU device with a larger memory needs to be used, and the requirement on the GPU device is higher. If GPU equipment with larger video memory cannot be adopted, the size of the classification layer can be limited, and the identification accuracy of the image identification network is poor.

S102: and extracting first training data corresponding to the first label from the training set, and determining a first class central vector corresponding to the first label in the initial classification layer.

In order to solve the above problem, the present embodiment does not perform training with all data in the training set in each training round, but extracts part of the data from the training set to participate in the training. Specifically, the first label may be one or more arbitrary labels corresponding to data in the training set, and specific content is not limited, and in this embodiment, M represents the number of the first labels, that is, the number of the first labels. The first label is a label corresponding to data participating in the training in the current round of training, so that the first label is used for screening the training set and extracting the first training data corresponding to the first label, and the specific quantity of the first training data is not limited. Correspondingly, a corresponding first-class central vector can be selected from the initial classification layer according to the first label so as to train the first-class central vector. It can be understood that, by extracting the first training data and the first class central vector, the effect of pruning the training set and the initial classification layer can be achieved, and the recognition capability of the initial recognition model on the first training data can be trained by using the pruning function.

It should be noted that, in this embodiment, a specific obtaining process of the first tag is not limited, for example, the first number of tags may be determined, where the number is smaller than the number of tags corresponding to the training set, and the first tag is randomly selected based on the first number of tags. Or the first tag can be selected according to preset information or the acquired first tag selection instruction.

S103: and generating a training classification layer by using the first class central vector, and training the training classification layer and the initial image recognition model in the video memory by using first training data to obtain the first classification layer.

After the first-class central vector is obtained, a training classification layer, namely a sub-classification layer of the initial classification layer, is generated by using the first-class central vector. The embodiment does not limit the specific way of generating the training classification layer, for example, each first-class central vector may be spliced to form the training classification layer, and it can be understood that the training classification layer is necessarily smaller than the initial classification layer because the training classification layer only includes the first-class central vector. After the training classification layer is obtained, the training classification layer and the initial image recognition model are trained in the video memory by using first training data, namely the first training data, the training classification layer and the initial image recognition model are input into the video memory of GPU equipment so as to be trained in the video memory. The embodiment does not limit the specific training process, in the training process, the parameters of the training classification layer and the initial image recognition model are changed, and the training classification layer obtained after the training is finished is the first classification layer. In this embodiment, the training classification layer may be W _M And (4) showing. And after the training is finished, the training classification layer is the first classification layer.

Specifically, in a possible implementation, the step of generating the training classification layer by using the first-class center vector may include:

step 11: and generating a second label corresponding to the first label.

Step 12: and sequencing the first class central vectors according to the second label to obtain a training classification layer.

In this embodiment, the first label may be referred to as an old label, and the second label may be referred to as a new label. Each class center vector in the training classification layer corresponds to a first label and first training data respectively, the first label is generally a digital label, when N digital labels are total, the value range of the first label is from 0 to N-1, and the first class center vector and the first label are corresponded by using the size sequence of the labels. The first label may not be a continuous label, and in order to accurately and intuitively indicate the correspondence between the label, the training data, and the class center vector, a second label may be generated and used as a new label to participate in the training.

Specifically, on the premise that the number of the first tags is M, the second tags may be 0 to M-1, each of the first tags and the second tags correspond to each other one by one, and the specific correspondence relationship between the first tags and the second tags is not limited, for example, the first tags may be arranged from large to small, the first tags and the second tags may be corresponding to each other according to the size order of the first tags and the size order of the second tags, and each first-type central vector corresponding to the first tag may be corresponding to the second tag. Or when the first tag has a selection order, the first tag and the second tag may be corresponded with the selection order and the size order of the second tag. After the second label is obtained, the first type central vectors are sequenced according to the second label, so that the first type central vectors correspond to the second label, and a training classification layer is obtained. Therefore, the corresponding relation among the label, the training data and the class center vector can be visually indicated by using the second label, and only the second label is concerned during training.

Further, in order to reduce the calculation scale of the data, the step of ranking the first-class central vectors according to the second label to obtain the training classification layer may include:

step 21: and sequencing the first type of central vectors according to the second label to obtain a vector matrix.

Step 22: and carrying out normalization processing on the vector matrix to obtain a training classification layer.

Correspondingly, the method also comprises the following steps:

step 23: and carrying out output normalization processing on the initial image recognition model.

After the first-class central vectors are sequenced according to the second label, a vector matrix is directly obtained. The training classification layer can be obtained by normalizing the vector matrix, and the specific process of the normalization processing is not limited and can refer to the related technology. Corresponding to the training classification layer, the output normalization processing should be performed on the initial image recognition model, so that the output feature vector is matched with the classification layer. In this embodiment, the vector matrix can be represented by W', and the normalization process is

W _j Is a first type of center vector with a first label j.

After the training classification layer is obtained by ordering the first-class central vectors by using the second label, the step of training the training classification layer and the initial image recognition model in the video memory by using the first training data may include:

step 31: and identifying the first training data by using the second label to obtain second training data.

Step 32: and training the training classification layer and the initial image recognition model in the video memory by using second training data based on the target loss function.

Since the first-class central vectors are ordered according to the second labels and correspond to the second labels, the first training data needs to be labeled by the second labels to obtain corresponding second training data, and training is performed by the second training data. The whole training process of the training classification layer can be trained for S times, the second training data participating in training can be selected according to the batch size in each training, the training is carried out based on the target loss function, and the current training times of the training classification layer can be represented by S. The specific content of the target loss function is not limited, for example, the target loss function may be an arcface loss function, which is an improvement on the softmax loss function, and the arcface loss function normalizes the image feature vector and the classification layer respectively, so that the result of multiplication is exactly the cosine included angle between the image feature vector and each class feature vector of the classification layer, and meanwhile, a margin is added to the classification boundary, so that the inter-class classification is clearer.

Further, the total training times of the initial image recognition network may be represented by T, and the current training times may be represented by T. Because the initial image recognition network participates in the whole training, and each class center vector in the classification layer only participates in the partial training, the total training times of the initial classification layer is smaller than the total training times of the initial image recognition network. In order to balance the training degree of the classification layer and the image recognition network, the step of training the classification layer and the initial image recognition model in the video memory by using the first training data may include:

step 41: and acquiring a first learning rate corresponding to the training classification layer and a second learning rate corresponding to the initial image recognition model by using the training frequency information.

Step 42: and training a training classification layer and an initial image recognition model in the video memory based on the first learning rate and the second learning rate.

In order to balance the training degrees of the classification layer and the image recognition network, different learning rates may be set for them, wherein the training classification layer corresponds to a first learning rate, the initial image recognition model corresponds to a second learning rate, a first convergence rate corresponding to the first learning rate is greater than a second convergence rate corresponding to the second learning rate, and a first initial value corresponding to the first learning rate is greater than a second initial value corresponding to the second learning rate. That is, the first learning rate is larger than the second learning rate at the start of training and is smaller than the second learning rate at the end of training, so that the balance of the training degree between the two is maintained in the case where the number of times of training of the initial classification layer is smaller than that of the initial image recognition model. Specifically, after the training classification layer is determined, the first learning rate and the second learning rate may be obtained according to the training frequency information. The training time information may include a total training time of the initial image recognition network, a current training time of the initial image recognition network, a total training time of a current training classification layer, and a current training time of the current training classification layer. The total training times of the initial image recognition network are preset training times in the whole training process, namely parameters T, and the specific size of the parameters T can be set as required; the current training times of the initial image recognition network are the times of the initial image recognition network which has been trained, namely the parameters t; the total training times of the current training classification layer is the preset training times of the current training classification layer, namely a parameter S, and the specific size of the parameter S can be set according to needs; the current training times of the current training classification layer is the time when the current training classification layer is already trained, namely the parameter s.

Further, to ensure that the training classification layer can be sufficiently trainedThe first learning rate should be zeroed out at S = S. Based on this, in a specific embodiment, the first learning rate may be represented by Lr ^t _class The second learning rate may be represented by Lr ^t _face And then:

wherein, lr _class Is the first initial learning rate, lr _fece Is a second initial learning rate, β is a first decay rate, α is a second decay rate, and β is greater than α. Lr _class 、Lr _fece The specific sizes of β, α, and S may be set as required, and this embodiment is not limited. Referring to fig. 3, fig. 3 is a graph illustrating a learning rate according to an embodiment of the present disclosure. Wherein, the classification layer learning rate is the first learning rate, and the identification network learning rate is the second learning rate. It can be seen that the first learning rate and the second learning rate become smaller as the argument t changes, and at the start of training, the first learning rate is larger than the second learning rate and then converges to 0 quickly at the faster first convergence rate. After regenerating the training network layer and updating the first training data, the first learning rate is still greater than the second learning rate at the beginning of training, and then converges quickly to 0 also at the faster first convergence rate. It should be noted that step 41 and step 42 may be executed separately, and may also be executed on the basis of the schemes of step 11 to step 12, step 21 to step 23, step 31 to step 32, and the like.

Referring to fig. 2, fig. 2 is a flowchart of a specific training process of an image recognition model according to an embodiment of the present application, after obtaining an initial classification layer of N × L, selecting M first-class central vectors from the initial classification layer, and performing normalization processing on the M first-class central vectors to obtain a corresponding training classification layer. At the same time, the output is X _i The initial recognition network carries out output normalization processing to make the output of the initial recognition network match with the training classification layerAnd further utilizing the first training data, the training classification layer and the initial recognition network to train based on the Softmaxloss function. A first classification layer is obtained after training.

S104: and updating the initial classification layer by using the first classification layer, and updating the first label until the training of the initial image recognition model is finished, so as to obtain the image recognition model.

After training of the initial image recognition network and the training classification layer is completed, and the trained initial image recognition network and the trained first classification layer are obtained, the initial classification layer is updated by the first classification layer, and meanwhile, the first label is updated, so that the first label is reused to determine the first training data and perform iterative training until the training of the initial image recognition model is completed, namely when T = T, the initial image recognition model is determined as the image recognition model. The training of the initial classification layer is completed by updating the initial classification layer according to the first classification layer, so that the initial image recognition model can be trained more effectively during subsequent iterative training. The embodiment does not limit the specific updating manner of the initial classification layer, and for example, the parameters in the first classification layer may be used to replace the corresponding parameters in the initial classification layer. Specifically, in a possible implementation manner, in order to prevent the problem of unstable training direction, a moving average strategy may be adopted to update the initial classification layer in combination with the historical parameters. In this case, the step of updating the initial classification layer with the first classification layer may include:

step 51: and determining a corresponding second classification layer of the first classification layer in the initial classification layers.

Step 52: and extracting a first parameter corresponding to the first classification layer and a second parameter corresponding to the second classification layer.

Step 53: and calculating a target parameter by using the sliding update factor, the first parameter and the second parameter, and replacing the second parameter by using the target parameter.

Because the first classification layer only comprises part of the trained first class central vectors, only the corresponding second classification layer of the first classification layer in the initial classification layer can be updated, and the second classification layer is the trained first class central vectorsA corresponding classification level in the initial classification level. Specifically, each first-class central vector in the first classification layer corresponds to a second tag, the second tag corresponds to the first tag, and the first tag corresponds to the first-class central vector in the initial classification layer, and the second classification layer corresponding to the first classification layer may be determined by using the correspondence relationship. After the second classification layer is determined, the first parameter is extracted from the first classification layer, and the second parameter is extracted from the second classification layer as a history parameter. The sliding update factor is used for calculating the weight of the first parameter and the second parameter in the target parameter during updating, and the specific size is not limited. The target parameter calculated based on the sliding update factor and the second parameter does not change too severely, and the problem of unstable training direction can be avoided. In one embodiment, the target parameter may be W _j ' represents, wherein:

W _j ′＝(1-λ)W _j +λW _i

wherein, W _i Is a first parameter, W _j For the second parameter, λ is the sliding update factor, which takes the value of (0,1).

By updating the first label, the first training data can be reselected, and different first class central vectors in the initial classification layer are trained by using the new first training data, so that the training of the whole initial classification layer is completed. In a specific embodiment, the step of updating the first tag may include:

step 61: and acquiring all training labels corresponding to the training set.

Step 62: and extracting a plurality of target training labels from the training labels based on the uniform distribution, and determining the target training labels as first labels.

In this embodiment, the target training labels may be extracted from all training labels corresponding to the training set in an evenly distributed manner, and the extracted target training labels are the first labels. The mode of extracting the first labels based on uniform distribution can ensure that the extracted times of all the training labels are similar as much as possible, so that the initial classification layer is trained as comprehensively as possible. By using the updating method from step 61 to step 62, the first label may be updated, after the updating, the first training data may be reselected according to the first label, and the training classification layer reselected from the initial classification layer (i.e. the training classification layer composed of the first type central vector different from the previous one) is trained by using the new first training data, and the training of the entire initial classification layer and the initial image recognition model is completed through multiple iterations of training and updating. It should be noted that the methods of steps 61 to 62 can also be used to select the first label before the first training is started.

S105: and acquiring an image to be recognized, and inputting the image to be recognized into the image recognition model to obtain a recognition result.

After the image recognition model is obtained, the image to be recognized can be recognized by using the image recognition model to obtain a corresponding recognition result, and a specific recognition process is not limited, and related technologies can be referred to, which are not described herein again.

By applying the image recognition method provided by the embodiment of the application, after the training set is obtained, the initial classification layer and the initial image recognition model are not trained by all data in each training, and first training data with a first label is extracted from the initial classification layer and the initial image recognition model. Correspondingly, the first class central vector corresponding to the first label in the initial classification layer is extracted, and a training classification layer is formed by using the first class central vector, namely, the training set and the corresponding classification layer are compressed. And training the training classification layer and the initial image recognition model in a video memory to obtain a trained first classification layer. After one-time training is finished, the initial classification layer is updated by the first classification layer, meanwhile, the first label is updated so as to carry out iterative training, an image recognition model is obtained until the initial image recognition model is trained, and the image to be recognized is recognized by the image recognition model. The method allows a larger classification layer to be set, so that the identification accuracy of the obtained image identification model is improved; the requirements on GPU equipment can be reduced, and GPU equipment with smaller video memory is selected for training. Specifically, by selecting part of data in the training set and training the corresponding part in the initial classification layer by using the data, the occupation of the training data and the classification layer on the video memory space can be reduced, so that the requirement on GPU equipment is reduced, and the GPU equipment with larger video memory is not required to be adopted because of adopting a larger classification layer. Meanwhile, a larger classification layer can be adopted, and the image recognition model obtained by training has stronger recognition capability and higher recognition accuracy. The problems of low identification accuracy and high requirement on GPU equipment in the related technology are solved.

In the following, the image recognition apparatus provided by the embodiment of the present application is introduced, and the image recognition apparatus described below and the image recognition method described above may be referred to correspondingly.

Referring to fig. 4, fig. 4 is a schematic structural diagram of an image recognition apparatus according to an embodiment of the present disclosure, including:

an initial data obtaining module 110, configured to obtain an initial classification layer, an initial image recognition model, and a training set;

the data extraction module 120 is configured to extract first training data corresponding to the first label from the training set, and determine a first class central vector corresponding to the first label in the initial classification layer;

the training module 130 is configured to generate a training classification layer by using the first class central vector, and train the training classification layer and the initial image recognition model in the video memory by using the first training data to obtain a first classification layer;

the updating module 140 is configured to update the initial classification layer by using the first classification layer, and update the first label until the initial image recognition model is trained, so as to obtain an image recognition model;

and the identification module 150 is configured to obtain an image to be identified, and input the image to be identified into the image identification model to obtain an identification result.

Optionally, the updating module 140 includes:

the training label acquiring unit is used for acquiring all training labels corresponding to the training set;

and the uniform distribution extraction unit is used for extracting a plurality of target training labels from the training labels based on uniform distribution and determining the target training labels as the first labels.

Optionally, the training module 130 comprises:

the second label generating unit is used for generating a second label corresponding to the first label;

and the sequencing unit is used for sequencing the first type central vectors according to the second label to obtain a training classification layer.

Optionally, the training module 130 comprises:

the identification unit is used for carrying out identification processing on the first training data by utilizing the second label to obtain second training data;

and the first training unit is used for training the training classification layer and the initial image recognition model in the video memory based on the target loss function by utilizing the second training data.

Optionally, the sorting unit comprises:

the sorting subunit is used for sorting the first type of central vectors according to the second label to obtain a vector matrix;

the normalization processing subunit is used for performing normalization processing on the vector matrix to obtain a training classification layer;

correspondingly, the method also comprises the following steps:

and the output normalization processing subunit is used for performing output normalization processing on the initial image identification model.

Optionally, the training module 130 comprises:

the learning rate obtaining unit is used for obtaining a first learning rate corresponding to the training classification layer and a second learning rate corresponding to the initial image recognition model by using the training frequency information; a first convergence rate corresponding to the first learning rate is greater than a second convergence rate corresponding to the second learning rate, and a first initial value corresponding to the first learning rate is greater than a second initial value corresponding to the second learning rate;

and the second training unit is used for training the training classification layer and the initial image recognition model in the video memory based on the first learning rate and the second learning rate.

Optionally, the updating module 140 includes:

the second classification layer determining unit is used for determining a corresponding second classification layer of the first classification layer in the initial classification layers;

the parameter extraction unit is used for extracting a first parameter corresponding to the first classification layer and a second parameter corresponding to the second classification layer;

and the parameter updating unit is used for calculating the target parameter by using the sliding updating factor, the first parameter and the second parameter and replacing the second parameter by using the target parameter.

In the following, the electronic device provided by the embodiment of the present application is introduced, and the electronic device described below and the image recognition method described above may be referred to correspondingly.

Referring to fig. 5, fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. Wherein the electronic device 100 may include a processor 101 and a memory 102, and may further include one or more of a multimedia component 103, an information input/information output (I/O) interface 104, and a communication component 105.

The processor 101 is configured to control the overall operation of the electronic device 100 to complete all or part of the steps in the image recognition method; the memory 102 is used to store various types of data to support operation at the electronic device 100, such as instructions for any application or method operating on the electronic device 100 and application-related data. The Memory 102 may be implemented by any type or combination of volatile and non-volatile Memory devices, such as one or more of Static Random Access Memory (SRAM), electrically Erasable Programmable Read-Only Memory (EEPROM), erasable Programmable Read-Only Memory (EPROM), programmable Read-Only Memory (PROM), read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic or optical disk.

The multimedia component 103 may include a screen and an audio component. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in the memory 102 or transmitted through the communication component 105. The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface 104 provides an interface between the processor 101 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 105 is used for wired or wireless communication between the electronic device 100 and other devices. Wireless Communication, such as Wi-Fi, bluetooth, near Field Communication (NFC), 2G, 3G, or 4G, or a combination of one or more of them, so that the corresponding Communication component 105 may include: wi-Fi part, bluetooth part, NFC part.

The electronic Device 100 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors or other electronic components, and is configured to perform the image recognition method according to the above embodiments.

The following describes a computer-readable storage medium provided in an embodiment of the present application, and the computer-readable storage medium described below and the image recognition method described above may be referred to correspondingly.

The present application further provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps of the image recognition method described above.

The computer-readable storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the components and steps of the various examples have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

Finally, it should be further noted that, in this document, relationships such as first and second, etc., are used merely to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any actual relationship or order between these entities or operations. Also, the terms include, or any other variation is intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that includes a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

The principle and the implementation of the present application are explained herein by applying specific examples, and the above description of the embodiments is only used to help understand the method and the core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. An image recognition method, comprising:

acquiring an image to be recognized, and inputting the image to be recognized into the image recognition model to obtain a recognition result;

wherein the updating the first tag comprises: acquiring all training labels corresponding to the training set; extracting a plurality of target training labels from the training labels based on uniform distribution, and determining the target training labels as the first labels;

the initial classification layer and the initial image recognition model are untrained; the training set comprises training data, each training data has a corresponding label, and the training data with the same label belongs to the training data with the same category; the initial classification layer has a class center vector corresponding to a label.

2. The image recognition method of claim 1, wherein the generating a training classification layer using the first class of center vectors comprises:

generating a second label corresponding to the first label;

3. The image recognition method of claim 2, wherein the training classification layer and the initial image recognition model in a video memory using the first training data comprises:

4. The image recognition method of claim 2, wherein the ranking the first class of center vectors according to the second label to obtain the training classification layer comprises:

normalizing the vector matrix to obtain the training classification layer;

correspondingly, the method also comprises the following steps:

5. The image recognition method of claim 1, wherein the training classification layer and the initial image recognition model in a video memory by using the first training data comprises:

6. The image recognition method of claim 1, wherein the updating the initial classification layer with the first classification layer comprises:

7. An image recognition apparatus, comprising:

the identification module is used for acquiring an image to be identified and inputting the image to be identified into the image identification model to obtain an identification result;

wherein the update module is specifically configured to: acquiring all training labels corresponding to the training set; extracting a plurality of target training labels from the training labels based on uniform distribution, and determining the target training labels as the first labels;

the image recognition device is specifically used for enabling the initial classification layer and the initial image recognition model to be untrained; the training set comprises training data, each training data has a corresponding label, and the training data with the same label belongs to the training data with the same category; the initial classification layer has a class center vector corresponding to a label.

8. An electronic device comprising a memory and a processor, wherein:

the memory is used for storing a computer program;

the processor is configured to execute the computer program to implement the image recognition method according to any one of claims 1 to 6.

9. A computer-readable storage medium for storing a computer program, wherein the computer program when executed by a processor implements the image recognition method according to any one of claims 1 to 6.