CN112560791B

CN112560791B - Recognition model training method, recognition method and device and electronic equipment

Info

Publication number: CN112560791B
Application number: CN202011582051.1A
Authority: CN
Inventors: 史晓丽; 张震国; 吴剑平
Original assignee: Suzhou Keda Technology Co Ltd
Current assignee: Suzhou Keda Technology Co Ltd
Priority date: 2020-12-28
Filing date: 2020-12-28
Publication date: 2022-08-09
Anticipated expiration: 2040-12-28
Also published as: CN112560791A

Abstract

The invention relates to the technical field of image processing, in particular to a training method, a recognition method, a device and electronic equipment for a recognition model, wherein the training method comprises the steps of obtaining a first sample image and a second sample image corresponding to the first sample image, wherein the image quality of the first sample image is higher than that of the second sample image; inputting the first sample image into a teacher network to obtain a first characteristic; inputting the second sample image into an identification model to obtain a second characteristic, wherein the identification model sequentially comprises a first image quality conversion network and a student network; determining an image loss based on the image output by the first image quality conversion network and the first sample image; and updating the parameters of the recognition model according to the feature loss and the image loss of the first feature and the second feature so as to determine the target recognition model. The teacher network is used for guiding the training of the recognition network, and the recognition accuracy of the recognition network in a complex environment is improved.

Description

Recognition model training method, recognition method and device and electronic equipment

Technical Field

The invention relates to the technical field of image processing, in particular to a training method, a recognition method and a device of a recognition model and electronic equipment.

Background

The face recognition technology is a technology for acquiring effective identification information from a face image by utilizing computer analysis for identity recognition, and is one of the current mainstream directions due to the characteristics of high practicability, high usability and the like. The existing face recognition method includes performing face recognition on an image by using a face recognition model based on deep learning.

In recent years, with the continuous development of face recognition technology, application scenes thereof are also continuously explored. Under the establishment of safe cities and smart cities, video monitoring is rapidly popularized and upgraded, and numerous video monitoring applications urgently need to efficiently identify and compare with a human face database in real time under a complex environment so as to realize rapid identity identification and intelligent early warning. In the existing recognition mode, under the control and matching conditions, the face recognition can obtain a relatively high recognition rate, but under a complex environment, for example, the image quality is low, the recognition accuracy rate is sharply reduced.

Disclosure of Invention

In view of this, embodiments of the present invention provide a training method, an identification device, and an electronic device for identifying a model, so as to solve the problem of low identification accuracy.

According to a first aspect, an embodiment of the present invention provides a training method for a recognition model, including:

acquiring a first sample image and a second sample image corresponding to the first sample image, wherein the image quality of the first sample image is higher than that of the second sample image;

inputting the first sample image into a teacher network to obtain a first characteristic;

inputting the second sample image into an identification model to obtain a second characteristic, wherein the identification model sequentially comprises a first image quality conversion network and a student network, and the network structure of the student network is the same as that of the teacher network;

determining an image loss based on the image output by the first image quality conversion network and the first sample image;

and updating the parameters of the recognition model according to the feature loss corresponding to the first feature and the second feature and the image loss so as to determine the target recognition model.

According to the training method for the recognition model, the first image quality conversion network is arranged in the recognition model and used for converting an input low-quality image into a high-quality image, difference calculation is carried out on the image output by the first image quality conversion network and a first sample image input into a teacher network to determine image loss, on the basis of the image loss, the teacher network is used for guiding training of the recognition network, and recognition accuracy of the recognition network obtained through training in a complex environment is improved.

With reference to the first aspect, in a first implementation manner of the first aspect, the updating parameters of the recognition model according to the feature loss corresponding to the first feature and the second feature and the image loss to determine a target recognition model includes:

acquiring a target category of the second sample image and a prediction category output by the recognition model;

determining a class loss for the recognition model using the target class and the prediction class;

updating the parameters of the recognition model based on the image loss, the category loss and the feature loss, and determining the target recognition model.

According to the training method for the recognition model, provided by the embodiment of the invention, the recognition network is trained by utilizing the image loss, the category loss and the characteristic loss, so that the characteristic loss between the first characteristic and the second characteristic output by the student network in the recognition model is gradually reduced, and the teacher network guides the training of the recognition model.

With reference to the first implementation manner of the first aspect, in a second implementation manner of the first aspect, the updating parameters of the recognition model based on the image loss, the category loss, and the feature loss to determine the target recognition model includes:

fixing parameters of the teacher network;

determining a joint loss using the image loss, the class loss, and the feature loss;

and updating the parameters of the identification model based on the joint loss and the target identification model.

According to the training method of the recognition model provided by the embodiment of the invention, the first sample image and the second sample image are respectively input into the teacher network and the recognition model, and the parameters of the teacher network are fixed in the training process, so that the teacher network is used for guiding the training of the recognition model, and the accuracy of the recognition result of the recognition model obtained by training in a complex environment is improved.

With reference to the first aspect, in a third implementation manner of the first aspect, the acquiring a first sample image and a corresponding second sample image includes:

acquiring a first sample image;

and inputting the first sample image into the second image quality conversion network to obtain the second sample image.

According to the training method for the recognition model, the second sample image is obtained in an online conversion mode, and the second image quality conversion network is used for converting the high-quality image into the low-quality image, so that the first sample image can be directly used for image quality conversion through the second image conversion network while the recognition model is trained, the second sample image is obtained, the memory space can be saved, and the data volume storage can be reduced.

With reference to the first aspect, in a fourth implementation manner of the first aspect, the acquiring a first sample image and a corresponding second sample image includes:

acquiring a first training sample set and a second training sample set, wherein images in the first training sample set correspond to images in the second training sample set one by one;

and extracting a first sample image in the first training sample set and a second sample image corresponding to the first sample image in the second training sample set.

According to the training method for the recognition model provided by the embodiment of the invention, the second sample image can be obtained in an off-line mode, namely the first training sample set and the second training sample set can be determined before the recognition model is trained, and the corresponding sample images can be directly extracted in the training process, so that the training efficiency is improved.

According to a second aspect, an embodiment of the present invention further provides an identification method, where the method includes:

acquiring an image to be identified;

inputting the image to be recognized into a target recognition model to obtain the recognition features of the image to be recognized, wherein the target recognition model is obtained by training according to the training method of the recognition model in the first aspect of the invention or any one embodiment of the first aspect;

and comparing the identification features with the features to be matched to determine the identification result of the image to be identified.

According to the identification method provided by the embodiment of the invention, on the basis of the identification accuracy of the identification model, the identification model is used for identifying the image to be identified, so that the accuracy of object identification in a complex environment can be improved.

According to a third aspect, an embodiment of the present invention further provides a training apparatus for recognizing a model, including:

the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a first sample image and a second sample image corresponding to the first sample image, and the image quality of the first sample image is higher than that of the second sample image;

the first input module is used for inputting the first sample image into a teacher network to obtain a first characteristic;

the second input module is used for inputting the second sample image into an identification model to obtain a second characteristic, the identification model sequentially comprises a first image quality conversion network and a student network, and the network structure of the student network is the same as that of the teacher network;

a determining module, configured to determine an image loss based on the image output by the first image quality conversion network and the first sample image;

and the updating module is used for updating the parameters of the recognition model according to the characteristic loss corresponding to the first characteristic and the second characteristic and the image loss so as to determine the target recognition model.

According to the training device for the recognition model, provided by the embodiment of the invention, the first image quality conversion network is arranged in the recognition model and used for converting an input low-quality image into a high-quality image, difference calculation is carried out on the image output by the first image quality conversion network and a first sample image input into a teacher network to determine image loss, on the basis of the image loss, the teacher network is used for guiding the training of the recognition network, and the recognition accuracy of the recognition network obtained by training in a complex environment is improved.

According to a fourth aspect, an embodiment of the present invention further provides an identification apparatus, including:

the second acquisition module is used for acquiring an image to be identified;

a third input module, configured to input the image to be recognized into a target recognition model, so as to obtain recognition features of the image to be recognized, where the target recognition model is obtained by training according to the training method of the recognition model in the first aspect of the present invention or any embodiment of the first aspect;

and the identification module is used for comparing the identification features with the features to be matched and determining the identification result of the image to be identified.

The identification device provided by the embodiment of the invention utilizes the identification model to identify the image to be identified on the basis of the identification accuracy of the identification model, so that the accuracy of object identification in a complex environment can be improved.

According to a fifth aspect, an embodiment of the present invention provides an electronic device, including: a memory and a processor, the memory and the processor being communicatively connected to each other, the memory storing therein computer instructions, and the processor executing the computer instructions to perform the training method for the recognition model according to the first aspect or any one of the embodiments of the first aspect, or to perform the recognition method according to the second aspect.

According to a sixth aspect, an embodiment of the present invention provides a computer-readable storage medium storing computer instructions for causing a computer to execute the training method of the recognition model described in the first aspect or any one of the implementation manners of the first aspect, or execute the recognition method described in the second aspect.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a flow diagram of a training method of a recognition model according to an embodiment of the invention;

FIG. 2 is a flow diagram of a training method of a recognition model according to an embodiment of the present invention;

FIG. 3 is a flow diagram of a training method of a recognition model according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating a training structure of a recognition model according to an embodiment of the present invention;

FIG. 5 is a flow chart of an identification method according to an embodiment of the invention;

FIG. 6 is a block diagram of a training apparatus for recognizing a model according to an embodiment of the present invention;

fig. 7 is a block diagram of a structure of a recognition apparatus according to an embodiment of the present invention;

fig. 8 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

For the purposes of the following description, the terms referred to herein in the examples of the invention are to be construed as follows:

and the teacher network inputs the image under the normal environment and outputs the image as the characteristic of the image.

The student network has the same network structure as the teacher network and is used for extracting the features of the input image, and the specific structural details of the student network and the teacher network are not limited at all, and only the features of the input image can be extracted.

A first image quality conversion network for converting the low-quality image into a high-quality image.

And the second image quality conversion network is used for converting the high-quality image into the low-quality image. The first image quality conversion network and the second image quality conversion network may adopt a countermeasure network (for example, a CycleGan network) or may adopt other network structures, and a specific network structure thereof is not limited at all.

In accordance with an embodiment of the present invention, there is provided an embodiment of a training method for recognition models, it is noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer-executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.

In this embodiment, a training method for a recognition model is provided, which can be used in electronic devices, such as a computer, a mobile phone, a tablet computer, and the like, fig. 1 is a flowchart of a training method for a recognition model according to an embodiment of the present invention, and as shown in fig. 1, the flowchart includes the following steps:

and S11, acquiring the first sample image and the corresponding second sample image.

Wherein the image quality of the first sample image is higher than the image quality of the second sample image.

The first sample image is a high-quality sample image, the second sample image is a low-quality sample image, and specifically, a sample image data set under a complex environment can be obtained as a low-quality training sample image data set, and sample image data under a controllable environment can be obtained as a high-quality training sample data set.

Whether the acquired sample image is a high-quality image or a low-quality image may be inferred by using an image quality evaluation network, may also be calibrated by a human, and the like, and a specific determination manner is not limited in any way.

After the high-quality image is acquired, the electronic equipment converts the high-quality image into a corresponding low-quality image by using an image quality conversion network; or after the low-quality image is acquired, the low-quality image is converted into a corresponding high-quality image by using an image quality conversion network. That is, the sample images used for training the recognition model are paired, i.e., the first sample image and the second sample image correspond to each other one by one.

Details about this step will be described later.

And S12, inputting the first sample image into the teacher network to obtain the first characteristic.

The teacher network may be obtained by training using a high-quality image, for example, after the electronic device acquires the high-quality training sample data set, the teacher network may be trained using an image in the high-quality training sample data set; the teacher network may be trained in a third-party device, and the electronic device may directly acquire the teacher network from the third-party device. The manner in which the teacher's network is obtained is not limited in any way.

After the electronic equipment acquires the teacher network, the first sample image is input into the teacher network, and the first characteristic is output. The teacher network is used for extracting features of the input image to obtain first features. The specific network structure of the teacher network can be set correspondingly according to actual conditions.

And S13, inputting the second sample image into the recognition model to obtain a second feature.

The identification model sequentially comprises a first image quality conversion network and a student network, and the network structure of the student network is the same as that of the teacher network.

And the electronic equipment inputs the second sample image corresponding to the first sample image into the recognition model to obtain a second characteristic.

As described above, the network structure of the student network is the same as that of the teacher network, and the input of the student network is the high-quality sample image converted by the first image quality conversion network, and the output is the second feature.

Further, the electronic device simultaneously inputs the first sample image into the teacher network and the second sample image into the first image quality conversion network in the recognition model. The teacher network outputs a first feature, the first image quality network performs quality conversion on the second sample image to obtain a high sample image, the high sample image obtained after conversion is input into the student network, and the student network performs feature extraction on the high sample image to obtain a second feature.

S14, an image loss is determined based on the image output by the first image quality conversion network and the first sample image.

And adding a first image quality conversion network into the recognition model, so that the recognition model can be trained by using a second sample image under the complex environment, and the recognition model capable of recognizing the image under the complex environment is obtained.

Since the first image quality conversion network is used to convert a low-quality image into a high-quality image, and the second sample image input into the first image quality conversion network corresponds to the first sample image, the high-quality image output from the first image quality network should theoretically be the same as the first sample image. Therefore, the electronic device determines an image loss using a difference between the image output by the first image quality conversion network and the first sample image, and makes the image loss approach zero after training.

And S15, updating the parameters of the recognition model according to the feature loss and the image loss corresponding to the first feature and the second feature so as to determine the target recognition model.

The electronic equipment updates the parameters of the recognition model based on the characteristic loss and the image loss corresponding to the first characteristic and the second characteristic, namely, the teacher network is used for guiding the training of the student network; furthermore, the prediction classification is used for updating parameters of the recognition model from the recognition model, so that the recognition accuracy of the target recognition model obtained through training is improved.

In the training method for the recognition model provided by this embodiment, a first image quality conversion network is set in the recognition model, and is used to convert an input low-quality image into a high-quality image, and perform difference calculation on an image output by the first image quality conversion network and a first sample image input to a teacher network to determine image loss.

In this embodiment, a training method for a recognition model is provided, which can be used in electronic devices, such as computers, mobile phones, tablet computers, and the like, fig. 2 is a flowchart of a training method for a recognition model according to an embodiment of the present invention, and as shown in fig. 2, the flowchart includes the following steps:

and S21, acquiring the first sample image and the corresponding second sample image.

Please refer to S11 in fig. 1, which is not described herein again.

And S22, inputting the first sample image into the teacher network to obtain the first characteristic.

Please refer to S12 in fig. 1, which is not described herein again.

And S23, inputting the second sample image into the recognition model to obtain a second feature.

Please refer to S13 in fig. 1, which is not described herein again.

S24, an image loss is determined based on the image output by the first image quality conversion network and the first sample image.

Please refer to S14 in fig. 1, which is not described herein again.

And S25, updating the parameters of the recognition model according to the feature loss and the image loss corresponding to the first feature and the second feature so as to determine the target recognition model.

Specifically, the above S25 includes the following steps:

and S251, acquiring the target class of the second sample image and the prediction class output by the recognition model.

The target type of the second sample image is the same as the target type of the first sample image, and may be determined in a human labeling manner or in other manners, which is not limited herein, and only the electronic device needs to be ensured to be able to acquire the target type of the second sample image.

The electronic device may predict a category of the second sample image based on the second feature, resulting in a predicted category. For example, the recognition model includes a first image quality conversion network and a student network connected in sequence, and a full connection layer is further connected to an output end of the student network, and is used for performing class prediction based on a second feature output by the student network to obtain a prediction class.

S252 determines the category loss using the target category and the prediction category.

As described above, the electronic device obtains the prediction category from the output of the full connection layer connected behind the student network, and calculates the difference between the target category and the prediction category by using the loss function, so that the category loss can be determined.

And S253, updating the parameters of the recognition model based on the image loss, the category loss and the feature loss, and determining the target recognition model.

Wherein the image loss is caused by the samples input into the teacher network and the first image quality conversion network, and the processing of the first image quality conversion network. Since the first image quality conversion network converts a low-quality image into a high-quality image, the recognition model can recognize images in a complex environment.

From the training perspective, the electronic device updates the parameters of the recognition model by using the image loss, the category loss and the feature loss, so that the feature loss is gradually reduced, and the target recognition model can be determined. For example, the electronic device may set a threshold value of the feature loss, update the parameters of the recognition model after each training, calculate the feature loss, and stop the training of the recognition model if the calculated feature loss is within the threshold value range of the feature loss.

The student network and the teacher network have the same network structure, accordingly, the second characteristic is output from the student network, the first characteristic is output from the teacher network, and the electronic equipment can calculate the difference between the first characteristic and the second characteristic by using the loss function, so that the characteristic loss can be determined.

As an optional implementation manner of this embodiment, the step S253 may include the following steps:

(1) parameters of the teacher network are fixed.

And in the process of training the recognition model, the electronic equipment ensures that parameters of the teacher network are unchanged, and guides the training of the recognition model by using the teacher network.

(2) The joint loss is determined using the image loss, the class loss, and the feature loss.

The electronic device may obtain weights of the image loss, the category loss, and the feature loss, and calculate a weighted sum of the image loss, the category loss, and the feature loss using the obtained weights, thereby determining the joint loss.

(3) And updating the parameters of the identification model based on the joint loss to determine the target identification model.

After determining the joint loss, the electronic device updates parameters of the recognition model based on the joint loss to reduce the characteristic loss, and then determines the target recognition model.

And inputting the first sample image and the second sample image into a teacher network and an identification model respectively, and fixing parameters of the teacher network in a training process so as to guide the training of the identification model by using the teacher network and improve the accuracy of the identification model obtained by training.

According to the training method for the recognition model, the recognition network is trained by utilizing the image loss, the category loss and the feature loss, so that the difference between the second feature and the first feature output by the student network in the recognition model is gradually reduced, and the teacher network guides the training of the recognition model.

In this embodiment, a training method for a recognition model is provided, which can be used in electronic devices, such as computers, mobile phones, tablet computers, and the like, fig. 3 is a flowchart of a training method for a recognition model according to an embodiment of the present invention, and as shown in fig. 3, the flowchart includes the following steps:

and S31, acquiring the first sample image and the corresponding second sample image.

As described above, since the acquisition of the second sample image is difficult, the number of samples thereof needs to be extended by means of the image quality conversion network. Therefore, in order to simulate the face image in the complex environment, an image quality conversion network needs to be used for amplifying the image, and the aim is to add the face image in the complex environment into a training sample, which cannot be obtained in practice, and only can be amplified by an algorithm.

The image amplification can be carried out on line or off line. The on-line operation is that when the recognition model is trained, a high-quality image is converted into a low-quality image by using an image conversion network, and then the low-quality image obtained by conversion is input into a first image quality conversion network of the recognition model. The offline processing may be to convert the high-quality image into the low-quality image by using an image conversion network before training the recognition model, so as to directly extract the corresponding sample image when training the recognition model.

Regarding the acquisition of the sample image, the electronic device may acquire the original image, perform target detection on the original image to obtain target key points in the original image, and then scale the original image to a preset size based on positions of the target key points in the original image to obtain the sample image. The target may be a human face, a vehicle, or a license plate, etc., and the target is not limited herein, and may be set according to an actual situation.

The original images are scaled to the preset size, so that the recognition model only needs to recognize the images with the preset size, and the recognition accuracy of the recognition model can be improved.

Further optionally, the original image may be an image acquired in a monitoring scene. Since the object in the monitored scene is constantly moving, continuous still pictures are few. Therefore, when the target detection is carried out, the electronic equipment can carry out the target detection only in the motion area by using the target detection algorithm, thereby improving the efficiency among the targets.

The target detection algorithm may be a Multi-task convolutional neural network (MTCNN) algorithm; alternatively, the DenseBox algorithm; or SSH algorithm, etc., and the specific type structure of the target detection algorithm is not limited in any way in the embodiments of the present invention.

Taking the online image quality as an example, the step S31 may include the following steps:

s311, a first sample image is acquired.

The first sample image may be an image acquired under a controlled environment, which corresponds to a high quality image.

S312, the first sample image is input into a second image quality conversion network, so as to obtain a second sample image.

When the recognition model is trained, the first sample image is input into a second image quality conversion network to obtain a second sample image. And then training the recognition model by utilizing the second sample image and the first sample image corresponding to the second sample image.

The second image quality conversion network is used for converting a high-quality image into a low-quality image, and the first image quality conversion network in the recognition model is used for converting the low-quality image into the high-quality image.

Because the first image quality conversion network is used for converting the low-quality image into the high-quality image, the second sample image is converted by the first image quality conversion network to obtain the high-quality image, and the high-quality image obtained after conversion and the first sample image are subjected to difference calculation to update the parameters of the first image quality conversion network, so that the output image of the first image quality conversion network approaches to the corresponding first sample image, and the accuracy of the identification model is improved.

As an optional implementation manner of this embodiment, the above S31 may also be processed offline. Specifically, the above S31 may include the following steps:

(1) a first training sample set and a second training sample set are obtained.

And the images in the first training sample set correspond to the images in the second training sample set in a one-to-one mode.

The images in the first training sample set are first sample images which are high-quality images acquired under a controllable environment, and the images in the second training sample set are second sample images which are second sample images corresponding to the first sample images. The second training sample set may be obtained by performing image processing on the first sample image in the first training sample set, for example, before training the recognition model, the first sample image in the first training sample set is converted into the second sample image by using a second quality conversion network.

(2) And extracting a first sample image in the first training sample set and a second sample image corresponding to the first sample image in the second training sample set.

After the first training sample set and the second training sample set are obtained, the electronic device can train the recognition model. When the recognition model is trained, first sample images are respectively extracted from a first training sample set, and second sample images corresponding to the first sample images are extracted from a second training sample set.

The second sample image can be obtained in an off-line mode, namely the first training sample set and the second training sample set can be determined before the recognition model is trained, and the corresponding sample image can be directly extracted in the training process, so that the training efficiency is improved.

And S32, inputting the first sample image into the teacher network to obtain the first characteristic.

Please refer to S22 in fig. 2 for details, which are not described herein.

And S33, inputting the second sample image into the recognition model to obtain a second feature.

Please refer to S22 in fig. 2 for details, which are not described herein.

S34, an image loss is determined based on the image output by the first image quality conversion network and the first sample image.

Specifically, the above S34 includes the following steps:

s341, the image output by the first image quality conversion network is extracted.

Wherein the input of the first image quality conversion network is the second sample image.

As described above, the recognition model sequentially includes the first image quality conversion network and the student network, wherein the first image quality conversion network is used to convert the second sample image into a high quality image, that is, the electronic device can extract the converted image from the output of the first image quality conversion network.

S342, an image loss is determined based on a difference between the extracted image and the first sample image.

The electronic device can determine the image loss by calculating the difference between the extracted image and the first sample image. The image loss is used for training the recognition model so as to ensure that the difference of the images is within a controllable range.

And S35, updating the parameters of the recognition model according to the feature loss and the image loss corresponding to the first feature and the second feature so as to determine the target recognition model.

Please refer to S25 in fig. 2 for details, which are not described herein.

In the training method for the recognition model provided in this embodiment, because the first image quality conversion network is used to convert the low-quality image into the high-quality image, the second sample image is converted by the first image quality conversion network to obtain the high-quality image, and the difference between the high-quality image obtained by conversion and the first sample image is calculated to update the parameter of the first image quality conversion network, so that the output image of the first image quality conversion network approaches the corresponding first sample image, thereby improving the accuracy of the recognition model.

As an alternative implementation of this embodiment, fig. 4 shows a system architecture for training a recognition model. As shown in fig. 4, the feature extraction layer of the teacher network is a residual error network, and the classification layer is a full connection network. Specifically, the teacher network and the recognition model each include an input layer 31, a convolutional layer 32(Conv), a Pooling layer 33(Pooling), a residual unit 34(Resblock), a fully-connected layer 35(fc), and loss function layers 36, 37, and 39. Further, a first image quality conversion network, namely, a CycleGan generation network 38 is also included in the recognition model. The structure of each layer is described in the following table:

the loss function adopted by the loss function layer of the recognition model is an ArcFace function, and the ArcFace function can be represented by the following formula:

wherein N is the total number of the second sample images, i is the ith second sample image, i is more than or equal to 1 and less than or equal to N, y _i And the target class of the ith second sample image, s is a scaling coefficient, theta is an angle interval between the weight vector of the identification model and the feature vector of the second feature of the ith second sample image, and t is an angle edge.

As shown in fig. 4, the first image quality conversion network is a CycleGan generation network 38, and the recognition model is obtained by connecting the CycleGan generation network 38 to a student network corresponding to a teacher network, where the CycleGan generation network 38 is located in front of the student network. The recognition model is obtained based on the CycleGan generation network 38 and the student network fine tuning under the guidance of the teacher network. In the training process, the input data of the teacher network is the first sample data, and the input data (the first sample image) of the teacher network is converted into a low-quality image as the input data of the recognition model through a second image quality conversion network (not shown in the figure), wherein the second image quality conversion network generates a network for another CycleGan. The teacher network corresponds to the input data of the recognition model one to one, and the output characteristics of the teacher network and the input data of the recognition model can be approximated by adopting the maximum mean difference loss, and certainly can be approximated by adopting other ways, without any limitation.

The identification model updates model parameters by using 3 loss functions, one is to add L2 loss to the output image of the CycleGan generation network 38 and the corresponding first sample image input by the teacher network to obtain the image loss; the other is to use the ArcFace loss for the prediction class of the recognition model output; yet another is feature loss. During the training process, the parameters of the teacher network are frozen, and only the parameters of the recognition model are updated.

In accordance with an embodiment of the present invention, there is provided an identification method embodiment, it should be noted that the steps illustrated in the flowchart of the figure may be performed in a computer system such as a set of computer executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.

In this embodiment, an identification method is provided, which can be used in electronic devices, such as a computer, a mobile phone, a tablet computer, and the like, fig. 5 is a flowchart of a training method of an identification model according to an embodiment of the present invention, and as shown in fig. 5, the flowchart includes the following steps:

and S41, acquiring the image to be recognized.

The image to be recognized can be obtained by the electronic equipment through acquisition from the monitoring equipment and preprocessing, wherein the preprocessing can be to perform target detection on the acquired image, intercept the detected target from the acquired image and use the target as the image to be recognized; or an image to be recognized acquired by the electronic device directly from the outside, and the like.

And S42, inputting the image to be recognized into the target recognition model to obtain the recognition characteristics of the image to be recognized.

The target recognition model is obtained by training by using the training method of the recognition model of any one of the above embodiments. The electronic device inputs the image to be recognized acquired in S41 to the target recognition model, and obtains the recognition features of the image to be recognized. For details of the specific structure of the target recognition model, please refer to the detailed description of the above embodiments, which is not repeated herein.

And S43, comparing the recognition features with the features to be matched, and determining the recognition result of the image to be recognized.

The electronic device may store features to be matched, where the features to be matched may be obtained by inputting images of respective targets to a teacher network and performing feature extraction using the teacher network.

And the electronic equipment compares the identification features obtained in the step S42 with the features to be matched to determine the identification result of the image to be identified. The identification result may be a category corresponding to the identification feature, or a category and category information corresponding to the identification feature, such as a target name, an age, and an identification number. The specific content of the recognition result is not limited at all, and may be set according to actual requirements.

According to the identification method provided by the embodiment, on the basis of the identification accuracy of the identification model, the identification model is used for identifying the image to be identified, so that the accuracy of object identification in a complex environment can be improved.

In this embodiment, a training apparatus for recognizing a model, or a recognition apparatus is further provided, and the apparatus is used to implement the foregoing embodiments and preferred embodiments, and the description of which has been already made is omitted. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

The present embodiment provides a training apparatus for recognizing a model, as shown in fig. 6, including:

a first obtaining module 51, configured to obtain a first sample image and a corresponding second sample image, where an image quality of the first sample image is higher than an image quality of the second sample image;

a first input module 52, configured to input the first sample image into a teacher network to obtain a first feature;

a second input module 53, configured to input the second sample image into an identification model to obtain a second feature, where the identification model sequentially includes a first image quality conversion network and a student network, and the network structure of the student network is the same as that of the teacher network;

a determining module 54, configured to determine an image loss based on the image output by the first image quality conversion network and the first sample image;

and an updating module 55, configured to update parameters of the recognition model according to the feature loss of the first feature and the second feature and the image loss, so as to determine the target recognition model.

In the training apparatus for recognition models provided in this embodiment, a first image quality conversion network is set in the recognition model, and is used to convert an input low-quality image into a high-quality image, and perform difference calculation on an image output by the first image quality conversion network and a first sample image input to a teacher network to determine image loss.

The present embodiment further provides an identification apparatus, as shown in fig. 7, including:

the second obtaining module 61 is used for obtaining an image to be identified;

a third input module 62, configured to input the image to be recognized into a target recognition model, so as to obtain the recognition features of the image to be recognized, where the target recognition model is obtained by training according to the training method of the recognition model in the foregoing embodiment;

and the identification module 63 is configured to compare the identification features with the features to be matched, and determine an identification result of the image to be identified.

The recognition device provided by the embodiment utilizes the recognition model to recognize the image to be recognized on the basis of the recognition accuracy of the recognition model, so that the accuracy of object recognition in a complex environment can be improved.

The training means for the recognition model, or the recognition means, in this embodiment is presented in the form of functional units, where a unit refers to an ASIC circuit, a processor and a memory executing one or more software or fixed programs, and/or other devices that may provide the above-described functionality.

Further functional descriptions of the modules are the same as those of the corresponding embodiments, and are not repeated herein.

An embodiment of the present invention further provides an electronic device, which has the training apparatus for the recognition model shown in fig. 6 or the recognition apparatus shown in fig. 7.

Referring to fig. 8, fig. 8 is a schematic structural diagram of an electronic device according to an alternative embodiment of the present invention, and as shown in fig. 8, the electronic device may include: at least one processor 71, such as a CPU (Central Processing Unit), at least one communication interface 73, memory 74, at least one communication bus 72. Wherein a communication bus 72 is used to enable the connection communication between these components. The communication interface 73 may include a Display (Display) and a Keyboard (Keyboard), and the optional communication interface 73 may also include a standard wired interface and a standard wireless interface. The Memory 74 may be a high-speed RAM Memory (volatile Random Access Memory) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The memory 74 may alternatively be at least one memory device located remotely from the processor 71. Wherein the processor 71 may be in connection with the apparatus described in fig. 6 or 7, an application program is stored in the memory 74, and the processor 71 calls the program code stored in the memory 74 for performing any of the above-mentioned method steps.

The communication bus 72 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus. The communication bus 72 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 8, but this is not intended to represent only one bus or type of bus.

The memory 74 may include a volatile memory (RAM), such as a random-access memory (RAM); the memory may also include a non-volatile memory (english: non-volatile memory), such as a flash memory (english: flash memory), a hard disk (english: hard disk drive, abbreviated: HDD) or a solid-state drive (english: SSD); the memory 74 may also comprise a combination of memories of the kind described above.

The processor 71 may be a Central Processing Unit (CPU), a Network Processor (NP), or a combination of CPU and NP.

The processor 71 may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof.

Optionally, the memory 74 is also used for storing program instructions. Processor 71 may call program instructions to implement a training method for a recognition model as shown in the embodiments of fig. 1 to 3 of the present application, or a recognition method as shown in the embodiment of fig. 5.

Embodiments of the present invention further provide a non-transitory computer storage medium, where the computer storage medium stores computer-executable instructions, and the computer-executable instructions may execute the training method or the recognition method of the recognition model in any of the above method embodiments. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD), a Solid State Drive (SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.

Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.

Claims

1. A training method for recognition models, comprising:

2. The training method according to claim 1, wherein the updating parameters of the recognition model according to the feature loss of the first feature and the second feature and the image loss to determine a target recognition model comprises:

3. The method of claim 2, wherein the updating parameters of the recognition model based on the image loss, the class loss, and the feature loss to determine the target recognition model comprises:

fixing parameters of the teacher network;

and updating the parameters of the identification model based on the joint loss, and determining the target identification model.

4. The training method of claim 1, wherein the obtaining a first sample image and a corresponding second sample image comprises:

acquiring a first sample image;

and inputting the first sample image into a second image quality conversion network to obtain a second sample image.

5. The training method of claim 1, wherein the obtaining a first sample image and a corresponding second sample image comprises:

6. An identification method, characterized in that the method comprises:

acquiring an image to be identified;

inputting the image to be recognized into a target recognition model to obtain the recognition features of the image to be recognized, wherein the target recognition model is obtained by training according to the training method of the recognition model of any one of claims 1-5;

7. A training apparatus for recognizing a model, comprising:

8. An identification device, the method comprising:

the second acquisition module is used for acquiring an image to be identified;

a third input module, configured to input the image to be recognized into a target recognition model, so as to obtain recognition features of the image to be recognized, where the target recognition model is obtained by training according to the training method of the recognition model according to any one of claims 1 to 5;

9. An electronic device, comprising:

a memory and a processor, the memory and the processor being communicatively connected to each other, the memory having stored therein computer instructions, the processor executing the computer instructions to perform the training method of the recognition model according to any one of claims 1 to 5, or to perform the recognition method according to claim 6.

10. A computer-readable storage medium storing computer instructions for causing a computer to execute the training method of the recognition model according to any one of claims 1 to 5 or execute the recognition method according to claim 6.