CN115063876A - Image recognition rate improving method and device, electronic equipment and storage medium - Google Patents

Image recognition rate improving method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115063876A
CN115063876A CN202210986278.5A CN202210986278A CN115063876A CN 115063876 A CN115063876 A CN 115063876A CN 202210986278 A CN202210986278 A CN 202210986278A CN 115063876 A CN115063876 A CN 115063876A
Authority
CN
China
Prior art keywords
image
training
module
loss function
trained
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210986278.5A
Other languages
Chinese (zh)
Other versions
CN115063876B (en
Inventor
付鑫
韩乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ji Hua Laboratory
Original Assignee
Ji Hua Laboratory
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ji Hua Laboratory filed Critical Ji Hua Laboratory
Priority to CN202210986278.5A priority Critical patent/CN115063876B/en
Publication of CN115063876A publication Critical patent/CN115063876A/en
Application granted granted Critical
Publication of CN115063876B publication Critical patent/CN115063876B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4053Super resolution, i.e. output image resolution higher than sensor resolution
    • G06T3/4076Super resolution, i.e. output image resolution higher than sensor resolution by iteratively correcting the provisional high resolution image using the original low-resolution image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the field of image recognition, in particular to a method and a device for improving image recognition rate, electronic equipment and a storage medium. The image recognition rate improving method comprises the following steps: acquiring an image to be processed; zooming an image to be processed to obtain a first image with a preset size; inputting the first image into a pre-trained conversion model to obtain a high-recognition-rate image with a recognition rate higher than that of the first image; the conversion model comprises an SR module, an identification module and a full link network; the first image is processed by an SR module, an identification module and a full link network in sequence; the SR module has no BN layer, the images to be processed are not required to be distinguished by manually setting a threshold value, all the images to be processed are processed by the SR module, the wrong judgment caused by human factors is eliminated, and the high-recognition-rate images are ensured to be obtained.

Description

Image recognition rate improving method and device, electronic equipment and storage medium
Technical Field
The invention relates to the field of image recognition, in particular to a method and a device for improving image recognition rate, electronic equipment and a storage medium.
Background
Most image recognition algorithms, such as face recognition algorithms, mainly perform recognition processing on high-quality images, and in part of the cases, because shooting conditions are not ideal, only Low-quality and Low-Resolution images (Low-Resolution, that is, lr images) can be acquired, for such cases, in the prior art, generally, an SR module is configured, and the lr images are recognized and processed after being processed by the SR module, while the existing SR module mainly has two functions:
1. processing the collected lr image by a Super-Resolution method to obtain a Super-Resolution image (Super-Resolution, namely sr image);
2. changing data distribution through a BN layer, and processing the lr images into sr images meeting the size requirement of a face recognition algorithm;
however, in practical applications, on one hand, the acquired image needs to be processed into an SR image through the SR module only when it is detected that the acquired image belongs to an lr image, and when the acquired image does not belong to the lr image, the SR module is skipped to directly perform recognition processing to extract face recognition information, and as for whether the acquired image belongs to the lr image, a limit is generally set manually, for example, a threshold is set, the Resolution is lower than the threshold for the lr image, and is higher than the threshold for a non-lr image, such as a High-Resolution image (hr image), the manually set limit is limited to personal determination, so that an error is large, which often causes a part of the lr image to skip the SR module and directly enter the recognition module, thereby causing face recognition failure.
On the other hand, since the non-lr images are directly identified by skipping the SR module process, although the non-lr images of non-uniform size can also be directly used for identification, the identification network of the identification algorithm is always changed every time an image of different size is identified, so that the identification efficiency is reduced.
Accordingly, the prior art is in need of improvement and development.
Disclosure of Invention
The invention aims to provide an image recognition rate improving method, an image recognition rate improving device, electronic equipment and a storage medium, which can effectively eliminate wrong judgment caused by human factors and are beneficial to ensuring that an image with a high recognition rate is obtained.
In a first aspect, the present application provides an image recognition rate improving method, applied to an image processing system in image recognition, where the image recognition rate improving method includes the following steps:
s1, acquiring an image to be processed;
s2, zooming the image to be processed to obtain a first image with a preset size;
s3, inputting the first image into a pre-trained conversion model to obtain a high-recognition-rate image with a recognition rate higher than that of the first image; the conversion model comprises an SR module, an identification module and a full link network; the first image is processed by the SR module, the identification module and the full link network in sequence; the SR module has no BN layer;
the conversion model is obtained by training through the following training steps:
A1. acquiring a plurality of groups of training data, wherein each group of training data comprises an lr image and a corresponding hr image; the lr images are obtained by up-sampling and down-sampling the hr images, and the size of the lr images is consistent with that of the hr images;
A2. according to the training data, singly pre-training the SR module to be trained to obtain the SR module completing the pre-training;
A3. according to the training data, the identification module to be trained is singly pre-trained to obtain the identification module which is pre-trained;
A4. pre-training the full link network to be trained independently to obtain the pre-trained full link network;
A5. and according to the training data, carrying out integral training on the conversion model after pre-training to obtain the conversion model after training.
All the acquired images are not classified or screened, so that human influence factors are avoided, and all the images need to be processed by an SR module, thereby effectively ensuring that no lr image directly enters an identification module; the SR module, the recognition module and the full-link network are pre-trained respectively, so that the overall training speed of the conversion model is increased subsequently.
Further, the specific steps in step a2 include:
circularly executing the following steps:
A21. inputting the lr image into the SR module to be trained to obtain an SR image; the sr image is consistent with the lr image in size;
A22. calculating a first loss function value from the sr image and the hr image, the first loss function value calculated by a first loss function formula:
Figure 180248DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 707176DEST_PATH_IMAGE002
for the first loss function value, M is a pixel length of the sr image, N is a pixel width of the sr image,
Figure 34383DEST_PATH_IMAGE003
the pixel values of the pixel points positioned at the ith horizontal pixel point and the jth vertical pixel point in the sr image,
Figure 570538DEST_PATH_IMAGE004
the pixel values of the pixel points which are positioned at the ith horizontal and the jth vertical in the hr image;
A23. if the first loss function value converges to the first preset range, ending the circulation, otherwise, adjusting the parameter of the SR module.
The first loss function is set to ensure that the first loss function value is gradually reduced towards the convergence direction in the training process, so that the training effect of the SR module is convenient to determine.
Further, the specific steps in step a3 include:
circularly executing the following steps:
A31. inputting the sr image into the identification module to be trained to obtain a first high-dimensional vector;
A32. inputting the corresponding hr image into the identification module to be trained to obtain a second high-dimensional vector;
A33. calculating a second loss function value from the first high-dimensional vector and the second high-dimensional vector, the second loss function value calculated by a second loss function equation:
Figure 447227DEST_PATH_IMAGE005
wherein the content of the first and second substances,
Figure 535269DEST_PATH_IMAGE006
for the second loss function value, s1 is the first high-dimensional vector, h1 is the second high-dimensional vector,
Figure 878001DEST_PATH_IMAGE007
cos distance for the first high-dimensional vector and the second high-dimensional vector;
A34. if the second loss function value converges to the second preset range, ending the circulation, otherwise, adjusting the parameters of the identification module.
And if the second loss function value is gradually reduced in the training process, the characteristic vectors obtained after the sr image and the hr image pass through the same recognition module are closer and closer, so that the training effect of the recognition module is convenient to determine.
Further, the specific steps in step a4 include:
circularly executing the following steps:
A41. inputting the first high-dimensional vector into a full link network to be trained, and calculating a third loss function value according to the first high-dimensional vector, wherein the third loss function value is obtained by calculating a third loss function, and the third loss function is a softmax function or an arcfacce function;
A42. if the third loss function value converges to the third preset range, ending the cycle, otherwise, adjusting the parameters of the full link network.
Further, the specific steps in step a5 include:
after the SR module completing the pre-training, the recognition module completing the pre-training and the full link network completing the pre-training are obtained, freezing the recognition module completing the pre-training and circularly executing the following steps:
A51. inputting the training data into a conversion model to be trained and calculating a first overall loss function value, wherein the first overall loss function value is obtained by calculating a first overall loss function, and the first overall loss function is obtained by respectively weighting a first loss function corresponding to the SR module which completes the pre-training, a second loss function corresponding to the recognition module which completes the pre-training and a third loss function corresponding to the full link network which completes the pre-training;
A52. if the first overall loss function value converges to a fourth preset range, ending the circulation, otherwise, adjusting the parameters of the conversion model.
Further, the specific steps in step a5 include:
after the SR module completing the pre-training, the recognition module completing the pre-training and the full link network completing the pre-training are obtained, different training strategy parameters are formulated for the SR module completing the pre-training, the recognition module completing the pre-training and the full link network completing the pre-training, and the following steps are executed in a circulating mode:
A53. inputting the training data into a conversion model to be trained and calculating a second overall loss function value, wherein the second overall loss function value is obtained by calculating a second overall loss function, and the second overall loss function is obtained by respectively weighting a first loss function corresponding to the SR module which completes the pre-training, a second loss function corresponding to the recognition module which completes the pre-training and a third loss function corresponding to the full link network which completes the pre-training;
A54. and if the second overall loss function value converges to a fourth preset range, ending the circulation, otherwise, adjusting the parameters of the conversion model.
In a second aspect, the present invention further provides an image recognition rate improving apparatus, which is applied to an image processing system in image recognition, and the image recognition rate improving apparatus includes:
the acquisition module is used for acquiring an image to be processed;
the zooming module is used for zooming the image to be processed to obtain a first image with a preset size;
the processing module is used for inputting the first image into a conversion model trained in advance to obtain an image with a high recognition rate higher than that of the first image; the conversion model comprises an SR module, an identification module and a full link network; the first image is processed by the SR module, the identification module and the full link network in sequence; the SR module has no BN layer;
the conversion model in the processing module is obtained by training through the following training steps:
A1. acquiring a plurality of groups of training data, wherein each group of training data comprises an lr image and a corresponding hr image; the lr images are obtained by up-sampling and down-sampling corresponding hr images, and the size of the lr images is consistent with that of the hr images;
A2. according to the training data, singly pre-training the SR module to be trained to obtain the SR module completing the pre-training;
A3. according to the training data, the identification module to be trained is singly pre-trained to obtain the identification module which is pre-trained;
A4. pre-training the full link network to be trained independently to obtain the pre-trained full link network;
A5. and according to the training data, carrying out integral training on the conversion model after pre-training to obtain the conversion model after training.
All images to be processed are subjected to scaling processing in a unified mode and input into the conversion model to be sequentially subjected to the processing of the SR module, the recognition module and the full link network, and classification is not performed any more, so that the influence of human factors is avoided.
In a third aspect, the present invention provides an electronic device, which includes a processor and a memory, where the memory stores computer readable instructions, and when the computer readable instructions are executed by the processor, the steps in the image recognition rate improving method are executed.
In a fourth aspect, the present invention provides a storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, performs the steps of the image recognition rate improving method.
As can be seen from the above, in the present application, it is not necessary to manually set and distinguish lr images from non-lr images, and all images need to be processed by the SR module to obtain SR images, so that it is ensured that all lr images do not mistakenly skip the processing of the SR module; on the other hand, the collected images are zoomed to be uniform in size before being input into the SR module, so that the recognition module can be maintained unchanged, the reduction of the recognition efficiency is avoided, and meanwhile, a BN layer in the SR module is removed, the step that the SR module executes uniform size is avoided, and the convergence speed during training is improved.
Drawings
Fig. 1 is a flowchart of an image recognition rate improving method according to an embodiment of the present disclosure.
Fig. 2 is a schematic structural diagram of an image recognition rate improving apparatus according to an embodiment of the present disclosure.
Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.
Most of image recognition algorithms in the prior art mainly aim at high-quality images, however, due to unsatisfactory shooting conditions, for example, in a shooting scene, the general quality of an acquired image is low due to the influences of factors such as illumination, shielding, distance, angle, motion and the like, and the lr image has problems such as much noise, blurring, distortion, low resolution and the like, and the recognition effect of image recognition is greatly influenced, at present, for the lr image, an SR module (super resolution method) is generally adopted to process the image before image recognition to obtain an SR image, and then the SR image is recognized, however, most of the SR images obtained by the super resolution method only improve the resolution, so that a user visually obtains a high-definition impression, but actually the obtained SR image does not necessarily contain information required by image recognition, and the phenomenon that although the SR image is displayed clearly occurs, but still not effectively recognized.
In the normal face recognition process, two steps are generally included: face detection and face recognition, wherein the face detection process generally finds the position of a face from the whole image and provides a face image, the size of the face image is generally fixed to 224x224 pixels, and the obtained face image can be directly subjected to face recognition (it needs to be noted that the face images with different sizes can also be subjected to face recognition, but when the face images with different sizes are subjected to face recognition, the recognition network in a face recognition algorithm is frequently changed to influence the recognition speed, so that the sizes of the face images are unified, and the relatively stable recognition speed is favorably ensured); therefore, when the SR module is arranged between the processes of face detection and face recognition to process an image, the SR module is also generally required to output a face image with a uniform size after scaling (it should be noted that most SR modules in the prior art have a scaling step).
In some embodiments, an image recognition rate improving method applied to an image processing system in image recognition includes:
s1, acquiring an image to be processed;
s2, zooming the image to be processed to obtain a first image with a preset size;
s3, inputting the first image into a pre-trained conversion model to obtain a high-recognition-rate image with a recognition rate higher than that of the first image; the conversion model comprises an SR module, an identification module and a full link network; the first image is processed by an SR module, an identification module and a full link network in sequence; there is no BN layer in the SR module.
In this embodiment, the scaling process is performed after the image to be processed is obtained, the size of the image to be processed is adjusted to a preset size, and the BN layer in the SR module is deleted, so that the first image is not scaled after passing through the SR module.
The obtained images to be processed are not distinguished whether to be lr images or hr images, all the images to be processed need to be subjected to scaling processing and input into the conversion model to be sequentially processed by the SR module, the identification module and the full link network, so that the step of distinguishing the lr images or the hr images is omitted, the processing efficiency is improved, meanwhile, the method is more meaningful, the wrong judgment caused by human factors can be effectively eliminated, and the high-recognition-rate images are ensured to be obtained.
It should be noted that, the BN layer is a specific setting of a corresponding SR network in the SR module, and in the prior art, the BN layer is used to change data distribution (that is, change the size of an input image), but in the present application, since a first image is not scaled after passing through the SR module, the sizes of the input first image and an SR image output by the SR module are the same, and thus the data distribution is the same, and forced use of the BN layer not only results in slow convergence of a first loss function value (described in detail below) in the SR module, but also reduces the recognition effect of a subsequent recognition module.
It should be further noted that the recognition module is configured to perform the face recognition process.
In some embodiments, the transformation model is trained by the following training steps:
A1. acquiring a plurality of groups of training data, wherein each group of training data comprises an lr image and a corresponding hr image; the lr images are obtained by up-sampling and down-sampling corresponding hr images, and the size of the lr images is consistent with that of the hr images;
A2. according to the training data, singly pre-training the SR module to be trained to obtain the SR module completing the pre-training;
A3. according to the training data, the identification module to be trained is singly pre-trained to obtain the identification module which is pre-trained;
A4. pre-training the full link network to be trained independently to obtain the pre-trained full link network;
A5. and according to the training data, carrying out integral training on the conversion model after pre-training to obtain the conversion model after training.
In this embodiment, the hr image is upsampled by bicubic interpolation, gaussian noise is added, then the image is downsampled, and finally a corresponding lr image is obtained, where each set of training data is composed of the lr image and the hr image corresponding to each other.
Then, the SR module, the recognition module and the full-link network are pre-trained respectively, and the pre-trained SR module, the recognition module and the full-link network can obtain more accurate initial parameters, so that the overall training speed of the conversion model can be accelerated subsequently.
In certain embodiments, the specific steps in step a2 include:
circularly executing the following steps:
A21. inputting the lr image into an SR module to be trained to obtain an SR image; the size of the sr image is consistent with that of the lr image;
A22. calculating a first loss function value from the sr image and the hr image, the first loss function value being calculated by a first loss function formula:
Figure 342480DEST_PATH_IMAGE008
wherein the content of the first and second substances,
Figure 89987DEST_PATH_IMAGE002
is the first loss function value, M is the pixel length of the sr image, N is the pixel width of the sr image,
Figure 880089DEST_PATH_IMAGE003
the pixel values of the pixel points positioned at the ith horizontal position and the jth vertical position in the sr image,
Figure 713047DEST_PATH_IMAGE004
the pixel value of a pixel point which is positioned at the ith horizontal pixel point and the jth vertical pixel point in the hr image;
A23. if the first loss function value converges to the first preset range, ending the cycle, otherwise, adjusting the parameter of the SR module.
In this embodiment, the SR module may be based on any SR model, for example, a VDSR, SRGAN, and other classical networks; in the training process, the SR image and the corresponding hr image are subjected to feature matching, that is, in the training process, the SR image obtained by the lr image through the SR module is considered to be consistent with the hr image as much as possible, so the first loss function is set to ensure that the first loss function value is gradually reduced toward the convergence direction in the training process (the smaller the convergence of the first loss function value is, the higher the feature matching degree representing the SR image and the hr image is, the better the training effect is).
It should be noted that the parameter of the SR module refers to a parameter in the above-mentioned classical network, such as a hyper-parameter, a coefficient, etc., and adjusting the parameter of the SR module refers to changing its value, such as gradually decreasing the value or gradually increasing the value, and if the first loss function value calculated after adjustment is smaller than the first loss function value calculated last time, it is proved that the first loss function value is converging, and the training direction is correct (iterative training is a prior art means, and is not described here again).
In certain embodiments, the specific steps in step a3 include:
circularly executing the following steps:
A31. inputting the sr image into a recognition module to be trained to obtain a first high-dimensional vector;
A32. inputting the corresponding hr image into a recognition module to be trained to obtain a second high-dimensional vector;
A33. calculating a second loss function value from the first high dimensional vector and the second high dimensional vector, the second loss function value calculated by a second loss function equation:
Figure 715638DEST_PATH_IMAGE005
wherein the content of the first and second substances,
Figure 566919DEST_PATH_IMAGE006
for the second loss function value, s1 is the first high-dimensional vector, h1 is the second high-dimensional vector,
Figure 13075DEST_PATH_IMAGE007
cos distance of the first high-dimensional vector and the second high-dimensional vector;
A34. if the second loss function value converges to the second predetermined range, the loop is terminated, otherwise, the parameters of the identification module are adjusted (similar to the parameters of the SR adjustment module, which are not described herein again).
In this embodiment, the identification module may be based on any identification model, such as VGG, resnet, and the like; in the training process, the sr image and the hr image pass through the same recognition module, and corresponding high-dimensional vectors (i.e., a first high-dimensional vector and a second high-dimensional vector) can be obtained, for example, 256-dimensional vectors, 512-dimensional vectors, and the like, the cos distance of the first high-dimensional vector and the second high-dimensional vector is set as a second loss function value, and the second loss function value gradually decreases in the training process, which means that the feature vectors obtained after the sr image and the hr image pass through the same recognition module are closer and closer.
In certain embodiments, the specific steps in step a4 include:
circularly executing the following steps:
A41. inputting the first high-dimensional vector into a full-link network to be trained, and calculating a third loss function value according to the first high-dimensional vector, wherein the third loss function value is obtained by calculating a third loss function, and the third loss function is a softmax function or an arcfacce function;
A42. if the third loss function value converges to the third predetermined range, the loop is terminated, otherwise, the parameters of the full link network are adjusted (similar to the parameters of the SR adjustment module, which are not described herein again).
In practical application, the overall loss function (including the first overall loss function and the second overall loss function) of the conversion model is the sum of the first loss function, the second loss function and the third loss function, and the first loss function, the second loss function and the third loss function only reflect the precision of the SR module, the recognition module and the full-link network after being pre-trained, but the first loss function, the second loss function and the third loss function are independent and not organically combined with each other, and the overall precision of the conversion model cannot be ensured, so the application sets weights for the first loss function, the second loss function and the third loss function respectively by setting the overall loss function to combine the three functions to form the conversion model (actually, the first loss function is only used for ensuring that the SR image and the corresponding hr image are as close as possible and not the key for improving the image recognition rate, the weighting coefficient of the first loss function is generally 0.1, and the second loss function and the third loss function are used for ensuring that the recognition module ensures that the SR image is as close as possible to the key for improving the image recognition rate The information reconstructed by the SR module is an important embodiment specially used for identification, and generally, the weighting coefficient of the second loss function is 0.4, and the weighting coefficient of the third loss function is 0.5), so that the influence degree of each part on the reconstruction of the high-identification-rate image is controlled, and the image reconstruction precision is further improved.
It should be noted that the pre-training of the SR module, the recognition module, and the full link network are not interfered with each other, and only to determine the respective initial parameters, the pre-training can be understood as coarse tuning, and in the process of training the conversion model, the adjustment of the parameters of the conversion model is also performed by adjusting the parameters of the SR module, the recognition module, and the full link network at the end of the root, but the adjustment process is affected by the weight and is not performed independently and in an unrelated manner.
In one embodiment, the specific steps in step a5 include:
after the SR module completing the pre-training, the recognition module completing the pre-training and the full link network completing the pre-training are obtained, freezing the recognition module completing the pre-training and circularly executing the following steps:
A51. inputting training data into a conversion model to be trained and calculating a first overall loss function value, wherein the first overall loss function value is obtained by calculating a first overall loss function, and the first overall loss function is obtained by respectively weighting a first loss function corresponding to an SR module completing pre-training, a second loss function corresponding to an identification module completing pre-training and a third loss function corresponding to a full link network completing pre-training;
A52. if the first overall loss function value converges to the fourth preset range, ending the circulation, otherwise, adjusting the parameters of the conversion model.
Freezing the recognition module which finishes the pre-training means that the parameters of the recognition module are kept unchanged in the subsequent training process.
In the process of carrying out overall training on the conversion model, the method is equivalent to simultaneously training an SR module completing pre-training, an identification module completing pre-training and a full-link network completing pre-training, and although the pre-training is completed by each part, the training efficiency is greatly improved, the first overall loss function value is still difficult to converge; in this regard, in the present embodiment, only the SR module and the full link network are trained by freezing the recognition module that has completed the pre-training, so that the conversion model converges more smoothly.
In another embodiment, the specific steps in step a5 include:
after obtaining the SR module completing the pre-training, the recognition module completing the pre-training and the full link network completing the pre-training, different training strategy parameters are formulated for the SR module completing the pre-training, the recognition module completing the pre-training and the full link network completing the pre-training, and the following steps are executed in a circulating manner:
A53. inputting training data into a conversion model to be trained and calculating a second overall loss function value, wherein the second overall loss function value is obtained by calculating a second overall loss function, and the second overall loss function is obtained by respectively weighting a first loss function corresponding to an SR module which completes pre-training, a second loss function corresponding to an identification module which completes pre-training and a third loss function corresponding to a full link network which completes pre-training;
A54. if the second overall loss function value converges to the fourth preset range, ending the cycle, otherwise, adjusting the parameters of the conversion model.
In this embodiment, different training strategy parameters are formulated for the SR module that completes the pre-training, the recognition module that completes the pre-training, and the full link network that completes the pre-training, which also helps the conversion model to converge smoothly, for example, when a gradient descent method is adopted for training, different learning rates are set for the SR module, the recognition module, and the full link network.
Referring to fig. 2, fig. 2 is a diagram of an image recognition rate improving apparatus applied to an image processing system in image recognition according to some embodiments of the present application, the image recognition rate improving apparatus is integrated in a back-end control device of the image recognition rate improving apparatus in the form of a computer program, and the image recognition rate improving apparatus includes:
an obtaining module 100, configured to obtain an image to be processed;
the zooming module 200 is configured to zoom an image to be processed to obtain a first image with a preset size;
the processing module 300 is configured to input the first image into a pre-trained conversion model, and obtain an image with a high recognition rate, where the recognition rate is higher than that of the first image; the conversion model comprises an SR module, an identification module and a full link network; the first image is processed by an SR module, an identification module and a full link network in sequence; there is no BN layer in the SR module.
In some embodiments, the transformation model in the processing module 300 is trained by the following training steps:
A1. acquiring a plurality of groups of training data, wherein each group of training data comprises an lr image and a corresponding hr image; the lr images are obtained by up-sampling and down-sampling corresponding hr images, and the size of the lr images is consistent with that of the hr images;
A2. according to the training data, singly pre-training the SR module to be trained to obtain the SR module completing the pre-training;
A3. according to the training data, the identification module to be trained is singly pre-trained to obtain the identification module which is pre-trained;
A4. pre-training the full link network to be trained independently to obtain the pre-trained full link network;
A5. and according to the training data, carrying out integral training on the conversion model after pre-training to obtain the conversion model after training.
In some embodiments, the following steps are performed in a loop when performing pre-training on the SR module to be trained separately according to the training data to obtain the SR module that has completed pre-training to train the transformation model in the processing module 300:
A21. inputting the lr image into an SR module to be trained to obtain an SR image; the size of the sr image is consistent with that of the lr image;
A22. calculating a first loss function value from the sr image and the hr image, the first loss function value being calculated by a first loss function formula:
Figure 785859DEST_PATH_IMAGE009
wherein the content of the first and second substances,
Figure 388879DEST_PATH_IMAGE002
is the first loss function value, M is the pixel length of the sr image, N is the pixel width of the sr image,
Figure 45732DEST_PATH_IMAGE003
the pixel values of the pixel points positioned at the ith horizontal position and the jth vertical position in the sr image,
Figure 708794DEST_PATH_IMAGE004
the pixel value of a pixel point which is positioned at the ith horizontal pixel point and the jth vertical pixel point in the hr image;
A23. if the first loss function value converges to the first preset range, ending the cycle, otherwise, adjusting the parameter of the SR module.
In some embodiments, the following steps are performed in a loop while performing individual pre-training of the recognition module to be trained according to the training data to obtain a recognition module that has been pre-trained to train the transformation model in the processing module 300:
A31. inputting the sr image into a recognition module to be trained to obtain a first high-dimensional vector;
A32. inputting the corresponding hr image into a recognition module to be trained to obtain a second high-dimensional vector;
A33. calculating a second loss function value from the first high-dimensional vector and the second high-dimensional vector, the second loss function value calculated by a second loss function formula:
Figure 703295DEST_PATH_IMAGE005
wherein the content of the first and second substances,
Figure 64000DEST_PATH_IMAGE006
for the second loss function value, s1 is the first high-dimensional vector, h1 is the second high-dimensional vector,
Figure 889874DEST_PATH_IMAGE007
cos distance of the first high-dimensional vector and the second high-dimensional vector;
A34. if the second loss function value converges to the second preset range, ending the circulation, otherwise, adjusting the parameters of the identification module.
In some embodiments, the following steps are performed in a loop while performing individual pre-training of the full link network to be trained to obtain a fully linked network that has been pre-trained to train the transformation model in the processing module 300:
A41. inputting the first high-dimensional vector into a full-link network to be trained, and calculating a third loss function value according to the first high-dimensional vector, wherein the third loss function value is obtained by calculating a third loss function, and the third loss function is a softmax function or an arcfacce function;
A42. if the third loss function value converges to a third preset range, ending the loop, otherwise, adjusting the parameters of the full link network.
In some embodiments, when performing the overall training of the conversion model after the pre-training according to the training data to obtain the conversion model after the training to train the conversion model in the processing module 300 and after obtaining the SR module after the pre-training, the recognition module after the pre-training, and the fully-linked network after the pre-training, the recognition module after the pre-training is frozen and the following steps are performed in a loop:
A51. inputting training data into a conversion model to be trained and calculating a first overall loss function value, wherein the first overall loss function value is obtained by calculating a first overall loss function, and the first overall loss function is obtained by respectively weighting a first loss function corresponding to an SR module which completes pre-training, a second loss function corresponding to an identification module which completes pre-training and a third loss function corresponding to a full link network which completes pre-training;
A52. if the first overall loss function value converges to the fourth preset range, ending the circulation, otherwise, adjusting the parameters of the conversion model.
In some embodiments, when performing the overall training of the conversion model after the pre-training according to the training data to obtain the conversion model after the training to train the conversion model in the processing module 300 and after obtaining the SR module after the pre-training, the recognition module after the pre-training, and the full link network after the pre-training, different training strategy parameters are formulated for the SR module after the pre-training, the recognition module after the pre-training, and the full link network after the pre-training, and the following steps are performed in a circulating manner:
A53. inputting training data into a conversion model to be trained and calculating a second overall loss function value, wherein the second overall loss function value is obtained by calculating a second overall loss function, and the second overall loss function is obtained by respectively weighting a first loss function corresponding to an SR module which completes pre-training, a second loss function corresponding to an identification module which completes pre-training and a third loss function corresponding to a full link network which completes pre-training;
A54. if the second overall loss function value converges to the fourth preset range, ending the cycle, otherwise, adjusting the parameters of the conversion model.
Referring to fig. 3, fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure, where the present disclosure provides an electronic device, including: the processor 1301 and the memory 1302, the processor 1301 and the memory 1302 are interconnected and communicate with each other through a communication bus 1303 and/or other connection mechanisms (not shown), and the memory 1302 stores a computer program executable by the processor 1301, and when the computing device runs, the processor 1301 executes the computer program to execute the image recognition rate improving method in any optional implementation manner of the embodiment of the first aspect, so as to implement the following functions: acquiring an image to be processed; zooming an image to be processed to obtain a first image with a preset size; inputting the first image into a pre-trained conversion model to obtain a high-recognition-rate image with a recognition rate higher than that of the first image; the conversion model comprises an SR module, an identification module and a full link network; the first image is processed by an SR module, an identification module and a full link network in sequence; there is no BN layer in the SR module.
An embodiment of the present application provides a storage medium, where a computer program is stored on the storage medium, and when the computer program is executed by a processor, the method for improving an image recognition rate in any optional implementation manner of the embodiment of the first aspect is executed, so as to implement the following functions: acquiring an image to be processed; zooming an image to be processed to obtain a first image with a preset size; inputting the first image into a pre-trained conversion model to obtain a high-recognition-rate image with a recognition rate higher than that of the first image; the conversion model comprises an SR module, an identification module and a full link network; the first image is processed by an SR module, an identification module and a full link network in sequence; there is no BN layer in the SR module.
The storage medium may be implemented by any type of volatile or nonvolatile storage device or combination thereof, such as a Static Random Access Memory (SRAM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), an Erasable Programmable Read-Only Memory (EPROM), a Programmable Read-Only Memory (PROM), a Read-Only Memory (ROM), a magnetic Memory, a flash Memory, a magnetic disk, or an optical disk.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
In addition, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
Furthermore, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (9)

1. An image recognition rate improving method is applied to an image processing system in image recognition, and is characterized by comprising the following steps:
s1, acquiring an image to be processed;
s2, zooming the image to be processed to obtain a first image with a preset size;
s3, inputting the first image into a pre-trained conversion model to obtain a high-recognition-rate image with a recognition rate higher than that of the first image; the conversion model comprises an SR module, an identification module and a full link network; the first image is processed by the SR module, the identification module and the full link network in sequence; the SR module has no BN layer;
the conversion model is obtained by training through the following training steps:
A1. acquiring a plurality of groups of training data, wherein each group of training data comprises an lr image and a corresponding hr image; the lr images are obtained by up-sampling and down-sampling the hr images, and the size of the lr images is consistent with that of the hr images;
A2. according to the training data, singly pre-training the SR module to be trained to obtain the SR module completing the pre-training;
A3. according to the training data, the identification module to be trained is singly pre-trained to obtain the identification module which is pre-trained;
A4. pre-training the full link network to be trained independently to obtain the pre-trained full link network;
A5. and according to the training data, carrying out integral training on the conversion model after pre-training to obtain the conversion model after training.
2. The method for improving image recognition rate according to claim 1, wherein the specific steps in step a2 include:
circularly executing the following steps:
A21. inputting the lr image into the SR module to be trained to obtain an SR image; the sr image is consistent with the lr image in size;
A22. calculating a first loss function value from the sr image and the hr image, the first loss function value calculated by a first loss function formula:
Figure 431454DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 676490DEST_PATH_IMAGE002
for the first loss function value, M is a pixel length of the sr image, N is a pixel width of the sr image,
Figure 534856DEST_PATH_IMAGE003
the pixel values of the pixel points positioned at the ith horizontal pixel point and the jth vertical pixel point in the sr image,
Figure 992382DEST_PATH_IMAGE004
the pixel values of the pixel points which are positioned at the ith horizontal and the jth vertical in the hr image;
A23. if the first loss function value converges to a first preset range, ending the circulation, otherwise, adjusting the parameter of the SR module.
3. The method for improving image recognition rate according to claim 2, wherein the specific steps in step a3 include:
circularly executing the following steps:
A31. inputting the sr image into the identification module to be trained to obtain a first high-dimensional vector;
A32. inputting the corresponding hr image into the identification module to be trained to obtain a second high-dimensional vector;
A33. calculating a second loss function value from the first high-dimensional vector and the second high-dimensional vector, the second loss function value calculated by a second loss function formula:
Figure 121269DEST_PATH_IMAGE005
wherein the content of the first and second substances,
Figure 22360DEST_PATH_IMAGE006
for the second loss function value, s1 is the first high-dimensional vector, h1 is the second high-dimensional vector,
Figure 617289DEST_PATH_IMAGE007
cos distance for the first high-dimensional vector and the second high-dimensional vector;
A34. and if the second loss function value converges to a second preset range, ending the circulation, otherwise, adjusting the parameters of the identification module.
4. The method for improving image recognition rate according to claim 3, wherein the specific steps in the step A4 include:
circularly executing the following steps:
A41. inputting the first high-dimensional vector into a full link network to be trained, and calculating a third loss function value according to the first high-dimensional vector, wherein the third loss function value is obtained by calculating a third loss function, and the third loss function is a softmax function or an arcfacce function;
A42. if the third loss function value converges to a third preset range, ending the cycle, otherwise, adjusting the parameters of the full link network.
5. The method for improving image recognition rate according to claim 1, wherein the specific steps in step a5 include:
after the SR module completing the pre-training, the recognition module completing the pre-training and the full link network completing the pre-training are obtained, freezing the recognition module completing the pre-training and circularly executing the following steps:
A51. inputting the training data into a conversion model to be trained and calculating a first overall loss function value, wherein the first overall loss function value is obtained by calculating a first overall loss function, and the first overall loss function is a weighted sum function of a first loss function corresponding to the SR module which completes the pre-training, a second loss function corresponding to the recognition module which completes the pre-training and a third loss function corresponding to the full link network which completes the pre-training;
A52. if the first overall loss function value converges to a fourth preset range, ending the circulation, otherwise, adjusting the parameters of the conversion model.
6. The method for improving image recognition rate according to claim 1, wherein the specific steps in step a5 include:
after the SR module completing the pre-training, the recognition module completing the pre-training and the full link network completing the pre-training are obtained, different training strategy parameters are formulated for the SR module completing the pre-training, the recognition module completing the pre-training and the full link network completing the pre-training, and the following steps are executed in a circulating mode:
A53. inputting the training data into a conversion model to be trained and calculating a second overall loss function value, wherein the second overall loss function value is obtained by calculating a second overall loss function, and the second overall loss function is a weighted sum function of a first loss function corresponding to the SR module which completes the pre-training, a second loss function corresponding to the recognition module which completes the pre-training and a third loss function corresponding to the full link network which completes the pre-training;
A54. and if the second overall loss function value converges to a fourth preset range, ending the circulation, otherwise, adjusting the parameters of the conversion model.
7. An image recognition rate improving apparatus applied to an image processing system in image recognition, the image recognition rate improving apparatus comprising:
the acquisition module is used for acquiring an image to be processed;
the zooming module is used for zooming the image to be processed to obtain a first image with a preset size;
the processing module is used for inputting the first image into a conversion model trained in advance to obtain an image with a high recognition rate higher than that of the first image; the conversion model comprises an SR module, an identification module and a full link network; the first image is processed by the SR module, the identification module and the full link network in sequence; the SR module has no BN layer;
the conversion model in the processing module is obtained by training through the following training steps:
A1. acquiring a plurality of groups of training data, wherein each group of training data comprises an lr image and a corresponding hr image; the lr images are obtained by up-sampling and down-sampling corresponding hr images, and the lr images are consistent with the hr images in size;
A2. according to the training data, singly pre-training the SR module to be trained to obtain the SR module completing the pre-training;
A3. according to the training data, the identification module to be trained is singly pre-trained to obtain the identification module which is pre-trained;
A4. pre-training the full link network to be trained independently to obtain the pre-trained full link network;
A5. and according to the training data, carrying out integral training on the conversion model after pre-training to obtain the conversion model after training.
8. An electronic device comprising a processor and a memory, wherein the memory stores computer readable instructions, and the computer readable instructions, when executed by the processor, perform the steps of the image recognition rate improving method according to any one of claims 1-6.
9. A storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, performs the steps of the image recognition rate improving method according to any one of claims 1 to 6.
CN202210986278.5A 2022-08-17 2022-08-17 Image recognition rate improving method and device, electronic equipment and storage medium Active CN115063876B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210986278.5A CN115063876B (en) 2022-08-17 2022-08-17 Image recognition rate improving method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210986278.5A CN115063876B (en) 2022-08-17 2022-08-17 Image recognition rate improving method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN115063876A true CN115063876A (en) 2022-09-16
CN115063876B CN115063876B (en) 2022-11-18

Family

ID=83207805

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210986278.5A Active CN115063876B (en) 2022-08-17 2022-08-17 Image recognition rate improving method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115063876B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084119A (en) * 2019-03-26 2019-08-02 安徽艾睿思智能科技有限公司 Low-resolution face image recognition methods based on deep learning
CN111461134A (en) * 2020-05-18 2020-07-28 南京大学 Low-resolution license plate recognition method based on generation countermeasure network
CN111754401A (en) * 2020-05-29 2020-10-09 新加坡依图有限责任公司(私有) Decoder training method, high-definition face image generation device and computer equipment
CN113837942A (en) * 2021-09-26 2021-12-24 平安科技(深圳)有限公司 Super-resolution image generation method, device, equipment and storage medium based on SRGAN
CN114170608A (en) * 2021-12-01 2022-03-11 上海东普信息科技有限公司 Super-resolution text image recognition method, device, equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084119A (en) * 2019-03-26 2019-08-02 安徽艾睿思智能科技有限公司 Low-resolution face image recognition methods based on deep learning
CN111461134A (en) * 2020-05-18 2020-07-28 南京大学 Low-resolution license plate recognition method based on generation countermeasure network
CN111754401A (en) * 2020-05-29 2020-10-09 新加坡依图有限责任公司(私有) Decoder training method, high-definition face image generation device and computer equipment
CN113837942A (en) * 2021-09-26 2021-12-24 平安科技(深圳)有限公司 Super-resolution image generation method, device, equipment and storage medium based on SRGAN
CN114170608A (en) * 2021-12-01 2022-03-11 上海东普信息科技有限公司 Super-resolution text image recognition method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN115063876B (en) 2022-11-18

Similar Documents

Publication Publication Date Title
Cao et al. Attention-aware face hallucination via deep reinforcement learning
Shamsolmoali et al. G-GANISR: Gradual generative adversarial network for image super resolution
Dosovitskiy et al. Generating images with perceptual similarity metrics based on deep networks
CN110414377B (en) Remote sensing image scene classification method based on scale attention network
Ren et al. Single image super-resolution via adaptive high-dimensional non-local total variation and adaptive geometric feature
Zhou et al. Scale adaptive image cropping for UAV object detection
Ren et al. Single image super-resolution using local geometric duality and non-local similarity
CN104969257B (en) Image processing equipment and image processing method
CN107067367A (en) A kind of Image Super-resolution Reconstruction processing method
Ye et al. CSformer: Bridging convolution and transformer for compressive sensing
Pérez-Pellitero et al. Photorealistic video super resolution
CN105046659A (en) Sparse representation-based single lens calculation imaging PSF estimation method
Yang et al. Dual-geometric neighbor embedding for image super resolution with sparse tensor
Liu et al. Multi-scale residual hierarchical dense networks for single image super-resolution
Dou et al. Medical image super-resolution via minimum error regression model selection using random forest
Zhao et al. Crnet: Unsupervised color retention network for blind motion deblurring
Chen et al. Learnable descent algorithm for nonsmooth nonconvex image reconstruction
Choi et al. Test-time adaptation for video frame interpolation via meta-learning
CN113393385B (en) Multi-scale fusion-based unsupervised rain removing method, system, device and medium
Moser et al. Diffusion Models, Image Super-Resolution And Everything: A Survey
Hossain et al. Anti-aliasing deep image classifiers using novel depth adaptive blurring and activation function
CN112132145B (en) Image classification method and system based on model extended convolutional neural network
Zhao et al. Unsupervised color retention network and new quantization metric for blind motion deblurring
CN115063876B (en) Image recognition rate improving method and device, electronic equipment and storage medium
Gan et al. Learned Two-step Iterative Shrinkage Thresholding Algorithm for Deep Compressive Sensing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant