WO2024045442A1 - 图像矫正模型的训练方法、图像矫正方法、设备及存储介质 - Google Patents

图像矫正模型的训练方法、图像矫正方法、设备及存储介质 Download PDF

Info

Publication number
WO2024045442A1
WO2024045442A1 PCT/CN2022/142238 CN2022142238W WO2024045442A1 WO 2024045442 A1 WO2024045442 A1 WO 2024045442A1 CN 2022142238 W CN2022142238 W CN 2022142238W WO 2024045442 A1 WO2024045442 A1 WO 2024045442A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
training
network
rotated
correction
Prior art date
Application number
PCT/CN2022/142238
Other languages
English (en)
French (fr)
Inventor
叶嘉权
魏新明
王孝宇
肖嵘
Original Assignee
青岛云天励飞科技有限公司
深圳云天励飞技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 青岛云天励飞科技有限公司, 深圳云天励飞技术股份有限公司 filed Critical 青岛云天励飞科技有限公司
Publication of WO2024045442A1 publication Critical patent/WO2024045442A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/60Rotation of whole images or parts thereof
    • G06T3/608Rotation of whole images or parts thereof by skew deformation, e.g. two-pass or three-pass rotation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/02Affine transformations

Definitions

  • the present application relates to the field of image correction, and in particular to an image correction model training method, image correction method, computer equipment and storage medium.
  • This application provides an image correction model training method, image correction method, computer equipment and storage medium, so as to reduce the training cost of the image correction model and improve the accuracy of the obtained image correction model.
  • this application provides a training method for an image correction model.
  • the method includes:
  • training data includes a training image and a rotated image corresponding to the training image
  • this application also provides an image correction method, which method includes:
  • the image to be corrected is input into a pre-trained image correction model to obtain a corrected image, wherein the pre-trained image correction model is trained using the image correction model training method as described in the first aspect.
  • the application also provides a computer device, the computer device including a memory and a processor; the memory is used to store a computer program; the processor is used to execute the computer program and execute the computer program.
  • the computer program implements the training method of the image correction model as described in the first aspect and/or the image correction method as described in the second aspect.
  • the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program.
  • the processor causes the processor to implement as described in the first aspect.
  • This application discloses a training method for an image correction model, an image correction method, a computer device and a storage medium.
  • the training image and the rotated image corresponding to the training image are used as the training data of the image correction model.
  • the amount of data already used is reduced.
  • the amount of annotated data can reduce the dependence on annotated data, reduce the training cost of the image correction model, and also improve the accuracy of the trained image correction model.
  • the corrected image is obtained based on the affine transformation matrix and the sampling network, and then the preset correction network and the sampling network are iteratively trained based on the corrected image and the training image, and after the training is completed, the sampling network and the preset correction network are jointly used as Image correction model, the sampling network is also used as part of the image correction model to participate in the training of the image correction model, and the corrected images are used to conduct unsupervised training of the image correction model, so that the trained images can be guaranteed while reducing the labeling cost. Corrected model accuracy.
  • Figure 1 is a schematic flow chart of a training method for an image correction model provided by an embodiment of the present application
  • Figure 2 is a schematic diagram of the steps for performing affine transformation on a rotated image provided by an embodiment of the present application
  • Figure 3 is a schematic diagram of the steps of image sampling provided by the embodiment of the present application.
  • Figure 4 is a schematic flow chart of an image correction method provided by an embodiment of the present application.
  • Figure 5 is a schematic block diagram of a training device for an image correction model provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural block diagram of a computer device provided by an embodiment of the present application.
  • Embodiments of the present application provide an image correction model training method, an image correction method, a computer device, and a storage medium.
  • the image correction model obtained using the training method of the image correction model can be used to correct the collected face images and improve the accuracy of the corrected images, so as to improve the accuracy of face recognition in subsequent face recognition. Recognition success rate and accuracy.
  • FIG. 1 is a schematic flow chart of an image correction model training method provided by an embodiment of the present application.
  • the image correction model training method iteratively trains the image correction model by simulating training data, which can reduce training costs and improve the accuracy of the trained image correction model.
  • the training method of the image correction model specifically includes: steps S101 to step S105.
  • Step S101 Obtain training data, where the training data includes a training image and a rotated image corresponding to the training image.
  • the training data is used to train the image correction model.
  • the training data includes training images and rotated images corresponding to the training images. Among them, the rotated image is obtained by rotating the training image by a preset angle.
  • the method may further include the following steps: obtaining a training image and rotating the training image to obtain a rotated image corresponding to the training image.
  • the training image can be rotated by a certain angle in advance to obtain the rotated image.
  • the training image, the rotated image corresponding to the training image, and the rotation angle corresponding to the training image can be directly obtained.
  • multiple rotated images can be obtained by rotating a different training image at different angles, thereby increasing the amount of training data.
  • the rotation angles during rotation may be different or may have the same rotation angle.
  • a large number of pictures of frontal faces can be screened out in business scenarios as training images.
  • the training images include images without obvious angle rotation, and then a large number of images are generated by artificially introducing random rotation angles. Rotated images for training.
  • the training image is rotated, and the rotated image obtained by the rotation is used to participate in the training of the image correction model and obtain artificially labeled data including key points.
  • this method of generating training data by simulating the generation of rotated images does not require manual annotation of the rotated images in advance. Instead, the rotation angle can be controlled by oneself. On the one hand, it can enrich the training data and reduce the training cost. The dependence on human-labeled data in the process reduces the cost of obtaining training data. On the other hand, it can also reduce errors in the process of human-labeled data, thereby further improving the accuracy of the trained image correction model.
  • Step S102 Input the rotated image into a preset correction network to obtain an affine transformation matrix corresponding to the rotated image.
  • the affine transformation matrix is also the affine transformation matrix corresponding to the rotated image output by the preset correction network.
  • the affine transformation matrix can be a matrix with two rows and three columns, which includes the information required to correct the face image, including rotation, translation, scaling and other information.
  • the affine transformation matrix can be expressed as:
  • U is the input picture
  • is the affine transformation matrix.
  • the shape of the affine transformation matrix is 2*3. Therefore, the preset correction network can be regarded as a six-node regressor, and ⁇ can be expressed as:
  • the preset correction network may be a pre-trained correction network trained based on a small amount of labeled data. Using a small amount of labeled data to perform supervised pre-training on the correction network can improve the performance of the correction network. Training efficiency of image rectification models.
  • the method may further include the following steps: obtaining sample data, where the sample data includes a sample image and key points corresponding to the sample image; inputting the sample image into a convolutional neural network to obtain an output Affine matrix; determine a supervised affine matrix based on the key points corresponding to the sample image and the preset positioning points; calculate the loss function value between the output affine matrix and the supervised affine matrix, and calculate the value according to the loss
  • the function value is used to pre-train the convolutional neural network to obtain a pre-trained correction network.
  • sample data refers to a sample image with annotation, including the sample image and the key points in the annotated sample image.
  • the sample data can be labeled data in the training data, or new data outside the training data. Then the sample image is input into the convolutional neural network to obtain the output of the convolutional neural network, that is, the output affine matrix.
  • the supervisory affine matrix is determined based on the key points corresponding to the sample image and the preset positioning points.
  • the preset positioning point refers to a preset point with a certain coordinate position, and each key point can correspond to a preset positioning point.
  • key points at the same position in different images can be corrected to a fixed position. For example, the position coordinates of the eyes and mouth in facial images are always at a certain fixed position. point.
  • the mean absolute error (MAE) loss function can be used to iteratively train the convolutional neural network. For example, since both the output affine matrix and the supervisory affine matrix include six values, the values at the corresponding positions in the output affine matrix and the supervision affine matrix can be subtracted and their absolute values can be calculated, and then the absolute values can be After addition and averaging, the loss function value between the output affine matrix and the supervisory affine matrix can be obtained.
  • the parameters of the convolutional neural network are continuously adjusted through the loss function value between the output affine matrix and the supervised affine matrix, so that the output affine matrix output by the convolutional neural network can be closer to the supervised affine matrix.
  • the loss value of the loss function of the convolutional neural network reaches the preset value, at this time the convolutional neural network converges, and the converged convolutional neural network can be used as a pre-trained correction network.
  • the convolutional neural network is supervised in the form of a matrix, which can quickly obtain the preset correction network, thus improving the training of the image correction model. speed.
  • the method may further include the following steps: obtaining training sample data, the training sample data including a rotation sample image corresponding to the training sample image and a rotation angle corresponding to the training sample image;
  • the sample image is input to the convolutional neural network to obtain the output affine matrix corresponding to the rotated sample image; determine the supervisory affine matrix according to the rotated sample image and the rotation angle corresponding to the rotated sample image; calculate the output affine
  • the loss function value between the matrix and the supervision affine matrix, and the convolutional neural network is pre-trained according to the loss function value to obtain a pre-trained correction network.
  • the training sample data is used to pre-train the convolutional neural network.
  • the training sample data includes the rotation sample image corresponding to the training sample image and the rotation angle corresponding to the training sample image.
  • the rotated sample image is obtained by rotating the training sample image by a preset angle. For example, if the training sample image is rotated clockwise by 15° to obtain the rotated sample image, then the rotation angle corresponding to the training sample image is 15°.
  • the training sample data can be the training data used for iterative training of the preset correction network and sampling network, thereby reducing the amount of samples required in the training process of the image correction model, thereby reducing the image quality. Correction model training cost.
  • the mean absolute error (MAE) loss function can be used to iteratively train the convolutional neural network.
  • the parameters of the convolutional neural network are continuously adjusted through the loss function value between the output affine matrix and the supervised affine matrix, so that the output affine matrix output by the convolutional neural network can be closer to the supervised affine matrix.
  • the loss value of the loss function of the convolutional neural network reaches the preset value, at this time the convolutional neural network converges, and the converged convolutional neural network can be used as a pre-trained correction network.
  • the convolutional neural network can choose a CNN network with the network structure of MobileNet.
  • the method may further include the following steps:
  • the image is rotated for image preprocessing, which includes resizing and/or image enhancement.
  • Image preprocessing is performed on the rotated image, where image preprocessing includes size adjustment, that is, adjusting the size of the rotated image to a fixed size.
  • image preprocessing may also include image enhancement, which is to make the image clearer.
  • image preprocessing can be performed on the rotated image, or image preprocessing can be performed on the training image.
  • the rotated image obtained is the preprocessed image, that is, It is not necessary to perform image preprocessing on rotated images.
  • Step S103 Perform affine transformation on the rotated image based on the affine transformation matrix, and input the obtained transformation data to a sampling network for image sampling to obtain a corrected image corresponding to the rotated image.
  • image sampling can be differential image sampling, which refers to interpolating and rounding the pixel position value obtained by affine transformation of each pixel point to obtain the corresponding actual sampling coordinates, and finally calculating the pixels in the rotated image according to the actual sampling coordinates.
  • Point sampling is performed, that is, for each pixel position in the corrected image, the pixel value at the corresponding position in the rotated image is found for filling.
  • FIG. 2 is a schematic diagram of the steps of performing affine transformation on a rotated image according to an embodiment of the present application.
  • the step of performing affine transformation on the rotated image based on the affine transformation matrix may include: step S1031, obtaining the pixel coordinates of each pixel point in the rotated image; step S1032. The pixel coordinates of each pixel point are mapped based on the affine transformation matrix to obtain the mapping coordinates of each pixel point.
  • Coordinate mapping is performed on all pixels in the rotated image in the above manner, and the mapping coordinates corresponding to each pixel can be obtained.
  • the step of inputting the obtained transformation data to a sampling network for image sampling and obtaining the corrected image corresponding to the rotated image may include:
  • mapping coordinates of the pixel points are input to the sampling network for image sampling to obtain the corrected image corresponding to the rotated image; when the mapping coordinates of the pixel points are integers, According to the mapping coordinates, pixel points corresponding to the mapping coordinates are obtained in the rotated image and filled with pixel points to obtain a corrected image.
  • mapping coordinates corresponding to the pixel points calculated by the affine transformation matrix are floating point values rather than integer values. Therefore, in order to obtain the corrected image, it is also necessary to interpolate the coordinates of these mapping coordinates that are not integers. Only by rounding can pixel filling be performed.
  • mapping coordinates corresponding to each pixel point it is determined whether the mapping coordinates corresponding to the pixel point is an integer. If the mapping coordinates corresponding to the pixel point is an integer, this means that the mapping coordinates corresponding to the pixel point are Corresponding to certain spatial positions in the rotated image, you can directly obtain the pixel points corresponding to the sampling coordinates in the rotated image according to the mapping coordinates of the pixel points for pixel filling. However, if the mapping coordinates corresponding to the pixels are not integers, it means that the mapping coordinates cannot be directly corresponding to the spatial position in the rotated image. In this case, the mapping coordinates of the pixels need to be input to the sampling network for image sampling, and all pixels are sampled. After all pixel points are filled, the corrected image corresponding to the rotated image is obtained.
  • FIG. 3 is a schematic diagram of the steps of image sampling provided by an embodiment of the present application.
  • the step of inputting the mapping coordinates of the pixel points into the sampling network for image sampling and obtaining the corrected image corresponding to the rotated image may include:
  • Step S1033 Interpolate and round the mapping coordinates of the pixel points to obtain the sampling coordinates corresponding to the pixel points;
  • Step S1034 Obtain the pixel points corresponding to the sampling coordinates in the rotated image according to the sampling coordinates. Perform pixel filling to obtain a corrected image.
  • mapping coordinates of the pixel points are interpolated and rounded.
  • various methods can be used for interpolation and rounding, such as bilinear interpolation and nearest neighbor interpolation.
  • bilinear interpolation is to find the four integer points closest to the desired coordinates, and sum them up according to the distance weight. The closer the distance, the greater the weight.
  • the process of interpolation and rounding can be:
  • f represents the sampling function
  • ⁇ x and ⁇ y represent the sampling function parameters
  • sampling coordinates After obtaining the sampling coordinates, sample the pixels at the corresponding positions in the rotated image according to the sampling coordinates, and then fill the obtained pixels with pixels until all pixels in the rotated image are traversed, and then the pixels will be filled with pixels.
  • the obtained image is used as the corrected image.
  • Step S104 Calculate the loss function value between the correction image and the training image, and iteratively train the preset correction network and the sampling network according to the loss function value, and when the training is completed, The preset correction network and the sampling network together serve as an image correction model.
  • the loss function is used to calculate the loss function value between the corrected image and the training image.
  • the loss function can be a mean squared error (MSE) loss function. And iteratively train the preset correction network and sampling network based on the calculated loss function value.
  • MSE mean squared error
  • the sampling network since the sampling network is differentiable and satisfies the back propagation condition, it can be trained end-to-end together with the preset correction network.
  • the sampling network is used as the image correction model.
  • One network layer participates in the training of a preset correction network.
  • the preset correction network and the sampling network may be trained together, or the preset correction network may be iteratively trained using a sampling network with fixed parameters.
  • the loss function value between the correction image and the training image is calculated based on the loss function, and the sampling parameters in the sampling network and the network in the preset correction network are adjusted based on the loss function value.
  • the weight value of the parameter is such that the corrected image output by the image correction model can be closer to the training image, thereby improving the accuracy of image correction performed by the image correction model.
  • the weight values of the network parameters in the preset correction network can be adjusted first. After the weight values of the network parameters in the preset correction network are adjusted appropriately, the sampling parameters in the sampling network can be adjusted appropriately.
  • the loss function value between the correction image and the training image is calculated based on the loss function.
  • the weight value of the network parameters in the preset correction network can be adjusted according to the calculated loss function value, so that the preset The affine transformation matrix output by the correction network is more accurate.
  • the trained preset correction network and sampling network will be used as an image correction model to participate in image correction.
  • stochastic gradient descent method, gradient descent method, Newton method, quasi-Newton method, conjugate gradient method, etc. can be used to adjust the weight values of the parameters in the preset correction network.
  • the average squared error loss function When using the average squared error loss function to calculate the loss function value between the corrected image and the training image, you can calculate the distance between each pixel position in the corrected image and the training image, find the square of the distance, and then sum the squares.
  • the loss function value between the corrected image and the training image can be obtained by averaging. When the loss function value between the correction image and the training image reaches the preset value or reaches the minimum value, it can be considered that the training of the preset correction network is completed at this time.
  • the corrected images and training images are used to iteratively train the preset correction network and sampling network, which can improve the accuracy of the affine transformation matrix output by the preset correction network, thereby comprehensively improving the entire image correction model.
  • the correctness and accuracy of the output corrected image are used to iteratively train the preset correction network and sampling network, which can improve the accuracy of the affine transformation matrix output by the preset correction network, thereby comprehensively improving the entire image correction model.
  • the training method of the image correction model uses the training image and the rotated image corresponding to the training image as the training data of the image correction model, which reduces the amount of labeled data when training the image correction model, thereby reducing the need for labeled data.
  • Data dependence reduces the training cost of the image correction model and can also improve the accuracy of the trained image correction model.
  • the corrected image is obtained based on the affine transformation matrix and the sampling network, and then the preset correction network and the sampling network are iteratively trained based on the corrected image and the training image, and after the training is completed, the sampling network and the preset correction network are jointly used as Image correction model, the sampling network is also used as part of the image correction model to participate in the training of the image correction model, and the corrected images are used to conduct unsupervised training of the image correction model, so that the trained images can be guaranteed while reducing the labeling cost. Corrected model accuracy.
  • FIG. 4 is a schematic diagram of the steps of an image correction method provided by an embodiment of the present application.
  • the image correction method includes step S201 and step S202.
  • Step S201 Obtain the image to be corrected.
  • image preprocessing Before performing image correction, image preprocessing can be performed on the acquired image to obtain the image to be corrected.
  • image preprocessing includes modifying the size of the image so that the image is adjusted to a preset size to ensure the accuracy of the corrected image obtained by image correction.
  • Step S202 Input the image to be corrected into a pre-trained image correction model to obtain a corrected image, wherein the pre-trained image correction model is trained using the above-mentioned image correction model training method.
  • the corrected image corresponding to the image to be corrected can be obtained.
  • the correction network in the pre-trained image correction model outputs the affine transformation matrix corresponding to the image to be corrected, and then uses the affine transformation matrix
  • the transformation matrix performs affine transformation on the image to be corrected to obtain transformation data corresponding to the image to be corrected, where the transformation data includes the mapping coordinates of each pixel in the image to be corrected after affine transformation.
  • the transformation data corresponding to the image to be corrected is input into the sampling network for image sampling, and the corrected image generated after image sampling is used as the output of the pre-trained image correction model, thereby obtaining the corrected image corresponding to the image to be corrected.
  • the image correction method provided by the above embodiment uses a pre-trained image correction model to perform image correction on the image to be corrected, and can perform high-accuracy image correction on the image, thereby improving the success of recognition during subsequent face recognition or other actions. rate and accuracy.
  • FIG. 5 is a schematic block diagram of an image correction model training device provided by an embodiment of the present application.
  • the training device of the image correction model includes: a data acquisition module 301, a matrix generation module 302, an image generation model 303 and an iterative training module 304. in,
  • the data acquisition module 301 is used to acquire training data, where the training data includes training images and rotated images corresponding to the training images.
  • the matrix generation module 302 is used to input the rotated image into a preset correction network to obtain an affine transformation matrix corresponding to the rotated image.
  • the image generation model 303 is used to perform affine transformation on the rotated image based on the affine transformation matrix, and input the obtained transformation data to a sampling network for image sampling to obtain a corrected image corresponding to the rotated image.
  • Iterative training module 304 is used to calculate the loss function value between the correction image and the training image, and iteratively train the preset correction network and the sampling network according to the loss function value, and perform When training is completed, the preset correction network and the sampling network are used together as an image correction model.
  • FIG. 6 is a schematic block diagram of an image correction device provided by an embodiment of the present application.
  • the image correction device includes: an image acquisition module 401 and an image correction module 402 . in,
  • Image acquisition module 401 is used to acquire images to be corrected.
  • Image correction module 402 is used to input the image to be corrected into a pre-trained image correction model to obtain a corrected image, wherein the pre-trained image correction model is trained using the above-mentioned image correction model training method. of.
  • FIG. 7 is a schematic structural block diagram of a computer device provided by an embodiment of the present application.
  • the computer device may be a server or a terminal.
  • the computer device includes a processor, a memory, and a network interface connected through a system bus, where the memory may include a non-volatile storage medium and an internal memory.
  • Non-volatile storage media stores operating systems and computer programs.
  • the computer program includes program instructions. When executed, the program instructions can cause the processor to perform any image correction model training method.
  • the processor is used to provide computing and control capabilities to support the operation of the entire computer device.
  • the internal memory provides an environment for the execution of the computer program in the non-volatile storage medium.
  • the computer program When executed by the processor, it can cause the processor to execute any image correction model training method.
  • This network interface is used for network communication, such as sending assigned tasks, etc.
  • Those skilled in the art can understand that the structure shown in Figure 7 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. Specific computer equipment can May include more or fewer parts than shown, or combine certain parts, or have a different arrangement of parts.
  • the processor can be a central processing unit (Central Processing Unit, CPU), and the processor can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general processor may be a microprocessor or the processor may be any conventional processor.
  • the processor is used to run a computer program stored in the memory to implement the following steps:
  • training data includes a training image and a rotated image corresponding to the training image
  • the preset correction network is a pre-trained correction network; the processor is also used to implement:
  • sample data where the sample data includes a sample image and key points corresponding to the sample image
  • the preset correction network is a pre-trained correction network; the processor is also used to implement:
  • training sample data includes a rotation sample image corresponding to the training sample image and a rotation angle corresponding to the training sample image
  • the processor when implementing the affine transformation of the rotated image based on the affine transformation matrix, the processor is configured to implement:
  • the pixel coordinates of each pixel point are mapped based on the affine transformation matrix to obtain the mapping coordinates of each pixel point.
  • the processor when the processor implements the input of the obtained transformation data to the sampling network for image sampling and obtains the corrected image corresponding to the rotated image, it is used to implement:
  • mapping coordinates of the pixel points are not integers, input the mapping coordinates of the pixel points into a sampling network for image sampling to obtain a corrected image corresponding to the rotated image;
  • mapping coordinates of the pixel points are integers
  • the pixel points corresponding to the mapping coordinates are obtained in the rotated image according to the mapping coordinates and filled with pixel points to obtain a corrected image.
  • the processor when the processor implements the input of the mapping coordinates of the pixel points into the sampling network for image sampling and obtains the corrected image corresponding to the rotated image, it is used to implement:
  • sampling coordinates pixel points corresponding to the sampling coordinates are obtained in the rotated image and filled with pixel points to obtain a corrected image.
  • the processor before inputting the rotated image into a preset correction network and obtaining an affine transformation matrix corresponding to the rotated image, the processor is configured to:
  • Image preprocessing is performed on the rotated image, and the image preprocessing includes resizing and/or image enhancement.
  • the processor is used to run a computer program stored in the memory to implement the following steps:
  • the image to be corrected is input into a pre-trained image correction model to obtain a corrected image, wherein the pre-trained image correction model is trained using the above-mentioned image correction model training method.
  • Embodiments of the present application also provide a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program.
  • the computer program includes program instructions.
  • the processor executes the program instructions to implement the present application.
  • the computer-readable storage medium may be an internal storage unit of the computer device described in the previous embodiment, such as a hard disk or memory of the computer device.
  • the computer-readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), or a secure digital (SD) equipped on the computer device. ) card, Flash Card, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

一种图像矫正模型的训练方法、图像矫正方法、计算机设备及存储介质,涉及图像矫正领域,其中,图像矫正模型的训练方法包括:获取训练数据,训练数据包括训练图像和训练图像对应的旋转图像(S101);将旋转图像输入至预设的矫正网络中,得到仿射变换矩阵(S102);基于仿射变换矩阵对旋转图像进行仿射变换,并将得到的变换数据输入至采样网络进行图像采样,得到旋转图像对应的矫正图像(S103);计算矫正图像和训练图像之间的损失函数值,并根据损失函数值对预设的矫正网络和采样网络进行迭代训练,并在训练完成时将预设的矫正网络和采样网络共同作为图像矫正模型(S104)。所述图像矫正模型的训练方法能够降低图像矫正模型的训练成本。

Description

图像矫正模型的训练方法、图像矫正方法、设备及存储介质
本申请要求于2022年8月30日提交中国专利局,申请号为202211048861.8、发明名称为“图像矫正模型的训练方法、图像矫正方法、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像矫正领域,尤其涉及一种图像矫正模型的训练方法、图像矫正方法、计算机设备及存储介质。
背景技术
随着科技的不断发展,人脸识别技术在社会中也得到了广泛的应用,例如安防领域。但由于采集人脸的设备的设置位置通常是固定的,而行人的行走轨迹和行人的身体方向均是不固定的,这使得采集得到人脸图像往往会存在一定的倾斜或旋转,因此,就需要先对这些存在倾斜或旋转的人脸图像进行图像矫正,以保证人脸识别的准确度。目前,现有技术中在进行图像矫正时,通常是对人脸图像中的关键点进行标注,然后利用这些有标注的图像对神经网络进行训练。但这种训练方式需要大量的标注数据,标注成本高且容易引入误差,导致最终训练得到的神经网络的准确度不高。
发明内容
本申请提供了一种图像矫正模型的训练方法、图像矫正方法、计算机设备及存储介质,以降低图像矫正模型的训练成本并提高得到的图像矫正模型准确度。
第一方面,本申请提供了一种图像矫正模型的训练方法,所述方法包括:
获取训练数据,所述训练数据包括训练图像和所述训练图像对应的旋转图像;
将所述旋转图像输入至预设的矫正网络中,得到所述旋转图像对应的仿射变换矩阵;
基于所述仿射变换矩阵对所述旋转图像进行仿射变换,并将得到的变换数据输入至采样网络进行图像采样,得到所述旋转图像对应的矫正图像;
计算所述矫正图像和所述训练图像之间的损失函数值,并根据所述损失函数值对所述预设的矫正网络和所述采样网络进行迭代训练,并在训练完成时将所述预设的矫正网络和所述采样网络共同作为图像矫正模型。
第二方面,本申请还提供了一种图像矫正方法,所述方法包括:
获取待矫正图像;
将所述待矫正图像输入至预先训练的图像矫正模型中,得到矫正图像,其中,所述预先训练的图像矫正模型为采用如第一方面中所述的图像矫正模型的训练方法训练得到的。
第三方面,本申请还提供了一种计算机设备,所述计算机设备包括存储器和处理器;所述存储器用于存储计算机程序;所述处理器,用于执行所述计算机程序并在执行所述计算机程序时实现如第一方面所述的图像矫正模型的训练方法和/或如第二方面所述的图像矫正方法。
第四方面,本申请还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时使所述处理器实现如第一方面所述的图像矫正模型的训练方法和/或如第二方面所述的图像矫正方法。
本申请公开了一种图像矫正模型的训练方法、图像矫正方法、计算机设备及存储介质,将训练图像和训练图像对应的旋转图像作为图像矫正模型的训练数据,在训练图像矫正模型时减少了已标注数据的数量,从而能够降低对已标注数据的依赖性,降低图像矫正模型的训练成本,也能够提高训练得到的图像矫正模型的准确率。另外,基于仿射变换矩阵和采样网络得到矫正图像,然后根据矫正图像和训练图像对预设的矫正网络和采样网络进行迭代训练,并在训练完成后将采样网络和预设的矫正网络共同作为图像矫正模型,将采样网络也作为图像矫正模型的一部分参与图像矫正模型的训练,利用矫正图像来对图像矫正模型进行无监督的训练,使得在降低标注成本的情况下也能够保证训练得到的图像矫正模型的准确率。
附图说明
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的一种图像矫正模型的训练方法的示意流程图;
图2为本申请实施例提供的对旋转图像进行仿射变换的步骤示意图;
图3为本申请实施例提供的进行图像采样的步骤示意图;
图4为本申请实施例提供的一种图像矫正方法的示意流程图;
图5为本申请实施例提供的一种图像矫正模型的训练装置的示意性框图;
图6为本申请实施例提供的一种计算机设备的结构示意性框图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
附图中所示的流程图仅是示例说明,不是必须包括所有的内容和操作/步骤,也不是必须按所描述的顺序执行。例如,有的操作/步骤还可以分解、组合或部分合并,因此实际执行的顺序有可能根据实际情况改变。
应当理解,在此本申请说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本申请。如在本申请说明书和所附权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。
还应当理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。
还可以理解的是,在本申请的具体实施方式中,涉及到人脸等相关的数 据,当本申请以上实施例运用到具体产品或技术中时,需要获得用户许可或者同意,且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。
本申请的实施例提供了一种图像矫正模型的训练方法、图像矫正方法、计算机设备及存储介质。使用图像矫正模型的训练方法得到的图像矫正模型可用于对采集到的人脸图像进行图像矫正,并提高得到的矫正图像的准确率,以便于在后续进行人脸识别时,提高人脸识别的识别成功率和准确率。
下面结合附图,对本申请的一些实施方式作详细说明。在不冲突的情况下,下述的实施例及实施例中的特征可以相互组合。
请参阅图1,图1为本申请实施例提供的一种图像矫正模型的训练方法的示意流程图。该图像矫正模型的训练方法通过模拟训练数据的方式对图像矫正模型进行迭代训练,能够降低训练成本并且提高训练得到的图像矫正模型的准确度。
如图1所示,该图像矫正模型的训练方法,具体包括:步骤S101至步骤S105。
步骤S101、获取训练数据,所述训练数据包括训练图像和所述训练图像对应的旋转图像。
获取训练数据,训练数据用于训练图像矫正模型,训练数据中包括训练图像和训练图像对应的旋转图像。其中,旋转图像是由训练图像旋转预设角度后得到的。
在一实施例中,在步骤S101之前,所述方法还可以包括以下步骤:获取训练图像,并对所述训练图像进行旋转,得到所述训练图像对应的旋转图像。
也就是说,可以预先对训练图像旋转一定角度得到旋转图像,在获取训练数据时,可以直接获取训练图像、训练图像对应的旋转图像和训练图像对应的旋转角度。在具体实施过程中,可以通过将一个不同的训练图像进行不同角度的旋转的方式,得到多个旋转图像,从而增加训练数据的数量。另外,对于不同的训练图像,在进行旋转时的旋转角度可以各不相同也可以具有相同的旋转角度。
在具体实施过程中,可以根据在业务场景中筛选出大量正脸的图片作为训练图像,其中,训练图像包括没有带明显角度旋转的图像,然后通过人为 引入随机旋转角度的方式,产生大量用于训练的旋转图像。
由于在训练图像矫正模型时,若使用的训练数据为具有标注的人脸图像和对应的关键点,那么会需要大量的已标注数据来进行网络训练,使得训练数据的获取成本较高,并且图像矫正模型的准确率也依赖于已标注数据的准确性,因此,本申请中通过对训练图像进行旋转,利用旋转得到的旋转图像参与图像矫正模型的训练,与获取人为标注的包括关键点的数据作为训练数据的方案相比,这种生成旋转图像的模拟生成训练数据的方式不需要预先对已发生旋转的图像进行人为标注,而是可以自行控制旋转角度,一方面能够丰富训练数据,降低训练过程中对人为标注数据的依赖性,降低训练数据的获取成本,另一方面也能够减少人为标注数据过程中的误差,从而进一步的提高训练得到的图像矫正模型的准确度。
步骤S102、将所述旋转图像输入至预设的矫正网络中,得到所述旋转图像对应的仿射变换矩阵。
将旋转图像输入至预设的矫正网络中,得到旋转图像对应的仿射变换矩阵。其中,仿射变换矩阵也即预设的矫正网络输出的旋转图像对应的仿射变换矩阵。该仿射变换矩阵可以是一个两行三列的矩阵,其中包括了纠正人脸图像所需的信息,包括旋转,平移,缩放等信息。仿射变换矩阵可以表示为:
Θ=f loc(U)
其中,U是输入的图片,且
Figure PCTCN2022142238-appb-000001
Θ则为仿射变换矩阵。仿射变换矩阵的形状是2*3,因此,预设的矫正网络可以看作是六个节点的回归器,进而可以将Θ表示为:
Figure PCTCN2022142238-appb-000002
另外,在一实施例中,预设的矫正网络可以是基于少量的有标注的数据来训练得到的预训练的矫正网络,使用少量有标注数据来对矫正网络进行有监督的预训练,能够提高图像矫正模型的训练效率。
在一实施方式中,所述方法还可以包括以下步骤:获取样本数据,所述样本数据包括样本图像和所述样本图像对应的关键点;将所述样本图像输入至卷积神经网络,得到输出仿射矩阵;基于所述样本图像对应的关键点和预设定位点确定监督仿射矩阵;计算所述输出仿射矩阵和所述监督仿射矩阵之间的损失函数值,并根据所述损失函数值对所述卷积神经网络进行预训练, 得到预训练的矫正网络。
获取样本数据,其中,样本数据是指具有标注的样本图像,包括样本图像和标注出的样本图像中的关键点。在具体实施过程中,样本数据可以是训练数据中的有标注数据,也可以是训练数据之外的新数据。然后将样本图像输入至卷积神经网络中,得到卷积神经网络的输出,也即输出仿射矩阵。
再基于样本图像对应的关键点和预设的定位点确定出监督仿射矩阵。其中,预设的定位点是指预先设置好的具有确定坐标位置的点,每一个关键点都可以对应一个预设定位点。在具体实施过程中,通过设置预设定位点,可以将不同图像的同一位置的关键点被矫正到一个固定的位置,例如,人脸图像中的眼睛、嘴巴的位置坐标总是在某一固定点。
在得到监督仿射矩阵后,计算输出仿射矩阵和监督仿射矩阵的损失值,并根据计算出的损失函数值对卷积神经网络进行迭代训练,以得到预训练的矫正网络。
在具体实施过程中,可以使用平均绝对值误差(MAE)损失函数对卷积神经网络进行迭代训练。例如,由于输出仿射矩阵和监督仿射矩阵中都包括六个数值,因此,可以将输出仿射矩阵和监督仿射矩阵中对应位置的数值相减并求其绝对值,再将绝对值进行相加后进行平均即可得到输出仿射矩阵和监督仿射矩阵之间的损失函数值。
通过输出仿射矩阵和监督仿射矩阵之间的损失函数值不断的对卷积神经网络的参数进行调整,以使卷积神经网络所输出的输出仿射矩阵能够更加接近于监督仿射矩阵,直至卷积神经网络的损失函数的损失值达到预设值,此时卷积神经网络收敛,可以将该收敛的卷积神经网络作为预训练的矫正网络。利用输出仿射矩阵和监督仿射矩阵之间的损失函数值,以矩阵的方式对卷积神经网络进行有监督的训练,能够较快的得到预设的矫正网络,从而提高图像矫正模型的训练速度。
在另一实施方式中,所述方法还可以包括以下步骤:获取训练样本数据,所述训练样本数据包括训练样本图像对应的旋转样本图像和所述训练样本图像对应的旋转角度;将所述旋转样本图像输入至卷积神经网络,得到所述旋转样本图像对应的输出仿射矩阵;根据所述旋转样本图像和所述旋转样本图像对应的旋转角度确定监督仿射矩阵;计算所述输出仿射矩阵和所述监督仿 射矩阵之间的损失函数值,并根据所述损失函数值对所述卷积神经网络进行预训练,得到预训练的矫正网络。
获取训练样本数据,训练样本数据用于对卷积神经网络进行预训练,训练样本数据中包括训练样本图像对应的旋转样本图像和训练样本图像对应的旋转角度。其中,旋转样本图像是由训练样本图像旋转预设角度后得到的,例如,训练样本图像顺时针旋转15°后得到旋转样本图像,那么该训练样本图像对应的旋转角度即为15°。在具体实施过程中,训练样本数据可以是在对预设的矫正网络和采样网络进行迭代训练所使用的训练数据,从而能够减少在图像矫正模型的训练过程中所需的样本量,进而降低图像矫正模型的训练成本。
将旋转样本图像输入至卷积神经网络中,得到旋转样本图像对应的输出仿射矩阵。由于旋转样本图像是由训练样本图像经过一定角度的旋转得到的,也即是说,旋转样本图像对应的旋转角度是已知的,由于对图片的旋转,相当于对图片的图片矩阵乘以一个仿射矩阵,该仿射矩阵与旋转角度有关,因此,在已知旋转样本图像对应的旋转角度的情况下,可以基于旋转样本图像和旋转样本图像对应的旋转角度,直接计算出该旋转图像所对应的监督仿射矩阵。
在得到监督仿射矩阵后,计算输出仿射矩阵和监督仿射矩阵的损失值,并根据计算出的损失函数值对卷积神经网络进行迭代训练,以得到预训练的矫正网络。
在具体实施过程中,可以使用平均绝对值误差(MAE)损失函数对卷积神经网络进行迭代训练。通过输出仿射矩阵和监督仿射矩阵之间的损失函数值不断的对卷积神经网络的参数进行调整,以使卷积神经网络所输出的输出仿射矩阵能够更加接近于监督仿射矩阵,直至卷积神经网络的损失函数的损失值达到预设值,此时卷积神经网络收敛,可以将该收敛的卷积神经网络作为预训练的矫正网络。
在具体实施过程中,由于最终生成的图像矫正模型是用于进行人脸图像的矫正,往往是进行人脸识别的前置任务。因此图像矫正模型应当尽量做到参数量少,且运行速度快,才能提高人脸识别的识别速度。例如卷积神经网络可以选择具有MobileNet的网络结构的CNN网络。
在一实施例中,在所述将所述旋转图像输入至预设的矫正网络中,得到所述旋转图像对应的仿射变换矩阵的步骤之前,所述方法还可以包括如下步骤:对所述旋转图像进行图像预处理,所述图像预处理包括尺寸调整和/或图像增强。
对旋转图像进行图像预处理,其中,图像预处理包括尺寸调整,也即将旋转图像的尺寸调整到固定的大小,另外图像预处理还可以包括图像增强,图像增强是为了使图像更加清晰。
在具体实施过程中,可以对旋转图像进行图像预处理,也可以是对训练图像进行图像预处理,在对训练图像进行图像预处理后,得到的旋转图像就是经过预处理后的图像,也就可以不对旋转图像进行图像预处理。
步骤S103、基于所述仿射变换矩阵对所述旋转图像进行仿射变换,并将得到的变换数据输入至采样网络进行图像采样,得到所述旋转图像对应的矫正图像。
基于仿射变换矩阵对旋转图像进行仿射变换,得到变换数据,其中,变换数据包括旋转图像中每个像素点经过仿射变换后的映射坐标。然后将变换数据输入至采样网络中进行图像采样,即可得到与旋转图像对应的矫正图像。其中,图像采样可以是差分图像采样,是指对每个像素点经仿射变换求得的像素位置值进行插值取整,得到对应的实际采样坐标,最后根据实际采样坐标在旋转图像中对像素点进行采样,也即是说,对于矫正图像中的每一个像素位置,都在旋转图像中寻找对应位置的像素值进行填充。
在一实施例中,请参阅图2,为本申请实施例提供的对旋转图像进行仿射变换的步骤示意图。如图2中所示,所述基于所述仿射变换矩阵对所述旋转图像进行仿射变换的步骤可以包括:步骤S1031、获取所述旋转图像中每个像素点的像素坐标;步骤S1032、基于所述仿射变换矩阵分别对每个所述像素点的像素坐标进行映射,得到每个所述像素点的映射坐标。
首先获取旋转图像中一个像素点的像素坐标,可以记为
Figure PCTCN2022142238-appb-000003
在经过仿射变换矩阵Θ后,该像素点对应的映射坐标变为
Figure PCTCN2022142238-appb-000004
具体仿射变换的过程为:
Figure PCTCN2022142238-appb-000005
对旋转图像中的所有像素点都按照上述方式进行坐标映射,即可得到每个像素点对应的映射坐标。
在一实施例中,所述将得到的变换数据输入至采样网络进行图像采样,得到所述旋转图像对应的矫正图像的步骤可以包括:
当所述像素点的映射坐标不是整数时,将所述像素点的映射坐标输入至采样网络进行图像采样,得到所述旋转图像对应的矫正图像;当所述像素点的映射坐标是整数时,根据所述映射坐标在所述旋转图像中获取与所述映射坐标对应的像素点进行像素点填充,以得到矫正图像。
通常情况下,经过仿射变换矩阵所计算出的像素点所对应的映射坐标的数值是浮点值而非整数值,因此,为了得到矫正图像,还需要对这些映射坐标不是整数的坐标进行插值取整,才能够进行像素点填充。
在具体实施过程中,在得到每一个像素点对应的映射坐标后,判断像素点对应的映射坐标是否为整数,若像素点对应的映射坐标为整数,此时也就是说像素点对应的映射坐标刚好对应旋转图像中的某些空间位置,那么可以直接根据像素点的映射坐标在旋转图像中获取与采样坐标对应的像素点进行像素点填充。但若像素点对应的映射坐标不是整数,则说明无法将映射坐标与旋转图像中的空间位置进行直接对应,此时需要将像素点的映射坐标输入至采样网络进行图像采样,并在所有的像素点都完成像素点填充后,得到旋转图像对应的矫正图像。
在一实施例中,请参阅图3,为本申请实施例提供的进行图像采样的步骤示意图。如图3中所示,所述将所述像素点的映射坐标输入至采样网络进行图像采样,得到所述旋转图像对应的矫正图像的步骤可以包括:
步骤S1033、对所述像素点的映射坐标进行插值取整,得到所述像素点对应的采样坐标;步骤S1034、根据所述采样坐标在所述旋转图像中获取与所述采样坐标对应的像素点进行像素点填充,得到矫正图像。
对像素点的映射坐标进行插值取整,在具体实施过程中,可以使用多种方式进行插值取整,例如双线性插值和最近邻插值。其中,双线性插值就是找出距离所求坐标最近的四个整数点,根据距离权重求和,距离越近权重越大。其中,插值取整的过程可以为:
Figure PCTCN2022142238-appb-000006
其中,
Figure PCTCN2022142238-appb-000007
f表示采样函数,Φ x和Φ y表示采样函数参数,
Figure PCTCN2022142238-appb-000008
为旋转图像在(n,m,c)处的值,
Figure PCTCN2022142238-appb-000009
表示矫正图像在
Figure PCTCN2022142238-appb-000010
的值。
在得到采样坐标后,根据采样坐标在旋转图像中的对应位置采样像素点,然后将获取到的该像素点进行像素点填充,直至遍历完成旋转图像中的所有像素点之后,将经过像素点填充得到的图像作为矫正图像。
步骤S104、计算所述矫正图像和所述训练图像之间的损失函数值,并根据所述损失函数值对所述预设的矫正网络和所述采样网络进行迭代训练,并在训练完成时将所述预设的矫正网络和所述采样网络共同作为图像矫正模型。
在得到矫正图像后,利用损失函数计算矫正图像和训练图像之间的损失函数值,在具体实施过程中,损失函数可以是平均平方误差(MSE)损失函数。并根据计算出的损失函数值对预设的矫正网络和采样网络进行迭代训练。
在具体实施过程中,由于采样网络是可导的且满足反向传播条件,因此其可以和预设的矫正网络一起端对端的进行训练,也即是说,将采样网络作为图像矫正模型中的一个网络层参与预设的矫正网络的训练。
其中,在对预设的矫正网络和采样网络进行迭代训练时,可以是将预设的矫正网络和采样网络共同训练,也可以是利用固定参数的采样网络对预设的矫正网络进行迭代训练。
当对预设的矫正网络和采样网络共同训练时,基于损失函数计算矫正图像和训练图像之间的损失函数值,基于损失函数值分别调整采样网络中的采样参数和预设的矫正网络中网络参数的权重值,以使得图像矫正模型所输出的矫正图像能够更大程度的接近于训练图像,从而提高图像矫正模型进行图像矫正的准确率。在具体实施过程中,可以优先调整预设的矫正网络中网络参数的权重值,在预设的矫正网络中网络参数的权重值调整合适后,再适当的对采样网络中的采样参数进行调整。
当采样网络的参数固定时,基于损失函数计算矫正图像和训练图像之间的损失函数值,此时可以根据计算出的损失函数值调整预设的矫正网络中的网络参数的权重值,使预设的矫正网络所输出的仿射变换矩阵更为准确。并且在训练完成后将训练好的预设的矫正网络和采样网络共同作为图像矫正模型,参与图像的纠正。在具体实施过程中,例如可以采用随机梯度下降法、 梯度下降法、牛顿法、拟牛顿法、共轭梯度法等方式来对预设的矫正网络中参数的权重值进行调整。
在使用平均平方误差损失函数计算矫正图像和训练图像之间的损失函数值时,可以将矫正图像和训练图像中每个像素点位置的距离并求其距离的平方,再将平方进行求和后平均即可得到矫正图像和训练图像之间的损失函数值。当矫正图像和训练图像之间的损失函数值达到预设值或达到最小值时,可认为此时预设的矫正网络的训练完成。
通过采样网络的输出,利用矫正图像和训练图像对预设的矫正网络和采样网络进行迭代训练,能够提高预设的矫正网络所输出的仿射变换矩阵的准确率,从而综合提高整个图像矫正模型所输出矫正图像的正确率和准确率。
上述实施例提供的图像矫正模型的训练方法,将训练图像和训练图像对应的旋转图像作为图像矫正模型的训练数据,在训练图像矫正模型时减少了已标注数据的数量,从而能够降低对已标注数据的依赖性,降低图像矫正模型的训练成本,也能够提高训练得到的图像矫正模型的准确率。另外,基于仿射变换矩阵和采样网络得到矫正图像,然后根据矫正图像和训练图像对预设的矫正网络和采样网络进行迭代训练,并在训练完成后将采样网络和预设的矫正网络共同作为图像矫正模型,将采样网络也作为图像矫正模型的一部分参与图像矫正模型的训练,利用矫正图像来对图像矫正模型进行无监督的训练,使得在降低标注成本的情况下也能够保证训练得到的图像矫正模型的准确率。
请参阅图4,图4是本申请实施例提供的一种图像矫正方法的步骤示意图。如图4中所示,该图像矫正方法包括步骤S201和步骤S202。
步骤S201、获取待矫正图像。
在进行图像矫正前,可以对获取到的图像进行图像预处理,得到待矫正图像。其中,图像预处理包括修改图像的尺寸大小,使图像调整为预设尺寸,以保证进行图像矫正所得到的矫正图像的准确度。
步骤S202、将所述待矫正图像输入至预先训练的图像矫正模型中,得到矫正图像,其中,所述预先训练的图像矫正模型为采用上述的图像矫正模型的训练方法训练得到的。
将待矫正图像输入至采用前述图像矫正模型的训练方法训练的图像矫正 模型中,即可得到待矫正图像所对应的矫正图像。
在具体实施过程中,在待矫正图像输入至预先训练的图像矫正模型后,经过预先训练的图像矫正模型中的矫正网络,输出该待矫正图像所对应的仿射变换矩阵,然后利用该仿射变换矩阵对待矫正图像进行仿射变换,得到待矫正图像对应的变换数据,其中,变换数据中包括待矫正图像中每个像素点经过仿射变换后的映射坐标。然后将待矫正图像对应的变换数据输入至采样网络中进行图像采样,并将经过图像采样后生成的矫正图像作为该预先训练的图像矫正模型的输出,从而得到该待矫正图像所对应的矫正图像。
上述实施例提供的图像矫正方法利用预先训练的图像矫正模型对待矫正图像进行图像矫正,能够对图像进行高准确率的图像矫正,从而在后续进行人脸识别或其他动作时,能够提高识别的成功率和准确率。
请参阅图5,图5是本申请实施例提供的一种图像矫正模型的训练装置的示意性框图。如图5中所示,该图像矫正模型的训练装置包括:数据获取模块301、矩阵生成模块302、图像生成模型303和迭代训练模块304。其中,
数据获取模块301,用于获取训练数据,所述训练数据包括训练图像和所述训练图像对应的旋转图像。
矩阵生成模块302,用于将所述旋转图像输入至预设的矫正网络中,得到所述旋转图像对应的仿射变换矩阵。
图像生成模型303,用于基于所述仿射变换矩阵对所述旋转图像进行仿射变换,并将得到的变换数据输入至采样网络进行图像采样,得到所述旋转图像对应的矫正图像。
迭代训练模块304,用于计算所述矫正图像和所述训练图像之间的损失函数值,并根据所述损失函数值对所述预设的矫正网络和所述采样网络进行迭代训练,并在训练完成时将所述预设的矫正网络和所述采样网络共同作为图像矫正模型。
请参阅图6,图6是本申请实施例提供的一种图像矫正装置的示意性框图。如图6中所示,该图像矫正装置包括:图像获取模块401和图像矫正模块402。其中,
图像获取模块401,用于获取待矫正图像。
图像矫正模块402,用于将所述待矫正图像输入至预先训练的图像矫正 模型中,得到矫正图像,其中,所述预先训练的图像矫正模型为采用如上述的图像矫正模型的训练方法训练得到的。
请参阅图7,图7是本申请实施例提供的一种计算机设备的结构示意性框图。该计算机设备可以是服务器或终端。
如图7中所示,该计算机设备包括通过系统总线连接的处理器、存储器和网络接口,其中,存储器可以包括非易失性存储介质和内存储器。
非易失性存储介质可存储操作系统和计算机程序。该计算机程序包括程序指令,该程序指令被执行时,可使得处理器执行任意一种的图像矫正模型的训练方法。
处理器用于提供计算和控制能力,支撑整个计算机设备的运行。
内存储器为非易失性存储介质中的计算机程序的运行提供环境,该计算机程序被处理器执行时,可使得处理器执行任意一种的图像矫正模型的训练方法。
该网络接口用于进行网络通信,如发送分配的任务等。本领域技术人员可以理解,图7中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
应当理解的是,处理器可以是中央处理单元(Central Processing Unit,CPU),该处理器还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。其中,通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
其中,在一个实施例中,所述处理器用于运行存储在存储器中的计算机程序,以实现如下步骤:
获取训练数据,所述训练数据包括训练图像和所述训练图像对应的旋转图像;
将所述旋转图像输入至预设的矫正网络中,得到所述旋转图像对应的仿 射变换矩阵;
基于所述仿射变换矩阵对所述旋转图像进行仿射变换,并将得到的变换数据输入至采样网络进行图像采样,得到所述旋转图像对应的矫正图像;
计算所述矫正图像和所述训练图像之间的损失函数值,并根据所述损失函数值对所述预设的矫正网络和所述采样网络进行迭代训练,并在训练完成时将所述预设的矫正网络和所述采样网络共同作为图像矫正模型。
在一个实施例中,所述预设的矫正网络为预训练的矫正网络;所述处理器还用于实现:
获取样本数据,所述样本数据包括样本图像和所述样本图像对应的关键点;
将所述样本图像输入至卷积神经网络,得到输出仿射矩阵;
基于所述样本图像对应的关键点和预设定位点确定监督仿射矩阵;
计算所述输出仿射矩阵和所述监督仿射矩阵之间的损失函数值,并根据所述损失函数值对所述卷积神经网络进行预训练,得到预训练的矫正网络。
在一个实施例中,所述预设的矫正网络为预训练的矫正网络;所述处理器还用于实现:
获取训练样本数据,所述训练样本数据包括训练样本图像对应的旋转样本图像和所述训练样本图像对应的旋转角度;
将所述旋转样本图像输入至卷积神经网络,得到所述旋转样本图像对应的输出仿射矩阵;
根据所述旋转样本图像和所述旋转样本图像对应的旋转角度确定监督仿射矩阵;
计算所述输出仿射矩阵和所述监督仿射矩阵之间的损失函数值,并根据所述损失函数值对所述卷积神经网络进行预训练,得到预训练的矫正网络。
在一个实施例中,所述处理器在实现所述基于所述仿射变换矩阵对所述旋转图像进行仿射变换时,用于实现:
获取所述旋转图像中每个像素点的像素坐标;
基于所述仿射变换矩阵分别对每个所述像素点的像素坐标进行映射,得到每个所述像素点的映射坐标。
在一个实施例中,所述处理器在实现所述将得到的变换数据输入至采样 网络进行图像采样,得到所述旋转图像对应的矫正图像时,用于实现:
当所述像素点的映射坐标不是整数时,将所述像素点的映射坐标输入至采样网络进行图像采样,得到所述旋转图像对应的矫正图像;
当所述像素点的映射坐标是整数时,根据所述映射坐标在所述旋转图像中获取与所述映射坐标对应的像素点进行像素点填充,以得到矫正图像。
在一个实施例中,所述处理器在实现所述将所述像素点的映射坐标输入至采样网络进行图像采样,得到所述旋转图像对应的矫正图像时,用于实现:
对所述像素点的映射坐标进行插值取整,得到所述像素点对应的采样坐标;
根据所述采样坐标在所述旋转图像中获取与所述采样坐标对应的像素点进行像素点填充,得到矫正图像。
在一个实施例中,所述处理器在实现所述将所述旋转图像输入至预设的矫正网络中,得到所述旋转图像对应的仿射变换矩阵之前,用于实现:
对所述旋转图像进行图像预处理,所述图像预处理包括尺寸调整和/或图像增强。
其中,在另一个实施例中,所述处理器用于运行存储在存储器中的计算机程序,以实现如下步骤:
获取待矫正图像;
将所述待矫正图像输入至预先训练的图像矫正模型中,得到矫正图像,其中,所述预先训练的图像矫正模型为采用如上述的图像矫正模型的训练方法训练得到的。
本申请的实施例中还提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序中包括程序指令,所述处理器执行所述程序指令,实现本申请实施例提供的任一项的图像矫正模型的训练方法和/或图像矫正方法。
其中,所述计算机可读存储介质可以是前述实施例所述的计算机设备的内部存储单元,例如所述计算机设备的硬盘或内存。所述计算机可读存储介质也可以是所述计算机设备的外部存储设备,例如所述计算机设备上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。

Claims (10)

  1. 一种图像矫正模型的训练方法,其特征在于,所述方法包括:
    获取训练数据,所述训练数据包括训练图像和所述训练图像对应的旋转图像;
    将所述旋转图像输入至预设的矫正网络中,得到所述旋转图像对应的仿射变换矩阵;
    基于所述仿射变换矩阵对所述旋转图像进行仿射变换,并将得到的变换数据输入至采样网络进行图像采样,得到所述旋转图像对应的矫正图像;
    计算所述矫正图像和所述训练图像之间的损失函数值,并根据所述损失函数值对所述预设的矫正网络和所述采样网络进行迭代训练,并在训练完成时将所述预设的矫正网络和所述采样网络共同作为图像矫正模型。
  2. 根据权利要求1所述的图像矫正模型的训练方法,其特征在于,所述预设的矫正网络为预训练的矫正网络;在将所述旋转图像输入至预设的矫正网络之前,所述方法还包括:
    获取样本数据,所述样本数据包括样本图像和所述样本图像对应的关键点;
    将所述样本图像输入至卷积神经网络,得到输出仿射矩阵;
    基于所述样本图像对应的关键点和预设定位点确定监督仿射矩阵;
    计算所述输出仿射矩阵和所述监督仿射矩阵之间的损失函数值,并根据所述损失函数值对所述卷积神经网络进行预训练,得到预训练的矫正网络。
  3. 根据权利要求1所述的图像矫正模型的训练方法,其特征在于,所述预设的矫正网络为预训练的矫正网络;在将所述旋转图像输入至预设的矫正网络之前,所述方法还包括:
    获取训练样本数据,所述训练样本数据包括训练样本图像对应的旋转样本图像和所述训练样本图像对应的旋转角度;
    将所述旋转样本图像输入至卷积神经网络,得到所述旋转样本图像对应的输出仿射矩阵;
    根据所述旋转样本图像和所述旋转样本图像对应的旋转角度确定监督仿射矩阵;
    计算所述输出仿射矩阵和所述监督仿射矩阵之间的损失函数值,并根据 所述损失函数值对所述卷积神经网络进行预训练,得到预训练的矫正网络。
  4. 根据权利要求1所述的图像矫正模型的训练方法,其特征在于,所述基于所述仿射变换矩阵对所述旋转图像进行仿射变换,包括:
    获取所述旋转图像中每个像素点的像素坐标;
    基于所述仿射变换矩阵分别对每个所述像素点的像素坐标进行映射,得到每个所述像素点的映射坐标。
  5. 根据权利要求4所述的图像矫正模型的训练方法,其特征在于,所述将得到的变换数据输入至采样网络进行图像采样,得到所述旋转图像对应的矫正图像,包括:
    当所述像素点的映射坐标不是整数时,将所述像素点的映射坐标输入至采样网络进行图像采样,得到所述旋转图像对应的矫正图像;
    当所述像素点的映射坐标是整数时,根据所述映射坐标在所述旋转图像中获取与所述映射坐标对应的像素点进行像素点填充,以得到矫正图像。
  6. 根据权利要求5所述的图像矫正模型的训练方法,其特征在于,所述将所述像素点的映射坐标输入至采样网络进行图像采样,得到所述旋转图像对应的矫正图像,包括:
    对所述像素点的映射坐标进行插值取整,得到所述像素点对应的采样坐标;
    根据所述采样坐标在所述旋转图像中获取与所述采样坐标对应的像素点进行像素点填充,得到矫正图像。
  7. 根据权利要求1-6中任一项所述的图像矫正模型的训练方法,其特征在于,在所述将所述旋转图像输入至预设的矫正网络中,得到所述旋转图像对应的仿射变换矩阵之前,所述方法包括:
    对所述旋转图像进行图像预处理,所述图像预处理包括尺寸调整和/或图像增强。
  8. 一种图像矫正方法,其特征在于,所述方法包括:
    获取待矫正图像;
    将所述待矫正图像输入至预先训练的图像矫正模型中,得到矫正图像,其中,所述预先训练的图像矫正模型为采用如权利要求1至7中任一项所述的图像矫正模型的训练方法训练得到的。
  9. 一种计算机设备,其特征在于,所述计算机设备包括存储器和处理器;
    所述存储器用于存储计算机程序;
    所述处理器,用于执行所述计算机程序并在执行所述计算机程序时实现如权利要求1至7中任一项所述的图像矫正模型的训练方法和/或如权利要求8所述的图像矫正方法。
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时使所述处理器实现如权利要求1至7中任一项所述的图像矫正模型的训练方法和/或如权利要求8所述的图像矫正方法。
PCT/CN2022/142238 2022-08-30 2022-12-27 图像矫正模型的训练方法、图像矫正方法、设备及存储介质 WO2024045442A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211048861.8A CN115423691A (zh) 2022-08-30 2022-08-30 图像矫正模型的训练方法、图像矫正方法、设备及存储介质
CN202211048861.8 2022-08-30

Publications (1)

Publication Number Publication Date
WO2024045442A1 true WO2024045442A1 (zh) 2024-03-07

Family

ID=84200055

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/142238 WO2024045442A1 (zh) 2022-08-30 2022-12-27 图像矫正模型的训练方法、图像矫正方法、设备及存储介质

Country Status (2)

Country Link
CN (1) CN115423691A (zh)
WO (1) WO2024045442A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115423691A (zh) * 2022-08-30 2022-12-02 青岛云天励飞科技有限公司 图像矫正模型的训练方法、图像矫正方法、设备及存储介质
CN115861393B (zh) * 2023-02-16 2023-06-16 中国科学技术大学 图像匹配方法、航天器着陆点定位方法及相关装置
CN116757964B (zh) * 2023-08-16 2023-11-03 山东省地质矿产勘查开发局第八地质大队(山东省第八地质矿产勘查院) 用于地理信息展示的图像校正方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107749050A (zh) * 2017-09-30 2018-03-02 珠海市杰理科技股份有限公司 鱼眼图像矫正方法、装置及计算机设备
CN108399408A (zh) * 2018-03-06 2018-08-14 李子衿 一种基于深度空间变换网络的变形字符矫正方法
CN109993137A (zh) * 2019-04-09 2019-07-09 安徽大学 一种基于卷积神经网络的快速人脸矫正方法
WO2021051593A1 (zh) * 2019-09-19 2021-03-25 平安科技(深圳)有限公司 图像处理方法、装置、计算机设备及存储介质
CN112651389A (zh) * 2021-01-20 2021-04-13 北京中科虹霸科技有限公司 非正视虹膜图像的矫正模型训练、矫正、识别方法及装置
CN115423691A (zh) * 2022-08-30 2022-12-02 青岛云天励飞科技有限公司 图像矫正模型的训练方法、图像矫正方法、设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107749050A (zh) * 2017-09-30 2018-03-02 珠海市杰理科技股份有限公司 鱼眼图像矫正方法、装置及计算机设备
CN108399408A (zh) * 2018-03-06 2018-08-14 李子衿 一种基于深度空间变换网络的变形字符矫正方法
CN109993137A (zh) * 2019-04-09 2019-07-09 安徽大学 一种基于卷积神经网络的快速人脸矫正方法
WO2021051593A1 (zh) * 2019-09-19 2021-03-25 平安科技(深圳)有限公司 图像处理方法、装置、计算机设备及存储介质
CN112651389A (zh) * 2021-01-20 2021-04-13 北京中科虹霸科技有限公司 非正视虹膜图像的矫正模型训练、矫正、识别方法及装置
CN115423691A (zh) * 2022-08-30 2022-12-02 青岛云天励飞科技有限公司 图像矫正模型的训练方法、图像矫正方法、设备及存储介质

Also Published As

Publication number Publication date
CN115423691A (zh) 2022-12-02

Similar Documents

Publication Publication Date Title
WO2024045442A1 (zh) 图像矫正模型的训练方法、图像矫正方法、设备及存储介质
WO2020199906A1 (zh) 人脸关键点检测方法、装置、设备及存储介质
WO2022033048A1 (zh) 视频插帧方法、模型训练方法及对应装置
CN111386550A (zh) 图像深度和自我运动预测神经网络的无监督学习
JP7373554B2 (ja) クロスドメイン画像変換
WO2020042720A1 (zh) 一种人体三维模型重建方法、装置和存储介质
WO2020199693A1 (zh) 一种大姿态下的人脸识别方法、装置及设备
WO2021121108A1 (zh) 图像超分辨率和模型训练方法、装置、电子设备及介质
WO2021051593A1 (zh) 图像处理方法、装置、计算机设备及存储介质
CN110163087B (zh) 一种人脸姿态识别方法及系统
WO2023019974A1 (zh) 文档图像的矫正方法、装置、电子设备和存储介质
WO2021164269A1 (zh) 基于注意力机制的视差图获取方法和装置
WO2023035531A1 (zh) 文本图像超分辨率重建方法及其相关设备
EP4239583A1 (en) Three-dimensional reconstruction method, apparatus and system, and storage medium
CN115578515B (zh) 三维重建模型的训练方法、三维场景渲染方法及装置
CN112700516A (zh) 基于深度学习的视频渲染方法、装置、计算机设备以及存储介质
WO2022252640A1 (zh) 图像分类预处理、图像分类方法、装置、设备及存储介质
WO2022016996A1 (zh) 图像处理方法、装置、电子设备及计算机可读存储介质
WO2024021504A1 (zh) 人脸识别模型训练方法、识别方法、装置、设备及介质
WO2024037562A1 (zh) 三维重建方法、装置及计算机可读存储介质
WO2024051591A1 (zh) 用于估算视频旋转的方法、装置、电子设备和存储介质
CN111062944B (zh) 网络模型训练方法及装置、图像分割方法及装置
WO2023174063A1 (zh) 背景替换的方法和电子设备
CN115082636B (zh) 基于混合高斯网络的单图像三维重建方法及设备
WO2023109086A1 (zh) 文字识别方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22957252

Country of ref document: EP

Kind code of ref document: A1