WO2023023960A1 - 图像处理及神经网络的训练方法和装置 - Google Patents

图像处理及神经网络的训练方法和装置 Download PDF

Info

Publication number
WO2023023960A1
WO2023023960A1 PCT/CN2021/114403 CN2021114403W WO2023023960A1 WO 2023023960 A1 WO2023023960 A1 WO 2023023960A1 CN 2021114403 W CN2021114403 W CN 2021114403W WO 2023023960 A1 WO2023023960 A1 WO 2023023960A1
Authority
WO
WIPO (PCT)
Prior art keywords
loss function
image
grid
neural network
projection model
Prior art date
Application number
PCT/CN2021/114403
Other languages
English (en)
French (fr)
Inventor
刘宝恩
李鑫超
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to PCT/CN2021/114403 priority Critical patent/WO2023023960A1/zh
Publication of WO2023023960A1 publication Critical patent/WO2023023960A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/80Geometric correction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/60Image enhancement or restoration using machine learning, e.g. neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the present disclosure relates to the technical field of image processing, in particular to an image processing and neural network training method and device.
  • an embodiment of the present disclosure provides an image processing method, the method comprising: acquiring a target image; wherein the target image has a preset type of distortion; acquiring a first image including a target object in the target image area, transforming the first image area by using a preset first projection model, so as to reduce the degree of the preset type of distortion of the target object in the first image area.
  • an embodiment of the present disclosure provides a neural network training method
  • the neural network is used to determine the transformation parameters of the first image area including the target object in the target image after the transformation process is performed by the preset first projection model , the transformation parameters are used to adjust the first image region after transformation processing, so that the position and size of the first image region after transformation processing are the same as those before transformation processing;
  • the method includes: obtaining an initial Initial transformation parameters output by the neural network; determining a loss function of the initial neural network based on the initial transformation parameters; training the initial neural network based on the loss function to obtain the neural network.
  • an embodiment of the present disclosure provides an image processing device, including a processor, and the processor is configured to perform the following steps: acquiring a target image; wherein the target image has a preset type of distortion; acquiring the target image The first image area including the target object is included in the first image area, and the first image area is transformed by a preset first projection model, so as to reduce the preset type of distortion of the target object in the first image area Degree.
  • an embodiment of the present disclosure provides a neural network training device, including a processor, and the neural network is used to determine the first image region including the target object in the target image and perform transformation processing through a preset first projection model Transformation parameters after transformation, the transformation parameters are used to adjust the first image region after transformation processing, so that the position and size of the first image region after transformation processing are the same as those before transformation processing; the processing The device is used to perform the following steps: obtain the initial transformation parameters output by the initial neural network; determine the loss function of the initial neural network based on the initial transformation parameters; train the initial neural network based on the loss function to obtain the neural network network.
  • an embodiment of the present disclosure provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method described in the first aspect is implemented.
  • the first image region including the target object in the target image is transformed through the preset first projection model, so that the degree of the preset type of distortion of the target object in the first image region can be effectively reduced, Reduces the distortion of the target object in the target image, making the target object more natural and realistic.
  • Figure 1 is a schematic diagram of the perspective projection effect of some embodiments.
  • FIG. 2 is a comparison diagram of an image without a preset type of distortion and an image with a preset type of distortion according to some embodiments.
  • FIG. 3 is a flowchart of an image processing method according to an embodiment of the present disclosure.
  • FIG. 4 is a schematic diagram of a projection model used in an embodiment of the present disclosure.
  • FIG. 5 is a schematic diagram of a mask acquisition process implemented by the present disclosure.
  • FIG. 6 is a schematic illustration of a mask practiced by the present disclosure.
  • FIG. 7 is a schematic diagram of a mask grid implemented in the present disclosure.
  • FIG. 8 is a schematic diagram of a full-image grid implemented in the present disclosure.
  • FIG. 9 is a schematic diagram of a grid offset estimation network implemented by the present disclosure.
  • FIG. 10 is a schematic diagram of a network structure implemented in the present disclosure.
  • FIG. 11 is a schematic diagram of the overall flow of the implementation of the present disclosure.
  • FIG. 12 is a flow chart of a neural network training method according to an embodiment of the present disclosure.
  • FIG. 13 is a block diagram of an image processing device/neural network training device according to an embodiment of the present disclosure.
  • first, second, third, etc. may be used in the present disclosure to describe various information, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of the present disclosure, first information may also be called second information, and similarly, second information may also be called first information. Depending on the context, the word “if” as used herein may be interpreted as “at” or “when” or “in response to a determination.”
  • f is the focal length of the image acquisition device
  • is the angle between the line segment formed by the imaging point and the optical center of the image acquisition device and the optical axis.
  • O is the optical center
  • A, B, and C are the points of the object in the physical space
  • A', B', and C' are the corresponding points of A, B, and C on the imaging plane
  • the line segment ab is located
  • the straight line is the optical axis. Due to the perspective projection effect, the imaging width of the BC surface of the object in the screen will change significantly with the viewing angle position, and the larger the viewing angle, the larger the imaging screen width (the length of B'C'), that is, B'C' is pulled stretching, resulting in distortion. This distortion due to the stretching effect of perspective projection is called perspective distortion.
  • the above-mentioned object is a human face
  • FIG. 2 it is a comparison diagram of the image before perspective projection (a) and the image after perspective projection (b). It can be seen that in the image before perspective projection (a), there is no perspective distortion on the face of person 202; in the image after perspective projection (b), there is perspective distortion on the face of person 202, and the face is stretched , does not look natural and realistic.
  • an embodiment of the present disclosure provides an image processing method, see FIG. 3 and FIG. 11 , the method includes:
  • Step 301 Acquire a target image; wherein, the target image has a preset type of distortion;
  • Step 302 Obtain the first image area including the target object in the target image, and perform transformation processing on the first image area through a preset first projection model, so as to reduce the impact of the target object on the first image area.
  • the target image may be acquired by an image acquisition device such as a camera, and there is a preset type of distortion in the target image, specifically, at least the target object in the target image has a preset type of distortion.
  • the preset type of distortion may be the distortion caused by the outline of the target object being stretched. In some embodiments, the preset type of distortion may be called perspective distortion.
  • the target image is an image captured by a wide-angle lens.
  • transformation processing may be performed on the first image region including the target object, so as to reduce the degree of the preset type of distortion in the first image region.
  • the first image area may be an image area corresponding to a mask of the target object, may also be an image area corresponding to a bounding box of the target object, or may be another type of image area including the target object.
  • the target object is a human face; in other embodiments, the target object may also be other objects that do not include a prominent straight line outline. For the case where the target object is a human face, the human face may also include areas such as hair and/or neck in addition to the area where the facial features are located.
  • the first projection model may include, but not limited to, a stereographic projection model. Since the stereographic projection model can reduce the influence of r in the imaging model by ⁇ changes, it can reduce the distortion of the preset type in the first image area. Degree.
  • the stereographic projection model of some embodiments is as follows:
  • Applying the stereographic projection model will cause the plane straight lines in the original field of view to become curved, and new distortions will appear.
  • Objects that do not include prominent straight outlines, such as faces are less sensitive to distortions such as flat straight lines becoming curved, and are more sensitive to stretching distortions, while objects that include prominent straight line outlines are less sensitive to distortions such as flat straight lines becoming curved. More sensitive to distortion and less sensitive to stretching distortion. Therefore, only the first image area including the target object can be transformed by applying the stereographic projection model, and the second projection model (for example, a perspective projection model) can be applied to the second image area other than the first image area. , thereby reducing the degree of the preset type of distortion in the second image region.
  • the spherical plane projection model can only be applied to the face area (the area in the dotted line box in the figure), and the background area (the area outside the dotted line box in the figure) Applies a perspective projection model.
  • acceptable imaging effects of the foreground face and the background can be maintained at the same time, which not only reduces the perspective distortion of the face, but also avoids obvious bending of the background.
  • the above-mentioned first projection model and second projection model can also adopt other types of projection models, as long as the degree of distortion of the preset type can be reduced, And the degree of the preset type of distortion after the image is transformed by the second projection model is higher than the degree of the preset type of distortion after the image is transformed by the first projection model.
  • the transformation process will cause the size and/or position of the first image region to change, therefore, the transformed first image region may be adjusted so that the transformed first image region The position and size of the image area are the same as before the transformation process, and then adjust the second image area in the target image except the first image area, so that the second image area is the same as the transformed image area.
  • the first image area fits.
  • the adjustment may include at least one adjustment manner among scale transformation and translation transformation.
  • the scale transformation is used to enlarge or reduce the first image area, so that the size of the first image area after scale transformation is the same as that before the first projection model transformation process.
  • the translation transformation is used to change the position of the first image area, so that the position of the first image area after the translation transformation is the same as that of the first image area before being processed by the first projection model transformation.
  • the mask of the first image region after transformation processing can be gridded to obtain the grid of the mask;
  • the grid of masks is translated and scaled. It is also possible to translate the grid of the second image area through the pre-acquired grid offset parameter, so that the second image area is adapted to the first image area, and the distance between the first image area and the second image area Natural transition.
  • the target image can be input into the face segmentation network for face segmentation, and the output of the network can be obtained after one forward propagation.
  • the target image may be an RGB image obtained by shooting with a large viewing angle.
  • the face segmentation network can output the mask of the face area, as shown in Figure 6.
  • the position and size of the face mask are the same as the face area (including hair and neck) in the target image.
  • Perform grid processing on the face mask to obtain the grid of the mask as shown in FIG. 7 . It is also possible to perform grid processing on the entire target image to obtain a grid of the entire target image, as shown in FIG. 8 .
  • the transformation parameters and grid offset parameters are obtained through a neural network.
  • the neural network can be pre-trained. Specifically, the initial transformation parameters and initial grid offset parameters output by the initial neural network can be obtained; the loss function of the initial neural network is determined based on the initial transformation parameters and the initial grid offset parameters; based on the loss function
  • the initial neural network is trained to obtain the neural network.
  • the neural network may be a convolutional neural network (Convolutional Neural Networks, CNN).
  • the training process of the neural network can use the gradient descent method, and the designed loss function can make CNN converge according to the rules of the loss function and learn the optimizer target.
  • the loss function is obtained based on at least one of the following: the first loss function Loss proj used to constrain the first projection model; the second loss function Loss line used to constrain the grid line; A third loss function Loss reg for shift distribution; a fourth loss function Loss edge for constraining grid boundaries.
  • the first projection model is a spherical projection model
  • the second projection model is a perspective projection model
  • the first loss function Loss proj is specifically used to ensure that the face part is reprojected according to the spherical plane, optimizing Face perspective projection distortion
  • the second loss function Loss line is specifically used to keep the grid lines straight to prevent background distortion
  • the third loss function Loss reg is specifically used to constrain the distribution of grid offset values to keep the grid as uniform as possible
  • the fourth loss function Loss edge is specifically used to constrain the boundary and prevent the mesh from shrinking too much.
  • the loss function may be obtained by performing a weighted average on the first loss function, the second loss function, the third loss function, and the fourth loss function based on preset weights.
  • the weight corresponding to the first loss function is greater than any one of the weight corresponding to the second loss function, the weight corresponding to the third loss function, and the weight corresponding to the fourth loss function.
  • the loss function L is specifically as follows:
  • ⁇ 1 , ⁇ 2 , ⁇ 3 and ⁇ 4 are respectively the weight corresponding to the first loss function, the weight corresponding to the second loss function, the weight corresponding to the third loss function and the fourth loss The corresponding weight of the function.
  • Loss proj is specifically as follows:
  • v i is the position of the grid point conforming to the stereographic projection
  • u i is the position of the grid point in the original perspective projection image (that is, the target image)
  • [s k ,t k ] is the transformation parameter, which represents the similarity transformation
  • ⁇ (s k ) is a regularization term, constraining s k to be within the controllable range of scale change
  • K represents the face instance space
  • F represents the grid point space within the face range.
  • a CNN-based instance segmentation method is used to segment the face (including hair and neck) and the background in the large-view photo, generate a face mask, and grid the original image grid and mask grid; Then apply the stereographic projection to the foreground face, keep the perspective projection unchanged in the background, and design the loss function in combination with the linear constraints, regularization constraints and boundary constraints; then use the gridded face mask as the input of CNN, and the output is in the original Offset from the graph grid base.
  • CNN parameters are optimized based on the designed self-supervised loss function; finally, in the testing phase, the CNN is added to the original image grid to obtain the optimized grid, and then according to the correspondence between the original image grid and the new grid relationship to obtain the optimized face distortion corrected image.
  • the network structure design of the embodiment of the present disclosure is shown in FIG. 9 and FIG. 10 .
  • the backbone network performs the underlying feature transformation based on the Fully Convolutional Networks (FCN), the network offset estimation branch (OffsetNet) estimates the grid offset parameters, and the transformation parameter estimation branch (ShapeNet) estimates the transformation parameters. Since the position and size of the face after adopting the stereographic projection model will be different from the original image, the transformation parameters are directly used as an estimated output in this disclosure to participate in the calculation of the Loss proj loss item. On the basis of the transformation parameters controlling the foreground of the face to maintain the spherical projection method, a small scale transformation and translation transformation are performed to keep the overall loss function value as small as possible.
  • the face instance segmentation network and the grid offset estimation network are integrated into one network, and the CNN hardware acceleration is used to improve the overall computing efficiency.
  • the face instance segmentation network and the grid offset estimation network will be directly connected in series, so that the grid offset after face distortion correction can be obtained with only one forward calculation.
  • the existing face distortion correction algorithm is based on the traditional iterative optimization method, and the optimization efficiency is low. Compared with the method of iterative optimization, the network optimization method of the embodiment of the present disclosure can effectively improve the calculation efficiency.
  • the mask grid can also be interpolated based on the pixel values of the pixels in the first image area, so that the mask The grid reverts to the first image region including color information.
  • interpolation processing may be performed on the grid of the second image area after transformation processing based on the pixel values of the pixel points in the second image area, so as to restore the grid of the second image area is the second image region that includes color information.
  • different projection models are used to perform projection transformation on the foreground and background of the face, and then the grid offsets of the foreground and background areas of the transformed face are obtained, and the grid coordinates of the original image are added to the offset value to obtain the optimized network. grid, and then interpolate the grid according to the pixel values of the original image to obtain the optimized image (that is, the output image).
  • the optimized image that is, the output image.
  • the CNN instance segmentation is performed on the input image including the face, that is, the face instance is segmented by the CNN to obtain the segmentation mask, that is, the mask of the face part.
  • the CNN grid offset estimation is performed to obtain the grid offset.
  • the grid of the original image is restored to obtain the human
  • the de-distorted grid of the face is then interpolated based on the input image, the grid obtained in the previous step, and the full-image grid to obtain a distortion-corrected output image.
  • the embodiment of the present disclosure also provides a neural network training method, the neural network is used to determine the first image region including the target object in the target image after being transformed by the preset first projection model transformation parameters, the transformation parameters are used to adjust the first image region after transformation processing, so that the position and size of the first image region after transformation processing are the same as those before transformation processing; the method includes :
  • Step 1201 Obtain the initial transformation parameters output by the initial neural network
  • Step 1202 Determine the loss function of the initial neural network based on the initial transformation parameters
  • Step 1203 Train the initial neural network based on the loss function to obtain the neural network.
  • the first projection model is used to reduce the degree of a preset type of distortion of the target object in the first image region.
  • the loss function is obtained based on at least one of the following: a first loss function used to constrain the first projection model; a second loss function used to constrain grid lines; A third loss function for the displacement distribution; a fourth loss function for constraining the grid boundaries.
  • the method further includes: performing a weighted average on the first loss function, the second loss function, the third loss function and the fourth loss function based on preset weights to obtain The loss function.
  • the weight corresponding to the first loss function is greater than any one of the weight corresponding to the second loss function, the weight corresponding to the third loss function, and the weight corresponding to the fourth loss function .
  • the neural network is also used to determine the grid offset parameters of the second image area in the target image except the first image area after being transformed by the preset second projection model , the grid offset parameter is used to adjust the transformed second image region, so that the second image region is adapted to the transformed first image region.
  • the determining the loss function of the initial neural network based on the initial transformation parameters includes: determining the initial grid offset parameter based on the initial transformation parameters and the initial neural network output The loss function of the neural network.
  • the initial neural network outputs the initial transformation parameters and initial grid offset parameters based on the mask of the first sample image region including the sample target object in the sample image and the grid of the sample image .
  • the target object includes a human face.
  • the sample image is an image captured by a wide-angle lens.
  • the first projection model is a perspective projection model
  • the second projection model is a stereographic projection model
  • An embodiment of the present disclosure also provides an image processing device, including a processor, and the processor is configured to perform the following steps:
  • the target image has a preset type of distortion
  • the processor is further configured to: use a preset second projection model to transform a second image area in the target image other than the first image area, so as to reduce the The degree of the distortion of the preset type in the second image area; wherein, the degree of the distortion of the preset type after the image is transformed by the second projection model is higher than that of the transformation by the first projection model The degree of distortion of the preset type after processing.
  • the first projection model is a stereographic projection model
  • the second projection model is a perspective projection model
  • the processor is further configured to: adjust the first image region after transformation processing, so that the position and size of the first image region after transformation processing are the same as those before transformation processing; Adjusting a second image area other than the first image area in the target image, so that the second image area adapts to the transformed first image area.
  • the processor is configured to: perform grid processing on the transformed mask of the first image region to obtain a grid of the mask;
  • the grid of the mask is translated and scale transformed;
  • the grid of the second image area is translated by the grid offset parameter acquired in advance.
  • the transformation parameters and grid offset parameters are obtained through a neural network.
  • the processor is further configured to: obtain initial transformation parameters and initial grid offset parameters output by the initial neural network; determine the initial neural network based on the initial transformation parameters and initial grid offset parameters A loss function; based on the loss function, the initial neural network is trained to obtain the neural network.
  • the loss function is obtained based on at least one of the following: a first loss function used to constrain the first projection model; a second loss function used to constrain grid lines; A third loss function for the displacement distribution; a fourth loss function for constraining the grid boundaries.
  • the processor is further configured to perform a weighted average on the first loss function, the second loss function, the third loss function and the fourth loss function based on preset weights , to get the loss function.
  • the weight corresponding to the first loss function is greater than any one of the weight corresponding to the second loss function, the weight corresponding to the third loss function, and the weight corresponding to the fourth loss function .
  • the processor is configured to: use the first projection model to transform the mask of the first image region to obtain the transformed mask of the first image region;
  • the processed mask in the first image area is subjected to grid processing to obtain a mask grid; based on the pixel values of the pixels in the first image area, an interpolation process is performed on the mask grid.
  • the processor is configured to: use the second projection model to transform the grid of the second image region to obtain the transformed grid of the second image region; based on the The pixel values of the pixels in the second image area are calculated, and interpolation processing is performed on the transformed grid of the second image area.
  • the target object includes a human face.
  • the target image is an image captured by a wide-angle lens.
  • An embodiment of the present disclosure also provides a neural network training device, including a processor, and the neural network is used to determine the transformation of the first image region including the target object in the target image after the transformation process is performed by the preset first projection model parameters, the transformation parameters are used to adjust the first image region after transformation processing, so that the position and size of the first image region after transformation processing are the same as those before transformation processing; the processor is configured to execute The following steps:
  • the initial neural network is trained based on the loss function to obtain the neural network.
  • the loss function is obtained based on at least one of the following: a first loss function used to constrain the first projection model; a second loss function used to constrain grid lines; A third loss function for the displacement distribution; a fourth loss function for constraining the grid boundaries.
  • the processor is further configured to perform a weighted average on the first loss function, the second loss function, the third loss function and the fourth loss function based on preset weights , to get the loss function.
  • the weight corresponding to the first loss function is greater than any one of the weight corresponding to the second loss function, the weight corresponding to the third loss function, and the weight corresponding to the fourth loss function .
  • the neural network is also used to determine the grid offset parameters of the second image area in the target image except the first image area after being transformed by the preset second projection model , the grid offset parameter is used to adjust the transformed second image region, so that the second image region is adapted to the transformed first image region.
  • the processor is configured to: determine a loss function of the initial neural network based on the initial transformation parameters and an initial grid offset parameter output by the initial neural network.
  • the initial neural network outputs the initial transformation parameters and initial grid offset parameters based on the mask of the first sample image region including the sample target object in the sample image and the grid of the sample image .
  • the target object includes a human face.
  • the sample image is an image captured by a wide-angle lens.
  • the first projection model is a perspective projection model
  • the second projection model is a stereographic projection model
  • FIG. 13 shows a schematic diagram of the hardware structure of a more specific image processing device and/or neural network training device provided by an embodiment of the present disclosure.
  • the device may include: a processor 1301, a memory 1302, and an input/output interface 1303 , a communication interface 1304 and a bus 1305.
  • the processor 1301 , the memory 1302 , the input/output interface 1303 and the communication interface 1304 are connected to each other within the device through the bus 1305 .
  • the processor 1301 may be implemented by a general-purpose CPU (Central Processing Unit, central processing unit), a microprocessor, an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and is used to execute related programs to realize the technical solutions provided by the embodiments of this specification.
  • a general-purpose CPU Central Processing Unit, central processing unit
  • a microprocessor an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and is used to execute related programs to realize the technical solutions provided by the embodiments of this specification.
  • ASIC Application Specific Integrated Circuit
  • the memory 1302 can be implemented in the form of ROM (Read Only Memory, read-only memory), RAM (Random Access Memory, random access memory), static storage device, dynamic storage device, etc.
  • the memory 1302 can store operating systems and other application programs. When implementing the technical solutions provided by the embodiments of this specification through software or firmware, the relevant program codes are stored in the memory 1302 and invoked by the processor 1301 for execution.
  • the input/output interface 1303 is used to connect the input/output module to realize information input and output.
  • the input/output/module can be configured in the device as a component (not shown in the figure), or can be externally connected to the device to provide corresponding functions.
  • the input device may include a keyboard, mouse, touch screen, microphone, various sensors, etc.
  • the output device may include a display, a speaker, a vibrator, an indicator light, and the like.
  • the communication interface 1304 is used to connect a communication module (not shown in the figure), so as to realize the communication interaction between the device and other devices.
  • the communication module can realize communication through wired means (such as USB, network cable, etc.), and can also realize communication through wireless means (such as mobile network, WIFI, Bluetooth, etc.).
  • Bus 1305 includes a path for transferring information between the various components of the device (eg, processor 1301, memory 1302, input/output interface 1303, and communication interface 1304).
  • the above device only shows the processor 1301, the memory 1302, the input/output interface 1303, the communication interface 1304, and the bus 1305, in the specific implementation process, the device may also include other components.
  • the above-mentioned device may only include components necessary to implement the solutions of the embodiments of this specification, and does not necessarily include all the components shown in the figure.
  • An embodiment of the present disclosure further provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the steps performed by the second processing unit in the method described in any of the preceding embodiments are implemented.
  • Computer-readable media including both permanent and non-permanent, removable and non-removable media, can be implemented by any method or technology for storage of information.
  • Information may be computer readable instructions, data structures, modules of a program, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Flash memory or other memory technology, Compact Disc Read-Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cartridge, tape magnetic disk storage or other magnetic storage device or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • computer-readable media excludes transitory computer-readable media, such as modulated data signals and carrier waves.
  • a typical implementing device is a computer, which may take the form of a personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media player, navigation device, e-mail device, game control device, etc. desktops, tablets, wearables, or any combination of these.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Processing (AREA)

Abstract

本公开实施例提供一种图像处理及神经网络的训练方法和装置,通过预设的第一投影模型对目标图像中包括目标对象的第一图像区域进行变换处理,从而降低目标对象在第一图像区域中预设类型的畸变的程度,减轻目标图像中目标对象的失真,使目标对象更加自然真实。

Description

图像处理及神经网络的训练方法和装置 技术领域
本公开涉及图像处理技术领域,尤其涉及一种图像处理及神经网络的训练方法和装置。
背景技术
图像中往往存在着预设类型的畸变,使得图像中的一些对象看起来不够真实自然。因此,需要对图像进行畸变矫正。然而,传统的图像矫正方式的矫正效果较差。
发明内容
第一方面,本公开实施例提供一种图像处理方法,所述方法包括:获取目标图像;其中,所述目标图像存在预设类型的畸变;获取所述目标图像中包括目标对象的第一图像区域,通过预设的第一投影模型对所述第一图像区域进行变换处理,以降低所述目标对象在所述第一图像区域中所述预设类型的畸变的程度。
第二方面,本公开实施例提供一种神经网络的训练方法,所述神经网络用于确定目标图像中包括目标对象的第一图像区域经预设的第一投影模型进行变换处理后的变换参数,所述变换参数用于对变换处理后的所述第一图像区域进行调整,以使变换处理后的所述第一图像区域的位置和尺寸与变换处理前相同;所述方法包括:获取初始神经网络输出的初始变换参数;基于所述初始变换参数确定所述初始神经网络的损失函数;基于所述损失函数对所述初始神经网络进行训练,得到所述神经网络。
第三方面,本公开实施例提供一种图像处理装置,包括处理器,所述处理器用于执行以下步骤:获取目标图像;其中,所述目标图像存在预设类型的畸变;获取所述目标图像中包括目标对象的第一图像区域,通过预设的第一投影模型对所述第一图像区域进行变换处理,以降低所述目标对象在所述第一图像区域中所述预设类型的畸变的程度。
第四方面,本公开实施例提供一种神经网络的训练装置,包括处理器,所述神经网络用于确定目标图像中包括目标对象的第一图像区域经预设的第一投影模型进行变 换处理后的变换参数,所述变换参数用于对变换处理后的所述第一图像区域进行调整,以使变换处理后的所述第一图像区域的位置和尺寸与变换处理前相同;所述处理器用于执行以下步骤:获取初始神经网络输出的初始变换参数;基于所述初始变换参数确定所述初始神经网络的损失函数;基于所述损失函数对所述初始神经网络进行训练,得到所述神经网络。
第五方面,本公开实施例提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现第一方面所述的方法。
本公开实施例中,通过预设的第一投影模型对目标图像中包括目标对象的第一图像区域进行变换处理,从而能够有效降低目标对象在第一图像区域中预设类型的畸变的程度,减轻目标图像中目标对象的失真,使目标对象更加自然真实。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。
附图说明
为了更清楚地说明本公开实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是一些实施例的透视投影效果的示意图。
图2是一些实施例的不存在预设类型的畸变的图像与存在在预设类型的畸变的图像的对比图。
图3是本公开实施例的图像处理方法的流程图。
图4是本公开实施例采用的投影模型的示意图。
图5是本公开实施的掩膜获取过程的示意图。
图6是本公开实施的掩膜的示意图。
图7是本公开实施的掩膜网格的示意图。
图8是本公开实施的全图网格的示意图。
图9是本公开实施的网格偏移估计网络的示意图。
图10是本公开实施的网络结构的示意图。
图11是本公开实施的整体流程的示意图。
图12是本公开实施例的神经网络的训练方法的流程图。
图13是本公开实施例的图像处理装置/神经网络的训练装置的框图。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。
在本公开使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本公开。在本公开说明书和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。
应当理解,尽管在本公开可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本公开范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。
在一些情况下,例如,在通过广角镜头拍摄的图像中,往往存在一定程度的畸变。为了减轻图像的畸变程度,会对整幅图像进行畸变矫正。相关技术中,通常采用透视投影模型对整图进行去畸变处理,去畸变图像将服从如下所示的透视投影模型:
r=f*tan(θ);
其中,f为图像采集装置的焦距,θ为图像采集装置的成像点与光心这两点所构成的线段与光轴的夹角。
参见图1,O是光心,A、B和C分别是物体在物理空间中的点,A'、B'和C'分别是A、B和C在成像平面上对应的点,线段ab所在的直线为光轴。由于透视投影效 果,物体的BC面在画面中的成像宽度会随所处的视角位置发生明显变化,且视角越大成像画面宽度(B'C'的长度)越大,即B'C'被拉伸,从而导致畸变。这种因透视投影的拉伸效果而导致的畸变称为透视畸变。当上述物体是人脸时,将出现人脸被拉伸的成像效果。如图2所示,是透视投影前的图像(a)与透视投影后的图像(b)的对比图。可以看出,在透视投影前的图像中(a),人物202的人脸不存在透视畸变;在透视投影后的图像(b)中,人物202的人脸存在透视畸变,人脸被拉伸,看起来不够自然真实。
由于传统的图像畸变矫正方法并不针对透视畸变问题,大视角图像中的透视畸变在大多数产品中并未被很好地解决。随着搭载广角镜头的消费级摄影摄像产品(例如,手机和运动相机等)的普及,适用于大视场角(Field of View,FoV)下采集的图像的畸变矫正方法具有很高的需求。然而由于当前的成像方法所限,在大FOV下图像畸变问题难以被完全解决,相关技术中的畸变矫正方法的畸变矫正效果较差。
基于此,本公开实施例提供一种图像处理方法,参见图3和图11,所述方法包括:
步骤301:获取目标图像;其中,所述目标图像存在预设类型的畸变;
步骤302:获取所述目标图像中包括目标对象的第一图像区域,通过预设的第一投影模型对所述第一图像区域进行变换处理,以降低所述目标对象在所述第一图像区域中所述预设类型的畸变的程度。
在步骤301中,目标图像可以由相机等图像采集装置采集得到,所述目标图像中存在预设类型的畸变,具体来说,至少包括所述目标图像中的目标对象存在预设类型的畸变。所述预设类型的畸变可以是目标对象的轮廓被拉伸而导致的畸变。在一些实施例中,所述预设类型的畸变可以称为透视畸变。在一些实施例中,所述目标图像为通过广角镜头拍摄得到的图像。
在步骤302中,可以对包括目标对象的第一图像区域进行变换处理,从而降低第一图像区域中所述预设类型的畸变的程度。所述第一图像区域可以是所述目标对象的掩膜对应的图像区域,也可以是所述目标对象的包围盒对应的图像区域,或者是包括所述目标对象的其他类型的图像区域。在一些实施例中,所述目标对象是人脸;在其他实施例中,所述目标对象也可以是其他不包括显著的直线轮廓的对象。对于目标对象是人脸的情况,所述人脸除了五官所在区域之外,还可以包括头发和/或脖颈等区域。
所述第一投影模型可以包括但不限于球极平面投影模型,由于球极平面投影模型 能够降低成像模型中r受θ变化的影响,因此能够降低第一图像区域中所述预设类型的畸变的程度。一些实施例的球极平面投影模型如下:
r=2f*tan(θ/2)
应用球极平面投影模型会导致原本视野中的平面直线变弯曲,出现新的失真。人脸等不包括显著的直线轮廓的对象对平面直线变弯曲这种失真的敏感程度较低,对拉伸失真的敏感程度较高,而包括显著的直线轮廓的对象对平面直线变弯曲这种失真的敏感程度较高,对拉伸失真的敏感程度较低。因此,可以只针对包括目标对象的第一图像区域应用球极平面投影模型进行变换处理,而对第一图像区域以外的第二图像区域应用第二投影模型(例如,透视投影模型)进行变换处理,从而减少第二图像区域中所述预设类型的畸变的程度。如图4所示,对于包括人脸的图像来说,可以只针对人脸区域(图中虚线框中的区域)应用球极平面投影模型,而对背景区域(图中虚线框以外的区域)应用透视投影模型。这样,可以同时保持可接受的前景人脸的成像以及背景的成像效果,既降低人脸的透视畸变,也避免背景发生明显弯曲。
应当说明的是,上述第一投影模型和第二投影模型除了球极平面投影模型和透视投影模型之外,还可以采用其他类型的投影模型,只要能够降低所述预设类型的畸变的程度,且图像经所述第二投影模型进行变换处理后所述预设类型的畸变的程度高于经所述第一投影模型进行变换处理后所述预设类型的畸变的程度。
在一些实施例中,变换处理会导致第一图像区域的尺寸和/或位置发生变化,因此,可以对变换处理后的所述第一图像区域进行调整,以使变换处理后的所述第一图像区域的位置和尺寸与变换处理前相同,再对所述目标图像中除所述第一图像区域以外的第二图像区域进行调整,以使所述第二图像区域与变换处理后的所述第一图像区域相适应。
所述调整可以包括尺度变换和平移变换中的至少一种调整方式。尺度变换用于对第一图像区域进行放大或缩小,以使第一图像区域经尺度变换后的大小与第一图像区域经第一投影模型变换处理前相同。平移变换用于改变第一图像区域的位置,以使第一图像区域平移变换后的位置与第一图像区域经第一投影模型变换处理前相同。
参见图5、图6、图7和图8,可以对变换处理后的所述第一图像区域的掩膜进行网格化处理,得到所述掩膜的网格;通过预先获取的变换参数对所述掩膜的网格进行平移和尺度变换。还可以通过预先获取的网格偏移参数对所述第二图像区域的网格进 行平移,以使第二图像区域与第一图像区域相适应,使第一图像区域与第二图像区域之间自然过渡。
下面以人脸图像为例,对本公开实施例的方案进行说明。如图5所示,可以将目标图像输入人脸分割网络进行人脸分割,一次前向传播后得到网络的输出。所述目标图像可以是通过大视角拍摄得到的RGB图像。人脸分割网络可以输出人脸区域的掩膜,如图6所示。人脸掩膜的位置和尺寸与目标图像中人脸区域(包括头发和脖颈)相同。对人脸掩膜进行网格化处理,得到所述掩膜的网格,如图7所示。还可以对目标图像进行整图网格化处理,得到整个目标图像的网格,如图8所示。
在一些实施例中,所述变换参数和网格偏移参数通过神经网络获取。所述神经网络可以预先训练得到。具体来说,可以获取初始神经网络输出的初始变换参数和初始网格偏移参数;基于所述初始变换参数和初始网格偏移参数确定所述初始神经网络的损失函数;基于所述损失函数对所述初始神经网络进行训练,得到所述神经网络。其中,所述神经网络可以是卷积神经网络(Convolutional Neural Networks,CNN)。神经网络的训练过程可以采用梯度下降法,设计好的损失函数能使CNN按损失函数规则收敛,学习优化器目标。
其中,所述损失函数基于以下至少一者得到:用于约束所述第一投影模型的第一损失函数Loss proj;用于约束网格直线的第二损失函数Loss line;用于约束网格偏移值分布的第三损失函数Loss reg;用于约束网格边界的第四损失函数Loss edge。在目标图像为人脸图像,第一投影模型为球极平面投影模型,第二投影模型为透视投影模型的情况下,第一损失函数Loss proj具体用于保证人脸部分按球极平面重投影,优化人脸透视投影畸变;第二损失函数Loss line具体用于使网格线保持直线,防止背景扭曲;第三损失函数Loss reg具体用于约束网格偏移值分布,使网格尽量保持均匀;第四损失函数Loss edge具体用于约束边界,防止网格过分收缩。
可以基于预设的权重对所述第一损失函数、所述第二损失函数、所述第三损失函数和所述第四损失函数进行加权平均,得到所述损失函数。其中,所述第一损失函数对应的权重大于所述第二损失函数对应的权重、所述第三损失函数对应的权重以及所述第四损失函数对应的权重中的任一者。所述损失函数L具体如下:
L=α 1Loss proj2Loss line+alpha 3Loss reg4Loss edge
其中,α 1、α 2、α 3和α 4分别为所述第一损失函数对应的权重、所述第二损失函数 对应的权重、所述第三损失函数对应的权重和所述第四损失函数对应的权重。
在一些实施例中,Loss proj具体如下:
Figure PCTCN2021114403-appb-000001
其中,v i为符合球极平面投影的网格点位置;u i为原透视投影图像(即目标图像)中网格点的位置;[s k,t k]为变换参数,表示相似性变换;λ(s k)为正则化项,约束s k在尺度变化可控范围,K表示人脸实例空间,F表示人脸范围内的网格点空间。
本公开实施例首先利用基于CNN的实例分割方法分割大视角照片中的人脸(含头发和脖颈)和背景,生成人脸掩模,并网格化得到原图网格和掩模网格;再针对前景人脸应用球极平面投影、背景保持透视投影不变,结合直线约束、正则化约束和边界约束,设计损失函数;接着将网格化的人脸掩模作为CNN输入,输出为在原图网格基础上的偏移。在训练阶段,基于设计好的自监督损失函数优化CNN参数;最后在测试阶段,将CNN的与原图网格相加得到优化后的网格,再根据原图网格和新网格的对应关系,获取优化后的人脸畸变矫正图像。
是本公开实施例的网络结构设计如图9和图10所示。其中,主干网络基于全卷积网络(Fully Convolutional Networks,FCN)进行底层特征变换,网络偏移估计支路(OffsetNet)估计网格偏移参数,变换参数估计支路(ShapeNet)估计变换参数。由于采用球极平面投影模型后人脸的位置和大小均与原图相比会有所差异,因此本公开中直接将变换参数作为一个估计输出,参与Loss proj损失项的计算。在变换参数控制人脸前景保持球极平面投影方式的基础上,进行小幅度的尺度变换和平移变换,以使得整体的损失函数值保持尽量小。在部署阶段,综合人脸实例分割网络和网格偏移估计网络到一个网络中,利用CNN硬件加速,能够提高整体计算效率。
在部署阶段,人脸实例分割网络和网格偏移估计网络将直接串联,使得人脸畸变矫正后的网格偏移仅需一次前向计算即可得到。现有的人脸畸变矫正算法是基于传统的迭代优化方法,优化效率较低。本公开实施例的网络优化方法相比于迭代优化的方式可有效提高计算效率。
在得到变换处理后的所述第一图像区域的掩膜网格之后,还可以基于所述第一图像区域的像素点的像素值,对所述掩膜网格进行插值处理,从而将掩膜网格恢复为包括颜色信息的第一图像区域。
在一些实施例中,还可以基于所述第二图像区域的像素点的像素值,对变换处理后的所述第二图像区域的网格进行插值处理,从而将第二图像区域的网格恢复为包括颜色信息的第二图像区域。
本公开实施例先对人脸前景和背景采用不同的投影模型进行投影变换,再获取变换后人脸前景和背景区域的网格偏移,对原图像网格坐标加上偏移值获得优化网格,再根据原图像的像素值对网格进行插值,得到优化后的图像(即输出图像),相比于传统的人脸畸变矫正方法,能够有效降低畸变矫正过程中的处理流程复杂度。
参见图11,是本公开一些实施例的整体流程图。首先对包括人脸的输入图像进行CNN实例分割,即通过CNN对人脸实例进行分割,得到分割掩膜,即人脸部分的掩膜。再基于分割掩膜以及原图网格化处理后得到的原图网格进行CNN网格偏移估计,得到网格偏移,基于网格偏移对原图网格进行网格恢复,得到人脸的去畸变网格,再基于输入图像、上一步得到的网格以及全图网格进行图像插值,从而得到畸变矫正后的输出图像。
如图12所示,本公开实施例还提供一种神经网络的训练方法,所述神经网络用于确定目标图像中包括目标对象的第一图像区域经预设的第一投影模型进行变换处理后的变换参数,所述变换参数用于对变换处理后的所述第一图像区域进行调整,以使变换处理后的所述第一图像区域的位置和尺寸与变换处理前相同;所述方法包括:
步骤1201:获取初始神经网络输出的初始变换参数;
步骤1202:基于所述初始变换参数确定所述初始神经网络的损失函数;
步骤1203:基于所述损失函数对所述初始神经网络进行训练,得到所述神经网络。
其中,所述第一投影模型用于降低目标对象在所述第一图像区域中预设类型的畸变的程度。所述变换处理的具体细节详见前述图像处理方法的实施例,此处不再赘述。
在一些实施例中,所述损失函数基于以下至少一者得到:用于约束所述第一投影模型的第一损失函数;用于约束网格直线的第二损失函数;用于约束网格偏移值分布的第三损失函数;用于约束网格边界的第四损失函数。
在一些实施例中,所述方法还包括:基于预设的权重对所述第一损失函数、所述第二损失函数、所述第三损失函数和所述第四损失函数进行加权平均,得到所述损失函数。
在一些实施例中,所述第一损失函数对应的权重大于所述第二损失函数对应的权重、所述第三损失函数对应的权重以及所述第四损失函数对应的权重中的任一者。
在一些实施例中,所述神经网络还用于确定所述目标图像中除所述第一图像区域以外的第二图像区域经预设的第二投影模型进行变换处理后的网格偏移参数,所述网格偏移参数用于对变换处理后的所述第二图像区域进行调整,以使所述第二图像区域与变换处理后的所述第一图像区域相适应。
在一些实施例中,所述基于所述初始变换参数确定所述初始神经网络的损失函数,包括:基于所述初始变换参数和所述初始神经网络输出的初始网格偏移参数确定所述初始神经网络的损失函数。
在一些实施例中,所述初始神经网络基于样本图像中包括样本目标对象的第一样本图像区域的掩膜以及所述样本图像的网格输出所述初始变换参数和初始网格偏移参数。
在一些实施例中,所述目标对象包括人脸。
在一些实施例中,所述样本图像为通过广角镜头拍摄得到的图像。
在一些实施例中,所述第一投影模型为透视投影模型,所述第二投影模型为球极平面投影模型。
本公开实施例还提供一种图像处理装置,包括处理器,所述处理器用于执行以下步骤:
获取目标图像;其中,所述目标图像存在预设类型的畸变;
获取所述目标图像中包括目标对象的第一图像区域,通过预设的第一投影模型对所述第一图像区域进行变换处理,以降低所述目标对象在所述第一图像区域中所述预设类型的畸变的程度。
在一些实施例中,所述处理器还用于:通过预设的第二投影模型对所述目标图像中除所述第一图像区域以外的第二图像区域进行变换处理,以降低所述第二图像区域中所述预设类型的所述畸变的程度;其中,图像经所述第二投影模型进行变换处理后所述预设类型的畸变的程度高于经所述第一投影模型进行变换处理后所述预设类型的畸变的程度。
在一些实施例中,所述第一投影模型为球极平面投影模型,所述第二投影模型为 透视投影模型。
在一些实施例中,所述处理器还用于:对变换处理后的所述第一图像区域进行调整,以使变换处理后的所述第一图像区域的位置和尺寸与变换处理前相同;对所述目标图像中除所述第一图像区域以外的第二图像区域进行调整,以使所述第二图像区域与变换处理后的所述第一图像区域相适应。
在一些实施例中,所述处理器用于:对变换处理后的所述第一图像区域的掩膜进行网格化处理,得到所述掩膜的网格;通过预先获取的变换参数对所述掩膜的网格进行平移和尺度变换;通过预先获取的网格偏移参数对所述第二图像区域的网格进行平移。
在一些实施例中,所述变换参数和网格偏移参数通过神经网络获取。
在一些实施例中,所述处理器还用于:获取初始神经网络输出的初始变换参数和初始网格偏移参数;基于所述初始变换参数和初始网格偏移参数确定所述初始神经网络的损失函数;基于所述损失函数对所述初始神经网络进行训练,得到所述神经网络。
在一些实施例中,所述损失函数基于以下至少一者得到:用于约束所述第一投影模型的第一损失函数;用于约束网格直线的第二损失函数;用于约束网格偏移值分布的第三损失函数;用于约束网格边界的第四损失函数。
在一些实施例中,所述处理器还用于:基于预设的权重对所述第一损失函数、所述第二损失函数、所述第三损失函数和所述第四损失函数进行加权平均,得到所述损失函数。
在一些实施例中,所述第一损失函数对应的权重大于所述第二损失函数对应的权重、所述第三损失函数对应的权重以及所述第四损失函数对应的权重中的任一者。
在一些实施例中,所述处理器用于:通过所述第一投影模型对所述第一图像区域的掩膜进行变换处理,得到变换处理后的所述第一图像区域的掩膜;对变换处理后的所述第一图像区域的掩膜进行网格化处理,得到掩膜网格;基于所述第一图像区域的像素点的像素值,对所述掩膜网格进行插值处理。
在一些实施例中,所述处理器用于:通过所述第二投影模型对所述第二图像区域的网格进行变换处理,得到变换处理后的所述第二图像区域的网格;基于所述第二图像区域的像素点的像素值,对变换处理后的所述第二图像区域的网格进行插值处理。
在一些实施例中,所述目标对象包括人脸。
在一些实施例中,所述目标图像为通过广角镜头拍摄得到的图像。
本公开实施例还提供一种神经网络的训练装置,包括处理器,所述神经网络用于确定目标图像中包括目标对象的第一图像区域经预设的第一投影模型进行变换处理后的变换参数,所述变换参数用于对变换处理后的所述第一图像区域进行调整,以使变换处理后的所述第一图像区域的位置和尺寸与变换处理前相同;所述处理器用于执行以下步骤:
获取初始神经网络输出的初始变换参数;
基于所述初始变换参数确定所述初始神经网络的损失函数;
基于所述损失函数对所述初始神经网络进行训练,得到所述神经网络。
在一些实施例中,所述损失函数基于以下至少一者得到:用于约束所述第一投影模型的第一损失函数;用于约束网格直线的第二损失函数;用于约束网格偏移值分布的第三损失函数;用于约束网格边界的第四损失函数。
在一些实施例中,所述处理器还用于:基于预设的权重对所述第一损失函数、所述第二损失函数、所述第三损失函数和所述第四损失函数进行加权平均,得到所述损失函数。
在一些实施例中,所述第一损失函数对应的权重大于所述第二损失函数对应的权重、所述第三损失函数对应的权重以及所述第四损失函数对应的权重中的任一者。
在一些实施例中,所述神经网络还用于确定所述目标图像中除所述第一图像区域以外的第二图像区域经预设的第二投影模型进行变换处理后的网格偏移参数,所述网格偏移参数用于对变换处理后的所述第二图像区域进行调整,以使所述第二图像区域与变换处理后的所述第一图像区域相适应。
在一些实施例中,所述处理器用于:基于所述初始变换参数和所述初始神经网络输出的初始网格偏移参数确定所述初始神经网络的损失函数。
在一些实施例中,所述初始神经网络基于样本图像中包括样本目标对象的第一样本图像区域的掩膜以及所述样本图像的网格输出所述初始变换参数和初始网格偏移参数。
在一些实施例中,所述目标对象包括人脸。
在一些实施例中,所述样本图像为通过广角镜头拍摄得到的图像。
在一些实施例中,所述第一投影模型为透视投影模型,所述第二投影模型为球极平面投影模型。
图13示出了本公开实施例所提供的一种更为具体的图像处理装置和/或神经网络的训练装置硬件结构示意图,该设备可以包括:处理器1301、存储器1302、输入/输出接口1303、通信接口1304和总线1305。其中处理器1301、存储器1302、输入/输出接口1303和通信接口1304通过总线1305实现彼此之间在设备内部的通信连接。
处理器1301可以采用通用的CPU(Central Processing Unit,中央处理器)、微处理器、应用专用集成电路(Application Specific Integrated Circuit,ASIC)、或者一个或多个集成电路等方式实现,用于执行相关程序,以实现本说明书实施例所提供的技术方案。
存储器1302可以采用ROM(Read Only Memory,只读存储器)、RAM(Random Access Memory,随机存取存储器)、静态存储设备,动态存储设备等形式实现。存储器1302可以存储操作系统和其他应用程序,在通过软件或者固件来实现本说明书实施例所提供的技术方案时,相关的程序代码保存在存储器1302中,并由处理器1301来调用执行。
输入/输出接口1303用于连接输入/输出模块,以实现信息输入及输出。输入输出/模块可以作为组件配置在设备中(图中未示出),也可以外接于设备以提供相应功能。其中输入设备可以包括键盘、鼠标、触摸屏、麦克风、各类传感器等,输出设备可以包括显示器、扬声器、振动器、指示灯等。
通信接口1304用于连接通信模块(图中未示出),以实现本设备与其他设备的通信交互。其中通信模块可以通过有线方式(例如USB、网线等)实现通信,也可以通过无线方式(例如移动网络、WIFI、蓝牙等)实现通信。
总线1305包括一通路,在设备的各个组件(例如处理器1301、存储器1302、输入/输出接口1303和通信接口1304)之间传输信息。
需要说明的是,尽管上述设备仅示出了处理器1301、存储器1302、输入/输出接口1303、通信接口1304以及总线1305,但是在具体实施过程中,该设备还可以包括实现正常运行所必需的其他组件。此外,本领域的技术人员可以理解的是,上述设备中也可以仅包含实现本说明书实施例方案所必需的组件,而不必包含图中所示的全部 组件。
本公开实施例还提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现前述任一实施例所述的方法中由第二处理单元执行的步骤。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
通过以上的实施方式的描述可知,本领域的技术人员可以清楚地了解到本说明书实施例可借助软件加必需的通用硬件平台的方式来实现。基于这样的理解,本说明书实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本说明书实施例各个实施例或者实施例的某些部分所述的方法。
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机,计算机的具体形式可以是个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件收发设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任意几种设备的组合。
以上实施例中的各种技术特征可以任意进行组合,只要特征之间的组合不存在冲突或矛盾,但是限于篇幅,未进行一一描述,因此上述实施方式中的各种技术特征的任意进行组合也属于本公开的范围。
本领域技术人员在考虑公开及实践这里公开的说明书后,将容易想到本公开的其它实施方案。本公开旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的 公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由下面的权利要求指出。
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。
以上所述仅为本公开的较佳实施例而已,并不用以限制本公开,凡在本公开的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本公开保护的范围之内。

Claims (49)

  1. 一种图像处理方法,其特征在于,所述方法包括:
    获取目标图像;其中,所述目标图像存在预设类型的畸变;
    获取所述目标图像中包括目标对象的第一图像区域,通过预设的第一投影模型对所述第一图像区域进行变换处理,以降低所述目标对象在所述第一图像区域中所述预设类型的畸变的程度。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    通过预设的第二投影模型对所述目标图像中除所述第一图像区域以外的第二图像区域进行变换处理,以降低所述第二图像区域中所述预设类型的畸变的程度;
    其中,图像经所述第二投影模型进行变换处理后所述预设类型的畸变的程度高于经所述第一投影模型进行变换处理后所述预设类型的畸变的程度。
  3. 根据权利要求2所述的方法,其特征在于,所述第一投影模型为球极平面投影模型,所述第二投影模型为透视投影模型。
  4. 根据权利要求1-3任一所述的方法,其特征在于,所述方法还包括:
    对变换处理后的所述第一图像区域进行调整,以使变换处理后的所述第一图像区域的位置和尺寸与变换处理前相同;
    对所述目标图像中除所述第一图像区域以外的第二图像区域进行调整,以使所述第二图像区域与变换处理后的所述第一图像区域相适应。
  5. 根据权利要求4所述的方法,其特征在于,所述对变换处理后的所述第一图像区域进行调整,包括:
    对变换处理后的所述第一图像区域的掩膜进行网格化处理,得到所述掩膜的网格;
    通过预先获取的变换参数对所述掩膜的网格进行平移和尺度变换;
    所述对所述目标图像中除所述第一图像区域以外的第二图像区域进行调整,包括:
    通过预先获取的网格偏移参数对所述第二图像区域的网格进行平移。
  6. 根据权利要求5所述的方法,其特征在于,所述变换参数和网格偏移参数通过神经网络获取。
  7. 根据权利要求6所述的方法,其特征在于,所述方法还包括:
    获取初始神经网络输出的初始变换参数和初始网格偏移参数;
    基于所述初始变换参数和初始网格偏移参数确定所述初始神经网络的损失函数;
    基于所述损失函数对所述初始神经网络进行训练,得到所述神经网络。
  8. 根据权利要求7所述的方法,其特征在于,所述损失函数基于以下至少一者得到:
    用于约束所述第一投影模型的第一损失函数;
    用于约束网格直线的第二损失函数;
    用于约束网格偏移值分布的第三损失函数;
    用于约束网格边界的第四损失函数。
  9. 根据权利要求8所述的方法,其特征在于,所述方法还包括:
    基于预设的权重对所述第一损失函数、所述第二损失函数、所述第三损失函数和所述第四损失函数进行加权平均,得到所述损失函数。
  10. 根据权利要求9所述的方法,其特征在于,所述第一损失函数对应的权重大于所述第二损失函数对应的权重、所述第三损失函数对应的权重以及所述第四损失函数对应的权重中的任一者。
  11. 根据权利要求1所述的方法,其特征在于,所述通过预设的第一投影模型对所述第一图像区域进行变换处理,包括:
    通过所述第一投影模型对所述第一图像区域的掩膜进行变换处理,得到变换处理后的所述第一图像区域的掩膜;
    对变换处理后的所述第一图像区域的掩膜进行网格化处理,得到掩膜网格;
    所述方法还包括:
    基于所述第一图像区域的像素点的像素值,对所述掩膜网格进行插值处理。
  12. 根据权利要求2所述的方法,其特征在于,所述通过预设的第二投影模型对所述目标图像中除所述第一图像区域以外的第二图像区域进行变换处理,包括:
    通过所述第二投影模型对所述第二图像区域的网格进行变换处理,得到变换处理后的所述第二图像区域的网格;
    所述方法还包括:
    基于所述第二图像区域的像素点的像素值,对变换处理后的所述第二图像区域的网格进行插值处理。
  13. 根据权利要求1所述的方法,其特征在于,所述目标对象包括人脸。
  14. 根据权利要求1所述的方法,其特征在于,所述目标图像为通过广角镜头拍摄得到的图像。
  15. 一种神经网络的训练方法,其特征在于,所述神经网络用于确定目标图像中包括目标对象的第一图像区域经预设的第一投影模型进行变换处理后的变换参数,所 述变换参数用于对变换处理后的所述第一图像区域进行调整,以使变换处理后的所述第一图像区域的位置和尺寸与变换处理前相同;所述方法包括:
    获取初始神经网络输出的初始变换参数;
    基于所述初始变换参数确定所述初始神经网络的损失函数;
    基于所述损失函数对所述初始神经网络进行训练,得到所述神经网络。
  16. 根据权利要求15所述的方法,其特征在于,所述损失函数基于以下至少一者得到:
    用于约束所述第一投影模型的第一损失函数;
    用于约束网格直线的第二损失函数;
    用于约束网格偏移值分布的第三损失函数;
    用于约束网格边界的第四损失函数。
  17. 根据权利要求16所述的方法,其特征在于,所述方法还包括:
    基于预设的权重对所述第一损失函数、所述第二损失函数、所述第三损失函数和所述第四损失函数进行加权平均,得到所述损失函数。
  18. 根据权利要求17所述的方法,其特征在于,所述第一损失函数对应的权重大于所述第二损失函数对应的权重、所述第三损失函数对应的权重以及所述第四损失函数对应的权重中的任一者。
  19. 根据权利要求15所述的方法,其特征在于,所述神经网络还用于确定所述目标图像中除所述第一图像区域以外的第二图像区域经预设的第二投影模型进行变换处理后的网格偏移参数,所述网格偏移参数用于对变换处理后的所述第二图像区域进行调整,以使所述第二图像区域与变换处理后的所述第一图像区域相适应。
  20. 根据权利要求19所述的方法,其特征在于,所述基于所述初始变换参数确定所述初始神经网络的损失函数,包括:
    基于所述初始变换参数和所述初始神经网络输出的初始网格偏移参数确定所述初始神经网络的损失函数。
  21. 根据权利要求19所述的方法,其特征在于,所述初始神经网络基于样本图像中包括样本目标对象的第一样本图像区域的掩膜以及所述样本图像的网格输出所述初始变换参数和初始网格偏移参数。
  22. 根据权利要求15至21任意一项所述的方法,其特征在于,所述目标对象包括人脸。
  23. 根据权利要求15至21任意一项所述的方法,其特征在于,所述样本图像为 通过广角镜头拍摄得到的图像。
  24. 根据权利要求19所述的方法,其特征在于,所述第一投影模型为透视投影模型,所述第二投影模型为球极平面投影模型。
  25. 一种图像处理装置,包括处理器,其特征在于,所述处理器用于执行以下步骤:
    获取目标图像;其中,所述目标图像存在预设类型的畸变;
    获取所述目标图像中包括目标对象的第一图像区域,通过预设的第一投影模型对所述第一图像区域进行变换处理,以降低所述目标对象在所述第一图像区域中所述预设类型的畸变的程度。
  26. 根据权利要求25所述的装置,其特征在于,所述处理器还用于:
    通过预设的第二投影模型对所述目标图像中除所述第一图像区域以外的第二图像区域进行变换处理,以降低所述第二图像区域中所述预设类型的畸变的程度;
    其中,图像经所述第二投影模型进行变换处理后所述预设类型的畸变的程度高于经所述第一投影模型进行变换处理后所述预设类型的畸变的程度。
  27. 根据权利要求26所述的装置,其特征在于,所述第一投影模型为球极平面投影模型,所述第二投影模型为透视投影模型。
  28. 根据权利要求25-27任一所述的装置,其特征在于,所述处理器还用于:
    对变换处理后的所述第一图像区域进行调整,以使变换处理后的所述第一图像区域的位置和尺寸与变换处理前相同;
    对所述目标图像中除所述第一图像区域以外的第二图像区域进行调整,以使所述第二图像区域与变换处理后的所述第一图像区域相适应。
  29. 根据权利要求28所述的装置,其特征在于,所述处理器用于:
    对变换处理后的所述第一图像区域的掩膜进行网格化处理,得到所述掩膜的网格;
    通过预先获取的变换参数对所述掩膜的网格进行平移和尺度变换;
    通过预先获取的网格偏移参数对所述第二图像区域的网格进行平移。
  30. 根据权利要求29所述的装置,其特征在于,所述变换参数和网格偏移参数通过神经网络获取。
  31. 根据权利要求30所述的装置,其特征在于,所述处理器还用于:
    获取初始神经网络输出的初始变换参数和初始网格偏移参数;
    基于所述初始变换参数和初始网格偏移参数确定所述初始神经网络的损失函数;
    基于所述损失函数对所述初始神经网络进行训练,得到所述神经网络。
  32. 根据权利要求31所述的装置,其特征在于,所述损失函数基于以下至少一者得到:
    用于约束所述第一投影模型的第一损失函数;
    用于约束网格直线的第二损失函数;
    用于约束网格偏移值分布的第三损失函数;
    用于约束网格边界的第四损失函数。
  33. 根据权利要求32所述的装置,其特征在于,所述处理器还用于:
    基于预设的权重对所述第一损失函数、所述第二损失函数、所述第三损失函数和所述第四损失函数进行加权平均,得到所述损失函数。
  34. 根据权利要求33所述的装置,其特征在于,所述第一损失函数对应的权重大于所述第二损失函数对应的权重、所述第三损失函数对应的权重以及所述第四损失函数对应的权重中的任一者。
  35. 根据权利要求25所述的装置,其特征在于,所述处理器用于:
    通过所述第一投影模型对所述第一图像区域的掩膜进行变换处理,得到变换处理后的所述第一图像区域的掩膜;
    对变换处理后的所述第一图像区域的掩膜进行网格化处理,得到掩膜网格;
    基于所述第一图像区域的像素点的像素值,对所述掩膜网格进行插值处理。
  36. 根据权利要求26所述的装置,其特征在于,所述处理器用于:
    通过所述第二投影模型对所述第二图像区域的网格进行变换处理,得到变换处理后的所述第二图像区域的网格;
    基于所述第二图像区域的像素点的像素值,对变换处理后的所述第二图像区域的网格进行插值处理。
  37. 根据权利要求25所述的装置,其特征在于,所述目标对象包括人脸。
  38. 根据权利要求25所述的装置,其特征在于,所述目标图像为通过广角镜头拍摄得到的图像。
  39. 一种神经网络的训练装置,包括处理器,其特征在于,所述神经网络用于确定目标图像中包括目标对象的第一图像区域经预设的第一投影模型进行变换处理后的变换参数,所述变换参数用于对变换处理后的所述第一图像区域进行调整,以使变换处理后的所述第一图像区域的位置和尺寸与变换处理前相同;所述处理器用于执行以下步骤:
    获取初始神经网络输出的初始变换参数;
    基于所述初始变换参数确定所述初始神经网络的损失函数;
    基于所述损失函数对所述初始神经网络进行训练,得到所述神经网络。
  40. 根据权利要求39所述的装置,其特征在于,所述损失函数基于以下至少一者得到:
    用于约束所述第一投影模型的第一损失函数;
    用于约束网格直线的第二损失函数;
    用于约束网格偏移值分布的第三损失函数;
    用于约束网格边界的第四损失函数。
  41. 根据权利要求40所述的装置,其特征在于,所述处理器还用于:
    基于预设的权重对所述第一损失函数、所述第二损失函数、所述第三损失函数和所述第四损失函数进行加权平均,得到所述损失函数。
  42. 根据权利要求41所述的装置,其特征在于,所述第一损失函数对应的权重大于所述第二损失函数对应的权重、所述第三损失函数对应的权重以及所述第四损失函数对应的权重中的任一者。
  43. 根据权利要求39所述的装置,其特征在于,所述神经网络还用于确定所述目标图像中除所述第一图像区域以外的第二图像区域经预设的第二投影模型进行变换处理后的网格偏移参数,所述网格偏移参数用于对变换处理后的所述第二图像区域进行调整,以使所述第二图像区域与变换处理后的所述第一图像区域相适应。
  44. 根据权利要求43所述的装置,其特征在于,所述处理器用于:
    基于所述初始变换参数和所述初始神经网络输出的初始网格偏移参数确定所述初始神经网络的损失函数。
  45. 根据权利要求43所述的装置,其特征在于,所述初始神经网络基于样本图像中包括样本目标对象的第一样本图像区域的掩膜以及所述样本图像的网格输出所述初始变换参数和初始网格偏移参数。
  46. 根据权利要求39至45任意一项所述的装置,其特征在于,所述目标对象包括人脸。
  47. 根据权利要求39至45任意一项所述的装置,其特征在于,所述样本图像为通过广角镜头拍摄得到的图像。
  48. 根据权利要求43所述的装置,其特征在于,所述第一投影模型为透视投影模型,所述第二投影模型为球极平面投影模型。
  49. 一种计算机可读存储介质,其特征在于,其上存储有计算机指令,该指令被 处理器执行时实现权利要求1至38任意一项所述的方法。
PCT/CN2021/114403 2021-08-24 2021-08-24 图像处理及神经网络的训练方法和装置 WO2023023960A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/114403 WO2023023960A1 (zh) 2021-08-24 2021-08-24 图像处理及神经网络的训练方法和装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/114403 WO2023023960A1 (zh) 2021-08-24 2021-08-24 图像处理及神经网络的训练方法和装置

Publications (1)

Publication Number Publication Date
WO2023023960A1 true WO2023023960A1 (zh) 2023-03-02

Family

ID=85321597

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/114403 WO2023023960A1 (zh) 2021-08-24 2021-08-24 图像处理及神经网络的训练方法和装置

Country Status (1)

Country Link
WO (1) WO2023023960A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117456076A (zh) * 2023-10-30 2024-01-26 神力视界(深圳)文化科技有限公司 一种材质贴图生成方法及相关设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111008947A (zh) * 2019-12-09 2020-04-14 Oppo广东移动通信有限公司 图像处理方法和装置、终端设备及存储介质
CN111091507A (zh) * 2019-12-09 2020-05-01 Oppo广东移动通信有限公司 图像处理方法、装置、电子设备以及存储介质
CN111105366A (zh) * 2019-12-09 2020-05-05 Oppo广东移动通信有限公司 图像处理方法和装置、终端设备及存储介质
CN112529784A (zh) * 2019-09-18 2021-03-19 华为技术有限公司 图像畸变校正方法及装置
CN112686824A (zh) * 2020-12-30 2021-04-20 北京迈格威科技有限公司 图像校正方法、装置、电子设备和计算机可读介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112529784A (zh) * 2019-09-18 2021-03-19 华为技术有限公司 图像畸变校正方法及装置
CN111008947A (zh) * 2019-12-09 2020-04-14 Oppo广东移动通信有限公司 图像处理方法和装置、终端设备及存储介质
CN111091507A (zh) * 2019-12-09 2020-05-01 Oppo广东移动通信有限公司 图像处理方法、装置、电子设备以及存储介质
CN111105366A (zh) * 2019-12-09 2020-05-05 Oppo广东移动通信有限公司 图像处理方法和装置、终端设备及存储介质
CN112686824A (zh) * 2020-12-30 2021-04-20 北京迈格威科技有限公司 图像校正方法、装置、电子设备和计算机可读介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117456076A (zh) * 2023-10-30 2024-01-26 神力视界(深圳)文化科技有限公司 一种材质贴图生成方法及相关设备

Similar Documents

Publication Publication Date Title
KR102227583B1 (ko) 딥 러닝 기반의 카메라 캘리브레이션 방법 및 장치
CN108694705B (zh) 一种多帧图像配准与融合去噪的方法
US10692197B2 (en) Systems and techniques for automatic image haze removal across multiple video frames
JP5437311B2 (ja) 画像補正方法、画像補正システム、角度推定方法、および角度推定装置
CN109753971B (zh) 扭曲文字行的矫正方法及装置、字符识别方法及装置
US20190251675A1 (en) Image processing method, image processing device and storage medium
CN107566688B (zh) 一种基于卷积神经网络的视频防抖方法、装置及图像对齐装置
WO2021012596A1 (zh) 图像调整方法、装置、存储介质以及设备
CN107689035B (zh) 一种基于卷积神经网络的单应性矩阵确定方法及装置
WO2016065632A1 (zh) 一种图像处理方法和设备
CN107564063B (zh) 一种基于卷积神经网络的虚拟物显示方法及装置
CN112686824A (zh) 图像校正方法、装置、电子设备和计算机可读介质
CN113688907B (zh) 模型训练、视频处理方法,装置,设备以及存储介质
CN114175091A (zh) 利用基于上下文分割层的自适应去扭曲的最优身体或面部保护的方法
EP3886044B1 (en) Robust surface registration based on parameterized perspective of image templates
CN111598777A (zh) 天空云图的处理方法、计算机设备和可读存储介质
US20220360707A1 (en) Photographing method, photographing device, storage medium and electronic device
CN113643414A (zh) 一种三维图像生成方法、装置、电子设备及存储介质
CN114049268A (zh) 图像校正方法、装置、电子设备和计算机可读存储介质
WO2023023960A1 (zh) 图像处理及神经网络的训练方法和装置
CN107590790B (zh) 一种基于对称边缘填充的简单透镜边缘区域去模糊方法
CN113497886B (zh) 视频处理方法、终端设备及计算机可读存储介质
TWI313136B (zh)
JP5590680B2 (ja) 映像合成装置、映像合成方法、および映像合成プログラム
CN114049250B (zh) 一种证件照人脸姿态矫正方法、装置及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21954501

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE