WO2024088111A1 - 图像处理方法、装置、设备、介质及程序产品 - Google Patents

图像处理方法、装置、设备、介质及程序产品 Download PDF

Info

Publication number
WO2024088111A1
WO2024088111A1 PCT/CN2023/124980 CN2023124980W WO2024088111A1 WO 2024088111 A1 WO2024088111 A1 WO 2024088111A1 CN 2023124980 W CN2023124980 W CN 2023124980W WO 2024088111 A1 WO2024088111 A1 WO 2024088111A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
dimensional
network
features
feature
Prior art date
Application number
PCT/CN2023/124980
Other languages
English (en)
French (fr)
Inventor
程紫阳
Original Assignee
北京字跳网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字跳网络技术有限公司 filed Critical 北京字跳网络技术有限公司
Publication of WO2024088111A1 publication Critical patent/WO2024088111A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/094Adversarial learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration

Definitions

  • the present disclosure relates to the field of computer technology, and in particular to an image processing method, device, equipment, medium and program product.
  • Image processing technology is widely used in the beautification of portraits or pet images, which usually performs facial beautification based on key points, such as beautification of facial contours.
  • the existing facial contour beautification may have unstable and unnatural smoothing effects due to inaccurate facial key point detection, and cannot meet the user's requirements for smooth facial contours in images.
  • the present disclosure proposes an image processing method, device, equipment, storage medium and program product to solve the technical problem of poor facial contour smoothing effect in face images to a certain extent.
  • the present disclosure provides an image processing method, comprising:
  • the first network performs contour smoothing processing on the high-dimensional features and the low-dimensional features based on the target smoothness attribute to obtain high-dimensional correction features and low-dimensional correction features;
  • a target facial image is generated based on the high-dimensional rectified features and the low-dimensional rectified features.
  • an image processing device comprising:
  • An acquisition module used for acquiring an original facial image to be processed
  • the first network is used to process the original facial image to obtain a high-dimensional features, low-dimensional features and target smoothing properties; the first network performs contour smoothing processing on the high-dimensional features and the low-dimensional features based on the target smoothing properties to obtain high-dimensional correction features and low-dimensional correction features; and generates a target facial image based on the high-dimensional correction features and the low-dimensional correction features.
  • an electronic device characterized in that it includes one or more processors, a memory; and one or more programs, wherein the one or more programs are stored in the memory and executed by the one or more processors, and the program includes instructions for executing the method described in the first aspect or the second aspect.
  • a non-volatile computer-readable storage medium containing a computer program is provided.
  • the processors execute the method described in the first aspect or the second aspect.
  • a computer program product comprising computer program instructions, which, when executed on a computer, cause the computer to execute the method described in the first aspect.
  • the image processing method, device, equipment, medium and program product provided by the present disclosure based on the target smoothing attribute determined by the first network adaptively for the original facial image, corrects the high-dimensional features and low-dimensional features of the original facial image to achieve smoothing of the facial contour in the image without changing the features of other areas in the image, so that the processed image is more natural, improves the image processing effect, and reduces the user's creative cost while beautifying the image.
  • FIG. 1 is a schematic diagram of an image processing architecture according to an embodiment of the present disclosure.
  • FIG. 2 is a schematic diagram of the hardware structure of an exemplary electronic device according to an embodiment of the present disclosure.
  • FIG. 3 is a schematic diagram of the image processing method according to an embodiment of the present disclosure.
  • FIG. 4 is a schematic diagram of the image processing method according to an embodiment of the present disclosure.
  • FIG5 is a schematic flow chart of an image processing method according to an embodiment of the present disclosure.
  • FIG. 6 is a schematic diagram of an image processing device according to an embodiment of the present disclosure.
  • a prompt message is sent to the user to clearly prompt the user that the operation requested to be performed will require obtaining and using the user's personal information.
  • the user can autonomously choose whether to provide personal information to software or hardware such as an electronic device, application, server, or storage medium that performs the operation of the technical solution of the present disclosure according to the prompt message.
  • the prompt information in response to receiving an active request from the user, may be sent to the user in the form of a pop-up window, in which the prompt information may be presented in text form.
  • the pop-up window may also carry a selection control for the user to choose "agree” or “disagree” to provide personal information to the electronic device.
  • FIG1 shows a schematic diagram of an image processing architecture of an embodiment of the present disclosure.
  • the image processing architecture 100 may include a server 110, a terminal 120, and a network 130 that provides a communication link.
  • the server 110 and the terminal 120 may be connected via a wired or wireless network 130.
  • the server 110 may be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, security services, and CDN.
  • the terminal 120 may be implemented in hardware or software.
  • the terminal 120 when the terminal 120 is implemented in hardware, it may be various electronic devices having a display screen and supporting page display, including but not limited to smart phones, tablet computers, e-book readers, laptop portable computers, and desktop computers, etc.
  • the terminal 120 device when the terminal 120 device is implemented in software, it may be installed in the electronic devices listed above; it may be implemented as multiple software or software modules (such as software or software modules used to provide distributed services), or it may be implemented as a single software or software module, which is not specifically limited here.
  • the image processing method provided in the embodiment of the present application can be executed by the terminal 120 or by the server 110. It should be understood that the number of terminals, networks and servers in FIG1 is only for illustration and is not intended to limit the number of terminals, networks and servers. Any number of terminals, networks and servers may be provided as required.
  • FIG2 shows a schematic diagram of the hardware structure of an exemplary electronic device 200 provided by an embodiment of the present disclosure.
  • the electronic device 200 may include: a processor 202, a memory 204, a network module 206, a peripheral interface 208, and a bus 210.
  • the processor 202, the memory 204, the network module 206, and the peripheral interface 208 are connected to each other in communication within the electronic device 200 through the bus 210.
  • Processor 202 may be a central processing unit (CPU), an image processor, a neural network processor (NPU), a microcontroller (MCU), a programmable logic device, a digital signal processor (DSP), an application specific integrated circuit (ASIC), or one or more integrated circuits.
  • processor 202 may be used to perform functions related to the technology described in the present disclosure.
  • processor 202 may also include multiple processors integrated into a single logical component. For example, as shown in FIG. 2, processor 202 may include multiple processors 202a, 202b, and 202c.
  • the memory 204 may be configured to store data (eg, instructions, computer code, etc.). As shown in FIG. 2 , the data stored in the memory 204 may include program instructions (eg, instructions for implementing the present disclosure). The processor 202 may also access the program instructions and data stored in the memory 204, and execute the program instructions to operate on the data to be processed.
  • the memory 204 may include a volatile storage device or a non-volatile storage device. In some embodiments, the memory 204 may include a random access memory (RAM), a read-only memory (ROM), an optical disk, a magnetic disk, a hard disk, a solid-state drive (SSD), a flash memory, a memory stick, etc.
  • RAM random access memory
  • ROM read-only memory
  • SSD solid-state drive
  • flash memory a memory stick, etc.
  • the network module 206 can be configured to provide communication with other external devices to the electronic device 200 via a network.
  • the network can be any wired or wireless network capable of transmitting and receiving data.
  • the network can be a wired network, a local wireless network (e.g., Bluetooth, WiFi, near field communication (NFC) etc.), a cellular network, the Internet or a combination thereof. It is understood that the type of network is not limited to the above specific examples.
  • the network module 306 can include any number of network interface controllers (NICs), radio frequency modules, transceivers, modems, routers, gateways, adapters, cellular network chips, etc., in any combination.
  • NICs network interface controllers
  • the peripheral interface 208 can be configured to connect the electronic device 200 to one or more peripheral devices to achieve information input and output.
  • the peripheral devices can include input devices such as a keyboard, a mouse, a touch pad, a touch screen, a microphone, and various sensors, and output devices such as a display, a speaker, a vibrator, and an indicator light.
  • the bus 210 can be configured to transmit information between various components of the electronic device 200 (e.g., the processor 202, the memory 204, the network module 206, and the peripheral interface 208), such as an internal bus (e.g., a processor-memory bus), an external bus (USB port, PCI-E bus), etc.
  • an internal bus e.g., a processor-memory bus
  • an external bus USB port, PCI-E bus
  • the architecture of the electronic device 200 may also include other components necessary for normal operation.
  • the architecture of the electronic device 200 may also only include the components necessary for implementing the embodiments of the present disclosure, and does not necessarily include all the components shown in the figure.
  • Such image processing applications with beauty functions generally beautify images based on facial key points. For example, after detecting facial key points, the facial key points are adjusted to smooth the facial contour.
  • this method may have poor smoothing effect and be unstable due to inaccurate facial key point detection.
  • smoothing the facial contour it may also change facial features, skin color and other characteristics, making the overall change of the smoothed facial image too large, which is far from the real face and cannot be reproduced. Therefore, how to improve the smoothing effect of facial contours in facial images has become a technical problem that needs to be solved urgently.
  • the embodiments of the present disclosure provide an image processing method, apparatus, device, storage medium and program product.
  • the target smoothing property determined by the first network adaptively for the original facial image the high-dimensional features and low-dimensional features of the original facial image are corrected to achieve smoothing of the facial contour in the image without changing the features of other areas in the image, so that the processed image is more natural, the image processing effect is improved, and the user's creative cost is reduced while beautifying the image.
  • the smoothness of the facial contour can be changed in one click based on the target smoothing property determined by the first network adaptively, so that the uneven contour in the image can be smooth and smooth.
  • FIG. 3 shows a schematic diagram of an image processing method according to an embodiment of the present disclosure.
  • the model architecture of the image processing model 300 may adopt a generative adversarial network (GAN), including a generative network 310 and a discriminative network 320.
  • the image processing model 300 may be obtained by training the initial generative adversarial network using a first training sample and a preset supervision strategy.
  • the generative network 310 in the image processing model 300 may be used as a contour smoothing network 330 in practical applications to smooth the facial contour of the original image to be processed and obtain a target image after smoothing.
  • At least one sample pair may be included based on the first training sample, each sample pair including an original training image A and a facial correction image B corresponding to the original training image A.
  • the original training image A may be feature corrected based on a preset contour smoothing standard to obtain a corresponding facial correction image B.
  • a first number e.g., 3,000
  • the original training images A include facial images that meet the image quality requirements, for example, the resolution of the facial image is not less than 1024*1024 pixels.
  • the original training image A in the first training sample may be a human face image, and at this time, as many types of facial images as possible may be covered, for example, male and female faces, faces of various age groups (e.g., 20-80 years old), faces of various angles, etc., so as to ensure the richness of the training data and improve the accuracy of the model training.
  • the facial image of the original training image A is manually corrected for facial contour features, so that the facial contour of the original training image A becomes smooth, thereby obtaining the corresponding facial correction image B.
  • other facial features such as skin color, facial features, skin texture, etc.
  • the original training image A in the first training sample can also be an animal facial images, such as those of cats or dogs.
  • the generative adversarial network can be trained based on the first training sample and the preset supervision strategy to obtain a contour smoothing network.
  • a preset supervision strategy can be set, and the image processing model 300 can be supervised for training using the first training sample, so as to obtain a generative network 310 that can adaptively match the input image with a suitable target smoothing attribute as a contour smoothing network 330.
  • the preset supervision strategy may include: setting the lighting condition parameters of the input image of the generative adversarial network to simulate the lighting condition of the input image.
  • the lighting condition parameters may be randomly set.
  • the input image input to the output adversarial network may be an image in the first training sample, such as the original training image A and the corresponding facial correction image B.
  • simulating the lighting condition of the input image in the image processing model 300 based on the lighting condition parameters can increase the richness of the input data, so that the image processing model 300 can process more diverse input images during the training process, thereby improving the accuracy of the image processing model 300.
  • a gamma correction algorithm can be used to implement lighting simulation, that is, the lighting condition parameters can be set by setting the relevant parameters of the gamma correction algorithm.
  • the preset supervision strategy may include:
  • the generation network generates a corresponding first image based on the original training image in the first training sample
  • a first discriminant network parameter of the discriminant network is adjusted based on the cross entropy loss function to maximize the cross entropy loss function, and a first generative network parameter of the generative network is adjusted based on the cross entropy loss function to minimize the cross entropy loss function.
  • the cross entropy loss function V(D, G) of the generation network 310 and the discriminant network 320 may include: the sum of a first expected function about a first logarithmic function and a second expected function about a second logarithmic function, wherein the first logarithmic function includes a logarithmic function of a first discrimination result about the original training image x, and the second logarithmic function includes a logarithmic function of a difference between a first preset value (e.g., 1) and a second discrimination result of the generation result for the original training image x.
  • the training process of the image processing model 300 may be that the generation network 310 and the discriminant network 320 are trained separately and alternately.
  • the generation network 310 may be fixed first and the discriminant network 320 may be trained to update the first discriminant network parameters of the discriminant network 320.
  • the training goal of the image processing model 300 is to maximize the cross entropy function V(D, G).
  • the discriminant network 320 fix the discriminant network 320 and train the generative network 310 to update the first generative network parameters of the generative network 310.
  • the training goal of the image processing model 300 is to minimize the cross entropy function V(D, G). Since the first network parameters of the discriminant network 320 do not change at this time, Ex [log D(x)] also does not change, and the cross entropy function V(D, G) is minimized.
  • the training of the discriminant network 320 when the generator network 310 is fixed and the training of the generator network 310 when the discriminant network 320 is fixed are repeated until a Nash equilibrium is reached. In this way, the result generated by the generator network 310 can be made more realistic.
  • the image processing model further includes a smooth attribute discriminator associated with the generation network
  • the preset supervision strategy may include:
  • the generation network generates a corresponding first image based on the original training image in the first training sample
  • a second generation network parameter of the generation network is adjusted based on the smoothness property loss function so that the first image matches the second smoothness property.
  • FIG. 4 shows a schematic diagram of an image processing model according to an embodiment of the present disclosure.
  • the image processing model 300 also includes a smooth attribute discriminator 340 associated with the generation network 310, and the smooth attribute discriminator 340 can also be set in the image processing model 300 to calculate the smooth attribute loss function of the generation network 310 and the smooth attribute discriminator 340.
  • the original training image A and the first smooth attribute R1 of the original training image, the facial correction image B and the second smooth attribute R2 of the facial correction image, and the first image A' in the first training sample can be input in pairs as input data of the smooth attribute discriminator 340, and the input data can be represented as an object-attribute pair (Image, attr), where Image represents the input image and attr represents the smooth attribute.
  • the smooth attribute discriminator 340 performs a matching judgment on the input data (Image, attr), and if Image and attr match, the judgment result is True, otherwise it is False.
  • the generation network 310 and the smooth attribute discriminator 340 are trained and updated separately and alternately.
  • the cross entropy function of the generation network 310 and the smooth attribute discriminator 340 can be used as a smooth attribute loss function V (Dattr, G), including: the sum of a third expected function about a third logarithmic function and a fourth expected function about a fourth logarithmic function, wherein the third logarithmic function includes a logarithmic function of the discrimination result about the original training image x and its attribute sttr, and the fourth logarithmic function includes a logarithmic function of the difference between a first preset value (e.g., 1) and the generation result of the original training image x and its attribute sttr. Then the goal of updating the smooth attribute discriminator 340 is to maximize the smooth attribute loss function V (Dattr, G).
  • the generation network 310 can adaptively match the smooth attribute suitable for the input image to generate a contour smoothing image suitable for the input image, so as to improve the contour smoothing effect when the generation network 310 is used as a contour smoothing network.
  • the preset supervision strategy may include:
  • the generation network generates a corresponding first image based on the original training image in the first training sample
  • the third generation network parameters of the generation network and the third discrimination network parameters of the discrimination network are adjusted based on the feature correction loss function to minimize the feature correction loss function.
  • calculating a feature correction loss function based on the first image and the face correction image in the first training sample further includes:
  • the feature correction loss function is obtained based on the sum of the first high-dimensional feature loss function and the second low-dimensional feature loss function.
  • high-dimensional semantic features can refer to features obtained based on deep networks in image processing models. Such high-dimensional semantic features can be close to the output layer and have the characteristics of low resolution, small feature map size, high abstraction, and more global information.
  • Low-dimensional texture features can refer to features obtained based on shallow networks in image processing models. Such low-dimensional texture features can be close to the input layer and have the characteristics of high resolution, large feature map size, more detailed information, and easy alignment with the original training image. By correcting these two features, the advantages of both can be combined, thereby improving the training effect of the image processing model and the effect of facial contour smoothing.
  • the generating network 310 generates the corresponding first image A' based on the original training image A.
  • the visual processor in the image processing model 300 can extract features from the first image A' to obtain the first high-dimensional semantic feature F1_A' and the first low-dimensional texture feature F2_A' of the first image; and extract features from the facial correction image in the first training sample to obtain the second high-dimensional semantic feature F3 and the second low-dimensional texture feature F4.
  • the feature correction loss function L_F can be calculated as follows: first high-dimensional feature loss function l1(F1_A', F3) + first low-dimensional feature loss function l1(F2_A', F4), where l1 is the mean absolute error function.
  • first generation network parameter, the second generation network parameter and the third generation network parameter in the present disclosure can all represent model parameters of the generation network, which can be the same or different;
  • first discriminant network parameter, the second discriminant network parameter and the third discriminant network parameter can all represent model parameters of the discriminant network, which can be the same or different.
  • the preset supervision strategy may include:
  • a loss weight of the original training image is determined based on a first smoothness property of the original training image and a second smoothness property of the face-rectified image.
  • determining the loss weight of the original training image based on the first smoothness property of the original training image and the second smoothness property of the face-rectified image includes:
  • a loss weight of the original training image is determined based on the degree of smooth change, wherein the loss weight of the original training image is proportional to the degree of smooth change.
  • calculating the degree of smooth change of the original training image based on the first smooth attribute and the second smooth attribute may include:
  • the smooth change degree is obtained based on an absolute value function of the attribute difference.
  • the generative adversarial network is trained to obtain a trained image processing model.
  • the trained image processing model adaptively learns the network parameters of facial smoothing of the training sample during the training process, and can adaptively match the smoothing properties suitable for the input image.
  • the generative network is used as a contour smoothing network for smoothing the facial contour of the image in actual application, and the facial contour of the original image input by the user is processed to make the facial contour smooth and fluent.
  • training a generative adversarial network based on the first training sample and a preset supervision strategy to obtain the contour smoothing network may further include:
  • the generative adversarial network is trained to obtain a preliminary image processing model
  • the preliminary image processing model is trained twice based on the second training sample and the preset supervision strategy to obtain the contour smoothing network.
  • the second training sample may be obtained based on the preliminary image processing model, specifically including:
  • the second training sample is obtained based on the second image and the corresponding third image.
  • the preliminary image processing model (including the preliminary generative network and the preliminary discriminant network) obtained by training based on the first training sample is not very stable.
  • the preliminary image processing model is trained twice using the second training sample and the preset supervision strategy to obtain a more stable image processing model, and the generative network in the more stable image processing model is used as a contour smoothing network, thereby improving the stability of facial contour processing in practical applications.
  • a large number of facial images (for example, an open source dataset including facial images) can be obtained, and the large number of facial images can be input into the initial generative network in the initial biological adversarial network to generate a second image D, thereby obtaining a large number of data sets set_D, and then the large number of data sets set_D can be input into the preliminary image processing model trained based on the first training sample to obtain the output third image E.
  • the second image D and the corresponding third image E form a new training data pair, which is used as the second training sample.
  • the preliminary image processing model is trained twice in combination with a preset supervision strategy to obtain a new image processing model.
  • the generative network in the new image processing model can be used as a contour smoothing network for practical applications (for example, the contour smoothing network 330 in FIG. 4).
  • Fig. 5 shows a schematic flow chart of an image processing method according to an embodiment of the present disclosure.
  • the image processing method 500 may include the following steps.
  • Step S510 obtaining an original facial image to be processed
  • Step S520 processing the original facial image based on the first network to obtain high-dimensional features, low-dimensional features and target smoothness properties of the original facial image;
  • Step S530 the first network performs contour smoothing processing on the high-dimensional features and the low-dimensional features based on the target smoothness attribute to obtain high-dimensional correction features and low-dimensional correction features;
  • Step S540 Generate a target facial image based on the high-dimensional correction features and the low-dimensional correction features.
  • the user hopes to smooth the facial contour in the original image imageA.
  • the original image imageA can be subjected to facial key point detection to obtain the facial key point P of the original image imageA.
  • the original image imageA is subjected to facial cropping based on the facial key point P to obtain the original facial image image_Face.
  • the original facial image image_Face is input into the trained first network (e.g., the contour smoothing network 330 in FIG. 3 ), and the first network extracts features of the original facial image image_Face to obtain the high-dimensional semantic features F1 and the low-dimensional texture features F2 of the original facial image image_Face.
  • the trained first network can adaptively determine the target smoothing properties that match the input data based on the input data
  • the high-dimensional semantic features F1 and the low-dimensional texture features F2 can be smoothed to obtain the high-dimensional semantic correction features F1’ and the low-dimensional texture correction features F2’.
  • the contour smoothing network then generates the target facial image image_Face’ after smoothing from the high-dimensional semantic correction features F1’ and the low-dimensional texture correction features F2’.
  • the target facial image image_Face’ can be further fused into the original image imageA to obtain the target image imageA’ with the facial contour smoothed.
  • the high-dimensional semantic features and low-dimensional texture features of the image are corrected based on the adaptive smoothing properties of the contour smoothing network to achieve smoothing of the facial contour in the image without changing the features of other areas in the image, so that the processed image is more natural, the image processing effect is improved, and the user's creative cost is reduced while beautifying the image.
  • the smoothness of the facial contour can be changed in one click based on the adaptive smoothing properties of the contour smoothing network, so that the uneven contour in the image can be smoothed.
  • users may not only need to beautify human portraits, but may also need to beautify animal images (such as pets such as cats and dogs).
  • animal images such as pets such as cats and dogs.
  • the facial contours in human face images can be smoothed, but also the facial contours in animal images can be smoothed, so that the facial contours in the human or animal images are smoother and more natural.
  • the original facial image is processed based on the first network to obtain high-dimensional features and low-dimensional features of the original facial image, including:
  • the first network performs feature extraction on the original facial image to obtain the high-dimensional features (e.g., high-dimensional semantic features F1 of the original facial image image_Face) and the low-dimensional features (e.g., low-dimensional texture features F2 of the original facial image image_Face) of the original facial image; wherein the high-dimensional features are semantic features, and the low-dimensional features are texture features.
  • high-dimensional features e.g., high-dimensional semantic features F1 of the original facial image image_Face
  • the low-dimensional features e.g., low-dimensional texture features F2 of the original facial image image_Face
  • high-dimensional features may refer to high-dimensional semantic features
  • low-dimensional features may refer to low-dimensional texture features.
  • the first network may extract features from the original facial image image_Face to obtain the original facial image Like the high-dimensional semantic features F1 and low-dimensional texture features F2 of image_Face.
  • processing the original facial image based on the first network to obtain a target smoothness attribute of the original facial image includes:
  • the first network performs facial contour detection on the original facial image to obtain facial contour feature points of the original facial image, and determines the target smoothness attribute based on the facial contour feature points.
  • determining the target smoothness attribute based on the facial contour feature points includes:
  • An original smoothness attribute of the original facial image is determined based on the facial contour feature points, and the target smoothness attribute is determined based on the original smoothness attribute.
  • the trained first network can adaptively determine the target smoothness attribute of the input data according to the input data. For example, the target contour feature points of the input data can be determined first, and then the corresponding target smoothness attribute can be obtained based on the target contour feature points.
  • the original smoothness attribute can be determined from the original contour feature points of the input data, and then the target smoothness attribute can be determined based on the original smoothness attribute.
  • contour smoothing is performed on the high-dimensional feature and the low-dimensional feature to obtain a high-dimensional correction feature and a low-dimensional correction feature, including:
  • the loss function weights of the high-dimensional features and the low-dimensional features are set based on the target smoothness attribute to obtain the high-dimensional correction features and the low-dimensional correction features.
  • the generative adversarial network is trained based on the first training sample and the preset supervision strategy to obtain the first network (e.g., the contour smoothing network 330 in FIG. 3 );
  • the first training sample includes at least one sample pair, and the sample pair includes an original training image (such as the original training image A in Figure 3) and a corresponding facial correction image (such as the facial correction image B in Figure 3), and the facial correction image is obtained by performing facial contour smoothing processing based on the original training image.
  • the sample pair includes an original training image (such as the original training image A in Figure 3) and a corresponding facial correction image (such as the facial correction image B in Figure 3), and the facial correction image is obtained by performing facial contour smoothing processing based on the original training image.
  • the generative adversarial network includes a generative network (e.g., the generative network 310 in FIG. 3 ) and a discriminative network associated with the generative network (e.g., the discriminative network 320 in FIG. 3 ), and the preset supervision strategy includes:
  • the generation network generates a corresponding first image (e.g., a first image A') based on the original training image in the first training sample;
  • the first discriminant network parameter of the discriminant network is adjusted to maximize the cross entropy loss function (e.g. ), and adjusting the first generating network parameters of the generating network based on the cross entropy loss function to minimize the cross entropy loss function (e.g. ).
  • the generative adversarial network includes a generative network and a smooth attribute discriminator associated with the generative network (e.g., the smooth attribute discriminator 340 in FIG. 4 ), and the preset supervision strategy includes:
  • the generation network generates a corresponding first image based on the original training image in the first training sample
  • a smoothness attribute loss function (e.g., V(Dattr, G)) of the generator network and the smoothness attribute discriminator based on the original training image, a first smoothness attribute (e.g., first smoothness attribute R1) of the original training image, the face-rectified image, a second smoothness attribute (e.g., second smoothness attribute R2) of the face-rectified image, and the first image;
  • the generative adversarial network includes a generative network and a smooth attribute discriminator associated with the generative network
  • the preset supervision strategy includes:
  • the generation network generates a corresponding first image based on the original training image in the first training sample
  • a feature correction loss function (e.g., a feature correction loss function L_F) based on the first image and the face correction image in the first training sample;
  • a third generation network parameter of the generation network and a third discriminant network parameter of the discriminant network are adjusted to minimize the feature correction loss function.
  • calculating a feature correction loss function based on the first image and the face correction image in the first training sample further includes:
  • the feature correction loss function is obtained based on the sum of the first high-dimensional feature loss function and the second low-dimensional feature loss function.
  • the preset supervision strategy includes:
  • a loss weight of the original training image is determined based on a first smoothness property of the original training image and a second smoothness property of the face-rectified image.
  • determining the loss weight of the original training image based on a first smoothness property of the original training image and a second smoothness property of the face-rectified image includes:
  • smoothness change degree abbreviations ( SB - SA )
  • first smoothness attribute e.g., first smoothness attribute SA
  • second smoothness attribute e.g., second smoothness attribute SB
  • a loss weight of the original training image is determined based on the degree of smooth change, wherein the loss weight of the original training image is proportional to the degree of smooth change.
  • calculating the degree of smooth change of the original training image based on the first smooth attribute and the second smooth attribute includes:
  • the smooth change degree is obtained based on an absolute value function of the attribute difference.
  • training a generative adversarial network based on a first training sample and a preset supervision strategy to obtain the contour smoothing network further includes:
  • the generative adversarial network is trained to obtain a preliminary image processing model
  • the preliminary image processing model is trained for a second time to obtain the contour smoothing network (such as the contour smoothing network 330 in FIG. 3-4 );
  • the second training sample is obtained based on the preliminary image processing model, and specifically includes:
  • a training data set including a plurality of facial images, and input the facial images into a generative network in the generative adversarial network to obtain a second image (e.g., a second image D);
  • the second training sample is obtained based on the second image and the corresponding third image.
  • the preset supervision strategy includes: setting the lighting condition parameters of the input image of the generative adversarial network to simulate the lighting conditions of the input image.
  • the method of the embodiment of the present disclosure can be performed by a single device, such as a computer or a server.
  • the method of the present embodiment can also be applied in a distributed scenario and completed by multiple devices cooperating with each other.
  • one of the multiple devices can only perform one or more steps in the method of the embodiment of the present disclosure, and the multiple devices will interact with each other to complete the described method.
  • the present disclosure further provides an image processing device, referring to FIG6 , wherein the image processing device includes:
  • An acquisition module used for acquiring an original facial image to be processed
  • the first network is used to process the original facial image to obtain high-dimensional features, low-dimensional features and target smoothing properties of the original facial image; the first network performs contour smoothing processing on the high-dimensional features and the low-dimensional features based on the target smoothing properties to obtain high-dimensional correction features and low-dimensional correction features; and generates a target facial image based on the high-dimensional correction features and the low-dimensional correction features.
  • the above device is described by dividing it into various modules according to its functions.
  • the functions of each module can be implemented in the same or multiple software and/or hardware.
  • the device of the above embodiment is used to implement the corresponding image processing method in any of the above embodiments, and has the beneficial effects of the corresponding method embodiment, which will not be described in detail here.
  • the present disclosure also provides a non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium stores computer instructions, and the computer instructions are used to enable the computer to execute the image processing method described in any of the above embodiments.
  • the computer-readable media of this embodiment include permanent and non-permanent, removable and non-removable media.
  • the medium may be implemented by any method or technology to store information.
  • Information may be computer-readable instructions, data structures, modules of programs or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disk read-only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • PRAM phase change memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • RAM random access memory
  • ROM read-only memory
  • EEPROM electrically erasable programmable
  • the computer instructions stored in the storage medium of the above embodiments are used to enable the computer to execute the image processing method described in any of the above embodiments, and have the beneficial effects of the corresponding method embodiments, which will not be repeated here.
  • the known power/ground connections to the integrated circuit (IC) chips and other components may or may not be shown in the provided figures.
  • the device can be shown in the form of a block diagram to avoid making the embodiments of the present disclosure difficult to understand, and this also takes into account the fact that the details of the implementation of these block diagram devices are highly dependent on the platform on which the embodiments of the present disclosure will be implemented (that is, these details should be fully within the scope of understanding of those skilled in the art).
  • DRAM dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Processing (AREA)

Abstract

本公开提供一种图像处理方法、装置、设备、存储介质及程序产品。该方法包括:获取待处理的原始面部图像;基于第一网络对所述原始面部图像进行处理,得到所述原始面部图像的高维特征、低维特征和目标平滑属性;所述第一网络基于所述目标平滑属性,对所述高维特征和所述低维特征进行轮廓平滑处理得到高维矫正特征和低维矫正特征;基于所述高维矫正特征和所述低维矫正特征生成目标面部图像。

Description

图像处理方法、装置、设备、介质及程序产品
本申请要求于2022年10月28日提交中国国家知识产权局、申请号为202211339290.3、发明名称为“图像处理方法、装置、设备、介质及程序产品”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及计算机技术领域,尤其涉及一种图像处理方法、装置、设备、介质及程序产品。
背景技术
图像处理技术广泛应用于人像或宠物图像美化的场景,其通常基于关键点来进行面部美化,例如面部轮廓的美化。然而,现有的面部轮廓美化可能由于面部关键点检测不准确等原因,平滑效果不稳定且不自然,不能满足用户对图像中面部轮廓平滑的要求。
发明内容
本公开提出一种图像处理方法、装置、设备、存储介质及程序产品,以在一定程度上解决人脸图像中面部轮廓平滑效果不佳的技术问题。
本公开第一方面,提供了一种图像处理方法,包括:
获取待处理的原始面部图像;
基于第一网络对所述原始面部图像进行处理,得到所述原始面部图像的高维特征、低维特征和目标平滑属性;
所述第一网络基于所述目标平滑属性,对所述高维特征和所述低维特征进行轮廓平滑处理得到高维矫正特征和低维矫正特征;
基于所述高维矫正特征和所述低维矫正特征生成目标面部图像。
本公开第二方面,提供了一种图像处理装置,包括:
获取模块,用于获取待处理的原始面部图像;
第一网络,用于对所述原始面部图像进行处理,得到所述原始面部图像的 高维特征、低维特征和目标平滑属性;所述第一网络基于所述目标平滑属性,对所述高维特征和所述低维特征进行轮廓平滑处理得到高维矫正特征和低维矫正特征;以及基于所述高维矫正特征和所述低维矫正特征生成目标面部图像。
本公开第三方面,提供了一种电子设备,其特征在于,包括一个或者多个处理器、存储器;和一个或多个程序,其中所述一个或多个程序被存储在所述存储器中,并且被所述一个或多个处理器执行,所述程序包括用于执行根据第一方面或第二方面所述的方法的指令。
本公开第四方面,提供了一种包含计算机程序的非易失性计算机可读存储介质,当所述计算机程序被一个或多个处理器执行时,使得所述处理器执行第一方面或第二方面所述的方法。
本公开第五方面,提供了一种计算机程序产品,包括计算机程序指令,当所述计算机程序指令在计算机上运行时,使得计算机执行第一方面所述的方法。
从上面所述可以看出,本公开提供的一种图像处理方法、装置、设备、介质及程序产品,基于第一网络自适应对原始面部图像确定的目标平滑属性,对原始面部图像的高维特征和低维特征进行矫正,以实现对图像中脸部轮廓的平滑处理,而不改变图像中其他区域的特征,使得经过处理的图像更自然,提高了图像处理的效果,也在美化图像的同时减少了用户的创造成本。
附图说明
为了更清楚地说明本公开或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本公开实施例的图像处理架构的示意图。
图2为本公开实施例的示例性电子设备的硬件结构示意图。
图3为本公开实施例的图像处理方法的示意性原理图。
图4为本公开实施例的图像处理方法的示意性原理图。
图5为本公开实施例的图像处理方法的流程示意图。
图6为本公开实施例的图像处理装置的示意图。
具体实施方式
为使本公开的目的、技术方案和优点更加清楚明白,以下结合具体实施例,并参照附图,对本公开进一步详细说明。
需要说明的是,除非另外定义,本公开实施例使用的技术术语或者科学术语应当为本公开所属领域内具有一般技能的人士所理解的通常意义。本公开实施例中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的组成部分。“包括”或者“包含”等类似的词语意指出现该词前面的元件或者物件涵盖出现在该词后面列举的元件或者物件及其等同,而不排除其他元件或者物件。“连接”或者“相连”等类似的词语并非限定于物理的或者机械的连接,而是可以包括电性的连接,不管是直接的还是间接的。“上”、“下”、“左”、“右”等仅用于表示相对位置关系,当被描述对象的绝对位置改变后,则该相对位置关系也可能相应地改变。
可以理解的是,在使用本公开各实施例公开的技术方案之前,均应当依据相关法律法规通过恰当的方式对本公开所涉及个人信息的类型、使用范围、使用场景等告知用户并获得用户的授权。
例如,在响应于接收到用户的主动请求时,向用户发送提示信息,以明确地提示用户,其请求执行的操作将需要获取和使用到用户的个人信息。从而,使得用户可以根据提示信息来自主地选择是否向执行本公开技术方案的操作的电子设备、应用程序、服务器或存储介质等软件或硬件提供个人信息。
作为一种可选的但非限定性的实现方式,响应于接收到用户的主动请求,向用户发送提示信息的方式例如可以是弹窗的方式,弹窗中可以以文字的方式呈现提示信息。此外,弹窗中还可以承载供用户选择“同意”或者“不同意”向电子设备提供个人信息的选择控件。
可以理解的是,上述通知和获取用户授权过程仅是示意性的,不对本公开的实现方式构成限定,其它满足相关法律法规的方式也可应用于本公开的实现 方式中。
图1示出了本公开实施例的图像处理架构的示意图。参考图1,该图像处理架构100可以包括服务器110、终端120以及提供通信链路的网络130。服务器110和终端120之间可通过有线或无线的网络130连接。其中,服务器110可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、安全服务、CDN等基础云计算服务的云服务器。
终端120可以是硬件或软件实现。例如,终端120为硬件实现时,可以是具有显示屏并且支持页面显示的各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、膝上型便携计算机和台式计算机等等。终端120设备为软件实现时,可以安装在上述所列举的电子设备中;其可以实现成多个软件或软件模块(例如用来提供分布式服务的软件或软件模块),也可以实现成单个软件或软件模块,在此不做具体限定。
需要说明的是,本申请实施例所提供的图像处理方法可以由终端120来执行,也可以由服务器110来执行。应了解,图1中的终端、网络和服务器的数目仅为示意,并不旨在对其进行限制。根据实现需要,可以具有任意数目的终端、网络和服务器。
图2示出了本公开实施例所提供的示例性电子设备200的硬件结构示意图。如图2所示,电子设备200可以包括:处理器202、存储器204、网络模块206、外围接口208和总线210。其中,处理器202、存储器204、网络模块206和外围接口208通过总线210实现彼此之间在电子设备200的内部的通信连接。
处理器202可以是中央处理器(Central Processing Unit,CPU)、图像处理器、神经网络处理器(NPU)、微控制器(MCU)、可编程逻辑器件、数字信号处理器(DSP)、应用专用集成电路(Application Specific Integrated Circuit,ASIC)、或者一个或多个集成电路。处理器202可以用于执行与本公开描述的技术相关的功能。在一些实施例中,处理器202还可以包括集成为单一逻辑组件的多个处理器。例如,如图2所示,处理器202可以包括多个处理器202a、202b和202c。
存储器204可以配置为存储数据(例如,指令、计算机代码等)。如图2所示,存储器204存储的数据可以包括程序指令(例如,用于实现本公开实施 例的图像处理方法的程序指令)以及要处理的数据(例如,存储器可以存储其他模块的配置文件等)。处理器202也可以访问存储器204存储的程序指令和数据,并且执行程序指令以对要处理的数据进行操作。存储器204可以包括易失性存储装置或非易失性存储装置。在一些实施例中,存储器204可以包括随机访问存储器(RAM)、只读存储器(ROM)、光盘、磁盘、硬盘、固态硬盘(SSD)、闪存、存储棒等。
网络模块206可以配置为经由网络向电子设备200提供与其他外部设备的通信。该网络可以是能够传输和接收数据的任何有线或无线的网络。例如,该网络可以是有线网络、本地无线网络(例如,蓝牙、WiFi、近场通信(NFC)等)、蜂窝网络、因特网、或上述的组合。可以理解的是,网络的类型不限于上述具体示例。在一些实施例中,网络模块306可以包括任意数量的网络接口控制器(NIC)、射频模块、接收发器、调制解调器、路由器、网关、适配器、蜂窝网络芯片等的任意组合。
外围接口208可以配置为将电子设备200与一个或多个外围装置连接,以实现信息输入及输出。例如,外围装置可以包括键盘、鼠标、触摸板、触摸屏、麦克风、各类传感器等输入设备以及显示器、扬声器、振动器、指示灯等输出设备。
总线210可以被配置为在电子设备200的各个组件(例如处理器202、存储器204、网络模块206和外围接口208)之间传输信息,诸如内部总线(例如,处理器-存储器总线)、外部总线(USB端口、PCI-E总线)等。
需要说明的是,尽管上述电子设备200的架构仅示出了处理器202、存储器204、网络模块206、外围接口208和总线210,但是在具体实施过程中,该电子设备200的架构还可以包括实现正常运行所必需的其他组件。此外,本领域的技术人员可以理解的是,上述电子设备200的架构中也可以仅包含实现本公开实施例方案所必需的组件,而不必包含图中所示的全部组件。
为了获得更好的图像效果,人们常常使用具有美颜功能的应用程序来对图像进行处理。这类具有美颜功能的图像处理应用程序一般基于面部关键点来对图像进行美化,例如检测出面部关键点后,对面部关键点进行调整以对面部轮廓进行平滑。然而,这种方式可能由于面部关键点检测不准确等原因,平滑效果不佳且不稳定;同时,在对面部轮廓进行平滑时还可能改变面部的五官、肤色等特征,使得平滑后的面部图像整体改动过大,与真实的面部相差深远,无 法满足用户对面部图像中面部轮廓平滑的要求。因此,如何提升面部图像中面部轮廓平滑效果成为了亟需解决的技术问题。
鉴于此,本公开实施例提供了一种图像处理方法、装置、设备、存储介质及程序产品。基于第一网络自适应对原始面部图像确定的目标平滑属性,对原始面部图像的高维特征和低维特征进行矫正,以实现对图像中面部轮廓的平滑处理,而不改变图像中其他区域的特征,使得经过处理的图像更自然,提高了图像处理的效果,也在美化图像的同时减少了用户的创造成本。具体到图像处理的应用程序中,能够基于第一网络自适应确定的目标平滑属性,实现一键式改变面部轮廓的流畅度,可以使图像中凹凸不平的轮廓变得平滑流畅。
参见图3,图3示出了根据本公开实施例的图像处理方法的示意性原理图。图3中,图像处理模型300的模型架构可以采用生成式对抗网络(Generative Adversarial Network,GAN),包括生成网络310和判别网络320。可以采用第一训练样本和预设监督策略对初始的生成对抗网络进行训练得到图像处理模型300。可以将图像处理模型300中的生成网络310作为实际应用中的轮廓平滑网络330,以对待处理的原始图像进行面部轮廓的平滑处理,并得到平滑处理后的目标图像。
在一些实施例中,基于第一训练样本可以包括至少一个样本对,每个样本对包括原始训练图像A和与原始训练图像A对应的面部矫正图像B。在一些实施例中,可以基于预设的轮廓平滑标准对原始训练图像A进行特征矫正,得到对应的面部矫正图像B。具体地,可以先获取第一数量(例如3000个)的原始训练图像A,原始训练图像A中包括符合图像质量要求的面部图像,例如该面部图像的分辨率不低于1024*1024像素。其中,第一训练样本中原始训练图像A可以是人脸图像,此时可以尽可能多地覆盖各中类型的面部图像,例如,可以覆盖男性和女性的面部、各个年龄段(例如20-80岁)的面部、各种角度的面部等,从而保证训练数据的丰富性,提高模型训练的准确度。然后,根据预设的面部轮廓效果标准,人工对该原始训练图像A的面部图像进行面部轮廓的特征矫正,使得原始训练图像A的面部轮廓变得平滑,从而得到对应的面部矫正图像B。应了解,在人工进行面部轮廓的特征矫正时,可以不对面部的诸如肤色、五官、肤质等其他特征进行处理,以保证训练得到图像处理模型仅改善面部轮廓的平滑度,而不会对面部的改动过大,影响面部图像的真实性,导致图像处理效果不自然。此外,第一训练样本中原始训练图像A还可以是动物 的面部图像,例如猫或狗的面部图像等。
在一些实施例中,可以基于第一训练样本和预设监督策略对生成对抗网络进行训练,得到轮廓平滑网络。为了保证训练过程中,图像处理模型300中的生成网络310能够自适应学习面部平滑的网络参数,可以设置预设监督策略,使用第一训练样本对图像处理模型300进行有监督训练,从而得到能够自适应地对输入图像匹配合适的目标平滑属性的生成网络310作为轮廓平滑网络330。
在一些实施例中,预设监督策略可以包括:设置所述生成对抗网络的输入图像的光照条件参数,以模拟所述输入图像的光照条件。进一步地,该光照条件参数可以是随机设置的。在训练阶段,输入至输出对抗网络的输入图像可以是第一训练样本中的图像,例如原始训练图像A和对应的面部矫正图像B。其中,基于光照条件参数来模拟图像处理模型300中输入图像的光照条件,能够增加输入数据的丰富度,这样图像处理模型300能够在训练过程中处理更多样化的输入图像,从而提高图像处理模型300的准确度。具体地,例如,可以采用gamma矫正算法来实现光照模拟,即可以通过设置gamma矫正算法的相关参数来设置光照条件参数。
在一些实施例中,预设监督策略可以包括:
所述生成网络基于所述第一训练样本中的所述原始训练图像生成对应的第一图像;
基于所述第一图像和所述原始训练图像分别计算所述生成网络和所述判别网络的交叉熵损失函数;
基于所述交叉熵损失函数调整所述判别网络的第一判别网络参数以使所述交叉熵损失函数最大化,以及基于所述交叉熵损失函数调整所述生成网络的第一生成网络参数以使所述交叉熵损失函数最小化。
具体地,生成网络310和判别网络320的交叉熵损失函数V(D,G)可以包括:关于第一对数函数的第一期望函数与关于第二对数函数的第二期望函数之和,其中,第一对数函数包括关于原始训练图像x的第一判别结果的对数函数,第二对数函数包括关于第一预设值(例如1)与针对原始训练图像x的生成结果的第二判别结果之差的对数函数。图像处理模型300的训练过程可以是生成网络310和判别网络320这两个网络单独且交替训练,例如可以先固定生成网络310且训练判别网络320以更新判别网络320的第一判别网络参数。此时,可 以调整判别网络320的第一判别网络参数,以使得判别网络320在其输入为面部矫正图像B时输出1(即D(B)=1),在其输入为第一图像A’时输出0(即D(A’)=0)。那么,图像处理模型300的训练目标是最大化交叉熵函数V(D,G)。
再固定判别网络320且训练生成网络310以更新生成网络310的第一生成网络参数。此时,可以调整生成网络310的第一生成网络参数,以使得将生成网络310输出的第一图像A’作为判别网络320的输入时,判别网络320输出1(即D(A’)=1)。那么,图像处理模型300的训练目标是最小化交叉熵函数V(D,G),由于此时判别网络320的第一网络参数不变,那么Ex[log D(x)]也不变,最小化交叉熵函数V(D,G)。
如此反复地执行固定生成网络310时训练判别网络320,以及固定判别网络320时训练生成网络310,直到达到纳什均衡。这样,能够使得生成网络310所生成的结果更加真实。
在一些实施例中,图像处理模型还包括与生成网络关联的平滑属性判别器,预设监督策略可以包括:
所述生成网络基于所述第一训练样本中的所述原始训练图像生成对应的第一图像;
基于所述原始训练图像、所述原始训练图像的第一平滑属性、所述面部矫正图像、所述面部矫正图像的第二平滑属性和所述第一图像计算所述生成网络和所述平滑属性判别器的平滑属性损失函数;
基于所述平滑属性损失函数调整所述生成网络的第二生成网络参数,使得所述第一图像与所述第二平滑属性相匹配。
具体地,如图4所示,图4示出了根据本公开实施例的图像处理模型的示意图。图4中,图像处理模型300还包括与生成网络310关联的平滑属性判别器340,还可以在图像处理模型300中设置平滑属性判别器340以计算由生成网络310和该平滑属性判别器340的平滑属性损失函数。可以将第一训练样本中的原始训练图像A与原始训练图像的第一平滑属性R1、面部矫正图像B与面部矫正图像的第二平滑属性R2、和第一图像A’作为平滑属性判别器340的输入数据成对输入,该输入数据可以表示为对象-属性对(Image,attr),其中Image表示输入图像,attr表示平滑属性。平滑属性判别器340对输入数据(Image,attr)进行匹配判定,如果Image和attr相匹配则判定结果为True,否则为False。例如Imgae为原始训练图像A的时候如果attr是原始训练图像A 对应的值R1则输出为True,Imgae为原始训练图像A的时候如果attr是面部矫正图像B对应的值R2则输出为False。在训练过程中,生成网络310和平滑属性判别器340单独且交替训练更新。
在更新平滑属性判别器340时,生成网络310是固定的,调整平滑属性判别器340的第二判别网络参数,以使得平滑属性判别器340的输入为(A,R1)或(B,R2)时,对应的输出为Dattr(A,R1)=Dattr(B,R2)=1(即True);输入为(B,R1)、(A,R2)或(A’,R2)时,对应的输出为Dattr(B,R1)=Dattr(A,R2)=Dattr(A’,R2)=0(即False)。可以将生成网络310和平滑属性判别器340的交叉熵函数作为平滑属性损失函数V(Dattr,G),包括:关于第三对数函数的第三期望函数与关于第四对数函数的第四期望函数之和,其中,第三对数函数包括关于原始训练图像x及其属性sttr的判别结果的对数函数,第四对数函数包括关于第一预设值(例如1)与原始训练图像x及其属性sttr的生成结果之差的对数函数。那么更新平滑属性判别器340时的目标是最大化平滑属性损失函数V(Dattr,G)。
在更新生成网络310时,平滑属性判别器340是固定的,调整生成网络310的第二生成网络参数,以使得平滑属性判别器340的输入为(A’,R2)时,对应的输出为Dattr(A’,R1)=Dattr(G(A),R1)=1(即True),即使得第一图像A’与面部矫正图像B的第二平滑属性相匹配,最终让生成网络310生成的第一图像A’符合面部矫正图像B的平滑属性,从而使得生成网络310具备自适应匹配输入图像的平滑属性的特性。这样,生成网络310针对其输入图像,都能够自适应地匹配与该输入图像相适应的平滑属性,以生成适合该输入图像的轮廓平滑处理图像,以实现在生成网络310作为轮廓平滑网络进行应用时,能够提高轮廓平滑处理的效果。
在一些实施例中,预设监督策略可以包括:
所述生成网络基于所述第一训练样本中的所述原始训练图像生成对应的第一图像;
基于所述第一图像和所述第一训练样本中的所述面部矫正图像计算特征矫正损失函数;
基于所述特征矫正损失函数调整基于所述特征矫正损失函数调整所述生成网络的第三生成网络参数和所述判别网络的第三判别网络参数,以使得所述特征矫正损失函数最小化。
进一步地,在一些实施例中,基于所述第一图像和所述第一训练样本中的所述面部矫正图像计算特征矫正损失函数,进一步包括:
对所述第一图像进行特征提取,得到第一高维语义特征和第一低维纹理特征;以及对所述第一训练样本中的所述面部矫正图像进行特征提取,得到第二高维语义特征和第二低维纹理特征;
基于所述第一高维语义特征和所述第二高维语义特征计算第一高维特征损失函数,以及基于所述第一低维纹理特征和所述第二低维纹理特征计算第一低维特征损失函数;
基于所述第一高维特征损失函数和所述第二低维特征损失函数之和得到所述特征矫正损失函数。
其中,高维语义特征可以指基于图像处理模型中深层网络所得到的特征,该高维语义特征可以靠近输出层,具有分辨率低、特征图的尺寸小、抽象程度高、包含更多全局信息等特点。低维纹理特征可以指基于图像处理模型中浅层网络所得到的特征,该低维纹理特征可以靠近输入层,具有分辨率较高,特征图的尺寸大、包含更多细节信息、容易与原始训练图像对齐等特点。通过对这两种特征进行矫正,能够将二者的优势相结合,从而提升图像处理模型的训练效果,以及面部轮廓平滑处理的效果。
具体地,生成网络310基于原始训练图像A生成对应的第一图像A’。可以通过图像处理模型300中的视觉处理器对第一图像A’进行特征提取,得到第一图像的第一高维语义特征F1_A’和第一低维纹理特征F2_A’;以及对第一训练样本中的面部矫正图像进行特征提取,得到第二高维语义特征F3和第二低维纹理特征F4。那么此时可以计算得到特征矫正损失函数L_F=第一高维特征损失函数l1(F1_A’,F3)+第一低维特征损失函数l1(F2_A’,F4),其中l1为平均绝对误差函数。
应了解,本公开中的第一生成网络参数、第二生成网络参数和第三生成网络参数均可以表示生成网络的模型参数,其可以相同或不同;第一判别网络参数、第二判别网络参数和第三判别网络参数均可以表示判别络的模型参数,其可以相同或不同。
在一些实施例中,预设监督策略可以包括:
基于原始训练图像的第一平滑属性和面部矫正图像的第二平滑属性确定所述原始训练图像的损失权重。
进一步地,基于原始训练图像的第一平滑属性和所述面部矫正图像的第二平滑属性确定所述原始训练图像的损失权重,包括:
基于平滑属性算法计算所述原始训练图像的第一平滑属性和所述面部矫正图像的第二平滑属性;
基于所述第一平滑属性和所述第二平滑属性计算所述原始训练图像的平滑改变程度;
基于所述平滑改变程度确定所述原始训练图像的损失权重,其中,所述原始训练图像的损失权重与所述平滑改变程度成正比。
在一些实施例中,基于所述第一平滑属性和所述第二平滑属性计算所述原始训练图像的平滑改变程度,可以包括:
计算所述第一平滑属性和所述第二平滑属性的属性差值;
基于所述属性差值的绝对值函数得到所述平滑改变程度。
具体地,可以分别计算原始训练图像A的第一平滑属性SA,面部矫正图像B的第二平滑属性SB,则原始训练图像A的平滑改变程度=abs(SB-SA),其中,abs为绝对值函数。由于平滑改变程度可以反映出每个训练样本需要进行面部轮廓改变的程度,对于面部重度凹凸不平的样本对应的损失函数权重应该更大,通过对损失权重的分配能够提升模型对平滑改变程度较大的样本的重视程度,同时减少对面部轮廓的平滑改变程度较小的样本的过度矫正,从而保证面部轮廓的平滑处理效果。那么可以据此确定原始训练图像A的损失权重与滑改变程度成正比。
经过上述预设监督策略中的一个或多个,对生成对抗网络进行训练得到训练好的图像处理模型。该训练好的图像处理模型字在训练过程中自适应学习训练样本的面部平滑的网络参数,能够自适应匹配适合输入图像的平滑属性。将其中的生成网络作为实际应用时对图像进行面部轮廓平滑处理的轮廓平滑网络,对用户输入的原始图像进行面部轮廓的处理,让面部轮廓变得平滑流畅。
在一些实施例中,基于第一训练样本和预设监督策略对生成对抗网络进行训练,得到所述轮廓平滑网络,还可以进一步包括:
基于第一训练样本和预设监督策略对生成对抗网络进行训练得到初步图像处理模型;
基于第二训练样本和所述预设监督策略对所述初步图像处理模型进行二次训练得到所述轮廓平滑网络。
进一步地,在一些实施例中,第二训练样本可以基于所述初步图像处理模型得到,具体包括:
获取包括多个面部图像的训练数据集,以及将所述面部图像输入至所述生成对抗网络中的生成网络得到第二图像;
将所述第二图像输入所述初步图像处理模型得到与所述第二图像对应的经过初步平滑处理的第三图像;
基于所述第二图像和对应的第三图像得到所述第二训练样本。
其中,由于第一训练样本的数据量较小,基于第一训练样本训练得到的初步图像处理模型(包括初步生成网络和初步判别网络)不太稳定,为了增加生成图像处理模型的稳定性,还需要使用该初步图像处理模型处理大批量的数据,得到大批量的二次训练数据集作为第二训练样本。然后使用该第二训练样本和预设监督策略对初步图像处理模型进行二次训练,得到更稳定的图像处理模型,并将该更稳定的图像处理模型中的生成网络作为轮廓平滑网络,从而提高了实际应用中面部轮廓处理的稳定性。
具体地,可以获取大批量的面部图像(例如,包括面部图像的开源数据集),将该大批量的面部图像输入至初始的生对抗网络中的初始生成网络以生成第二图像D,得到大批量数据集set_D,然后将该大批量数据集set_D输入基于第一训练样本训练得到的初步图像处理模型得到输出的第三图像E。由第二图像D与对应的第三图像E则组成新的训练数据对,作为第二训练样本,结合预设监督策略对初步图像处理模型进行二次训练,得到新的图像处理模型。可以将该新的图像处理模型中的生成网络作为实际应用的轮廓平滑网络(例如图4中的轮廓平滑网络330)。
参见图5,图5示出了根据本公开实施例的图像处理方法的流程示意图。图5中,图像处理方法500可以包括如下步骤。
步骤S510,获取待处理的原始面部图像;
步骤S520,基于第一网络对所述原始面部图像进行处理,得到所述原始面部图像的高维特征、低维特征和目标平滑属性;
步骤S530,所述第一网络基于所述目标平滑属性,对所述高维特征和所述低维特征进行轮廓平滑处理得到高维矫正特征和低维矫正特征;
步骤S540,基于所述高维矫正特征和所述低维矫正特征生成目标面部图像。
具体地,对于待处理的原始图像imageA,用户希望将原始图像imageA中的面部轮廓进行平滑处理。可以对该原始图像imageA进行面部关键点检测得到原始图像imageA的面部关键点P。并基于面部关键点P对原始图像imageA进行面部剪裁得到原始面部图像image_Face。将原始面部图像image_Face输入训练好的第一网络(例如图3中的轮廓平滑网络330),第一网络对原始面部图像image_Face进行特征提取得到原始面部图像image_Face的高维语义特征F1和低维纹理特征F2。由于训练好的第一网络能够基于输入数据自适应地确定与输入数据向匹配的目标平滑属性,可以对高维语义特征F1和低维纹理特征F2分别进行平滑处理得到高维语义矫正特征F1’和低维纹理矫正特征F2’。轮廓平滑网络再将高维语义矫正特征F1’和低维纹理矫正特征F2’生成平滑处理后的目标面部图像image_Face’。还可以进一步地将目标面部图像image_Face’融合至原始图像imageA中,得到对面部轮廓进行平滑处理的目标图像imageA’。根据本公开实施例的图像处理方法,基于轮廓平滑网络的自适应平滑属性对图像的高维语义特征和低维纹理特征进行矫正,以实现对图像中面部轮廓的平滑处理,而不改变图像中的其他区域的特征,使得经过处理的图像更自然,提高了图像处理的效果,也在美化图像的同时减少了用户的创造成本。具体到图像处理的应用程序中,能够基于轮廓平滑网络的自适应平滑属性,实现一键式改变面部轮廓的流畅度,可以使图像中凹凸不平的轮廓变得平滑流畅。
在实际应用中,用户可能不仅需对人像进行美化,还可能需要对动物图像(例如猫、狗等宠物)进行美化。根据本公开实施例的方法,不仅能够对人脸图像中的面部轮廓进行平滑处理,还能够对动物图像中的面部轮廓进行平滑处理,使得人像或动物图像中的面部轮廓更加平滑流畅,以及整个图像更加自然。
在一些实施例中,基于第一网络对所述原始面部图像进行处理,得到所述原始面部图像的高维特征和低维特征,包括:
所述第一网络对所述原始面部图像进行特征提取,得到所述原始面部图像的所述高维特征(例如原始面部图像image_Face的高维语义特征F1)和所述低维特征(例如原始面部图像image_Face的低维纹理特征F2);其中,所述高维特征为语义特征,所述低维特征为纹理特征。
具体地,高维特征可以是指高维语义特征,低维特征可以是指低维纹理特征。第一网络可以对原始面部图像image_Face进行特征提取得到原始面部图 像image_Face的高维语义特征F1和低维纹理特征F2。
在一些实施例中,基于第一网络对所述原始面部图像进行处理,得到所述原始面部图像的目标平滑属性,包括:
所述第一网络对所述原始面部图像进行面部轮廓检测,得到所述原始面部图像的面部轮廓特征点,并基于所述面部轮廓特征点确定所述目标平滑属性。
在一些实施例中,基于所述面部轮廓特征点确定所述目标平滑属性,包括:
基于所述面部轮廓特征点确定目标轮廓特征点,并基于所述目标轮廓特征点确定所述目标平滑属性;或,
基于所述面部轮廓特征点确定所述原始面部图像的原始平滑属性,并基于所述原始平滑属性确定所述目标平滑属性。
具体地,训练好的第一网络能够根据输入数据自适应地确定该输入数据的目标平滑属性,例如可以先确定输入数据的目标轮廓特征点,在基于该目标轮廓特征点得到对应的目标平滑属性,也可以先由输入数据的原始轮廓特征点确定原始平滑属性,再根据原始平滑属性来确定目标平滑属性。
在一些实施例中,基于所述目标平滑属性,对所述高维特征和所述低维特征进行轮廓平滑处理得到高维矫正特征和低维矫正特征,包括:
基于所述目标平滑属性设定所述高维特征和所述低维特征的损失函数权重,得到所述高维矫正特征和低维矫正特征。在一些实施例中,基于第一训练样本和预设监督策略对生成对抗网络进行训练,得到所述第一网络(例如图3中的轮廓平滑网络330);
其中,所述第一训练样本包括至少一个样本对,所述样本对包括原始训练图像(例如图3中的原始训练图像A)和对应的面部矫正图像(例如图3中的面部矫正图像B),所述面部矫正图像基于所述原始训练图像进行面部轮廓平滑处理得到。
在一些实施例中,所述生成对抗网络包括生成网络(例如图3中的生成网络310)和与所述生成网络关联的判别网络(例如图3中的判别网络320),所述预设监督策略包括:
所述生成网络基于所述第一训练样本中的所述原始训练图像生成对应的第一图像(例如第一图像A’);
基于所述第一图像和所述原始训练图像分别计算所述生成网络和所述判别网络的交叉熵损失函数(例如V(D,G));
基于所述交叉熵损失函数调整所述判别网络的第一判别网络参数以使所述交叉熵损失函数最大化(例如),以及基于所述交叉熵损失函数调整所述生成网络的第一生成网络参数以使所述交叉熵损失函数最小化(例如)。
在一些实施例中,所述生成对抗网络包括生成网络和与所述生成网络关联的平滑属性判别器(例如图4中的平滑属性判别器340),所述预设监督策略包括:
所述生成网络基于所述第一训练样本中的所述原始训练图像生成对应的第一图像;
基于所述原始训练图像、所述原始训练图像的第一平滑属性(例如第一平滑属性R1)、所述面部矫正图像、所述面部矫正图像的第二平滑属性(例如第二平滑属性R2)和所述第一图像计算所述生成网络和所述平滑属性判别器的平滑属性损失函数(例如V(Dattr,G));
基于所述平滑属性损失函数调整所述生成网络的第二生成网络参数,使得所述第一图像与所述第二平滑属性相匹配(例如Dattr(A’,R1)=Dattr(G(A),R1)=1)。
在一些实施例中,所述生成对抗网络包括生成网络和与所述生成网络关联的平滑属性判别器,所述预设监督策略包括:
所述生成网络基于所述第一训练样本中的所述原始训练图像生成对应的第一图像;
基于所述第一图像和所述第一训练样本中的所述面部矫正图像计算特征矫正损失函数(例如特征矫正损失函数L_F);
基于所述特征矫正损失函数调整所述生成网络的第三生成网络参数和所述判别网络的第三判别网络参数,以使得所述特征矫正损失函数最小化。
在一些实施例中,基于所述第一图像和所述第一训练样本中的所述面部矫正图像计算特征矫正损失函数,进一步包括:
对所述第一图像进行特征提取,得到第一高维语义特征和第一低维纹理特征;以及对所述第一训练样本中的所述面部矫正图像进行特征提取,得到第二高维语义特征和第二低维纹理特征;
基于所述第一高维语义特征和所述第二高维语义特征计算第一高维特征 损失函数,以及基于所述第一低维纹理特征和所述第二低维纹理特征计算第一低维特征损失函数;
基于所述第一高维特征损失函数和所述第二低维特征损失函数之和得到所述特征矫正损失函数。
在一些实施例中,所述预设监督策略包括:
基于原始训练图像的第一平滑属性和面部矫正图像的第二平滑属性确定所述原始训练图像的损失权重。
在一些实施例中,基于原始训练图像的第一平滑属性和所述面部矫正图像的第二平滑属性确定所述原始训练图像的损失权重,包括:
基于平滑属性算法计算所述原始训练图像的第一平滑属性和所述面部矫正图像的第二平滑属性;
基于所述第一平滑属性(例如第一平滑属性SA)和所述第二平滑属性(例如第二平滑属性SB)计算所述原始训练图像的平滑改变程度(例如平滑改变程度=abs(SB-SA));
基于所述平滑改变程度确定所述原始训练图像的损失权重,其中,所述原始训练图像的损失权重与所述平滑改变程度成正比。
在一些实施例中,基于所述第一平滑属性和所述第二平滑属性计算所述原始训练图像的平滑改变程度,包括:
计算所述第一平滑属性和所述第二平滑属性的属性差值;
基于所述属性差值的绝对值函数得到所述平滑改变程度。
在一些实施例中,基于第一训练样本和预设监督策略对生成对抗网络进行训练,得到所述轮廓平滑网络,还进一步包括:
基于第一训练样本和预设监督策略对生成对抗网络进行训练得到初步图像处理模型;
基于第二训练样本和所述预设监督策略对所述初步图像处理模型进行二次训练得到所述轮廓平滑网络(例如图3-4中的轮廓平滑网络330);
其中,第二训练样本基于所述初步图像处理模型得到,具体包括:
获取包括多个面部图像的训练数据集,以及将所述面部图像输入至所述生成对抗网络中的生成网络得到第二图像(例如第二图像D);
将所述第二图像输入所述初步图像处理模型得到与所述第二图像对应的经过初步平滑处理的第三图像(例如第三图像E);
基于所述第二图像和对应的第三图像得到所述第二训练样本。
在一些实施例中,所述预设监督策略包括:设置所述生成对抗网络的输入图像的光照条件参数,以模拟所述输入图像的光照条件。
需要说明的是,本公开实施例的方法可以由单个设备执行,例如一台计算机或服务器等。本实施例的方法也可以应用于分布式场景下,由多台设备相互配合来完成。在这种分布式场景的情况下,这多台设备中的一台设备可以只执行本公开实施例的方法中的某一个或多个步骤,这多台设备相互之间会进行交互以完成所述的方法。
需要说明的是,上述对本公开的一些实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于上述实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。
基于同一技术构思,与上述任意实施例方法相对应的,本公开还提供了一种图像处理装置,参见图6,所述图像处理装置包括:
获取模块,用于获取待处理的原始面部图像;
第一网络,用于对所述原始面部图像进行处理,得到所述原始面部图像的高维特征、低维特征和目标平滑属性;所述第一网络基于所述目标平滑属性,对所述高维特征和所述低维特征进行轮廓平滑处理得到高维矫正特征和低维矫正特征;以及基于所述高维矫正特征和所述低维矫正特征生成目标面部图像。
为了描述的方便,描述以上装置时以功能分为各种模块分别描述。当然,在实施本公开时可以把各模块的功能在同一个或多个软件和/或硬件中实现。
上述实施例的装置用于实现前述任一实施例中相应的图像处理方法,并且具有相应的方法实施例的有益效果,在此不再赘述。
基于同一技术构思,与上述任意实施例方法相对应的,本公开还提供了一种非暂态计算机可读存储介质,所述非暂态计算机可读存储介质存储计算机指令,所述计算机指令用于使所述计算机执行如上任一实施例所述的图像处理方法。
本实施例的计算机可读介质包括永久性和非永久性、可移动和非可移动媒 体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。
上述实施例的存储介质存储的计算机指令用于使所述计算机执行如上任一实施例所述的图像处理方法,并且具有相应的方法实施例的有益效果,在此不再赘述。
所属领域的普通技术人员应当理解:以上任何实施例的讨论仅为示例性的,并非旨在暗示本公开的范围(包括权利要求)被限于这些例子;在本公开的思路下,以上实施例或者不同实施例中的技术特征之间也可以进行组合,步骤可以以任意顺序实现,并存在如上所述的本公开实施例的不同方面的许多其它变化,为了简明它们没有在细节中提供。
另外,为简化说明和讨论,并且为了不会使本公开实施例难以理解,在所提供的附图中可以示出或可以不示出与集成电路(IC)芯片和其它部件的公知的电源/接地连接。此外,可以以框图的形式示出装置,以便避免使本公开实施例难以理解,并且这也考虑了以下事实,即关于这些框图装置的实施方式的细节是高度取决于将要实施本公开实施例的平台的(即,这些细节应当完全处于本领域技术人员的理解范围内)。在阐述了具体细节(例如,电路)以描述本公开的示例性实施例的情况下,对本领域技术人员来说显而易见的是,可以在没有这些具体细节的情况下或者这些具体细节有变化的情况下实施本公开实施例。因此,这些描述应被认为是说明性的而不是限制性的。
尽管已经结合了本公开的具体实施例对本公开进行了描述,但是根据前面的描述,这些实施例的很多替换、修改和变型对本领域普通技术人员来说将是显而易见的。例如,其它存储器架构(例如,动态RAM(DRAM))可以使用所讨论的实施例。
本公开实施例旨在涵盖落入所附权利要求的宽泛范围之内的所有这样的替换、修改和变型。因此,凡在本公开实施例的精神和原则之内,所做的任何 省略、修改、等同替换、改进等,均应包含在本公开的保护范围之内。

Claims (11)

  1. 一种图像处理方法,包括:
    获取待处理的原始面部图像;
    基于第一网络对所述原始面部图像进行处理,得到所述原始面部图像的高维特征、低维特征和目标平滑属性;
    基于所述目标平滑属性,所述第一网络对所述高维特征和所述低维特征进行轮廓平滑处理得到高维矫正特征和低维矫正特征;
    基于所述高维矫正特征和所述低维矫正特征生成目标面部图像。
  2. 根据权利要求1所述的方法,其中,基于第一网络对所述原始面部图像进行处理,得到所述原始面部图像的高维特征和低维特征,包括:
    所述第一网络对所述原始面部图像进行特征提取,得到所述原始面部图像的所述高维特征和所述低维特征;其中,所述高维特征为语义特征,所述低维特征为纹理特征。
  3. 根据权利要求1所述的方法,其中,基于第一网络对所述原始面部图像进行处理,得到所述原始面部图像的目标平滑属性,包括:
    所述第一网络对所述原始面部图像进行面部轮廓检测,得到所述原始面部图像的面部轮廓特征点,并基于所述面部轮廓特征点确定所述目标平滑属性。
  4. 根据权利要求3所述的方法,其中,基于所述面部轮廓特征点确定所述目标平滑属性,包括:
    基于所述面部轮廓特征点确定目标轮廓特征点,并基于所述目标轮廓特征点确定所述目标平滑属性;或,
    基于所述面部轮廓特征点确定所述原始面部图像的原始平滑属性,并基于所述原始平滑属性确定所述目标平滑属性。
  5. 根据权利要求1所述的方法,其中,基于所述目标平滑属性,对所述高维特征和所述低维特征进行轮廓平滑处理得到高维矫正特征和低维矫正特征,包括:
    基于所述目标平滑属性设定所述高维特征和所述低维特征的损失函数权 重,得到所述高维矫正特征和低维矫正特征。
  6. 根据权利要求1所述的方法,其中,所述第一网络包括生成网络和与所述生成网络关联的平滑属性判别器,包括:
    所述生成网络基于第一训练样本中的原始训练图像生成对应的第一图像;
    基于所述原始训练图像、所述原始训练图像的第一平滑属性、面部矫正图像、所述面部矫正图像的第二平滑属性和所述第一图像计算所述生成网络和所述平滑属性判别器的平滑属性损失函数;
    基于所述平滑属性损失函数调整所述生成网络的第一生成网络参数和所述平滑属性判别器的第一判别网络参数,使得所述第一图像与所述第二平滑属性相匹配;
    其中,所述第一训练样本包括至少一个样本对,所述样本对包括原始训练图像和对应的面部矫正图像,所述面部矫正图像基于所述原始训练图像进行面部轮廓平滑处理得到。
  7. 根据权利要求6的方法,其中,所述第一网络还包括与所述生成网络关联的判别网络包括:
    所述生成网络基于所述第一训练样本中的所述原始训练图像生成对应的第一图像;
    对所述第一图像进行特征提取,得到第一高维语义特征和第一低维纹理特征;以及对所述第一训练样本中的所述人脸矫正图像进行特征提取,得到第二高维语义特征和第二低维纹理特征;
    基于所述第一高维语义特征和所述第二高维语义特征计算第一高维特征损失函数,以及基于所述第一低维纹理特征和所述第二低维纹理特征计算第一低维特征损失函数;
    基于所述第一高维特征损失函数和所述第二低维特征损失函数之和得到所述特征矫正损失函数;
    基于所述特征矫正损失函数调整所述生成网络的第二生成网络参数和所述判别网络的第二判别网络参数,以使得所述特征矫正损失函数最小化。
  8. 一种图像处理装置,包括:
    获取模块,用于获取待处理的原始面部图像;
    第一网络,用于对所述原始面部图像进行处理,得到所述原始面部图像的高维特征、低维特征和目标平滑属性;所述第一网络基于所述目标平滑属性,对所述高维特征和所述低维特征进行轮廓平滑处理得到高维矫正特征和低维矫正特征;以及基于所述高维矫正特征和所述低维矫正特征生成目标面部图像。
  9. 一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如权利要求1至7任意一项所述的方法。
  10. 一种非暂态计算机可读存储介质,所述非暂态计算机可读存储介质存储计算机指令,所述计算机指令用于使计算机执行权利要求1至7任一所述方法。
  11. 一种计算机程序产品,包括计算机程序指令,当所述计算机程序指令在计算机上运行时,使得计算机执行权利要求1至7任一所述的方法。
PCT/CN2023/124980 2022-10-28 2023-10-17 图像处理方法、装置、设备、介质及程序产品 WO2024088111A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211339290.3 2022-10-28
CN202211339290.3A CN115641276A (zh) 2022-10-28 2022-10-28 图像处理方法、装置、设备、介质及程序产品

Publications (1)

Publication Number Publication Date
WO2024088111A1 true WO2024088111A1 (zh) 2024-05-02

Family

ID=84947699

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/124980 WO2024088111A1 (zh) 2022-10-28 2023-10-17 图像处理方法、装置、设备、介质及程序产品

Country Status (2)

Country Link
CN (1) CN115641276A (zh)
WO (1) WO2024088111A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115641276A (zh) * 2022-10-28 2023-01-24 北京字跳网络技术有限公司 图像处理方法、装置、设备、介质及程序产品

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110070484A (zh) * 2019-04-02 2019-07-30 腾讯科技(深圳)有限公司 图像处理、图像美化方法、装置和存储介质
CN112862712A (zh) * 2021-02-01 2021-05-28 广州方图科技有限公司 美颜处理方法、系统、存储介质和终端设备
CN114092354A (zh) * 2021-11-25 2022-02-25 中国农业银行股份有限公司四川省分行 一种基于生成对抗网络的人脸图像修复方法
WO2022166897A1 (zh) * 2021-02-07 2022-08-11 北京字跳网络技术有限公司 脸型调整图像生成方法、模型训练方法、装置和设备
CN115641276A (zh) * 2022-10-28 2023-01-24 北京字跳网络技术有限公司 图像处理方法、装置、设备、介质及程序产品

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110070484A (zh) * 2019-04-02 2019-07-30 腾讯科技(深圳)有限公司 图像处理、图像美化方法、装置和存储介质
CN112862712A (zh) * 2021-02-01 2021-05-28 广州方图科技有限公司 美颜处理方法、系统、存储介质和终端设备
WO2022166897A1 (zh) * 2021-02-07 2022-08-11 北京字跳网络技术有限公司 脸型调整图像生成方法、模型训练方法、装置和设备
CN114092354A (zh) * 2021-11-25 2022-02-25 中国农业银行股份有限公司四川省分行 一种基于生成对抗网络的人脸图像修复方法
CN115641276A (zh) * 2022-10-28 2023-01-24 北京字跳网络技术有限公司 图像处理方法、装置、设备、介质及程序产品

Also Published As

Publication number Publication date
CN115641276A (zh) 2023-01-24

Similar Documents

Publication Publication Date Title
US11798132B2 (en) Image inpainting method and apparatus, computer device, and storage medium
CN107025457B (zh) 一种图像处理方法和装置
US20180204094A1 (en) Image recognition method and apparatus
WO2021012590A1 (zh) 面部表情迁移方法、装置、存储介质及计算机设备
WO2020248841A1 (zh) 图像的au检测方法、装置、电子设备及存储介质
CN112836064A (zh) 知识图谱补全方法、装置、存储介质及电子设备
WO2021164550A1 (zh) 图像分类方法及装置
WO2024088111A1 (zh) 图像处理方法、装置、设备、介质及程序产品
US20230100427A1 (en) Face image processing method, face image processing model training method, apparatus, device, storage medium, and program product
US20230143452A1 (en) Method and apparatus for generating image, electronic device and storage medium
US20240046538A1 (en) Method for generating face shape adjustment image, model training method, apparatus and device
WO2021223738A1 (zh) 模型参数的更新方法、装置、设备及存储介质
JP2020507159A (ja) ピクチャプッシュの方法、移動端末および記憶媒体
US20200273198A1 (en) Method and apparatus for determining position of pupil
CN108492301A (zh) 一种场景分割方法、终端及存储介质
US20230047748A1 (en) Method of fusing image, and method of training image fusion model
CN115205925A (zh) 表情系数确定方法、装置、电子设备及存储介质
CN110211017A (zh) 图像处理方法、装置及电子设备
CN114792355A (zh) 虚拟形象生成方法、装置、电子设备和存储介质
WO2024193438A1 (zh) 一种表情驱动方法、装置、设备及介质
CN117252791A (zh) 图像处理方法、装置、电子设备及存储介质
CN116977544A (zh) 图像处理方法、装置、设备及存储介质
TWM586599U (zh) 人工智慧雲端膚質與皮膚病灶辨識系統
WO2022178975A1 (zh) 基于噪声场的图像降噪方法、装置、设备及存储介质
CN114926322A (zh) 图像生成方法、装置、电子设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23881690

Country of ref document: EP

Kind code of ref document: A1