WO2021147938A1

WO2021147938A1 - Systems and methods for image processing

Info

Publication number: WO2021147938A1
Application number: PCT/CN2021/073018
Authority: WO
Inventors: Ningning Zhao; Tianming Zhang; Yuanhao GUO; Xiubao Zhang; Haifeng Shen
Original assignee: Beijing Didi Infinity Technology And Development Co., Ltd.
Priority date: 2020-01-22
Filing date: 2021-01-21
Publication date: 2021-07-29

Abstract

Systems and methods for image processing. The methods may include obtaining a first facial image (510); determining, based on a trained image quality determination model, a value of each of one or more quality features of the first facial image (520); determining, based on the one or more values of the one or more quality features, whether the first facial image satisfies a quality condition (530); obtaining a correction condition of the first facial image (540); obtaining a trained correction model, wherein the trained correction model includes a trained first generator (550); and in response to determining that the first facial image satisfies the quality condition, generating a second facial image that satisfies the correction condition by correcting the first facial image based on the trained correction model and the correction condition (560).

Description

SYSTEMS AND METHODS FOR IMAGE PROCESSING

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to Chinese Patent Application No. 202010073524.9 filed on January 22, 2020, and Chinese Patent Application No. 202010176670.4 filed on March 13, 2020, the contents of each of which are incorporated herein by reference in their entirety.

TECHNICAL FIELD

The present disclosure generally relates to image processing technology, and in particular, to systems and methods for processing facial images.

BACKGROUND

With the development of science and technology, the face recognition technology has been widely used in various scenarios including, for example, car-hailing, finance, access control, or the like. Commonly, a facial image of a subject may be captured by an image capturing device (e.g., a camera) . Further, a face recognition operation may be performed on the facial image for identifying or verifying the subject’s identify. However, in some situations, the facial image may have a poor quality, which may increase the difficulty and reduce the accuracy of face recognition. For example, the facial image may be captured from an undesired shooting angle, or have a blur and/or an occlusion on the face of the subject. Therefore, it is desirable to provide improved systems and methods for processing facial images, thereby improving the efficiency and the accuracy of face recognition.

SUMMARY

An aspect of the present disclosure relates to a method for image processing. The method may be implemented on a computing device having at least one storage device storing a set of instructions, and at least one processor in communication with the at least one storage device. The method may include obtaining a first facial image; determining, based on a trained image quality determination model, a value of each of one or more quality features of the first facial image; determining, based on the one or more values of the one or more quality features, whether the first facial image satisfies a quality condition; obtaining a correction condition of the first facial image; obtaining a trained correction model, wherein the trained correction model includes a trained first generator; and in response to determining that the first facial image satisfies the quality condition, generating a second facial image that satisfies the correction condition by correcting the first facial image based on the trained correction model and the correction condition.

In some embodiments, the one or more quality features of the first facial image may include a norm, and the trained image quality determination model may be trained according to a model training process including obtaining a preliminary model; obtaining a first training sample set including a plurality of first sample images. Each of the plurality of first sample images may have one or more label values of the one or more quality features of the first sample image, and the label value of the norm may be determined based on a trained image analysis model. And the model training process may further include obtaining the trained image quality determination model by training the preliminary model using the first training sample set.

In some embodiments, at least one of the trained image quality determination model or the trained image analysis model may include a trained convolutional neural network model.

In some embodiments, the trained image quality determination model and the trained image analysis model may share one or more convolutional layers.

In some embodiments, the determining, based on the one or more values of the one or more quality features, whether the first facial image satisfies a quality condition may include determining a weight coefficient of each of the one or more quality features; determining, based on the one or more weight coefficients and the one or more values of the one or more quality features, a quality evaluation value of the first facial image; and determining whether the first facial image satisfies the quality condition based on the quality evaluation value.

In some embodiments, the determining the weight coefficient of each of the one or more quality features may include obtaining a test sample set including a plurality of second sample images; determining a preliminary weight coefficient of each of the one or more quality features; for each of the plurality of second sample images, determining, based on the trained quality determination model, a sample value of each of one or more quality features of the second sample image; and for each of the one or more quality features, determining, based on the preliminary weight coefficient of the quality feature and a Bayes algorithm, an optimized weight coefficient of the quality feature.

In some embodiments, the one or more quality features of the first facial image may include a blurring degree of the first facial image, a proportion of a target subject that is occluded in the first facial image, a brightness of the first facial image, a shooting angle of the first facial image, a completeness of the target subject in the first facial image, a posture of the target subject in the first facial image, or the like, or any combination thereof.

In some embodiments, the trained first generator may be a sub-model of a trained conditional Generative Adversarial network (C-GAN) model, the trained C-GAN model may further include a trained second generator, a trained first discriminator, and a trained second discriminator. And the trained correction model may be generated according to a model training process including obtaining a second training sample set including a plurality of image pairs. Each of the plurality of image pairs may include a third sample image and a fourth sample image of a same sample face, the third sample image may satisfy a first correction condition, and the fourth sample image may satisfy a second correction condition. And the model training process may further include training, based on the second training sample set, a second preliminary model by optimizing a loss function. The second preliminary model may include a first generator, a second generator, a first discriminator, and a second discriminator. And the loss function may include a first loss function relating to the first generator, a second loss function relating to the first discriminator, a third loss function relating to the second generator, and a fourth loss function relating to the second discriminator.

In some embodiments, the training a second preliminary model by optimizing a loss function includes an iterative operation including one or more iterations, and at least one of the one or more iterations may include obtaining an updated second preliminary model generated in a previous iteration. The updated second preliminary model may include an updated first generator, an updated second generator, an updated first discriminator, and an updated second discriminator. The at least one of the one or more iterations may further include for each of the plurality of image pairs, generating, based on the third sample image of the image pair and the second correction condition, a first predicted image using the updated first generator; generating, based on the first predicted image and the fourth sample image of the image pair, a first discrimination result using the updated first discriminator; generating, based on the first predicted image and the first correction condition of the image pair, a second predicted image using the updated second generator; and generating, based on the second predicted image and the third sample image of the image pair, a second discrimination result using the updated second discriminator. The at least one of the one or more iterations may further include determining a value of the loss function based on the first predicted image, the first discrimination result, the second predicted image, and the second discrimination result of each of the plurality of image pairs; and evaluating the updated second preliminary model based on the value of the loss function.

In some embodiments, the determining a value of the loss function based on the first predicted image, the first discrimination result, the second predicted image, and the second discrimination result of each of the plurality of image pairs may include determining, based on the third sample image and the first predicted image of each of the plurality of image pairs, a first loss value of the first loss function; determining, based on the first discrimination result of each of the plurality of image pairs, a second loss value of the second loss function; determining, based on the third sample image and the second predicted image of each of the plurality of image pairs, a third loss value of the third loss function; determining, based on the second discrimination result of each of the plurality of image pairs, a fourth loss value of the fourth loss function; and determining the value of the loss function based on the first loss value, the second loss value, the third loss value, and the fourth loss value using a weighted sum algorithm.

In some embodiments, in the weighted sum algorithm, a weight coefficient corresponding to the fourth loss value may be larger than an average of the weighting coefficients corresponding to the first loss value, the second loss value, and the third loss value.

In some embodiments, the first generator and the second generator may be the same model.

In some embodiments, the first discriminator and the second discriminator may be the same model.

In some embodiments, the first facial image may include a human face, and the correction condition of the first facial image may relate to an orientation of the human face.

An aspect of the present disclosure relates to a system for image processing. The system may include at least one storage medium including a set of instructions and at least one processor in communication with the at least one storage medium. When executing the set of instructions, the at least one processor may be directed to cause the system to obtain a first facial image; determine, based on a trained image quality determination model, a value of each of one or more quality features of the first facial image; determine, based on the one or more values of the one or more quality features, whether the first facial image satisfies a quality condition; obtain a correction condition of the first facial image; obtain a trained correction model, wherein the trained correction model includes a trained first generator; and in response to determining that the first facial image satisfies the quality condition, generate a second facial image that satisfies the correction condition by correcting the first facial image based on the trained correction model and the correction condition.

In some embodiments, the one or more quality features of the first facial image may include a norm, and to obtain the trained image quality determination model, the at least one processor may be directed to cause the system to obtain a preliminary model; obtain a first training sample set including a plurality of first sample images. Each of the plurality of first sample images may have one or more label values of the one or more quality features of the first sample image, and the label value of the norm may be determined based on a trained image analysis model. And to obtain the trained image quality determination model, the at least one processor may be directed to cause the system further to obtain the trained image quality determination model by training the preliminary model using the first training sample set.

In some embodiments, to determine, based on the one or more values of the one or more quality features, whether the first facial image satisfies a quality condition, the at least one processor may be directed to cause the system to determine a weight coefficient of each of the one or more quality features; determine, based on the one or more weight coefficients and the one or more values of the one or more quality features, a quality evaluation value of the first facial image; and determine whether the quality evaluation value of the first facial image satisfies the quality condition.

In some embodiments, to determine the weight coefficient of each of the one or more quality features, the at least one processor may be directed to cause the system to obtain a test sample set including a plurality of second sample images; determine a preliminary weight coefficient of each of the one or more quality features; for each of the plurality of second sample images, determine, based on the trained quality determination model, a sample value of each of one or more quality features of the second sample image; and for each of the one or more quality features, determine, based on the preliminary weight coefficient of the quality feature and a Bayes algorithm, an optimized weight coefficient of the quality feature.

In some embodiments, the trained first generator may be a sub-model of a trained conditional Generative Adversarial network (C-GAN) model, the trained C-GAN model may further include a trained second generator, a trained first discriminator, and a trained second discriminator. And to obtain the trained correction model, the at least one processor may be directed to cause the system to obtain a second training sample set including a plurality of image pairs. Each of the plurality of image pairs may include a third sample image and a fourth sample image of a same sample face, the third sample image satisfies a first correction condition, and the fourth sample image satisfies a second correction condition. And to obtain the trained correction model, the at least one processor may be directed to cause the system to train, based on the second training sample set, a second preliminary model by optimizing a loss function. The second preliminary model may include a first generator, a second generator, a first discriminator, and a second discriminator, and the loss function may include a first loss function relating to the first generator, a second loss function relating to the first discriminator, a third loss function relating to the second generator, and a fourth loss function relating to the second discriminator.

In some embodiments, training a second preliminary model by optimizing a loss function may include an iterative operation including one or more iterations. And in at least one of the one or more iterations, the at least one processor may be directed to cause the system to obtain an updated second preliminary model generated in a previous iteration. The updated second preliminary model may include an updated first generator, an updated second generator, an updated first discriminator, and an updated second discriminator. The at least one processor may be directed to cause the system further to, for each of the plurality of image pairs, generate, based on the third sample image of the image pair and the second correction condition, a first predicted image using the updated first generator; generate, based on the first predicted image and the fourth sample image of the image pair, a first discrimination result using the updated first discriminator; generate, based on the first predicted image and the first correction condition of the image pair, a second predicted image using the updated second generator; and generate, based on the second predicted image and the third sample image of the image pair, a second discrimination result using the updated second discriminator. The at least one processor may be directed to cause the system further to determine a value of the loss function based on the first predicted image, the first discrimination result, the second predicted image, and the second discrimination result of each of the plurality of image pairs; and evaluate the updated second preliminary model based on the value of the loss function.

In some embodiments, to determine a value of the loss function based on the first predicted image, the first discrimination result, the second predicted image, and the second discrimination result of each of the plurality of image pairs, the at least one processor may be directed to cause the system to determine, based on the third sample image and the first predicted image of each of the plurality of image pairs, a first loss value of the first loss function; determine, based on the first discrimination result of each of the plurality of image pairs, a second loss value of the second loss function; determine, based on the third sample image and the second predicted image of each of the plurality of image pairs, a third loss value of the third loss function; determine, based on the second discrimination result of each of the plurality of image pairs, a fourth loss value of the fourth loss function; and determine the value of the loss function based on the first loss value, the second loss value, the third loss value, and the fourth loss value using a weighted sum algorithm.

In some embodiments, in the weighted sum algorithm, a weight coefficient of the fourth loss value may be larger than an average of the weighting coefficients of the first loss value, the second loss value, and the third loss value.

A still further aspect of the present disclosure relates to a non-transitory computer readable medium including executable instructions that, when executed by at least one processor, direct the at least one processor to perform a method. The method may include obtaining a first facial image; determining, based on a trained image quality determination model, a value of each of one or more quality features of the first facial image; determining, based on the one or more values of the one or more quality features, whether the first image satisfies a quality condition; obtaining a correction condition of the first image; obtaining a trained correction model, wherein the trained correction model includes a trained first generator; and in response to determining that the first image satisfies the quality condition, generating a second facial image that satisfies the correction condition by correcting the first image based on the trained correction model and the correction condition.

Additional features will be set forth in part in the description which follows, and in part will become apparent to those skilled in the art upon examination of the following and the accompanying drawings or may be learned by production or operation of the examples. The features of the present disclosure may be realized and attained by practice or use of various aspects of the methodologies, instrumentalities, and combinations set forth in the detailed examples discussed below.

BRIEF DESCRIPTION OF THE DRAWINGS

The present disclosure is further described in terms of exemplary embodiments. These exemplary embodiments are described in detail with reference to the drawings. These embodiments are non-limiting exemplary embodiments, in which like reference numerals represent similar structures throughout the several views of the drawings, and wherein:

FIG. 1A is a schematic diagram illustrating an exemplary image processing system according to some embodiments of the present disclosure;

FIG. 1 B is a schematic diagram illustrating an exemplary car-hailing service system according to some embodiments of the present disclosure.

FIG. 2 is a schematic diagram illustrating exemplary hardware and/or software components of an exemplary computing device according to some embodiments of the present disclosure;

FIG. 3 is a schematic diagram illustrating exemplary hardware and/or software components of an exemplary terminal device according to some embodiments of the present disclosure;

FIG. 4A is a block diagram illustrating an exemplary processing device according to some embodiments of the present disclosure;

FIG. 4B is a block diagram illustrating another exemplary processing device according to some embodiments of the present disclosure;

FIG. 5 is a flowchart illustrating an exemplary process for generating a second facial image satisfying a correction condition according to some embodiments of the present disclosure;

FIG. 6 is a flowchart illustrating an exemplary process for obtaining a trained image quality determination model according to some embodiments of the present disclosure;

FIG. 7 is a flowchart illustrating an exemplary process for determining a weight coefficient according to some embodiments of the present disclosure;

FIG. 8 is a flowchart illustrating an exemplary process for obtaining a trained correction model according to some embodiments of the present disclosure;

FIG. 9 is a schematic diagram illustrating an exemplary conditional generative adversarial network model according to some embodiments of the present disclosure;

FIG. 10 is a schematic diagram illustrating an exemplary current second iteration in an iterative operation for training a second preliminary model according to some embodiments of the present disclosure; and

FIG. 11 is a flowchart illustrating an exemplary process for determining a value of a loss function according to some embodiments of the present disclosure.

DETAILED DESCRIPTION

In the following detailed description, numerous specific details are set forth by way of examples in order to provide a thorough understanding of the relevant disclosure. However, it should be apparent to those skilled in the art that the present disclosure may be practiced without such details. In other instances, well-known methods, procedures, systems, components, and/or circuitry have been described at a relatively high-level, without detail, in order to avoid unnecessarily obscuring aspects of the present disclosure. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present disclosure. Thus, the present disclosure is not limited to the embodiments shown, but to be accorded the widest scope consistent with the claims.

It will be understood that the terms “system, ” “engine, ” “unit, ” “module, ” and/or “block” used herein are one method to distinguish different components, elements, parts, sections, or assemblies of different levels in ascending order. However, the terms may be displaced by other expression if they may achieve the same purpose.

Generally, the words “module, ” “unit, ” or “block” used herein, refer to logic embodied in hardware or firmware, or to a collection of software instructions. A module, a unit, or a block described herein may be implemented as software and/or hardware and may be stored in any type of non-transitory computer-readable medium or other storage device. In some embodiments, a software module/unit/block may be compiled and linked into an executable program. It will be appreciated that software modules can be callable from other modules/units/blocks or from themselves, and/or may be invoked in response to detected events or interrupts. Software modules/units/blocks configured for execution on computing devices (e.g., processor 210 illustrated in FIG. 2) may be provided on a computer readable medium, such as a compact disc, a digital video disc, a flash drive, a magnetic disc, or any other tangible medium, or as a digital download (and can be originally stored in a compressed or installable format that needs installation, decompression, or decryption prior to execution) . Such software code may be stored, partially or fully, on a storage device of the executing computing device, for execution by the computing device. Software instructions may be embedded in firmware, such as an EPROM. It will be further appreciated that hardware modules (or units or blocks) may be included in connected logic components, such as gates and flip-flops, and/or can be included in programmable units, such as programmable gate arrays or processors. The modules (or units or blocks) or computing device functionality described herein may be implemented as software modules (or units or blocks) , but may be represented in hardware or firmware. In general, the modules (or units or blocks) described herein refer to logical modules (or units or blocks) that may be combined with other modules (or units or blocks) or divided into sub-modules (or sub-units or sub-blocks) despite their physical organization or storage.

It will be understood that when a unit, an engine, a module, or a block is referred to as being “on, ” “connected to, ” or “coupled to” another unit, engine, module, or block, it may be directly on, connected or coupled to, or communicate with the other unit, engine, module, or block, or an intervening unit, engine, module, or block may be present, unless the context clearly indicates otherwise. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.

The terminology used herein is for the purposes of describing particular examples and embodiments only and is not intended to be limiting. As used herein, the singular forms “a, ” “an, ” and “the” may be intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “include” and/or “comprise, ” when used in this disclosure, specify the presence of integers, devices, behaviors, stated features, steps, elements, operations, and/or components, but do not exclude the presence or addition of one or more other integers, devices, behaviors, features, steps, elements, operations, components, and/or groups thereof.

In addition, it should be understood that in the description of the present disclosure, the terms “first” , “second” , or the like, are only used for the purpose of differentiation, and cannot be interpreted as indicating or implying relative importance, nor can be understood as indicating or implying the order.

The flowcharts used in the present disclosure illustrate operations that systems implement according to some embodiments of the present disclosure. It is to be expressly understood, the operations of the flowcharts may be implemented not in order. Conversely, the operations may be implemented in an inverted order, or simultaneously. Moreover, one or more other operations may be added to the flowcharts. One or more operations may be removed from the flowcharts.

Facial images of a subject often need to be captured and analyzed for, for example, identify verification. For example, in order to improve the safety of a car-hailing platform, a driver’s image may be captured to verify the driver’s identity. For example, a face recognition model may be used to recognize a facial image of the driver acquired by an acquisition device installed on a vehicle of the driver. However, the facial image may be a poor quality. For example, the facial image may be captured with an undesired shooting angle, and the human face in the facial image may not be a front face. In addition, a movement of the driver and/or the acquisition device may result in a blurry facial image or a facial image that only includes part of the driver’s face, which may increase the difficulty and reduce the accuracy of face recognition. Therefore, it is desired to provide systems and methods for evaluating the image quality of a facial image and/or generating a facial image with a desired image quality.

An aspect of the present disclosure relates to systems and methods for image processing. The system may obtain a first facial image. The system may determine a value of each of one or more quality features of the first facial image based on a trained image quality determination model. Further, the system may determine, based on the one or more values of the one or more quality features, whether the first facial image satisfies a quality condition. The system may obtain a correction condition (e.g., a correction angle) of the first facial image. The system may further obtain a trained correction model. In some embodiments, the trained correction model may include a trained first generator (e.g., a sub-model of a trained conditional Generative Adversarial network (C-GAN) model) . In response to determining that the first facial image satisfies the quality condition, the system may further generate a second facial image that satisfies the correction condition by correcting the first facial image based on the trained correction model and the correction condition.

According to some embodiments of the present disclosure, a norm of a facial image (e.g., a first facial image) may be used to evaluate the image quality of the facial image. The norm may be determined based on complex or deep quality features that can reflect the image quality of the facial image, which are usually indetectable by human or traditional quality evaluation approaches. In such cases, the image quality of the facial image may be evaluated using the norm in deeper or more complex dimensions, which may improve the accuracy of an image quality evaluation of the facial image.

Moreover, in some embodiments, a specific preliminary model (e.g., a conditional Generative Adversarial network (C-GAN) model) with a loop structure is provided in the present disclosure, and be trained to generate the trained correction model. The preliminary model may include a first generator and a first discriminator forming a first path, and a second generator and a second discriminator forming a second path. The second path may be used as a feedback path of the first path. In such cases, in a training process of the preliminary model, the preliminary model may be updated based on information extracted by both the first path and the second path, which may improve the training accuracy of the preliminary model. A trained first generator trained from the first generator may be used as the trained correction model to generate a predicted image that satisfies a certain condition (e.g., a condition relating to an orientation of a human face) based on an input image, which may improve the accuracy of the predicted image. In addition, in certain embodiments, training data of the preliminary model does not need to be labeled, which may improve the efficiency of model training by reducing, for example, the preparation time, the processing time, the computational complexity and/or cost.

FIG. 1A is a schematic diagram illustrating an exemplary image processing system according to some embodiments of the present disclosure. As shown, an image processing system 100A may include a server 110, a network 120, an acquisition device 130, a user device 140, and a storage device 150.

The server 110 may be a single server or a server group. The server group may be centralized or distributed (e.g., the server 110 may be a distributed system) . In some embodiments, the server 110 may be local or remote. For example, the server 110 may access information and/or data stored in the acquisition device 130, the user device 140, and/or the storage device 150 via the network 120. As another example, the server 110 may be directly connected to the acquisition device 130, the user device 140, and/or the storage device 150 to access stored information and/or data. In some embodiments, the server 110 may be implemented on a cloud platform. Merely by way of example, the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, an inter-cloud, a multi-cloud, or the like, or any combination thereof. In some embodiments, the server 110 may be implemented on a computing device 200 including one or more components illustrated in FIG. 2 of the present disclosure. In some embodiments, the server 110 may include a processing device 112. The processing device 112 may process information and/or data relating to image processing to perform one or more functions described in the present disclosure. For example, the processing device 112 may generate one or more trained models (e.g., a trained image quality determination model, a trained correction model, a trained image analysis model, a trained conditional Generative Adversarial network (C-GAN) model, etc. ) that can be used in facial image processing. As another example, the processing device 112 may apply the trained model (s) in determining image quality of a facial image and/or generating a facial image that satisfies a certain condition. In some embodiments, the trained model (s) may be generated by a processing device, while the application of the trained model (s) may be performed on a different processing device. In some embodiments, the trained model (s) may be generated by a processing device of a system different from the image processing system 100A or a server different from the processing device 112 on which the application of the trained model (s) is performed. For instance, the trained model (s) may be generated by a first system of a vendor who provides and/or maintains such a model (s) , while facial image analysis using the trained model (s) may be performed on a second system of a client of the vendor. In some embodiments, the application of the trained model (s) may be performed online in response to a request for image processing. In some embodiments, the trained model (s) may be generated offline.

In some embodiments, the trained model (s) may be generated and/or updated (or maintained) by, e.g., a vendor of the trained model (s) , the manufacturer of the acquisition device 130. For instance, the manufacturer or the vendor may load the trained model (s) into the image processing system 100A or a portion thereof (e.g., the processing device 112 and/or the acquisition device 130) before or during the installation of the acquisition device 130 and/or the processing device 112, and maintain or update the trained model (s) from time to time (periodically or not) . The maintenance or update may be achieved by installing a program stored on a storage device (e.g., a compact disc, a USB drive, etc. ) or retrieved from an external source (e.g., a server maintained by the manufacturer or vendor) via the network 150. The program may include a new model (e.g., a new model (s) ) or a portion of a model that substitute or supplement a corresponding portion of the model.

In some embodiments, the processing device 112 may include one or more processing devices (e.g., single-core processing device (s) or multi-core processor (s) ) . Merely by way of example, the processing device 112 may include a central processing unit (CPU) , an application-specific integrated circuit (ASIC) , an application-specific instruction-set processor (ASIP) , a graphics processing unit (GPU) , a physics processing unit (PPU) , a digital signal processor (DSP) , a field programmable gate array (FPGA) , a programmable logic device (PLD) , a controller, a microcontroller unit, a reduced instruction-set computer (RISC) , a microprocessor, or the like, or any combination thereof.

In some embodiment, the sever 110 may be unnecessary and all or part of the functions of the server 110 may be implemented by other components (e.g., the acquisition device 130, the user device 140) of the image processing system 100A. For example, the processing device 112 may be integrated into the acquisition device 130 or the user device 140 and the functions of the processing device 112 may be implemented by the acquisition device 130 or the user device 140.

The network 120 may facilitate exchange of information and/or data for the image processing system 100A. In some embodiments, one or more components (e.g., the server 110, the acquisition device 130, the user device 140, the storage device 150) of the image processing system 100A may transmit information and/or data to other component (s) of the image processing system 100A via the network 120. For example, the server 110 may obtain an image to be processed from the acquisition device 130 via the network 120. As another example, the server 110 may obtain a trained image quality determination model from the storage device 150 via the network 120. As a further example, the server 110 may transmit a processed image (e.g., a second facial image that satisfies a correction condition) to the user device 140 via the network 120. In some embodiments, the network 120 may be any type of wired or wireless network, or a combination thereof. The network 120 may be and/or include a public network (e.g., the Internet) , a private network (e.g., a local area network (LAN) , a wide area network (WAN) ) , etc. ) , a wired network (e.g., an Ethernet network) , a wireless network (e.g., an 802.11 network, a Wi-Fi network, etc. ) , a cellular network (e.g., a Long Term Evolution (LTE) network) , a frame relay network, a virtual private network ( “VPN” ) , a satellite network, a telephone network, routers, hubs, switches, server computers, and/or any combination thereof. Merely by way of example, the network 120 may include a cable network, a wireline network, a fiber-optic network, a telecommunications network, an intranet, a wireless local area network (WLAN) , a metropolitan area network (MAN) , a public telephone switched network (PSTN) , a Bluetooth ^TM network, a ZigBee ^TM network, a near field communication (NFC) network, or the like, or any combination thereof. In some embodiments, the network 120 may include one or more network access points (e.g., the access points120-1, 120-2) . For example, the network 120 may include wired and/or wireless network access points such as base stations and/or internet exchange points through which one or more components of the image processing system 100A may be connected to the network 120 to exchange data and/or information.

The acquisition device 130 may be configured to acquire an image (the “image” herein refers to a single image, a frame of a video, or a video stream) . In some embodiments, the acquisition device 130 may include a camera 130-1, a video recorder 130-2, an image sensor 130-3, etc. The camera 130-1 may include a gun camera, a dome camera, an integrated camera, a monocular camera, a binocular camera, a multi-view camera, or the like, or any combination thereof. The video recorder 130-2 may include a PC Digital Video Recorder (DVR) , an embedded DVR, or the like, or any combination thereof. The image sensor 130-3 may include an infrared sensor, a visible sensor, a Charge Coupled Device (CCD) , a Complementary Metal Oxide Semiconductor (CMOS) , or the like, or any combination thereof. The image acquired by the acquisition device 130 may be a two-dimensional image, a three-dimensional image, a four-dimensional image, etc. In some embodiments, the acquisition device 130 may include a plurality of components each of which can acquire an image. For example, the acquisition device 130 may include a plurality of sub-cameras that can capture images or videos simultaneously. In some embodiments, the acquisition device 130 may transmit the acquired image to one or more components (e.g., the server 110, the user device 140, the storage device 150) of the image processing system 100A via the network 120.

The user device 140 may enable user interaction between the image processing system 100A and a user. Merely by way of example, the user device 140 may be configured to receive information and/or data from the server 110, the acquisition device 130, and/or the storage device 150 via the network 120. For example, the user device 140 may receive a processed image from the server 110. In some embodiments, the user device 140 may process information and/or data received from the server 110, the acquisition device 130, and/or the storage device 150 via the network 120. For example, the user device 140 may process the processed image received from the server 110 for further display.

In some embodiments, the user device 140 may provide a user interface via which a user may view information and/or input data and/or instructions to the image processing system 100A. For example, the user may view the processed image via the user interface. As another example, the user may input an instruction associated with an image processing parameter (e.g., a correction condition) via the user interface. In some embodiments, the user device 140 may include a mobile device 140-1, a tablet computer 140-2, a laptop computer 140-3, or the like, or any combination thereof. In some embodiments, the mobile device 140-1 may include a smart home device, a wearable device, a smart mobile device, a virtual reality device, an augmented reality device, or the like, or any combination thereof. In some embodiments, the smart home device may include a smart lighting device, a control device of an intelligent electrical apparatus, a smart monitoring device, a smart television, a smart video camera, an interphone, or the like, or any combination thereof. In some embodiments, the wearable device may include a smart bracelet, smart footgear, a pair of smart glasses, a smart helmet, a smart watch, smart clothing, a smart backpack, a smart accessory, or the like, or any combination thereof. In some embodiments, the smart mobile device may include a smartphone, a personal digital assistant (PDA) , a gaming device, a navigation device, a point of sale (POS) device, or the like, or any combination thereof. In some embodiments, the virtual reality device and/or the augmented reality device may include a virtual reality helmet, a virtual reality glass, a virtual reality patch, an augmented reality helmet, an augmented reality glass, an augmented reality patch, or the like, or any combination thereof. For example, the virtual reality device and/or the augmented reality device may include a Google ^TM Glass, an Oculus Rift, a Hololens, a Gear VR, etc. In some embodiments, the acquisition device 130 and/or the server 110 may be remotely operated through the user device 140. In some embodiments, the acquisition device 130 and/or the processing device 112 may be operated through the user device 140 via a wireless connection. In some embodiments, the user device 140 may receive information and/or instructions inputted by a user, and send the received information and/or instructions to the acquisition device 130 and/or the processing device 112 via the network 120. In some embodiments, the user device 140 may receive data and/or information from the processing device 112. In some embodiments, the user device 140 may be part of the processing device 112. In some embodiments, the user device 140 may be omitted.

The storage device 150 may be configured to store data and/or instructions. The data and/or instructions may be obtained from, for example, the server 110, the acquisition device 130, the user device 140, and/or any other component of the image processing system 100A. In some embodiments, the storage device 150 may store data and/or instructions that the server 110 may execute or use to perform exemplary methods described in the present disclosure. For example, the storage device 150 may store images (e.g., first facial image) acquired by the acquisition device 130. In some embodiments, the instruction may be written in languages such as C/C++, Java, Shell, Python, or the like, or any combination thereof.

In some embodiments, the storage device 150 may include a mass storage device, a removable storage device, a volatile read-and-write memory, a read-only memory (ROM) , or the like, or any combination thereof. Exemplary mass storage devices may include a magnetic disk, an optical disk, a solid-state drive, etc. Exemplary removable storage devices may include a flash drive, a floppy disk, an optical disk, a memory card, a zip disk, a magnetic tape, etc. Exemplary volatile read-and-write memory may include a random access memory (RAM) . Exemplary RAM may include a dynamic RAM (DRAM) , a double date rate synchronous dynamic RAM (DDR SDRAM) , a static RAM (SRAM) , a thyristor RAM (T-RAM) , and a zero-capacitor RAM (Z-RAM) , etc. Exemplary ROM may include a mask ROM (MROM) , a programmable ROM (PROM) , an erasable programmable ROM (EPROM) , an electrically erasable programmable ROM (EEPROM) , a compact disk ROM (CD-ROM) , and a digital versatile disk ROM, etc. In some embodiments, the storage device 150 may be implemented on a cloud platform. Merely by way of example, the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, an inter-cloud, a multi-cloud, or the like, or any combination thereof.

In some embodiments, the storage device 150 may be connected to the network 120 to communicate with one or more components (e.g., the server 110, the acquisition device 130, the user device 140) of the image processing system 100A. One or more components of the image processing system 100A may access the data or instructions stored in the storage device 150 via the network 120. In some embodiments, the storage device 150 may be directly connected to or communicate with one or more components (e.g., the server 110, the acquisition device 130, the user device 140) of the image processing system 100A. In some embodiments, the storage device 150 may be part of other components of the image processing system 100A, such as the server 110, the acquisition device 130, or the user device 140.

In some embodiments, the storage device 150 may communicate with one or more components (e.g., the server 110, the acquisition device 130, the user device 140) of the image processing system 100A via a communication bus. Taking the communication between the storage device 150 and the server 110 as an example, the storage device 150 and the server 110 may be implemented as two independent devices. The storage device 150 and the server 110 may communicate with each other via the communication bus. Exemplary communication buses may include an industry standard architecture (ISA) bus, a peripheral component interconnect (PCI) bus, an extended industry standard architecture (EISA) bus, or the like, or any combination thereof. In some embodiments, the communication bus may include an address bus, a data bus, a control bus, or the like, or any combination thereof. In some embodiments, the storage device 150 and the server 110 may be integrated on a chip. In such cases, the storage device 150 and the server 110 may communicate with each other via an internal interface.

In some embodiments, the image processing system 100A may be used to process images of users of an Online to Offline service platform. For example, the Online to Offline service platform may be an online transportation service platform for transportation services such as car-hailing, chauffeur services, delivery vehicles, carpool, bus service, driver hiring, shuttle services, or the like, or any combination thereof. Images of a user (e.g., a driver) of the online transportation service platform may be acquired by an acquisition device. The images may be processed by the image processing system 100A for further analysis (e.g., for identifying or verifying the user’s identify) .

FIG. 1 B is a schematic diagram illustrating an exemplary car-hailing service system 100B according to some embodiments of the present disclosure. As illustrated in FIG. 1 B, an acquisition device 160 may be provided on a car used in a car-hailing service. Images of a driver 170 of the car may be acquired by the acquisition device 160. The images may be transmitted to a processing device (not shown in FIG. 1 B) of the car-hailing service system 100B for further processing. For example, the processing device may perform a face recognition operation on the images so as to identify or verify the identify of the driver 170.

It should be noted that the above description is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations and modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure. In some embodiments, the image processing system 100A may include one or more additional components and/or one or more components of the image processing system 100A described above may be omitted. Additionally or alternatively, two or more components of the image processing system 100A may be integrated into a single component. A component of the image processing system 100A may be implemented on two or more sub-components.

FIG. 2 is a schematic diagram illustrating exemplary hardware and/or software components of an exemplary computing device according to some embodiments of the present disclosure. In some embodiments, the processing device 112 may be implemented on the computing device 200. As illustrated in FIG. 2, the computing device 200 may include a processor 210, a storage 220, an input/output (I/O) 230, and a communication port 240.

The processor 210 may execute computer instructions (program code) and perform functions of, for example, the processing device 112 or the user device 140 in accordance with techniques described herein. The computer instructions may include routines, programs, objects, components, signals, data structures, procedures, modules, and functions, which perform particular functions described herein. For example, the processor 210 may execute computer instructions to evaluate the quality of a facial image and/or generating a facial image satisfying a correction condition.

In some embodiments, the processor 210 may include a microcontroller, a microprocessor, a reduced instruction set computer (RISC) , an application specific integrated circuits (ASICs) , an application-specific instruction-set processor (ASIP) , a central processing unit (CPU) , a graphics processing unit (GPU) , a physics processing unit (PPU) , a microcontroller unit, a digital signal processor (DSP) , a field programmable gate array (FPGA) , an advanced RISC machine (ARM) , a programmable logic device (PLD) , any circuit or processor capable of executing one or more functions, or the like, or any combinations thereof.

Merely for illustration purposes, only one processor is described in the computing device 200. However, it should be noted that the computing device 200 in the present disclosure may also include multiple processors, and thus operations of a method that are performed by one processor as described in the present disclosure may also be jointly or separately performed by the multiple processors. For example, if in the present disclosure the processor of the computing device 200 executes both operations A and B, it should be understood that operations A and B may also be performed by two different processors jointly or separately in the computing device 200 (e.g., a first processor executes operation A and a second processor executes operation B, or the first and second processors jointly execute operations A and B) .

The storage 220 may store data/information obtained from the acquisition device 130, the user device 140, the storage device 150, or any other component of the image processing system 100A. For example, the storage 220 may store facial images captured by the acquisition device 130. In some embodiments, the storage 220 may include a mass storage device, a removable storage device, a volatile read-and-write memory, a read-only memory (ROM) , or the like, or any combination thereof. In some embodiments, the storage 220 may store one or more programs and/or instructions to perform exemplary methods described in the present disclosure. For example, the storage 220 may store a program for the processing device 112 for generating a second facial image by correcting a first facial image based on a trained correction model and a correction condition.

The I/O 230 may input or output signals, data, or information. In some embodiments, the I/O 230 may enable user interaction with the processing device 112. In some embodiments, the I/O 230 may include an input device and an output device. Exemplary input devices may include a keyboard, a mouse, a touch screen, a microphone, a trackball, or the like, or a combination thereof. Exemplary output devices may include a display device, a loudspeaker, a printer, a projector, or the like, or a combination thereof. Exemplary display devices may include a liquid crystal display (LCD) , a light-emitting diode (LED) -based display, a flat panel display, a curved screen, a television device, a cathode ray tube (CRT) , or the like, or a combination thereof.

Merely by way of example, a user may input parameters (e.g., a correction condition) needed for the operation of the image processing system 100A. The I/O 230 may also display images obtained from the acquisition device 130, the storage device 150, and/or the storage 220. For example, a user may input a request for viewing an image (e.g., a first facial image, a second facial image) stored in the storage device 150 via the I/O 230 (e.g., an input device) . The processing device 112 may retrieve the image for display based on the request. The image may be displayed via the I/O 230 (e.g., an output device) .

The communication port 240 may be connected to a network (e.g., the network 120) to facilitate data communications. The communication port 240 may establish connections between the processing device 112 and the acquisition device 130, the user device 140, the storage device 150, or any other component of the image processing system 100A. The connection may be a wired connection, a wireless connection, or a combination of both that enables data transmission and reception. The wired connection may include an electrical cable, an optical cable, a telephone wire, or the like, or any combination thereof. The wireless connection may include Bluetooth, Wi-Fi, WiMax, WLAN, ZigBee, mobile network (e.g., 3G, 4G, 5G, etc. ) , or the like, or a combination thereof. In some embodiments, the communication port 240 may be a standardized communication port, such as RS232, RS485, etc. In some embodiments, the communication port 240 may be a specially designed communication port. For example, the communication port 240 may be designed in accordance with the digital imaging and communications in medicine (DICOM) protocol.

FIG. 3 is a schematic diagram illustrating exemplary hardware and/or software components of an exemplary terminal device according to some embodiments of the present disclosure. In some embodiments, the user device 140 may be implemented on the terminal device 300 shown in FIG. 3.

As illustrated in FIG. 3, the terminal device 300 may include a communication platform 310, a display 320, a graphic processing unit (GPU) 330, a central processing unit (CPU) 340, an I/O 350, a memory 360, and a storage 390. In some embodiments, any other suitable component, including but not limited to a system bus or a controller (not shown) , may also be included in the terminal device 300.

In some embodiments, an operating system 370 (e.g., iOS ^TM, Android ^TM, Windows Phone ^TM) and one or more applications (Apps) 380 may be loaded into the memory 360 from the storage 390 in order to be executed by the CPU 340. The applications 380 may include a browser or any other suitable mobile apps for receiving and rendering information relating to the image processing system 100A. User interactions may be achieved via the I/O 350 and provided to the processing device 112 and/or other components of the image processing system 100A via the network 120.

FIGs. 4A and 4B are block diagrams illustrating

exemplary processing devices

112A and 112B according to some embodiments of the present disclosure. The

processing devices

112A and 112B may be exemplary embodiments of the processing device 112 as described in connection with FIG. 1A. In some embodiments, the processing device 112A may be configured to apply one or more trained models in image processing. For example, the processing device 112 may apply a trained image quality determination model in evaluating the quality of an image of a subject, apply a trained image analysis model in determining a label value of a norm, and/or apply a trained correction model in generating a facial image satisfying a correction condition. The processing device 112B may be configured to generate the one or more trained models, such as the trained image quality determination model, the trained image analysis model, and/or the trained correction model by model training.

In some embodiments, the

processing devices

112A and 112B may be respectively implemented on a processing unit (e.g., a processor 210 illustrated in FIG. 2 or a CPU 340 as illustrated in FIG. 3) . Merely by way of example, the processing devices 112A may be implemented on a CPU 340 of a terminal device, and the processing device 112B may be implemented on a computing device 200. Alternatively, the

processing devices

112A and 112B may be implemented on a same computing device 200 or a same CPU 340. For example, the

processing devices

112A and 112B may be implemented on a same computing device 200.

As shown in FIG. 4A, the processing device 112A may include an obtaining module 410, a determination module 420, and a generation module 430.

The obtaining module 410 may be configured to obtain information relating to the image processing system 100. Merely by way of example, the obtaining module 410 may obtain a first facial image (e.g., a facial image of a passenger, a driver) . For example, the obtaining module 410 may obtain the first facial image from an acquisition device, a storage device, a user device, etc.

In some embodiments, the obtaining module 410 may be further configured to obtain a correction condition of the first facial image. The correction condition may be used to adjust one or more quality features of the first facial image so as to improve the image quality of the first facial image. For example, the correction condition of the first facial image may relate to an orientation of the human face. Further, the obtaining module 410 may be configured to obtain a trained correction model. The trained correction model refers to a model (e.g., a machine learning model) or an algorithm configured to generate a second facial image that satisfies the correction condition based on the first facial image and the correction condition. For example, the trained correction model may include a trained first generator in a trained conditional generative adversarial network (C-GAN) model. In some embodiments, the obtaining module 410 may obtain the trained correction model from the storage device.

The determination module 420 may be configured to determine a value of each of one or more quality features of the first facial image based on a trained image quality determination model. The quality feature of the first facial image may reflect the image quality of the first facial image. Exemplary quality features of the first facial image may include a blurring degree of the first facial image, a proportion of the target subject that is occluded in the first facial image, a brightness of the first facial image, a shooting angle of the first facial image, a completeness of the target subject in the first facial image, a posture of the target subject in the first facial image, a norm, or the like, or any combination thereof. More descriptions regarding the determination of the value of the quality feature (s) may be found elsewhere in the present disclosure. See, e.g., FIG. 5 and relevant descriptions thereof.

In some embodiments, the determination module 420 may be further configured to determine whether the first facial image satisfies a quality condition based on the one or more values of the one or more quality features. For example, the determination module 420 may compare the value of a quality feature with a threshold (or a range) corresponding to the quality feature. As another example, the determination module 420 may determine a weight coefficient of each of the one or more quality features. For instance, the determination module 420 may obtain a test sample set including a plurality of second sample images, determine a preliminary weight coefficient of each of the one or more quality features, determine a sample value of each of one or more quality features of the second sample image for each of the plurality of second sample images, and for each of the one or more quality features, determine an optimized weight coefficient of the quality feature based on the preliminary weight coefficient of the quality feature and a Bayes algorithm. Then the determination module 420 may determine a quality evaluation value of the first facial image based on the value (s) and weight coefficient (s) of the quality feature (s) .

The generation module 430 may be configured to generate a second facial image that satisfies the correction condition by correcting the first facial image based on the trained correction model and the correction condition. For example, the generation module 430 may generate a model input based on the first facial image and the correction condition, and input the model input into the trained correction model.

As shown in FIG. 4B, the processing device 112B may include an obtaining module 440 and a model generation module 450.

The obtaining module 440 may be configured to obtain training data, such as a preliminary model and/or training samples. The model generation module 450 may be configured to generate one or more trained models by model training. In some embodiments, the processing device 112B may be configured to generate a quality determination model by model training. For example, the obtaining module 440 may be configured to obtain a preliminary model. The preliminary model may be any type of model (e.g., a convolutional neural network (CNN) model) to be trained as the trained image quality determination model. The preliminary model may include one or more model parameters. Further, the obtaining module 440 may obtain a first training sample set. The first training sample set may include a plurality of first sample images. Each first sample image may relate to a sample subject (e.g., a sample human face) and have one or more label values of the one or more quality features of the first sample image. Furthermore, the model generation module 450 may be configured to obtain or generate the trained image quality determination model by training the preliminary model using the first training sample set. For example, the model generation module 450 may iteratively update the model parameter (s) of the preliminary model in one or more iterations, and terminate the one or more iterations until a termination condition is satisfied in the current iteration. Then the model generation module 450 may designate the updated preliminary model (or a portion thereof) as the trained image quality determination model.

In some embodiments, the processing device 112B may be configured to generate a trained correction model by model training. For example, the obtaining module 440 may be configured to obtain a second training sample set including a plurality of image pairs. Each image pair may include a third sample image and a fourth sample image of a same sample face. The third sample image may satisfy a first correction condition, and the fourth sample image may satisfy a second correction condition.

The model generation module 450 may be configured to train a second preliminary model by optimizing a loss function F2 based on the second training sample set. In some embodiments, the model generation module 450 may train the second preliminary model by performing an iterative operation including one or more second iterations. For example, the model generation module 450 may obtain an updated second preliminary model generated in a previous second iteration. The updated second preliminary model may include an updated first generator, an updated second generator, an updated first discriminator, and an updated second discriminator. Further, the model generation module 450 may generate a first predicted image using the updated first generator based on the third sample image of the image pair and the second correction condition, generate a first discrimination result using the updated first discriminator based on the first predicted image and the fourth sample image of the image pair, generate a second predicted image using the updated second generator based on the first predicted image and the first correction condition of the image pair, and generate a second discrimination result using the updated second discriminator based on the second predicted image and the third sample image of the image pair. Furthermore, the model generation module 450 may determine a value of the loss function F2 based on the first predicted image, the first discrimination result, the second predicted image, and the second discrimination result of each of the plurality of image pairs. Moreover, the model generation module 450 may evaluate the updated second preliminary model based on the value of the loss function F2. For example, the model generation module 450 may determine whether the updated second preliminary model satisfies an evaluation condition. In response to determining that the updated second preliminary model satisfies the evaluation condition, the model generation module 450 may determine that the iterative operation can be terminated. The model generation module 450 may further designate the trained first generator or the trained second generator as the trained correction model.

It should be noted that the above description is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations and modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure.

In some embodiments, the processing device 112A and/or the processing device 112B may share two or more of the modules, and any one of the modules may be divided into two or more units. For instance, the

processing devices

112A and 112B may share a same obtaining module; that is, the obtaining module 410 and the obtaining module 440 are a same module. In some embodiments, the processing device 112A and/or the processing device 112B may include one or more additional modules, such as a storage module (not shown) for storing data. In some embodiments, the processing device 112A and the processing device 112B may be integrated into one processing device 112. In some embodiments, the training of different models may be performed by different processing devices 112B.

FIG. 5 is a flowchart illustrating an exemplary process for generating a second facial image satisfying a correction condition according to some embodiments of the present disclosure. In some embodiments, at least part of process 500 may be performed by the processing device 112A (implemented in, for example, the computing device 200 shown in FIG. 2) . For example, the process 500 may be stored in a storage device (e.g., the storage device 150, the storage 220, the storage 390) in the form of instructions (e.g., an application) , and invoked and/or executed by the processing device 112A (e.g., the processor 210 illustrated in FIG. 2, the CPU 340 illustrated in FIG. 3, one or more modules illustrated in FIG. 4A) . The operations of the illustrated process presented below are intended to be illustrative. In some embodiments, the process 500 may be accomplished with one or more additional operations not described, and/or without one or more of the operations discussed. Additionally, the order in which the operations of the process 500 as illustrated in FIG. 5 and described below is not intended to be limiting.

In 510, the processing device 112A (e.g., the obtaining module 410) may obtain a first facial image.

The first facial image refers to an image including at least a portion of the face of a target subject (e.g., a passenger, a driver) . The first facial image may be a two-dimensional image, a three-dimensional image, a four-dimensional image (e.g., a series of 3D images over time) , and/or any related image data (e.g., projection data) , etc. The first facial image may be a color image (e.g., an RGB image) or a grey image. In some embodiments, the first facial image may be acquired and used for, for example, an identity verification in car-hailing, a financial verification, an access control verification, or the like, or any combination thereof.

In some embodiments, the first facial image may be acquired by an acquisition device (e.g., the acquisition device 130 illustrated in FIG. 1) . The processing device 112A may obtain the first facial image from the acquisition device directly. Alternatively, the first facial image acquired by the acquisition device may be stored in a storage device (e.g., the storage device 150, an external storage device) . The processing device 112A may obtain the first facial image from the storage device. Alternatively, a user may input the first facial image via a user device (e.g., the user device 140) . The processing device 112A may obtain the first facial image from the user device.

In 520, the processing device 112A (e.g., the determination module 420) may determine, based on a trained image quality determination model, a value of each of one or more quality features of the first facial image.

A quality feature of the first facial image may reflect the image quality of the first facial image. Exemplary quality features of the first facial image may include a blurring degree of the first facial image, a proportion of the target subject that is occluded in the first facial image, a brightness of the first facial image, a shooting angle of the first facial image, a completeness of the target subject in the first facial image, a posture of the target subject in the first facial image, a norm, or the like, or any combination thereof.

As used herein, the blurring degree of the first facial image refers to a value indicating whether the target subject (e.g., a human face) is in a movement state and/or whether the acquisition device is focused on the target subject is accurate in the first facial image. The blurring degree of the first facial image may be negatively correlated with the image quality. The higher the blurring degree is, the lower the image quality may be.

The proportion of the target subject that is occluded in the first facial image refers to a ratio of an occluded area of the target subject (or the face of the target subject) to a total area of the target subject (or the face of the target subject) in the first facial image. The proportion of the target subject that is occluded in the first facial image may be negatively correlated with the image quality. The higher the proportion of the target subject that is occluded in the first facial image is, the lower the image quality may be.

The first facial image may have a desired image quality if its brightness is in a certain brightness range.

The shooting angle of the first facial image refers to an angle (e.g., an absolute value of an angle) of the acquisition device relative to the face of the target subject (e.g., a human face) when the first facial image is captured. For example, the shooting angle may include a yaw angle and a pitch angle. The pitch angle refers to an angle between an optical axis of the acquisition device and a horizontal (or transverse) plane of the face of the target subject (e.g., a horizontal plane passing through the central point of the target subject’s face) . The yaw angle refers to an angle between the optical axis and median (or sagittal) plane of the face of the target subject (e.g., a median plane passing through the central point of the target subject’s face) .

In some embodiments, when the acquisition device is facing the human face (i.e., the optical axis of the acquisition device is aligned with an orientation of the human face) , both the yaw angle and the pitch angle may be deemed as zero. In some embodiments, the shooting angle of the first facial image may be negatively correlated with the image quality. For example, in a face recognition process, the smaller the shooting angle is, the more accurate the quality features of the first facial image may be, and the higher the image quality may be. In some embodiments, both the yaw angle and the pitch angle may be set as 0°.

The completeness of the target subject refers to the completeness of the face of the target subject. The completeness of the target subject may be positively correlated with the image quality. The higher the completeness of the target subject in the first facial image is, the higher the image quality may be.

The posture of the target subject may include facial expressions and/or body postures of the target subject. Exemplary postures may include closing eyes, opening mouth, raising eyebrows, facing the acquisition device, tilting head, bowing head, or the like, or any combination thereof. The first facial image may be deemed as having a low image quality if the target subject holds some specific postures (e.g., closes eyes, lowers his/her head) .

The norm refers to a norm of one or more complex or deep quality features that can reflect the image quality of the first facial image. For example, the complex or deep feature (s) may be determined based on a trained image analysis model (e.g., a trained facial recognition model) . More descriptions regarding the norm may be found elsewhere in the present disclosure. See, e.g., FIG. 6 and relevant descriptions thereof. In some embodiments, the norm may be determined as a distance from an origin in a feature space to a point representing the complex or deep quality feature (s) of the first facial image in the feature space.

The value of a quality feature may be represented by, for example, a score, a rating, a level, or the like. In some embodiments, the trained image quality determination model may be a machine learning model or algorithm configured to output value (s) of one or more quality features of an image based on its input. For example, the trained image quality determination model may be or include a convolutional neural network (CNN) model, such as a V-net model, a U-net model, an AlexNet model, an Oxford Visual Geometry Group (VGG) model, a ResNet model, or the like, or any combination thereof.

In some embodiments, the trained image quality determination model may be trained according to a model training process as described in connection with FIG. 6. In some embodiments, different image quality determination models may be used for determining the values of different quality features. For example, each quality feature may correspond to a specific image quality determination model configured to determine a value of the quality feature. As another example, a certain image quality determination model may be used to determine the values of two or more quality features.

In 530, the processing device 112A (e.g., the determination module 420) may determine, based on the one or more values of the one or more quality features, whether the first facial image satisfies a quality condition.

In some embodiments, to determine whether the first facial image satisfies the quality condition, the processing device 112A may compare the value of at least one of the one or more quality features with a threshold (or a range) corresponding to the at least one quality feature. For example, the quality condition may include that the completeness of the target subject in the first facial image is larger than or equal to a completeness threshold. The processing device 112A may compare the completeness of the target subject in the first facial image with the completeness threshold. In response to determining that the completeness of the target subject in the first facial image is larger than or equal to the completeness threshold, the processing device 112A may determine that the first facial image satisfies the quality condition. As another example, the quality condition may include that the blurring degree of the first facial image is less than a blurring threshold and the brightness of the first facial image is within a brightness range. The processing device 112A may compare the blurring degree of the first facial image with the blurring threshold, and compare the brightness of the first facial image with the brightness range. In response to determining that the blurring degree of the first facial image is less than the blurring threshold and the brightness of the first facial image is within the brightness range, the processing device 112A may determine that the first facial image satisfies the quality condition.

In some embodiments, to determine whether the first facial image satisfies the quality condition, the processing device 112A may obtain or determine a weight coefficient of each of the one or more quality features. Further, the processing device 112A may determine, based on the one or more weight coefficients and the one or more values of the one or more quality features, a quality evaluation value of the first facial image. For example, the processing device 112A may determine the quality evaluation value of the first facial image based on the one or more weight coefficients and the one or more values of the one or more quality features using a weighted average algorithm. Merely by way of example, the quality evaluation value of the first facial image may be a weighted average value of the one or more values of the one or more quality features. Further, the processing device 112A may determine whether the first facial image satisfies the quality condition based on the quality evaluation value. For example, the quality condition may be that the quality evaluation value of the first facial image is larger than or equal to a quality evaluation threshold. In response to determining that the quality evaluation value is larger than or equal to the quality evaluation threshold, the processing device 112A may determine that the first facial image satisfies the quality condition.

In some embodiments, different quality features may have different importance in image quality evaluation. The weight coefficient of a quality feature may reflect an importance of the quality feature. For example, if the proportion of the target subject that is occluded in the first facial image has a greater degree of influence on the success rate of face recognition than the brightness of the first facial image, the weight coefficient of the proportion of the target subject that is occluded in the first facial image may be larger than the weight coefficient of the brightness of the first facial image.

The weight coefficient of a quality feature may be determined according to a default setting of the image processing system 100A, set manually by a user of the image processing system 100A, or determined by the processing device 112A. In some embodiments, the processing device 112A may determine the weight coefficient of each of the one or more quality features using a machine learning algorithm. Exemplary machine learning algorithms may include a Bayes algorithm, a support vector machines (SVM) algorithm, a grid search algorithm, a deep neural network (DNN) algorithm, or the like, or any combination thereof.

Merely by way of example, according to the Bayes algorithm, the processing device 112A may determine the weight coefficient of each quality feature based on a test sample set and a preliminary weight coefficient of each quality feature. More descriptions regarding the determination of the weight coefficient of each quality feature using the Bayes algorithm may be found elsewhere in the present disclosure (e.g., FIG. 6 and the descriptions thereof) .

As another example, according to the SVM algorithm, the processing device 112A may obtain one or more SVM classifiers according to a classifier training process (e.g., a training process performed based on a plurality of training images each having a classification label) . Each of the one or more SVM classifiers may correspond to a quality feature. Further, the processing device 112A may determine a classification result of each training image by processing the training image using the one or more SVM classifiers. Furthermore, for each of the one or more quality features, the processing device 112A may determine a count of correct classifications. Moreover, the processing device 112A may determine the weight coefficient of each quality feature by normalizing the counts of correct classifications corresponding to the quality feature.

As yet another example, according to the grid search algorithm, the processing device 112A may determine the weight coefficient of each quality feature by performing an exhaustive search in a weight coefficient set of the quality feature. Merely by way of example, the weight coefficient set may include possible weight coefficients of the quality feature, and the processing device 112A may select an optimum weight coefficient of the quality feature from the weight coefficient set. Alternatively, the weight coefficient set may include possible combinations each of which include weight coefficients of multiple quality features, and the processing device 112A may select an optimum combination from the possible combinations.

As still another example, according to the DNN algorithm, the one or more values of the one or more quality features of the first facial image may be input into a trained DNN network model (e.g., a multi-layer perceptron (MLP) model) . The trained DNN network model may be used to determine, based on the one or more values of the one or more quality features, whether the first facial image and a reference facial image correspond to a same target subject. For example, the trained DNN network model may output a similarity score between the first facial image and the reference facial image. Further, a layer (e.g., an output layer) of the trained DNN network model may include one or more parameters corresponding to the one or more values of the one or more quality features. The processing device 112A may determine the one or more parameters as the weight coefficient (s) corresponding to the one or more quality features.

In 540, the processing device 112A (e.g., the obtaining module 410) may obtain a correction condition of the first facial image.

A correction condition may be used to adjust one or more quality features of the first facial image so that a corrected image (e.g., the second facial image as described in connection with operation 560) may satisfy a certain condition.

In some embodiments, the first facial image may include a human face of the target subject. The correction condition of the first facial image may relate to an orientation of the human face. For illustration purposes, the orientation of the human face may be represented by a three-dimensional coordinate system constructed based on a facial image of the human face. The three-dimensional coordinate system may include a yaw axis, a pitch axis, and a roll axis. For example, the facial image may be captured by the acquisition device when the human face faces the acquisition device. The three-dimensional coordinate may be constructed with a point (e.g., a centroid of the human face) on the facial image as the origin, a vertical line as the yaw axis, a horizontal line as the pitch axis, and a line perpendicular to the yaw-pitch plane as the roll axis. Then the orientation of the human face may be represented by angles along the axes (e.g., a yaw angle along the yaw axis, a pitch angle along the pitch axis) . In some embodiments, the orientation of the human face may correspond to the shooting angle of the first facial image. For example, the yaw angle and the pitch angle that describes the orientation of the human face may be the same as the yaw angle and the pitch angle of the shooting angle, respectively.

In some embodiments, the correction condition may include that the human face faces the acquisition device. For example, the correction condition may be that the yaw angle and/or the pitch angle corresponding to the orientation of the human face) are approximately equal to 0°. As another example, the correction condition may be a correction angle of the yaw angle and/or the pitch angle corresponding to the orientation. For example, if the yaw angle of the orientation of the human face is 15°, the correction condition of the first facial image may be that the human face rotates -15° along the yaw axis.

Other exemplary correction conditions may include, for example, a brightness of the first facial image is within a certain brightness range, a blurring degree of the first facial image is less than a certain blurring threshold, a proportion of the target subject that is occluded in the first facial image is less than an occlusion threshold, a completeness of the target subject in the first facial image is larger than a completeness threshold, or the like, or any combination thereof.

In some embodiments, the processing device 112A may generate a vector representing the correction condition by performing a coding operation on the correction condition using a coding algorithm. Exemplary coding algorithms may include a one-hot coding algorithm, an embedding coding algorithm, or the like, or any combination thereof. Merely by way of example, the processing device 112A may perform a coding operation on the correction condition using the one-hot coding algorithm so as to convert the correction condition into a binary vector representing the correction condition. The binary vector may be used as an input of a trained correction model as described in connection with operation 550.

In 550, the processing device 112A (e.g., the obtaining module 410) may obtain a trained correction model.

The trained correction model refers to a model (e.g., a machine learning model) or an algorithm configured to generate a second facial image that satisfies the correction condition based on the first facial image and the correction condition. Merely by way of example, the trained correction model may include a trained facial angle correction model configured to generate a frontal face image based on a non-frontal face image and a correction angle. As another example, the trained correction model may include a trained brightness correction model configured to generate a facial image with a desired brightness based on a facial image with an undesired brightness and a correction brightness.

In some embodiments, the trained correction model may include a trained first generator. For example, the trained first generator may be a sub-model of a trained conditional generative adversarial network (C-GAN) model. The trained C-GAN model may further include a trained second generator, a trained first discriminator, and a trained second discriminator. In some embodiments, the trained correction model may be generated according to a model training process as described in connection with FIG. 8.

In some embodiments, the trained correction model may be previously generated by a computing device (e.g., the processing device 112B, a processing device of a vendor of the trained correction model) , and stored in a memory or a storage device (e.g., the storage device 150) disclosed elsewhere in the present disclosure. The processing device 112A may obtain the trained correction model from the storage device.

In 560, in response to determining that the first facial image satisfies the quality condition, the processing device 112A (e.g., the generation module 430) may generate a second facial image that satisfies the correction condition by correcting the first facial image based on the trained correction model and the correction condition.

For example, the processing device 112A may generate a model input based on the first facial image and the correction condition, and input the model input into the trained correction model. The trained correction model may output the second facial image based on the processing the model input. Merely by way of example, the model input may include the first facial image and the vector (e.g., a binary vector) representing the correction condition.

In some embodiments, to generate the second facial image that satisfies the correction condition, the trained correction model may be configured to perform operations including, for example, feature extraction, encoding, decoding, or the like, or any combination thereof, on the model input. Merely by way of example, the trained correction model (e.g., the trained first generator) may include an encoding component (or referred to as an encoder) , a decoding component (or referred to as a decoder) , and a transformation component. The encoding component may be configured to generate one or more first feature maps based on the first facial image. Specifically, the encoding component may be configured to obtain feature information (e.g., contour feature information, texture feature information, color feature information) of the first facial image. Further, the encoding component may be configured to generate the one or more first feature maps based on the feature information.

The transformation component may be configured to generate a correction map by transforming the correction condition. For example, as described in connection with operation 540, the correction condition may be converted into a vector (e.g., a one-hot vector) . The transformation component may be configured to convert the vector into a code with a fixed length (e.g., using a fully connected layer) . Further, the transformation component may be configured to resize the code to generate a second feature map with a fixed size. The transformation component may then be configured to generate the correction map based on the second feature map with the fixed size (e.g., using a transposed convolution layer and a cascade of one or more convolutional layers) . In some embodiments, the correction map may have a same size as the one or more feature maps generated by the encoding component.

The correction map and the first feature map (s) may be concatenated (or combined) into a concatenated map, and the concatenated map may be inputted into the decoding component. The decoding component may be further configured to generate the second facial image that satisfies the correction condition by decoding the concatenated map.

It should be noted that the above description of the process 500 is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations or modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure.

In some embodiments, one or more operations in the process 500 may be omitted. For example, operation 520 and operation 530 may be omitted. The processing device 112A may perform a correction operation on the first facial image directly without determining whether the first facial image satisfies the quality condition. As another example, operations 540-560 may be omitted. The processing device 112A may perform one or more other operations on the first facial image based on a determination result relating to the quality of the first facial image. For instance, in response to determining that the first facial image satisfies the quality condition, the processing device 112A may perform a face recognition operation on the first facial image to determine identity information of the target subject.

In some embodiments, one or more optional operations may be added to the process 500. For example, a storing operation may be added in the process 500. In the storing operation, information and/or data (e.g., the first facial image, the trained correction model, the second facial image) may be stored in a storage device (e.g., the storage device 150) disclosed elsewhere in the present disclosure. As another example, a recognition operation and/or a verification operation may be added in the process 500. Merely by way of example, the target subject may be a driver who provides a car-hailing service. The processing device 112A may perform a face recognition operation on the second facial image to determine identity information of the driver. Additionally or alternatively, the processing device 112A may perform a verification operation on the identity information of the driver. Specifically, registered identity information (e.g., facial image information, license information, ID card information, driving age information, etc. ) of the driver may be stored in a storage device (e.g., the storage device 150) . The processing device 112A may obtain the registered identity information from the storage device. Further, the processing device 112A may determine a verification result by comparing the identity information of the driver with the registered identity information.

FIG. 6 is a flowchart illustrating an exemplary process for obtaining a trained image quality determination model according to some embodiments of the present disclosure. In some embodiments, at least part of process 600 may be performed by the processing device 112B (implemented in, for example, the computing device 200 shown in FIG. 2) . For example, the process 600 may be stored in a storage device (e.g., the storage device 150, the storage 220, the storage 390) in the form of instructions (e.g., an application) , and invoked and/or executed by the processing device 112B (e.g., the processor 210 illustrated in FIG. 2, the CPU 340 illustrated in FIG. 3, one or more modules illustrated in FIG. 4B) . The operations of the illustrated process presented below are intended to be illustrative. In some embodiments, the process 600 may be accomplished with one or more additional operations not described, and/or without one or more of the operations discussed. Additionally, the order in which the operations of the process 600 as illustrated in FIG. 6 and described below is not intended to be limiting.

In some embodiments, the trained image quality determination model described in connection with operation 520 in FIG. 5 may be obtained according to the process 600. In some embodiments, the process 600 may be performed by another device or system other than the image processing system 100A, e.g., a device or system of a vendor of the trained image quality determination model. For illustration purposes, the implementation of the process 600 by the processing device 112B is described as an example.

In 610, the processing device 112B (e.g., the obtaining module 440) may obtain a preliminary model.

The preliminary model may be any type of model to be trained as the trained image quality determination model. For example, the preliminary model may include a convolutional neural network (CNN) model. The preliminary model may include one or more model parameters. Exemplary model parameters may include the number (or count) of layers, the number (or count) of nodes, a loss function F1, or the like, or any combination thereof. Before training, the model parameter (s) may have their respective initial values. The value (s) of the model parameter (s) may be updated in the training process of the preliminary model.

In 620, the processing device 112B (e.g., the obtaining module 440) may obtain a first training sample set.

The first training sample set may include a plurality of first sample images. Each of the plurality of first sample images may relate to a sample subject (e.g., a sample human face) and have one or more label values of the one or more quality features of the first sample image. For example, the one or more quality features of a first sample image may include a blurring degree of the first sample image, a proportion of a sample subject that is occluded in the first sample image, a brightness of the first sample image, a shooting angle of the first sample image, a completeness of the sample subject in the first sample image, a posture of the sample subject in the first sample image, a norm, or the like, or any combination thereof.

In some embodiments, the label value of a quality feature of a first sample image may be determined manually by a user (e.g., an operator of the image processing system 100A) . Merely by way of example, the label value of the blurring degree of the first sample image may be labeled manually. Alternatively, the label value of the quality feature may be determined by the processing device 112B. Merely by way of example, the label value of the quality feature of the first sample image may be determined based on a machine learning model relating to the quality feature. For example, the label value of the proportion of the sample subject that is occluded in the first sample image may be determined by a trained occlusion proportion determination mode. As another example, the label value of the posture of the sample subject in the first sample image may be obtained based on a trained posture recognition model.

In some embodiments, the quality feature of the first sample image may include a norm of the first sample image. The label value of the norm of the first sample image may be determined based on a trained image analysis model. In some embodiments, the trained image analysis model may include a trained facial recognition model, such as a trained CNN model. For example, the trained CNN model may include feature extraction layer (s) (e.g., convolutional layer (s) and/or pooling layer (s) ) . The feature extraction layer (s) may be configured to extract one or more complex or deep features of an image for facial recognition. Since the facial recognition result is associated with the quality of the image, the complex or deep features extracted by the feature extraction layer (s) may be regarded as complex or deep quality features that can reflect the image quality of the image.

To determine the label value of the norm of the first sample image, the first sample image may be inputted into the feature extraction layer (s) of the trained CNN model, and the feature extraction layer may be configured to output the values of the complex or deep feature (s) of the first sample image. Then, a norm of the values of the complex or deep feature (s) may be determined as the label value of the norm of the first sample image. By using the feature extraction layer (s) , the complex or deep quality feature (s) , which are indetectable by human or traditional quality evaluation approaches, may be extracted and used to determine the label value of the norm of the first sample. In such cases, the preliminary model may be trained to perform image quality evaluation from deeper and more complex dimensions, which may improve the accuracy of the image quality determination model trained from the preliminary model.

In 630, the processing device 112B (e.g., the model generation module 450) may obtain or generate the trained image quality determination model by training the preliminary model using the first training sample set.

In some embodiments, the training of the preliminary model may include one or more iterations to iteratively update the model parameter (s) of the preliminary model. For illustration purposes, an exemplary current iteration of the iteration (s) is described in the following description. The current iteration may be performed based on at least a portion of the first sample images. In some embodiments, a same set or different sets of sample images may be used in different iterations in training the preliminary model. For brevity, the sample images used in the current iteration are referred to as target first sample images. In the current iteration, an updated preliminary model generated in a previous iteration may be evaluated.

For example, for each target first sample image, the processing device 112B may determine one or more predicted values of the one or more quality features of the target first sample image using the updated preliminary model. The processing device 112B may then determine a value of the loss function F1 of the updated preliminary model based on the predicted value (s) and the label value (s) of each target training sample.

The loss function F1 may be used to evaluate the accuracy and reliability of the updated preliminary model, for example, the smaller the loss function F1 is, the more reliable the updated preliminary model is. Exemplary loss functions F1 may include an L1 loss function, a focal loss function, a log loss function, a cross-entropy loss function, a Dice loss function, an L2 loss function, a mean bias error (MBE) function, a mean square error (MSE) function, etc. The processing device 112B may further update the value (s) of the model parameter (s) of the updated preliminary model to be used in a next iteration based on the value of the loss function F1 according to, for example, a backpropagation algorithm.

In some embodiments, the one or more iterations may be terminated until a termination condition is satisfied in the current iteration. The termination condition may include, for example, that the value of the loss function F1 obtained in the current iteration is less than a loss threshold, that a difference between the value of the loss function F1 in a previous iteration and the value of the loss function F1 in the current iteration is less than a predetermined threshold, that a certain count of iterations has been performed, etc. If the termination condition is satisfied in the current iteration, the processing device 112B may designate the updated preliminary model (or a portion thereof) as the trained image quality determination model.

In some embodiments, the preliminary model may include a preliminary GAN model (e.g., a C-GAN model) including one or more generators and one or more discriminators. The processing device 112B may generate a trained model by training the preliminary GAN model, wherein the trained model may include one or more trained generators trained from the generator (s) and one or more trained discriminators trained from the discriminator (s) . The processing device 112B may designate one of the trained generator (s) as the trained image quality determination model.

In some embodiments, the trained image quality determination model may be stored in a memory or a storage device (e.g., the storage device 150) disclosed elsewhere in the present disclosure. The processing device 112A may obtain the trained image quality determination model from the memory or the storage device. In some embodiments, the trained image quality determination model may include a trained CNN model. Optionally, the trained image quality determination model and the trained image analysis model may share one or more convolutional layers.

It should be noted that the above description of the process 600 is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations or modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure. For example, after the trained image quality determination model is generated, the processing device 112B may further test the trained image quality determination model using a set of testing images. Additionally or alternatively, the processing device 112B may update the trained image quality determination model periodically or irregularly based on one or more newly-generated training samples (e.g., new images generated in a car-hailing service) .

FIG. 7 is a flowchart illustrating an exemplary process for determining a weight coefficient according to some embodiments of the present disclosure. In some embodiments, at least part of process 700 may be performed by the processing device 112A (implemented in, for example, the computing device 200 shown in FIG. 2) . For example, the process 700 may be stored in a storage device (e.g., the storage device 150, the storage 220, the storage 390) in the form of instructions (e.g., an application) , and invoked and/or executed by the processing device 112A (e.g., the processor 210 illustrated in FIG. 2, the CPU 340 illustrated in FIG. 3, one or more modules illustrated in FIG. 4A) . The operations of the illustrated process presented below are intended to be illustrative. In some embodiments, the process 700 may be accomplished with one or more additional operations not described, and/or without one or more of the operations discussed. Additionally, the order in which the operations of the process 700 as illustrated in FIG. 7 and described below is not intended to be limiting. In some embodiments, one or more operations of the process 700 may be performed to achieve at least part of operation 530 as described in connection with FIG. 5.

In 710, the processing device 112A (e.g., the determination module 420) may obtain a test sample set.

The test sample set may include a plurality of second sample images. In some embodiments, each of the plurality of second sample images may have an evaluation label indicating whether the second sample image is suitable for facial analysis. Merely by way of example, the evaluation label may be a number in range of 0-1. The larger the number of the evaluation label is, the more suitable the second sample image may be for facial analysis. In some embodiments, the evaluation label may be determined or confirmed by a user manually.

In 720, the processing device 112A (e.g., the determination module 420) may determine a preliminary weight coefficient of each of the one or more quality features.

As described in connection with 530, the weight coefficient of a quality feature may indicate an importance of the quality feature in image quality evaluation. The preliminary weight coefficient of the quality feature refers to an initial value of the weight coefficient of the quality feature. The preliminary weight coefficient may be according to a default setting of the image processing system 100A, set manually by a user of the image processing system 100A, or determined by the processing device 112A according to an actual need. For example, the processing device 112A may randomly assign a preliminary weight coefficient to each quality feature. In some embodiments, the preliminary weight coefficients of different quality features may be the same or different. Merely by way of example, the preliminary weight coefficient of each of the one or more quality features may be determined as 0.1. In some embodiments, a sum of the preliminary weight coefficient (s) of the one or more quality features may be equal to 1.

In 730, for each of the plurality of second sample images, the processing device 112A (e.g., the determination module 420) may determine a sample value of each of one or more quality features of the second sample image.

For example, the processing device 112A may determine a sample value of each quality feature of a second sample image based on a trained quality determination model. In some embodiments, operation 730 may be performed in a similar manner as operation 520 as described in connection with FIG. 5, and the descriptions thereof are not repeated here.

In 740, for each of the one or more quality features, the processing device 112A (e.g., the determination module 420) may determine, based on the preliminary weight coefficient of the quality feature and a Bayes algorithm, an optimized weight coefficient of the quality feature.

In some embodiments, for a second sample image, the processing device 112A may determine a quality prediction value of the second sample image based on the preliminary weight coefficient and the sample value of each quality feature. The determination of the quality prediction value may be performed in a similar manner as that of the quality evaluation value as described in connection with operation 530. For example, the processing device 112A may determine a weighted average value of the sample value (s) of the one or more quality features based on the preliminary weight coefficient (s) . The weighted average value may be determined as the quality prediction value of the second sample image.

Further, the processing device 112A may determine an accuracy of the preliminary weight coefficient (s) of the quality feature (s) . Merely by way of example, the accuracy of the preliminary weight coefficient (s) may be measured by a pass rate of the plurality of second sample images. For example, for each second sample image, the processing device 112A may compare the quality prediction value with the evaluation label of the second sample image. If the difference between the quality prediction value and the evaluation label of a second sample image is less than a threshold, the second sample image may be determined to be passed. As another example, the processing device 112A may select one or more second sample image (s) that can be recognized by a trained face recognition model. As used herein, the trained face recognition model may be similar to the (trained) image analysis model elsewhere (e.g., in FIG. 6 or FIG. 7) in the present disclosure. If a second sample image can be recognized by the trained face recognition model, the second sample image may be determined to be passed. Then the processing device 112A may determine a ratio of the count of the passed second sample image (s) to the total count of the second sample images as the pass rate of the plurality of second sample images.

The processing device 112A may then update the preliminary weight coefficient (s) of the one or more quality features based on the pass rate of the second sample images. For example, the processing device 112A may update the preliminary weight coefficient (s) of the one or more quality features based on the pass rate using a gradient descent algorithm. In some embodiments, the preliminary weight coefficient (s) may be iteratively updated until a specific condition is satisfied in a certain iteration. The specific condition may include, for example, the pass rate of the second sample images in the current iteration is larger than a pass rate threshold, that a difference between the quality prediction value and the evaluation label of each second sample image obtained in a current iteration is less than a loss threshold, that a certain count of iterations has been performed, etc. If the specific condition is satisfied in the current iteration, the processing device 112A may determine the updated weight coefficient (s) in the current iteration as the optimized weight coefficient (s) of the one or more quality features.

It should be noted that the above description of the process 700 is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations or modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure. For example, in operation 740, the processing device 112A may determine the optimized weight coefficient (s) based on other algorithm (s) (e.g., the SVM algorithm, the grid search algorithm, and/or the DNN algorithm as described in connection with FIG. 5) .

FIG. 8 is a flowchart illustrating an exemplary process for obtaining a trained correction model according to some embodiments of the present disclosure. In some embodiments, at least part of process 800 may be performed by the processing device 112B (implemented in, for example, the computing device 200 shown in FIG. 2) . For example, the process 800 may be stored in a storage device (e.g., the storage device 150, the storage 220, the storage 390) in the form of instructions (e.g., an application) , and invoked and/or executed by the processing device 112B (e.g., the processor 210 illustrated in FIG. 2, the CPU 340 illustrated in FIG. 3, one or more modules illustrated in FIG. 4B) . The operations of the illustrated process presented below are intended to be illustrative. In some embodiments, the process 800 may be accomplished with one or more additional operations not described, and/or without one or more of the operations discussed. Additionally, the order in which the operations of the process 800 as illustrated in FIG. 8 and described below is not intended to be limiting.

In some embodiments, the trained correction model described in connection with operation 550 and/or operation 560 in FIG. 5 may be obtained according to the process 800. In some embodiments, the process 800 may be performed by another device or system other than the image processing system 100A, e.g., a device or system of a vendor of the trained correction model. For illustration purposes, the implementation of the process 800 by the processing device 112B is described as an example.

In 810, the processing device 112B (e.g., the obtaining module 440) may obtain a second training sample set.

The second training sample set may include a plurality of image pairs. Each of the plurality of image pairs may include a third sample image and a fourth sample image of a same sample face. The third sample image may satisfy a first correction condition, and the fourth sample image may satisfy a second correction condition. In some embodiments, the first correction condition and/or the second correction condition may relate to an orientation of the sample face. For example, the first correction condition may be that the side of the sample face is illustrated in the image, and the second correction condition may be that the front of the sample face is illustrated in the image. In some embodiments, the second correction condition may be the same as the correction condition as described in connection with FIG. 5.

In 820, the processing device 112B (e.g., the model generation module 450) may train, based on the second training sample set, a second preliminary model by optimizing a loss function F2.

The second preliminary model may be any type of model to be trained as the correction model. For example, the second preliminary model may include a convolutional neural network (CNN) model. The second preliminary model may include one or more second model parameters. Exemplary second model parameters may include the number (or count) of layers, the number (or count) of nodes, the loss function F2, or the like, or any combination thereof. Before training, the second model parameter (s) may have their respective initial values. The value (s) of the second model parameter (s) may be updated in the training process of the second preliminary model.

In some embodiments, the second preliminary model may include or be a C-GAN model. For illustration purposes, FIG. 9 illustrates a schematic diagram of an exemplary C-GAN model 900 according to some embodiments of the present disclosure. The C-GAN model 900 may include one or more preliminary generators (e.g., a first generator, and a second generator illustrated in FIG. 9) and/or one or more preliminary discriminators (e.g., a first discriminator, and a second discriminator illustrated in FIG. 9) .

During the training of the C-GAN model 900, the preliminary generator may be configured to process an initial image of each target training sample and a correction condition to output a predicted image that satisfies the correction condition, and the preliminary discriminator may be configured to generate a discrimination result between the predicted image (i.e., fake data) and a sample image (i.e., true data) of each target training sample. The preliminary generator may be trained to generate the predicted image similar to the sample image to make the preliminary discriminator determine that the predicted image is not synthesized. The preliminary discriminator may be trained to improve its ability to distinguish the preliminary generator’s fake data from the true data. For example, the first discriminator may be configured to determine whether an image is a real image or a fake image. As used herein, the real image refers to an image acquired by an acquisition device. The fake image refers to an image generated by a generator (e.g., the first generator) . In some embodiments, the first discriminator may be further configured to determine whether two images correspond to a same subject. For example, the first discriminator may determine a similarity between the two images by calculating a distance (e.g., a Euclidean distance, a cosine distance, etc. ) between the two images. Further, the first discriminator may determine whether the two images correspond to the same target subject based on the distance.

For example, the first generator may be configured to generate a first predicted image based on the third sample image of an image pair and the second correction condition (e.g., a correction angle) . For example, the first generator may include an encoding component (or referred to as an encoder) , a decoding component (or referred to as a decoder) , and a transformation component. The encoding component may be configured to generate one or more first feature maps based on the third sample image. The transformation component may be configured to generate a correction map by transforming the second correction condition. The correction map and the first feature map (s) may be concatenated (or combined) into a concatenated map, and the concatenated map may be inputted into the decoding component. The decoding component may then be configured to generate the first predicted image by decoding the concatenated map.

The first discriminator may be configured to generate a first discrimination result between the first predicted image generated by the first generator and the fourth sample image of each image pair. For example, the discrimination result may indicate which one of the first predicted image and the fourth sample image of each pair is true data. Additionally or alternatively, the first discriminator may be configured to determine whether the first predicted image generated by the first generator and the third sample image of each image pair correspond to a same subject.

The second generator may be configured to generate a second predicted image based on the first predicted image and the first correction condition. In some embodiments, the first correction condition may relate to the second correction condition. For example, the first correction condition may be opposite to the second correction condition. For illustration purposes, if the second correction condition includes a correction angle (e.g., a yaw angle) of 15°, the first correction condition may include a correction angle of -15°. In some embodiments, the first generator and the second generator may be a same model. For example, the first generator and the second generator may be a same CNN model. In such cases, the first generator and the second generator may share same model parameters. A loss function (e.g., the first loss function) of the first generator may be the same as a loss function (e.g., the third loss function) of the second generator.

The second discriminator may be configured to generate a second discrimination result between the second predicted image and the third sample image of each image pair. The second discrimination result may be similar to the first discrimination result as aforementioned. In some embodiments, the first discriminator and the second discriminator may be a same model. For example, the first discriminator and the second discriminator may be a same classification model (e.g., a bidirectional encoder representations from transformers (BERT) model, a neural network model, a Fasttext model, etc. ) . In such cases, the first discriminator and the second discriminator may share same model parameters. A loss function (e.g., the second loss function) of the first discriminator may be the same as a loss function (e.g., the fourth loss function) of the second discriminator.

In some embodiments, the first generator and the first discriminator may form a first path of the C-GAN model. The second generator and the second discriminator may form a second path of the C-GAN model. An image generated by the first generator in the first path may be used as part of the input of the second generator so as to obtain an inverse solution of the output of the first generator. Then the second path may be used as a feedback path of the first path. In such cases, in a training process of the C-GAN model, a plurality of image features may be obtained through the first path and the second path, which may improve the accuracy of a predicted image generated by the first generator.

In some embodiments, the loss function of the C-GAN model 900 may include one or more of a first loss function relating to the first generator, a second loss function relating to the first discriminator, a third loss function relating to the second generator, and a fourth loss function relating to the second discriminator. In some embodiments, the training of the second preliminary model may include an iterative operation. The second model parameter (s) of the second preliminary model may be iteratively updated in the iterative operation. The iterative operation may include one or more second iterations.

For illustration purposes, FIG. 10 illustrates a schematic diagram of an exemplary current second iteration in an iterative operation for training a second preliminary model according to some embodiments of the present disclosure.

In 1010, the processing device 112B (e.g., the model generation module 450) may obtain an updated second preliminary model generated in a previous second iteration.

For example, the updated second preliminary model may include an updated first generator, an updated second generator, an updated first discriminator, and an updated second discriminator.

In 1020, for each of the plurality of image pairs, the processing device 112B (e.g., the model generation module 450) may generate, based on the third sample image of the image pair and the second correction condition, a first predicted image using the updated first generator.

For example, for an image pair, the processing device 112B may input the third sample image of the image pair and the second correction condition into the updated first generator. The updated first generator may output the first predicted image that satisfies the second correction condition. Merely by way of example, the second correction condition may be that the front face is illustrated in the image. A third sample image of a side face of a driver and the second correction condition may be inputted to the updated first generator to obtain a first predicted image illustrating a front face of the driver. In some embodiments, operation 1020 may be performed in a similar manner as operation 560 as described in connection with FIG. 5, and the descriptions thereof are not repeated here.

In 1030, for each of the plurality of image pairs, the processing device 112B (e.g., the model generation module 450) may generate, based on the first predicted image and the fourth sample image of the image pair, a first discrimination result using the updated first discriminator.

For example, for an image pair, the first discrimination result may indicate whether the first predicted image of the image pair is a fake image or a real image. As another example, the first discrimination result may indicate whether the first predicted image and the fourth sample image of the image pair correspond to a same subject. Merely by way of example, the updated first discriminator may determine a similarity between the first predicted image and the fourth sample image, and compare the similarity with a similarity threshold. In response to determining that the similarity is larger than or equal to the similarity threshold, the updated first discriminator may generate a first discrimination result indicating that the first predicted image and the fourth sample image correspond to a same subject.

In 1040, for each of the plurality of image pairs, the processing device 112B (e.g., the model generation module 450) may generate, based on the first predicted image and the first correction condition of the image pair, a second predicted image using the updated second generator.

For example, for an image pair, the processing device 112B may input the first predicted image of the image pair and the first correction condition into the updated second generator. The updated second generator may output the second predicted image that satisfies the first correction condition. Merely by way of example, the first correction condition may be that the side face is illustrated in the image. A first predicted image of a front face of a driver and the first correction condition may be inputted to the updated second generator to obtain a second predicted image illustrating a side face of the driver. In some embodiments, operation 1040 may be performed in a similar manner as operation 560 as described in connection with FIG. 5, and the descriptions thereof are not repeated here.

In 1050, for each of the plurality of image pairs, the processing device 112B (e.g., the model generation module 450) may generate, based on the second predicted image and the third sample image of the image pair, a second discrimination result using the updated second discriminator.

For example, for an image pair, the second discrimination result may indicate whether the second predicted image of the image pair is a fake image or a real image. As another example, the second discrimination result may indicate whether the second predicted image and the third sample image of the image pair correspond to a same subject.

In 1060, the processing device 112B (e.g., the model generation module 450) may determine a value of the loss function F2 based on the first predicted image, the first discrimination result, the second predicted image, and the second discrimination result of each of the plurality of image pairs.

In some embodiments, as described in connection with FIG. 8, the loss function F2 may include one or more of a first loss function relating to the (updated) first generator, a second loss function relating to the (updated) first discriminator, a third loss function relating to the (updated) second generator, and a fourth loss function relating to the (updated) second discriminator. The processing device 112B may determine the value of the loss function F2 based on one or more of a first loss value of the first loss function, a second loss value of the second loss value, a third loss value of the third loss value, and a fourth loss value of the fourth loss value. More descriptions regarding the determination of the value of the loss function F2 may be found elsewhere in the present disclosure (e.g., FIG. 11 and the descriptions thereof) .

In 1070, the processing device 112B (e.g., the model generation module 450) may evaluate the updated second preliminary model based on the value of the loss function F2.

Merely by way of example, the processing device 112B may determine whether the updated second preliminary model satisfies an evaluation condition. The evaluation condition may include, for example, that the value of the loss function F2 obtained in the current second iteration is less than a second loss threshold, that a difference between the value of the loss function F2 in a previous second iteration and the value of the loss function F2 in the current second iteration is less than a second predetermined threshold, that a certain count of iterations has been performed, or the like, or any combination thereof.

In some embodiments, in response to determining that the updated second preliminary model satisfies the evaluation condition, the processing device 112B may determine that the iterative operation can be terminated. The processing device 112B may further designate the trained first generator or the trained second generator as the trained correction model. In response to determining that the updated second preliminary model does not satisfy the evaluation condition, the processing device 112B may determine to perform a next second iteration until the updated second preliminary model satisfies the evaluation condition. For example, the processing device 112B may update the value (s) of the second model parameter (s) of the updated second preliminary model to be used in a next second iteration based on the value of the loss function F2 according to, for example, a backpropagation algorithm.

It should be noted that the above description of the process 1000 is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations or modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure. For example, the updated first generator and the updated second generator may be the same model. In such cases, in operation 1040, the processing device 112B may generate, based on the first predicted image and the first correction condition of the image pair, the second predicted image using the updated first generator. Additionally or alternatively, the updated first discriminator and the updated second discriminator may be the same model. In such cases, in operation 1050, the processing device 112B may generate, based on the second predicted image and the third sample image of the image pair, a second discrimination result using the updated first discriminator.

FIG. 11 is a flowchart illustrating an exemplary process for determining a value of a loss function F2 according to some embodiments of the present disclosure. In some embodiments, at least part of process 1100 may be performed by the processing device 112B (implemented in, for example, the computing device 200 shown in FIG. 2) . For example, the process 1100 may be stored in a storage device (e.g., the storage device 150, the storage 220, the storage 390) in the form of instructions (e.g., an application) , and invoked and/or executed by the processing device 112B (e.g., the processor 210 illustrated in FIG. 2, the CPU 340 illustrated in FIG. 3, one or more modules illustrated in FIG. 4B) . The operations of the illustrated process presented below are intended to be illustrative. In some embodiments, the process 1100 may be accomplished with one or more additional operations not described, and/or without one or more of the operations discussed. Additionally, the order in which the operations of the process 1100 as illustrated in FIG. 11 and described below is not intended to be limiting.

In some embodiments, the loss function F2 may be of any type of loss function, such as a mean square error loss function, a cross entropy loss function, an exponential loss function, etc. For illustration purposes, the cross entropy loss function may be taken as an example in the present disclosure.

In 1110, the processing device 112B (e.g., the model generation module 450) may determine, based on the third sample image and the first predicted image of each of the plurality of image pairs, a first loss value of the first loss function.

In some embodiments, as described in connection with FIG. 9, the first generator may be trained to generate a first predicted image of an image pair as close to the fourth sample image of the image pair (i.e., a real image) as possible to fool the first discriminator (e.g., make the first discriminator determines that the first predicted image is a real image) . The first loss function may include a first component for training the first generator to fool the first discriminator. The first component may be represented as formula (1) as below:

where D denotes the first discriminator, G denotes the first generator, γ denotes an initial orientation of the sample face in the third sample image,

denotes the first predicted image,

denotes a discrimination result regarding the first predicted image,

represents the first component of the first loss function, and

denotes a probability that the first discriminator determines that the first predicted image is a real image in the first discrimination result. In some embodiments, the discrimination result regarding the first predicted image may include a discrimination value, which may be equal to 1 if the first discriminator determines that the first predicted image is a real image or 0 if the first discriminator determines that the first predicted image is a fake image. During the training of the first generator, the first loss function may be optimized by maximizing the first component.

Additionally or alternatively, the first generator may be trained to generate a first predicted image as close to the original third sample image of the image pair to make the first discriminator determine that the first predicted image and the third sample image correspond to a same subject. The first loss function may include a second component for training the first generator to generate the first predicted image as close to the original third sample image as possible. In some embodiments, the second component may be represented as formula (2) as below:

where x ^s denotes the third sample image, D (x ^s) denotes a discrimination result regarding the third sample image,

denotes a distance between the discrimination result regarding the third sample image and the discrimination result regarding the first predicted image in a Lagrangian space.

In some embodiments, the discrimination result regarding the third sample image may include a discrimination value, which may be equal to 1 if the first discriminator determines that the third sample image is a real image or 0 if the first discriminator determines that the third sample image is a fake image. During the training of the first generator, the first loss function may be optimized by minimizing the second component. In some embodiments, the second component may be determined as a distance between the third sample image and the first predicted image in a feature space. For example, the distance may be a Euclidean distance between the third sample image and the first predicted image in the Euclidean space.

In some embodiments, the first loss function may include both the first component and the second component, and be represented as formula (3) as below:

In the current second iteration, the discrimination result regarding the first predicted image (i.e.,

) and the discrimination result regarding the third sample image (i.e., D (x ^s) ) may be determined using the updated first discriminator or the updated second discriminator. The first loss value of the first loss function may be determined according to the formula (3) based on the discrimination results regarding the first predicted image and the third sample image.

In 1120, the processing device 112B (e.g., the model generation module 450) may determine, based on the first discrimination result of each of the plurality of image pairs, a second loss value of the second loss function.

The second loss function may be used to improve the ability of the first discriminator to distinguish the first predicted image of an image pair from the fourth sample image of the image pair. For example, the second loss function may be represented as formula (4) as below:

where

denotes a probability that the first discriminator determines that the fourth sample image is a real image in the first discrimination result,

denotes a probability that the first discriminator determines that the first predicted image is a fake image in the first discrimination result. During the training of the first discriminator, the second loss function may be optimized by maximizing the probability of determining that the fourth sample image is a real image and maximizing the probability of determining that the first predicted image is a fake image.

In 1130, the processing device 112B (e.g., the model generation module 450) may determine, based on the third sample image and the second predicted image of each of the plurality of image pairs, a third loss value of the third loss function.

In some embodiments, as described in connection with FIG. 9, the first generator and the second generator may be a same model. In such cases, the third loss value of the third loss function may be determined in a similar manner as the first loss value of the first loss function. In some embodiments, the second generator may be trained to generate a second predicted image of an image pair as close to the third sample image of the image pair (i.e., a real image) as possible to fool the second discriminator (e.g., make the second discriminator determines that the second predicted image is a real image) . The third loss function may include a third component for training the second generator to fool the second discriminator. The third component may be represented as formula (5) as below:

where D′ denotes the second discriminator, G′ denotes the second generator,

denotes the second predicted image,

denotes a discrimination result regarding the second predicted image,

denotes the third component of the third loss function, and

denotes a probability that the second discriminator determines that the second predicted image is a real image in the second discrimination result. In some embodiments, the discrimination result regarding the second predicted image may include a discrimination value, which may be equal to 1 if the second discriminator determines that the second predicted image is a real image or 0 if the second discriminator determines that the second predicted image is a fake image. During the training of the second generator, the third loss function may be optimized by maximizing the third component.

In some embodiments, the second predicted image may be regarded as a reconstructed image corresponding to the third sample image. The second generator may be trained to generate a second predicted image as close to the third sample image of the image pair as possible to make the second discriminator determine that the second predicted image and the third sample image correspond to a same subject. The third loss function may include a fourth component for training the second generator to generate the second predicted image as close to the original third sample image as possible. In some embodiments, the fourth component may be represented as formula (6) as below:

where x ^s denotes the third sample image, D′ (x ^s) denotes a discrimination result regarding the third sample image,

denotes a distance between the discrimination result regarding the third sample image and the discrimination result regarding the second predicted image in the Lagrangian space. During the training of the second generator, the third loss function may be optimized by minimizing the fourth component. In some embodiments, the fourth component may be determined as a distance between the third sample image and the second predicted image in a feature space. For example, the distance may be a Euclidean distance between the third sample image and the second predicted image in the Euclidean space.

In some embodiments, the third loss function may include both the third component and the fourth component, and be represented as formula (7) as below:

In the current second iteration, the discrimination result regarding the second predicted image (i.e.,

) and the discrimination result regarding the third sample image (i.e., D′ (x ^s) ) may be determined using the updated first discriminator or the updated second discriminator. The third loss value of the third loss function may be determined according to the formula (7) based on the discrimination results regarding the second predicted image and the third sample image.

In 1140, the processing device 112B (e.g., the model generation module 450) may determine, based on the second discrimination result of each of the plurality of image pairs, a fourth loss value of the fourth loss function.

The fourth loss function may be used to improve the ability of the second discriminator to distinguish the second predicted image of an image pair from the third sample image of the image pair. For example, the fourth loss function may be represented as formula (8) as below:

where

denotes a probability that the second discriminator determines that the third sample image is a real image in the second discrimination result,

denotes a probability that the second discriminator determines that the second predicted image is a fake image in the second discrimination result. During the training of the second discriminator, the fourth loss function may be optimized by maximizing the probability of determining that the third sample image is a real image and maximizing the probability of determining that the second predicted image is a fake image.

In 1150, the processing device 112B (e.g., the model generation module 450) may determine the value of the loss function F2 based on the first loss value, the second loss value, the third loss value, and the fourth loss value using a weighted sum algorithm.

In some embodiments, the processing device 112B may determine the value of the loss function F2 using a weighted sum algorithm. For example, the processing device 112B may determine a weight coefficient for each of the first loss value, the second loss value, the third loss value, and the fourth loss value. Further, the processing device 112B may determine a weighted sum of the first loss value, the second loss value, the third loss value, and the fourth loss value. The processing device 112B may further designate the weighted sum as the value of the loss function F2.

In some embodiments, the weight coefficient corresponding to each of the first loss value, the second loss value, the third loss value, and the fourth loss value may both equal to 1. In such cases, the value of the loss function F2 may be represented as formula (9) below:

where

denotes the first loss value,

denotes the second loss value,

denotes the third loss value, and

denotes the fourth loss value.

In some embodiments, the weight coefficients of two or more of the first loss value, the second loss value, the third loss value, and the fourth loss value may be different. For example, the weight coefficient of the fourth loss value may be larger than an average of the weighting coefficients of the first loss value, the second loss value, and the third loss value. In such cases, the fourth loss value may have a relatively larger impact on the value of the loss function F2 than other loss values. The large impact of the fourth loss value on the value of the loss function F2 may then reversely stimulate the second generator, the first discriminator, and the first generator to improve their performance. Accordingly, a first generator with a relatively high performance may be obtained according to a training process, thereby optimizing the loss function F2 more effectively.

In some embodiments, as described in connection with FIG. 9, the first discriminator and the second discriminator may be a same model. The first discriminator and the second discriminator may share the same loss function. That is, the fourth loss function illustrated in formula (8) may be omitted. In such cases, the value of the loss function F2 may be represented as formula (10) as below:

In some embodiments, in the current second iteration, the processing device 112B may further update the second model parameters of the updated second preliminary model based on the value of the loss function F2 according to, for example, a stochastic gradient descent backpropagation algorithm. In some embodiments, the processing device 112B may update each of the first generator, the second generator, the first discriminator, the second discriminator of the updated second preliminary model based on the value of the loss function F2 in the current second iteration. Merely by way of example, the first generator of the updated second preliminary model may be updated based on the first loss function; the second generator of the updated second preliminary model may be updated based on the third loss function; the first discriminator of the updated second preliminary model may be updated based on the second loss function; the fourth generator of the updated second preliminary model may be updated based on the fourth loss function.

Alternatively, the processing device 112B may update part of the first generator, the second generator, the first discriminator, and the second discriminator of the updated second preliminary model in the current second iteration. For example, in the current second iteration, the processing device 112B may merely update model parameters of the first generator and/or the second generator based on the value of the loss function F2 (or a combination of the first loss value and the third loss value) . In a next second iteration, the processing device 112B may merely update model parameters of the first discriminator and/or the second discriminator based on the value of the loss function F2 (or a combination of the second loss value and the fourth loss value) determined in the next second iteration.

It should be noted that the above description of the process 1100 is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations or modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure. In some embodiments, a formula described above may be modified according to an actual need. For example, the formula may include one or more additional coefficients, and/or one or more coefficients described above may be omitted.

Having thus described the basic concepts, it may be rather apparent to those skilled in the art after reading this detailed disclosure that the foregoing detailed disclosure is intended to be presented by way of example only and is not limiting. Various alterations, improvements, and modifications may occur and are intended to those skilled in the art, though not expressly stated herein. These alterations, improvements, and modifications are intended to be suggested by this disclosure, and are within the spirit and scope of the exemplary embodiments of this disclosure.

Moreover, certain terminology has been used to describe embodiments of the present disclosure. For example, the terms “one embodiment, ” “an embodiment, ” and/or “some embodiments” mean that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Therefore, it is emphasized and should be appreciated that two or more references to “an embodiment” or “one embodiment” or “an alternative embodiment” in various portions of this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures or characteristics may be combined as suitable in one or more embodiments of the present disclosure.

Further, it will be appreciated by one skilled in the art, aspects of the present disclosure may be illustrated and described herein in any of a number of patentable classes or context including any new and useful process, machine, manufacture, or comlocation of matter, or any new and useful improvement thereof. Accordingly, aspects of the present disclosure may be implemented entirely hardware, entirely software (including firmware, resident software, micro-code, etc. ) or combining software and hardware implementation that may all generally be referred to herein as a “unit, ” “module, ” or “system. ” Furthermore, aspects of the present disclosure may take the form of a computer program product embodied in one or more computer readable media having computer-readable program code embodied thereon.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including electromagnetic, optical, or the like, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that may communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable signal medium may be transmitted using any appropriate medium, including wireless, wireline, optical fiber cable, RF, or the like, or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present disclosure may be written in a combination of one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C++, C#, VB. NET, Python or the like, conventional procedural programming languages, such as the "C" programming language, Visual Basic, Fortran 2003, Perl, COBOL 2002, PHP, ABAP, dynamic programming languages such as Python, Ruby, and Groovy, or other programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN) , or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider) or in a cloud computing environment or offered as a service such as a Software as a Service (SaaS) .

Furthermore, the recited order of processing elements or sequences, or the use of numbers, letters, or other designations thereof, are not intended to limit the claimed processes and methods to any order except as may be specified in the claims. Although the above disclosure discusses through various examples what is currently considered to be a variety of useful embodiments of the disclosure, it is to be understood that such detail is solely for that purpose, and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover modifications and equivalent arrangements that are within the spirit and scope of the disclosed embodiments. For example, although the implementation of various components described above may be embodied in a hardware device, it may also be implemented as a software only solution, e.g., an installation on an existing server or mobile device.

Similarly, it should be appreciated that in the foregoing description of embodiments of the present disclosure, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the various embodiments. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed subject matter requires more features than are expressly recited in each claim. Rather, claimed subject matter may lie in less than all features of a single foregoing disclosed embodiment.

Claims

A method for image processing, implemented on a computing device having at least one storage device storing a set of instructions, and at least one processor in communication with the at least one storage device, the method comprising:

obtaining a first facial image;

determining, based on a trained image quality determination model, a value of each of one or more quality features of the first facial image;

determining, based on the one or more values of the one or more quality features, whether the first facial image satisfies a quality condition;

obtaining a correction condition of the first facial image;

obtaining a trained correction model, wherein the trained correction model includes a trained first generator; and

in response to determining that the first facial image satisfies the quality condition, generating a second facial image that satisfies the correction condition by correcting the first facial image based on the trained correction model and the correction condition.
The method of claim 1, wherein the one or more quality features of the first facial image include a norm, and the trained image quality determination model is trained according to a model training process including:

obtaining a preliminary model;

obtaining a first training sample set including a plurality of first sample images, wherein each of the plurality of first sample images has one or more label values of the one or more quality features of the first sample image, and the label value of the norm is determined based on a trained image analysis model; and

obtaining the trained image quality determination model by training the preliminary model using the first training sample set.
The method of claim 2, wherein at least one of the trained image quality determination model or the trained image analysis model includes a trained convolutional neural network model.
The method of claim 2, wherein the trained image quality determination model and the trained image analysis model share one or more convolutional layers.
The method of claim 1, wherein the determining, based on the one or more values of the one or more quality features, whether the first facial image satisfies a quality condition includes:

determining a weight coefficient of each of the one or more quality features;

determining, based on the one or more weight coefficients and the one or more values of the one or more quality features, a quality evaluation value of the first facial image; and

determining whether the first facial image satisfies the quality condition based on the quality evaluation value.
The method of claim 5, wherein the determining the weight coefficient of each of the one or more quality features includes:

obtaining a test sample set including a plurality of second sample images;

determining a preliminary weight coefficient of each of the one or more quality features;

for each of the plurality of second sample images, determining, based on the trained quality determination model, a sample value of each of one or more quality features of the second sample image; and

for each of the one or more quality features, determining, based on the preliminary weight coefficient of the quality feature and a Bayes algorithm, an optimized weight coefficient of the quality feature.
The method of claim 1, wherein the one or more quality features of the first facial image include at least one of: a blurring degree of the first facial image, a proportion of a target subject that is occluded in the first facial image, a brightness of the first facial image, a shooting angle of the first facial image, a completeness of the target subject in the first facial image, or a posture of the target subject in the first facial image.
The method of claim 1, wherein

the trained first generator is a sub-model of a trained conditional Generative Adversarial network (C-GAN) model, the trained C-GAN model further includes a trained second generator, a trained first discriminator, and a trained second discriminator, and

the trained correction model is generated according to a model training process including:

obtaining a second training sample set including a plurality of image pairs, wherein each of the plurality of image pairs includes a third sample image and a fourth sample image of a same sample face, the third sample image satisfies a first correction condition, and the fourth sample image satisfies a second correction condition; and

training, based on the second training sample set, a second preliminary model by optimizing a loss function, wherein

the second preliminary model includes a first generator, a second generator, a first discriminator, and a second discriminator, and

the loss function includes a first loss function relating to the first generator, a second loss function relating to the first discriminator, a third loss function relating to the second generator, and a fourth loss function relating to the second discriminator.
The method of claim 8, wherein the training a second preliminary model by optimizing a loss function includes an iterative operation including one or more iterations, and at least one of the one or more iterations includes:

obtaining an updated second preliminary model generated in a previous iteration, wherein the updated second preliminary model includes an updated first generator, an updated second generator, an updated first discriminator, and an updated second discriminator;

for each of the plurality of image pairs,

generating, based on the third sample image of the image pair and the second correction condition, a first predicted image using the updated first generator;

generating, based on the first predicted image and the fourth sample image of the image pair, a first discrimination result using the updated first discriminator;

generating, based on the first predicted image and the first correction condition of the image pair, a second predicted image using the updated second generator; and

generating, based on the second predicted image and the third sample image of the image pair, a second discrimination result using the updated second discriminator;

determining a value of the loss function based on the first predicted image, the first discrimination result, the second predicted image, and the second discrimination result of each of the plurality of image pairs; and

evaluating the updated second preliminary model based on the value of the loss function.
The method of claim 9, wherein the determining a value of the loss function based on the first predicted image, the first discrimination result, the second predicted image, and the second discrimination result of each of the plurality of image pairs includes:

determining, based on the third sample image and the first predicted image of each of the plurality of image pairs, a first loss value of the first loss function;

determining, based on the first discrimination result of each of the plurality of image pairs, a second loss value of the second loss function;

determining, based on the third sample image and the second predicted image of each of the plurality of image pairs, a third loss value of the third loss function;

determining, based on the second discrimination result of each of the plurality of image pairs, a fourth loss value of the fourth loss function; and

determining the value of the loss function based on the first loss value, the second loss value, the third loss value, and the fourth loss value using a weighted sum algorithm.
The method of claim 10, wherein in the weighted sum algorithm, a weight coefficient corresponding to the fourth loss value is larger than an average of the weighting coefficients corresponding to the first loss value, the second loss value, and the third loss value.
The method of claim 8, wherein the first generator and the second generator are the same model.
The method of claim 8, wherein the first discriminator and the second discriminator are the same model.
The method of claim 1, wherein the first facial image includes a human face, and the correction condition of the first facial image relates to an orientation of the human face.
A system for image processing, comprising:

at least one storage medium including a set of instructions; and

at least one processor in communication with the at least one storage medium, wherein when executing the set of instructions, the at least one processor is directed to cause the system to:

obtain a first facial image;

determine, based on a trained image quality determination model, a value of each of one or more quality features of the first facial image;

determine, based on the one or more values of the one or more quality features, whether the first facial image satisfies a quality condition;

obtain a correction condition of the first facial image;

obtain a trained correction model, wherein the trained correction model includes a trained first generator; and

in response to determining that the first facial image satisfies the quality condition, generate a second facial image that satisfies the correction condition by correcting the first facial image based on the trained correction model and the correction condition.
The system of claim 15, wherein the one or more quality features of the first facial image include a norm, and to obtain the trained image quality determination model, the at least one processor is directed to cause the system to:

obtain a preliminary model;

obtain a first training sample set including a plurality of first sample images, wherein each of the plurality of first sample images has one or more label values of the one or more quality features of the first sample image, and the label value of the norm is determined based on a trained image analysis model; and

obtain the trained image quality determination model by training the preliminary model using the first training sample set.
The system of claim 16, wherein at least one of the trained image quality determination model or the trained image analysis model includes a trained convolutional neural network model.
The system of claim 16, wherein the trained image quality determination model and the trained image analysis model share one or more convolutional layers.
The system of claim 15, wherein to determine, based on the one or more values of the one or more quality features, whether the first facial image satisfies a quality condition, the at least one processor is directed to cause the system to:

determine a weight coefficient of each of the one or more quality features;

determine, based on the one or more weight coefficients and the one or more values of the one or more quality features, a quality evaluation value of the first facial image; and

determine whether the quality evaluation value of the first facial image satisfies the quality condition.
The system of claim 19, wherein to determine the weight coefficient of each of the one or more quality features, the at least one processor is directed to cause the system to:

obtain a test sample set including a plurality of second sample images;

determine a preliminary weight coefficient of each of the one or more quality features;

for each of the plurality of second sample images, determine, based on the trained quality determination model, a sample value of each of one or more quality features of the second sample image; and

for each of the one or more quality features, determine, based on the preliminary weight coefficient of the quality feature and a Bayes algorithm, an optimized weight coefficient of the quality feature.
The system of claim 15, wherein the one or more quality features of the first facial image include at least one of: a blurring degree of the first facial image, a proportion of a target subject that is occluded in the first facial image, a brightness of the first facial image, a shooting angle of the first facial image, a completeness of the target subject in the first facial image, or a posture of the target subject in the first facial image.
The system of claim 15, wherein the trained first generator is a sub-model of a trained conditional Generative Adversarial network (C-GAN) model, the trained C-GAN model further includes a trained second generator, a trained first discriminator, and a trained second discriminator, and

to obtain the trained correction model, the at least one processor is directed to cause the system to:

obtain a second training sample set including a plurality of image pairs, wherein each of the plurality of image pairs includes a third sample image and a fourth sample image of a same sample face, the third sample image satisfies a first correction condition, and the fourth sample image satisfies a second correction condition; and

train, based on the second training sample set, a second preliminary model by optimizing a loss function, wherein

the second preliminary model includes a first generator, a second generator, a first discriminator, and a second discriminator, and

the loss function includes a first loss function relating to the first generator, a second loss function relating to the first discriminator, a third loss function relating to the second generator, and a fourth loss function relating to the second discriminator.
The system of claim 22, wherein training a second preliminary model by optimizing a loss function includes an iterative operation including one or more iterations, and in at least one of the one or more iterations, the at least one processor is directed to cause the system to:

obtain an updated second preliminary model generated in a previous iteration, wherein the updated second preliminary model includes an updated first generator, an updated second generator, an updated first discriminator, and an updated second discriminator;

for each of the plurality of image pairs,

generate, based on the third sample image of the image pair and the second correction condition, a first predicted image using the updated first generator;

generate, based on the first predicted image and the fourth sample image of the image pair, a first discrimination result using the updated first discriminator;

generate, based on the first predicted image and the first correction condition of the image pair, a second predicted image using the updated second generator; and

generate, based on the second predicted image and the third sample image of the image pair, a second discrimination result using the updated second discriminator;

determine a value of the loss function based on the first predicted image, the first discrimination result, the second predicted image, and the second discrimination result of each of the plurality of image pairs; and

evaluate the updated second preliminary model based on the value of the loss function.
The system of claim 23, wherein to determine a value of the loss function based on the first predicted image, the first discrimination result, the second predicted image, and the second discrimination result of each of the plurality of image pairs, the at least one processor is directed to cause the system to:

determine, based on the third sample image and the first predicted image of each of the plurality of image pairs, a first loss value of the first loss function;

determine, based on the first discrimination result of each of the plurality of image pairs, a second loss value of the second loss function;

determine, based on the third sample image and the second predicted image of each of the plurality of image pairs, a third loss value of the third loss function;

determine, based on the second discrimination result of each of the plurality of image pairs, a fourth loss value of the fourth loss function; and

determine the value of the loss function based on the first loss value, the second loss value, the third loss value, and the fourth loss value using a weighted sum algorithm.
The system of claim 24, wherein in the weighted sum algorithm, a weight coefficient of the fourth loss value is larger than an average of the weighting coefficients of the first loss value, the second loss value, and the third loss value.
The system of claim 22, wherein the first generator and the second generator are the same model.
The system of claim 22, wherein the first discriminator and the second discriminator are the same model.
The system of claim 15, wherein the first facial image includes a human face, and the correction condition of the first facial image relates to an orientation of the human face.
A non-transitory computer readable medium, comprising executable instructions that, when executed by at least one processor, direct the at least one processor to perform a method, the method comprising:

obtaining a first facial image;

determining, based on a trained image quality determination model, a value of each of one or more quality features of the first facial image;

determining, based on the one or more values of the one or more quality features, whether the first image satisfies a quality condition;

obtaining a correction condition of the first image;

obtaining a trained correction model, wherein the trained correction model includes a trained first generator; and

in response to determining that the first image satisfies the quality condition, generating a second facial image that satisfies the correction condition by correcting the first image based on the trained correction model and the correction condition.