CN117135330A

CN117135330A - Model training and image processing method and device and electronic equipment

Info

Publication number: CN117135330A
Application number: CN202311098540.3A
Authority: CN
Inventors: 陈洁茹; 郭晶晶
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2023-08-29
Filing date: 2023-08-29
Publication date: 2023-11-28

Abstract

The application discloses a model training and image processing method and device and electronic equipment, and belongs to the technical field of data processing. The model training method comprises the following steps: acquiring a first training data set and a second training data set; training the first network model based on the first training data set to obtain a second network model, wherein the second network model is used for outputting a skin color feature map; training the third network model based on the second training data set to obtain a fourth network model, wherein the fourth network model is used for outputting a first global feature map; obtaining a second global feature map based on the skin color feature map and the first global feature map, wherein the second global feature map is used for obtaining first white balance gain data; and obtaining third white balance gain data based on the first white balance gain data and the second white balance gain data, wherein the second white balance gain data is obtained based on a skin color feature map and a non-neural network white balance algorithm.

Description

Model training and image processing method and device and electronic equipment

Technical Field

The application belongs to the technical field of image processing, and particularly relates to a model training and image processing method, device and electronic equipment.

Background

In the image signal processing (Image Signal Processing, ISP) path, an automatic white balance (Auto White Balance, AWB) module is an important module to correct the image color. Wherein the AWB module can correct the color of the object under different illumination to a color recognized by the human eye.

In the related technical scheme, the automatic white balance algorithm is mainly divided into a traditional automatic white balance algorithm and an automatic white balance algorithm based on a neural network.

For the traditional automatic white balance algorithm, a final white balance value needs to be calculated according to the position of a statistical data (stats) drop point in an original image file and some manually set super parameter constraints, when calculation is performed, in order to balance the accuracy of the automatic white balance value of some scenes, the super parameter sacrifices some parameters of non-important scenes when setting, and a final result is judged only according to the drop point, so that image characteristic information is not fully utilized, and the finally output white balance value has deviation.

And for an automatic white balance algorithm based on a neural network, the consistency of output results is relatively poor and the stability is poor.

Disclosure of Invention

The embodiment of the application aims to provide a model training and image processing method, device and electronic equipment, which can obtain accurate and consistent white balance data.

In a first aspect, an embodiment of the present application provides a model training method, including: acquiring a first training data set and a second training data set; the first training data set comprises face image data, initial color temperature data and initial skin color data, the second training data set comprises original image data and initial white balance gain data, and the face image data is obtained based on the original image data; training the first network model based on the first training data set to obtain a second network model, wherein the second network model is used for outputting a skin color feature map; training the third network model based on the second training data set to obtain a fourth network model, wherein the fourth network model is used for outputting a first global feature map; obtaining a second global feature map based on the skin color feature map and the first global feature map, wherein the second global feature map is used for obtaining first white balance gain data; and obtaining third white balance gain data based on the first white balance gain data and the second white balance gain data, wherein the second white balance gain data is obtained based on a skin color feature map and a non-neural network white balance algorithm.

In a second aspect, an embodiment of the present application provides an image processing method, including: acquiring an image to be processed; carrying out human image segmentation on the image to be processed to obtain a first image containing human images; processing the first image by adopting a second network model to obtain a skin color feature map; processing the image to be processed by adopting a fourth network model to obtain a first global feature map; obtaining a second global feature map according to the skin color feature map and the first global feature map; obtaining first white balance gain data based on the second global feature map, and obtaining second white balance gain data based on the skin color feature map and a non-neural network white balance algorithm; obtaining third white balance gain data based on the first white balance gain data and the second white balance gain data; and processing the image to be processed based on the third white balance gain data to obtain a second image.

In a third aspect, an embodiment of the present application provides a model training apparatus, including: the first acquisition module is used for acquiring a first training data set and a second training data set; the first training data set comprises face image data, initial color temperature data and initial skin color data, the second training data set comprises original image data and initial white balance gain data, and the face image data is obtained based on the original image data; the first training module is used for training the first network model based on the first training data set to obtain a second network model, and the second network model is used for outputting a skin color feature map; the second training module is used for training the third network model based on the second training data set to obtain a fourth network model, and the fourth network model is used for outputting the first global feature map; the first fusion module is used for obtaining a second global feature map based on the skin color feature map and the first global feature map, and the second global feature map is used for obtaining first white balance gain data; the first processing module is used for obtaining third white balance gain data based on the first white balance gain data and the second white balance gain data, wherein the second white balance gain data is obtained based on a skin color feature map and a non-neural network white balance algorithm.

In a fourth aspect, an embodiment of the present application provides an image processing apparatus including: the second acquisition module is used for acquiring the image to be processed; the segmentation module is used for carrying out human image segmentation on the image to be processed to obtain a first image containing the human image; the second processing module is used for processing the first image by adopting a second network model to obtain a skin color feature map; the third processing module is used for processing the image to be processed by adopting a fourth network model to obtain a first global feature map; the second fusion module is used for obtaining a second global feature map according to the skin color feature map and the first global feature map; the fourth processing module is used for obtaining first white balance gain data based on the second global feature map and obtaining second white balance gain data based on the skin color feature map and a non-neural network white balance algorithm; the fourth processing module is further configured to obtain third white balance gain data based on the first white balance gain data and the second white balance gain data; and the adjusting module is used for processing the image to be processed based on the third white balance gain data to obtain a second image.

In a fifth aspect, embodiments of the present application provide an electronic device comprising a processor and a memory storing a program or instructions executable on the processor, the program or instructions implementing the steps of the method as in the first or second aspect when executed by the processor.

In a sixth aspect, embodiments of the present application provide a readable storage medium having stored thereon a program or instructions which when executed by a processor perform the steps of the method as in the first or second aspects.

In a seventh aspect, embodiments of the present application provide a chip comprising a processor and a communication interface, the communication interface being coupled to the processor, the processor being configured to execute programs or instructions to implement a method as in the first or second aspect.

In an eighth aspect, embodiments of the present application provide a computer program product stored in a storage medium, the program product being executable by at least one processor to implement a method as in the first or second aspects.

In the embodiment of the application, the first network model and the third network model are trained by using the obtained first training data set and the obtained second training data set, so as to obtain a second network model and a fourth network model, wherein the second network model can output a skin color feature map, and the fourth network model can output a first global feature map.

In the above embodiment, the second white balance gain data obtained based on the skin tone feature map and the non-neural network white balance algorithm can be combined with the first white balance gain data obtained above to obtain the third white balance gain data, so as to process the image according to the third white balance gain data.

Compared with the related technical scheme, the embodiment of the application can perform data processing according to the determined third white balance gain data, and solves the problems of color cast and insufficient utilization of image characteristic information existing in the process of processing the image by adopting the white balance gain data calculated by the non-neural network white balance algorithm. Meanwhile, the problems of poor consistency and poor stability of output results of an automatic white balance algorithm based on a neural network in the related technical scheme are also solved.

Specifically, in the embodiment of the application, the skin color feature map can be used as a parameter of the non-neural network white balance algorithm, so that the color cast of the second white balance gain data output by the non-neural network white balance algorithm is reduced, the color accuracy of the adjusted image is improved, and meanwhile, the image feature information can be fully utilized.

Further, the determination of the third white balance gain data refers to the second white balance gain data, and therefore, the first white balance gain data can be corrected using the second white balance gain data, so that the consistency and stability thereof can be improved.

Compared with the related technical scheme, the third white balance gain data determined by the embodiment of the application is more accurate and has better consistency. Under the condition that the first image is processed by adopting the third white balance gain data, the color processing effect of the image can be improved, so that the processed second image is more in line with the true color seen by human eyes.

Drawings

FIG. 1 is a flow chart of a model training method in an embodiment of the application;

FIG. 2 is a schematic diagram of a skin tone feature map determination process in an embodiment of the application;

FIG. 3 is a flow chart of processing portrait image data according to an embodiment of the present application;

FIG. 4 is an exploded schematic view of an embodiment of the present application;

FIG. 5 is a schematic diagram of downsampling in an embodiment of the application;

FIG. 6 is a schematic diagram of feature extraction in an embodiment of the application;

FIG. 7 is a schematic diagram of a fusion operation in an embodiment of the application;

FIG. 8 is a schematic diagram of outputting a second global signature in an embodiment of the application;

FIG. 9 is a schematic view of an overall frame in an embodiment of the application;

FIG. 10 is a schematic block diagram of a model training apparatus in an embodiment of the application;

FIG. 11 is a flow chart of an image processing method in an embodiment of the application;

fig. 12 is a schematic block diagram of an image processing apparatus in an embodiment of the present application;

FIG. 13 is a schematic block diagram of an electronic device in an embodiment of the application;

fig. 14 is a schematic diagram of a hardware structure of an electronic device in an embodiment of the present application.

Detailed Description

Embodiments of the present application will be described more fully hereinafter with reference to the accompanying drawings, in which it is shown, however, in which some, but not all embodiments of the application are shown. All other embodiments, which are obtained by a person skilled in the art based on the embodiments of the present application, fall within the scope of protection of the present application.

The terms "first," "second," and the like in the description of the present application, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, as appropriate, such that embodiments of the present application may be implemented in sequences other than those illustrated or described herein, and that the objects identified by "first," "second," etc. are generally of a type, and are not limited to the number of objects, such as the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.

The model training and image processing method, device and electronic equipment provided by the embodiment of the application are described in detail below through specific embodiments and application scenes thereof with reference to the accompanying drawings.

In some of these embodiments, as shown in fig. 1, a model training method is proposed, including:

step 102, a first training data set and a second training data set are acquired.

The first training data set comprises face image data, initial color temperature data and initial skin color data, the second training data set comprises original image data and initial white balance gain data, and the face image data is obtained based on the original image data.

In the embodiment of the application, the face image data is obtained by dividing the original image data.

And 104, training the first network model based on the first training data set to obtain a second network model, wherein the second network model is used for outputting the skin color feature map.

And step 106, training the third network model based on the second training data set to obtain a fourth network model, wherein the fourth network model is used for outputting the first global feature map.

Step 108, obtaining a second global feature map based on the skin color feature map and the first global feature map, wherein the second global feature map is used for obtaining first white balance gain data.

Step 110, obtaining third white balance gain data based on the first white balance gain data and the second white balance gain data.

The second white balance gain data is obtained based on the skin color feature map and a non-neural network white balance algorithm.

Specifically, in the above embodiment, the skin color feature map can be used as a parameter of the non-neural network white balance algorithm, so as to reduce the color cast of the second white balance gain data output by the non-neural network white balance algorithm, thereby improving the color accuracy after the image adjustment, and simultaneously, fully utilizing the image feature information.

In some embodiments of the present application, the non-neural network white balance algorithm calculates final white balance gain data based on the location of statistical data (stats) drop points in the original image file and some manually set hyper-parameter constraints.

Wherein, the non-neural network white balance algorithm predicts scene light source based on the assumption of gray world, counts a graph to obtain R, G, B value of each pixel point, and lets the final R _gain Is G/R, B _gain G/B.

The gray world is assumed to be the average of the colors of the nature close to gray, G is the average of the entire graph G, B is the average of the entire graph B, and R is the average of the entire graph R.

In some non-neural network white balance algorithms, the non-neural network white balance algorithm is upgraded, and misleading color points are removed according to manual experience based on RGB point information counted by the whole graph. Wherein the misleading color point may be a non-gray black point. White balance is then calculated based on an advanced gray world algorithm. At this time, the advanced gray world algorithm not only simply counts the whole image information, but also sets a gray area, and the points falling in the gray area participate in the final calculation.

In some embodiments of the present application, an original portrait self-timer raw image, a target white balance value corresponding to the original portrait self-timer raw image, and a portrait self-timer jpg image are obtained in advance.

The method comprises the steps of dividing a portrait self-timer raw image into rectangular raw images with faces only through a portrait dividing algorithm, enabling head images with different sizes to correspond to portrait dividing raw images with different sizes, collecting color temperature information of each picture through a spectrometer, and recording in a one-to-one correspondence mode.

According to the portrait self-timer jpg chart, the r, g and b pixel values of the complexion of the face are measured by an image processing tool, and the numerical values of (r/g and b/g) are calculated as complexion target values, wherein the image processing tool can be image processing software, such as Photoshop. In the embodiment of the application, the r, g and b pixel values of the skin color of the face can be the average value of the pixels of the face area.

Simultaneously, 5 common color temperatures and 8 main stream skin colors are selected. Wherein the 5 common color temperatures include 7k, 6k, 5k, 4k and 3k. The 8 main stream complexions comprise white complexion, pale complexion, dark brown complexion, pale complexion, dark black complexion, pale black complexion and dark complexion, and the types of the complexions can be selected by self, for example, 3 complexions can be selected for training.

And obtaining 8 pictures of main stream skin colors under 5 color temperatures, namely 40 groups of pictures in total, and forming a data pair by the color temperature (correlated colour temperature, CCT), skin color pixel values and 40 groups of skin color pictures.

The image segmentation raw image, the color temperature and the skin color target value form a first group of data pairs, namely training samples in a first training data set.

And forming a second group of data pairs by the whole portrait self-timer raw image and the corresponding white balance value, namely training samples in a second training data set.

In some embodiments of the present application, the first network model includes a first sub-network and a second sub-network, the first network model is trained based on a first training data set to obtain a second network model, including: inputting the face image data into a first sub-network to obtain a skin color feature map; processing the skin color feature map by adopting a second sub-network to obtain first skin color temperature data and first skin color data; and updating parameters of the first subnetwork and the second subnetwork according to the initial color temperature data, the first color temperature data corresponding to the initial color temperature data, the initial skin color data and the first skin color data corresponding to the initial skin color data to obtain a second network model.

In this embodiment, when the first network model is trained, the first network model is decomposed to obtain the first subnetwork and the second subnetwork contained therein.

The first sub-network may take face image data as input, take a skin tone feature map as output, and the second sub-network may take skin tone feature map as input, take first color temperature data and first skin tone data as output, based on which training of the first network model is actually training of the first sub-network and the second sub-network, and the first sub-network and the second sub-network are connected by taking face image data, first color temperature data corresponding to initial color temperature data, initial skin tone data and first skin tone data corresponding to initial skin tone data in the first training data set as training samples, thereby obtaining the second network model.

Specifically, after the face image data is input into the first sub-network, the first sub-network processes the face image data to obtain a first skin color feature map corresponding to the face image data, and the first skin color feature map is input into the second sub-network for the second sub-network to process the face image data, so that first color temperature data and first skin color data are obtained. At this time, the initial color temperature data and the first color temperature data are compared to obtain a first difference, the initial skin color data and the first skin color data are compared to obtain a second difference, and parameters of the first sub-network and the second sub-network are updated according to the first difference and the second difference, so that training of a model is achieved.

In the above embodiment, with continuous training of the first sub-network and the second sub-network, after the difference between the output first color temperature data and the first skin color data and the corresponding initial color temperature data and initial skin color data is smaller than a preset value, the training of the first network model is considered to be finished, and the second network model is obtained.

In the above embodiment, the first sub-network and the second sub-network are trained together, so that the parameters between the first sub-network and the second sub-network can be mutually adjusted while the training times and the training cost are reduced. For example, in the case where the first parameter in the first sub-network needs to be increased, the first parameter in the first sub-network and the second parameter of the second sub-network are increased synchronously, in which case the degree to which the first parameter needs to be increased can be reduced, thereby improving the accuracy of the model.

Specifically, a back propagation method is used for calculating the gradient of each parameter in the network, a random gradient descent method is used for updating the parameters of the first sub-network and the second sub-network until the convergence of the target loss function value of the first network model is finished, the network parameters are saved, and the training process of the whole network is finished.

Specifically, the objective loss function of the first network model is an angle error function expressed as follows:

wherein L represents the cct, flesh tone pixel value [ cct, r, b ] of the target]L represents predicted [ cct, ri, bi ]]I is an integer of 1, 2, or 3 …, L ^T Is a transpose of L.

In some embodiments of the present application, inputting face image data to a first subnetwork to obtain a skin tone feature map includes: inputting the face image data to an average pooling layer in a first sub-network to obtain first characteristic data; inputting the first characteristic data into a first convolution layer in a first sub-network to obtain second characteristic data; inputting the second characteristic data into a first activation function layer in a first sub-network to obtain third characteristic data; inputting the third characteristic data into a second convolution layer in the first subnetwork to obtain fourth characteristic data; inputting the fourth characteristic data into a second activation function layer in the first subnetwork to obtain fifth characteristic data; and inputting the fifth characteristic data into a third convolution layer in the first subnetwork to obtain a skin color characteristic diagram.

In this embodiment, as shown in fig. 2, the first subnetwork mainly includes an averaging layer, a first convolution layer, a first activation function layer, a second convolution layer, a second activation function layer, and a third convolution layer. The average pooling layer is arranged so as to average skin color information and perform noise reduction treatment, so that better extraction of feature information is facilitated.

In some embodiments of the application, the average pooling layer has a size of 5×5 and a step size of 1. Feature data a1 having an output size of 20×20×4 is obtained. Wherein the characteristic data a1, i.e. the first characteristic data.

In the above embodiment, the first convolution layer and the first activation function layer are arranged so as to extract the characteristic information of the data, and the second convolution layer, the second activation function layer and the third convolution layer, namely the two-layer convolution and the three-layer convolution, are arranged so as to better extract the characteristic information of the data through the convolution layers.

In some embodiments of the application, the first convolution layer has a size of 1×1×4, a step size of 1, and a convolution kernel number of 30.

In some embodiments of the application, the third characteristic data is expressed as follows:

wherein y is third characteristic data processed by the first activation function layer, and x is second characteristic data.

In some embodiments of the present application, the convolution kernel size of the second convolution layer is 5×5×30, the step size is 1, the convolution kernel number is 60, and the size of the fifth feature data corresponding to the output is 16×16×60.

In some embodiments of the present application, the third convolution layer has a convolution kernel size of 7×7×60, a step size of 1, a convolution kernel number of 60, and a resulting skin tone feature map size of 10×10×60.

In the above embodiment, the first convolution layer, the first activation function layer, the second convolution layer, the second activation function layer, and the third convolution layer are set, so that the structure of the first subnetwork is similar to the network structure of VGGNet (a neural network structure), and further, the number of network layers is reduced, and a skin color feature layer and a final classification output layer are added.

In some embodiments of the present application, before inputting the face image data to the average pooling layer, the method further includes: preprocessing the face image data, wherein the preprocessing comprises the following steps: and adjusting the size of the face image data to be the target size. Specifically, the target size may be selected according to practical needs, such as selecting a size of 24×24×4.

As shown in fig. 3 and fig. 4, the face image data is taken as an original image file, and the preprocessing process includes:

taking the segmented raw image with the size of 64×64 as an example, the face image data is an original raw image, so that the corresponding r, gb, gr, b parts of the raw image are respectively separated to obtain 4 images with the size of 32×32, then the data with the size of 32×32×4 are used for judging that the data of the skin color part is motionless through a skin color detection algorithm, and the data of the non-skin color part is set to 0.

The specific algorithm is that on a picture with the size of 32×32, 4 data are respectively corresponding to the length and width positions, namely r, gb, gr, b, each point of the picture with the size of 32×32 is traversed, the 4 data corresponding to the channel form a data pair, and each data pair is multiplied by the white balance value of the previous frame to obtain a data value of the first prediction. Wherein r channel is multiplied by r _gain The value of b, channel multiplied by b _gain Is g, gr, gb are both values of g _gain . The pixel is then converted from the rgb space to the YCbCr space, y=0.257×r+0.564×g+0.098×b+16; cb= -0.148×R-0.291×G+0.439×B+128; cr=0.439×R-0.368×G-0.071×B+128, the corresponding point is within a specific threshold value77<Cb<127，133<Cr<173 Record as effective skin color point.

As shown in fig. 5, the portion with pixel 0 is removed by the multi-scale downsampling method.

Specifically, each row is traversed firstly, if all data in each row are 0, the row is deleted, each column is traversed again, all columns with 0 are deleted, primary processing data with inconsistent sizes are obtained, then the data are compared with (24×24×4) which is needed to be obtained, if the size is larger than 24×24, corresponding average downsampling operation is carried out to enable the size of the finally obtained data to be 24, and if the size is smaller than 24×24, 0 supplementing operation is carried out all around.

In some embodiments of the present application, processing the skin tone feature map using a second subnetwork to obtain first skin tone data and first skin tone data includes: inputting the skin color feature map to a fourth convolution layer of the second sub-network to obtain sixth feature data; inputting the sixth characteristic data into a third activation function layer of the second sub-network to obtain seventh characteristic data; inputting the seventh characteristic data to a maximum pooling layer of the second sub-network to obtain eighth characteristic data; after the eighth characteristic data are input to the first full-connection layer of the second sub-network, the output result of the first full-connection layer is input to the second full-connection layer of the second sub-network, and ninth characteristic data are obtained; the first skin tone data and the first skin tone data are determined based on the ninth feature data.

In this embodiment, as shown in fig. 6, by providing the fourth convolution layer to re-extract the feature information from the skin tone feature map, in the process, features more suitable for extracting skin tone and color temperature information can be learned, so as to improve the accuracy of the obtained setting parameters.

In the above embodiment, the maximum pooling layer is set, so that the feature data is subjected to dimension reduction processing by using the maximum pooling layer, and network overfitting is prevented.

In the above embodiment, the first full connection layer and the second full connection layer are provided, so that the first full connection layer and the second full connection layer are used to function as classifiers in the first feature convolution network, and the learned distributed features are mapped to the sample marking space, so that eighth feature data are identified and classified.

By setting the first full connection layer and the second full connection layer, the number of the feature data in the output ninth feature data corresponds to the selection of the setting parameters in the embodiment, so that the output ninth feature data is adapted to the current use scene.

Specifically, the convolution kernel size in the fourth convolution layer is 1×1×60, the step size is 1, and the convolution kernel number is 40.

In some embodiments of the application, the size of the largest pooling layer is 6 x 6, with a step size of 1. The output size is 5×5×40, that is, the size of the eighth feature data is 5×5×40.

In some embodiments of the present application, the output of the first fully-connected layer has a size of 1×1×80.

In some embodiments of the application, the size of the ninth feature data is 1×1×40.

Wherein the size of the ninth feature data is 1×1×40, that is, the number of data is 40, which corresponds to the above 40 sets of graphs one by one, that is, corresponds to the above 8 types of main stream skin colors at 5 color temperatures, wherein each number in the ninth feature data represents the probability of which color temperature of the main stream skin color appears.

Specifically, the number of data in the ninth feature data is 40, which can be understood as weights 40 different from each other, and the first number of the 40 weights corresponding to the skin color values (i.e. 40 sets of graphs) preset in advance is the weight corresponding to the first skin color and the color temperature. Setting corresponding values of skin color and color temperature in sequence, obtaining 40 weights, and performing normalization processing to enable the 40 values to be between 0 and 1.

Specifically, the first numerical value in the output result of the second full-connection layer is 0, which represents the color temperature 7k, and the probability of the first main stream skin color is 0; a second value of 0 represents the color temperature 7k in the current environment, and the probability of the second main stream skin color is 0; until a fourth value of 0.1, representing a color temperature of 7k, the probability of the fourth mainstream skin tone is 0.1; a fifth value of 0.9 representing a color temperature of 7k, the probability of fifth mainstream skin tone being 0.9; i.e. the skin color high probability of the current portrait is the fifth main stream skin color and the current ambient color temperature high probability is 7k.

In order to obtain the skin color and the color temperature information more accurately, the color temperature and the skin color corresponding to the third group and the fourth group are interpolated, the interpolation weight is the value finally obtained by the full face layer, and the color temperature, the r value and the b value of the skin color are finally output.

And when the 4 and 5 weights have values, the color temperature and the skin color pixel value which are finally obtained are interpolation of the 4 th group and the 5 th group corresponding weights, the color temperature and the skin color pixel value, and the r value and the b value of the color temperature and the skin color are finally output.

Wherein the color temperatures are classified into 5 classes (7 k, 6k, 5k, 4k, 3 k), and the mth represents the mth color temperature. Skin colors are divided into 8, and nth represents nth skin color. The second full-connection layer outputs the matching probability of the current environment and 40 environments, and the larger the numerical value is, the larger the probability of one environment is represented. If the layers 4 and 5 of the fully connected layer have values, and the other layers have no values, the probability that the environment and the skin color are between the layers is larger.

In order to further obtain the skin color value and the color temperature value, interpolation operation is required.

Specifically, the final color temperature value: cctf=0.1×x4+0.9×x5; skin color value: skinf=0.1×y4+0.9×y5.

Wherein, 40 types of environments respectively have: [ xm, yn ] corresponds.

Specifically, X corresponds to a color temperature, such as X1 corresponds to 7k and X2 corresponds to 6k. Y corresponds to skin color, Y1 corresponds to [ r1, b1, g1], and the first skin color corresponds to the values of r, b, g in RGB space; when the skin color corresponds to interpolation, the operations of matrix multiplication and addition are adopted.

In some embodiments of the present application, a weighted interpolation operation is performed for r and a weighted interpolation operation is performed for b.

In some embodiments of the present application, as shown in fig. 7, training the third network model based on the second training data set to obtain a fourth network model includes: inputting the original image data into a decomposition layer in a third network model to obtain decomposed data; sampling the decomposed data to obtain tenth characteristic data; inputting tenth characteristic data into an average pooling layer in a third network model to obtain eleventh characteristic data; performing convolution and activation processing on eleventh feature data to obtain a first global feature map; performing fusion operation on the first global feature map and the skin color feature map to obtain a second global feature map; inputting the second global feature map to a pooling layer in a third network model to obtain twelfth feature data; after the twelfth characteristic data is input to a third full-connection layer in a third network model, the output result of the third full-connection layer is input to a fourth full-connection layer in the third network model, and first white balance gain data are obtained; and updating parameters of the third network model based on the initial white balance gain data and the first white balance gain data to obtain a fourth network model.

In this embodiment, the decomposition layer is provided so that the decomposition of the skin color data is performed by the decomposition layer, thereby obtaining skin color data having a uniform size, that is, decomposed data.

The decomposed data is sampled to adjust the size of the output data so as to be processed at the average pooling layer.

In some embodiments of the present application, in the case where the size of the image to be processed is 1024×1024×1, the size of the decomposed data after processing by the decomposition layer is 512×512×4.

The size of the tenth characteristic data obtained by sampling the decomposed data is 256×256×4.

In the above embodiment, the average pooling layer size is 2×2, and the step size is 2. Feature data having an output size of 128×128×4, that is, eleventh feature data, is obtained.

In the process of convoluting and activating the eleventh feature data, the sizes of the first convoluting layer and the activating function layer are 2×2×4, the step length is 2, the convolution kernel number is 10, and the obtained feature data has the size of 64×64×10.

The second convolution layer and the activation function layer experienced had dimensions of 2 x 10, a step size of 2, a convolution kernel number of 30, and the resulting feature data had dimensions of 32 x 30.

The third convolutional layer experienced is 2 x 30 in size, 2 in step size, and 60 in convolution kernel, resulting in 16 x 60 data, i.e., the first global feature map.

In some embodiments of the application, the second global feature map has a size of 16×16×120.

In some embodiments of the present application, the skin tone feature map may be extracted using the first sub-network, so that the second network model and the fourth network model may share the first sub-network, thereby improving the data utilization rate while saving time spent on model training.

The target loss function of the third network model is an angle error function, which is expressed as follows:

wherein L represents first white balance gain data, [ r, b ]]，L ^T Transpose of L, L representing the predicted [ ri, bi ]]I is an integer of 1, 2, or 3 ….

In some embodiments of the present application, as shown in fig. 8, performing a fusion operation on the first global feature map and the skin color feature map to obtain a second global feature map, including: zero padding is carried out on the skin color feature map, and thirteenth feature data are obtained; adding the first global feature map and the thirteenth feature data corresponding channel to obtain fourteenth feature data; and splicing the fourteenth feature data with the first global feature map to obtain a second global feature map.

In this embodiment, the thirteenth feature data has a size of 16×16×60.

In some embodiments of the present application, as shown in fig. 9, the first network model includes network1+network2+skinap 1.

As shown in fig. 9, the structure of the third network model includes network3+map2+skinapp1+fc.

Wherein, network1 is a first sub-network, network2 is a second sub-network, network3 is a network in a third network model, skinap 1 is a Skin color feature Map, map2 is a first global feature Map, FC is a third full-connection layer and a fourth full-connection layer, and CCT skin_i is a color temperature, an R value of a Skin color of a face, and a G value of the Skin color of the face.

According to the model training method provided by the embodiment of the application, the execution subject can be a model training device. In the embodiment of the application, a model training device is taken as an example to execute a model training method, and the model training device provided by the embodiment of the application is described.

In some of these embodiments, as shown in fig. 10, there is provided a model training apparatus 1000 comprising: a first obtaining module 1002, configured to obtain a first training data set and a second training data set; the first training data set comprises face image data, initial color temperature data and initial skin color data, the second training data set comprises original image data and initial white balance gain data, and the face image data is obtained based on the original image data; the first training module 1004 is configured to train the first network model based on the first training data set to obtain a second network model, where the second network model is used to output a skin color feature map; a second training module 1006, configured to train the third network model based on the second training data set to obtain a fourth network model, where the fourth network model is configured to output the first global feature map; the first fusion module 1008 is configured to obtain a second global feature map based on the skin color feature map and the first global feature map, where the second global feature map is used to obtain first white balance gain data; the first processing module 1010 is configured to obtain third white balance gain data based on the first white balance gain data and second white balance gain data, where the second white balance gain data is obtained based on a skin color feature map and a non-neural network white balance algorithm.

Compared with the related technical scheme, the embodiment of the application can perform data processing according to the determined third white balance gain data, solves the problems of color cast and insufficient utilization of image characteristic information existing in the process of processing the image by adopting the white balance gain data calculated by the non-neural network white balance algorithm, and simultaneously solves the problems of poor consistency and poor stability of an output result of an automatic white balance algorithm based on a neural network in the related technical scheme.

In some embodiments of the present application, the first network model includes a first sub-network and a second sub-network, and the first training module 1004 is configured to input face image data to the first sub-network to obtain a skin color feature map; processing the skin color feature map by adopting a second sub-network to obtain first skin color temperature data and first skin color data; and updating parameters of the first subnetwork and the second subnetwork according to the initial color temperature data, the first color temperature data corresponding to the initial color temperature data, the initial skin color data and the first skin color data corresponding to the initial skin color data to obtain a second network model.

In some embodiments of the present application, a first training module 1004 is configured to input face image data to an averaging pooling layer in a first sub-network to obtain first feature data; inputting the first characteristic data into a first convolution layer in a first sub-network to obtain second characteristic data; inputting the second characteristic data into a first activation function layer in a first sub-network to obtain third characteristic data; inputting the data in the third characteristic data into a second convolution layer in the first subnetwork to obtain fourth characteristic data; inputting the fourth characteristic data into a second activation function layer in the first subnetwork to obtain fifth characteristic data; and inputting the fifth characteristic data into a third convolution layer in the first subnetwork to obtain a skin color characteristic diagram.

In some embodiments of the present application, the first training module 1004 is configured to input the skin color feature map to a fourth convolution layer of the second sub-network to obtain sixth feature data; inputting the sixth characteristic data into a third activation function layer of the second sub-network to obtain seventh characteristic data; inputting the seventh characteristic data to a maximum pooling layer of the second sub-network to obtain eighth characteristic data; after the eighth characteristic data are input to the first full-connection layer of the second sub-network, the output result of the first full-connection layer is input to the second full-connection layer of the second sub-network, and ninth characteristic data are obtained; the first skin tone data and the first skin tone data are determined based on the ninth feature data.

In some embodiments of the present application, the second training module 1006 is configured to input the original image data to a decomposition layer in the third network model, to obtain decomposed data; sampling the decomposed data to obtain tenth characteristic data; inputting tenth characteristic data into an average pooling layer in a third network model to obtain eleventh characteristic data; performing convolution and activation processing on eleventh feature data to obtain a first global feature map; performing fusion operation on the first global feature map and the skin color feature map to obtain a second global feature map; inputting the second global feature map to a pooling layer in a third network model to obtain twelfth feature data; after the twelfth characteristic data is input to a third full-connection layer in a third network model, the output result of the third full-connection layer is input to a fourth full-connection layer in the third network model, and first white balance gain data are obtained; and updating parameters of the third network model based on the initial white balance gain data and the first white balance gain data to obtain a fourth network model.

In some embodiments of the present application, the second training module 1006 is configured to perform zero padding processing on the skin color feature map to obtain thirteenth feature data; adding the first global feature map and the thirteenth feature data corresponding channel to obtain fourteenth feature data; and splicing the fourteenth feature data with the first global feature map to obtain a second global feature map.

In some of these embodiments, as shown in fig. 11, there is provided an image processing method including:

step 1102, obtaining an image to be processed;

step 1104, dividing the image to be processed to obtain a first image containing the image;

step 1106, processing the first image by adopting a second network model to obtain a skin color feature map;

step 1108, processing the image to be processed by adopting a fourth network model to obtain a first global feature map;

step 1110, obtaining a second global feature map according to the skin color feature map and the first global feature map;

step 1112, obtaining first white balance gain data based on the second global feature map, and obtaining second white balance gain data based on the skin color feature map and the non-neural network white balance algorithm;

step 1114, obtaining third white balance gain data based on the first white balance gain data and the second white balance gain data;

step 1116, processing the image to be processed based on the third white balance gain data to obtain a second image.

In the embodiment of the application, compared with the related technical scheme, the third white balance gain data determined by the embodiment of the application is more accurate and has better consistency. Under the condition that the first image is processed by adopting the third white balance gain data, the color processing effect of the image can be improved, so that the processed second image is more in line with the true color seen by human eyes.

According to the image processing method provided by the embodiment of the application, the execution subject can be an image processing device. In the embodiment of the present application, an image processing apparatus is described by taking an example of an image processing method performed by the image processing apparatus.

In some of these embodiments, as shown in fig. 12, there is provided an image processing apparatus 1200 including: a second acquiring module 1202, configured to acquire an image to be processed; the segmentation module 1204 is used for carrying out human image segmentation on the image to be processed to obtain a first image containing human images; a second processing module 1206, configured to process the first image with a second network model to obtain a skin color feature map; a third processing module 1208, configured to process the image to be processed by using a fourth network model, to obtain a first global feature map; a second fusion module 1210, configured to obtain a second global feature map according to the skin color feature map and the first global feature map; a fourth processing module 1212, configured to obtain first white balance gain data based on the second global feature map, and obtain second white balance gain data based on the skin color feature map and the non-neural network white balance algorithm; the fourth processing module 1212 is further configured to obtain third white balance gain data based on the first white balance gain data and the second white balance gain data; an adjusting module 1214, configured to process the image to be processed based on the third white balance gain data, so as to obtain a second image.

The model training apparatus 1000 and the image processing apparatus 1200 in the embodiment of the present application may be electronic devices, or may be components in electronic devices, such as an integrated circuit or a chip. The electronic device may be a terminal, or may be other devices than a terminal. By way of example, the electronic device may be a mobile phone, tablet computer, notebook computer, palm computer, vehicle-mounted electronic device, mobile internet appliance (Mobile Internet Device, MID), augmented reality (augmented reality, AR)/Virtual Reality (VR) device, robot, wearable device, ultra-mobile personal computer, UMPC, netbook or personal digital assistant (personal digital assistant, PDA), etc., but may also be a server, network attached storage (Network Attached Storage, NAS), personal computer (personal computer, PC), television (TV), teller machine or self-service machine, etc., and the embodiments of the present application are not limited in particular.

The model training device and the image processing device in the embodiment of the application can be devices with an operating system. The operating system may be an Android operating system, an ios operating system, or other possible operating systems, and the embodiment of the present application is not limited specifically.

The model training apparatus 1000 provided in the embodiment of the present application can implement each process implemented by the embodiment of the model training method in fig. 1, and the image processing apparatus 1200 can implement each process implemented by the embodiment of the image processing method in fig. 11, and can achieve the same technical effects, so that repetition is avoided, and no description is repeated here.

As shown in fig. 13, the embodiment of the present application further provides an electronic device 1300, which includes a processor 1302 and a memory 1304, where the memory 1304 stores a program or instructions that can be executed on the processor 1302, and the program or instructions implement the steps of the foregoing embodiment of the model training method or the image processing method when executed by the processor 1302, and achieve the same technical effects, so that repetition is avoided, and no further description is given here.

The electronic device in the embodiment of the application includes the mobile electronic device and the non-mobile electronic device.

Fig. 14 is a schematic hardware structure of an electronic device implementing an embodiment of the present application.

As shown in fig. 14, the electronic device 1400 includes, but is not limited to: radio frequency unit 1401, network module 1402, audio output unit 1403, input unit 1404, sensor 1405, display unit 1406, user input unit 1407, interface unit 1408, memory 1409, and processor 1410.

Those skilled in the art will appreciate that the electronic device 1400 may also include a power source (e.g., a battery) for powering the various components, which may be logically connected to the processor 1410 by a power management system to perform functions such as managing charging, discharging, and power consumption by the power management system. The electronic device structure shown in fig. 14 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than shown, or may combine certain components, or may be arranged in different components, which are not described in detail herein.

In some of these embodiments, processor 1410 is configured to: acquiring a first training data set and a second training data set; the first training data set comprises face image data, initial color temperature data and initial skin color data, the second training data set comprises original image data and initial white balance gain data, and the face image data is obtained based on the original image data; training the first network model based on the first training data set to obtain a second network model, wherein the second network model is used for outputting a skin color feature map; training the third network model based on the second training data set to obtain a fourth network model, wherein the fourth network model is used for outputting a first global feature map; obtaining a second global feature map based on the skin color feature map and the first global feature map, wherein the second global feature map is used for obtaining first white balance gain data; and obtaining third white balance gain data based on the first white balance gain data and the second white balance gain data, wherein the second white balance gain data is obtained based on a skin color feature map and a non-neural network white balance algorithm.

In some of these embodiments, the first network model includes a first subnetwork and a second subnetwork, and the processor 1410 is configured to: inputting the face image data into a first sub-network to obtain a skin color feature map; processing the skin color feature map by adopting a second sub-network to obtain first skin color temperature data and first skin color data; and updating parameters of the first subnetwork and the second subnetwork according to the initial color temperature data, the first color temperature data corresponding to the initial color temperature data, the initial skin color data and the first skin color data corresponding to the initial skin color data to obtain a second network model.

In some of these embodiments, processor 1410 is configured to: inputting the face image data to an average pooling layer in a first sub-network to obtain first characteristic data; inputting the first characteristic data into a first convolution layer in a first sub-network to obtain second characteristic data; inputting the second characteristic data into a first activation function layer in a first sub-network to obtain third characteristic data; inputting the third characteristic data into a second convolution layer in the first subnetwork to obtain fourth characteristic data; inputting the fourth characteristic data into a second activation function layer in the first subnetwork to obtain fifth characteristic data; and inputting the fifth characteristic data into a third convolution layer in the first subnetwork to obtain a skin color characteristic diagram.

In some of these embodiments, processor 1410 is configured to: inputting the skin color feature map to a fourth convolution layer of the second sub-network to obtain sixth feature data; inputting the sixth characteristic data into a third activation function layer of the second sub-network to obtain seventh characteristic data; inputting the seventh characteristic data to a maximum pooling layer of the second sub-network to obtain eighth characteristic data; after the eighth characteristic data are input to the first full-connection layer of the second sub-network, the output result of the first full-connection layer is input to the second full-connection layer of the second sub-network, and ninth characteristic data are obtained; the first skin tone data and the first skin tone data are determined based on the ninth feature data.

In some of these embodiments, processor 1410 is configured to: inputting the original image data into a decomposition layer in a third network model to obtain decomposed data; sampling the decomposed data to obtain tenth characteristic data; inputting tenth characteristic data into an average pooling layer in a third network model to obtain eleventh characteristic data; performing convolution and activation processing on eleventh feature data to obtain a first global feature map; performing fusion operation on the first global feature map and the skin color feature map to obtain a second global feature map; inputting the second global feature map to a pooling layer in a third network model to obtain twelfth feature data; after the twelfth characteristic data is input to a third full-connection layer in a third network model, the output result of the third full-connection layer is input to a fourth full-connection layer in the third network model, and first white balance gain data are obtained; and updating parameters of the third network model based on the initial white balance gain data and the first white balance gain data to obtain a fourth network model.

In some of these embodiments, processor 1410 is configured to: zero padding is carried out on the skin color feature map, and thirteenth feature data are obtained; adding the first global feature map and the thirteenth feature data corresponding channel to obtain fourteenth feature data; and splicing the fourteenth feature data with the first global feature map to obtain a second global feature map.

In some of these embodiments, processor 1410 is configured to: acquiring an image to be processed; carrying out human image segmentation on the image to be processed to obtain a first image containing human images; processing the first image by adopting a second network model to obtain a skin color feature map; processing the image to be processed by adopting a fourth network model to obtain a first global feature map; obtaining a second global feature map according to the skin color feature map and the first global feature map; obtaining first white balance gain data based on the second global feature map, and obtaining second white balance gain data based on the skin color feature map and a non-neural network white balance algorithm; obtaining third white balance gain data based on the first white balance gain data and the second white balance gain data; and processing the image to be processed based on the third white balance gain data to obtain a second image.

It should be appreciated that in embodiments of the present application, the input unit 1404 may include a graphics processor (Graphics Processing Unit, GPU) 14041 and a microphone 14042, with the graphics processor 14041 processing image data of still pictures or video obtained by an image capturing device (e.g., a camera) in a video capturing mode or an image capturing mode. The display unit 1406 may include a display panel 14061, and the display panel 14061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 1407 includes at least one of a touch panel 14071 and other input devices 14072. The touch panel 14071 is also referred to as a touch screen. The touch panel 14071 may include two parts, a touch detection device and a touch controller. Other input devices 14072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and so forth, which are not described in detail herein.

Memory 1409 may be used to store software programs as well as various data. The memory 1409 may mainly include a first memory area storing programs or instructions and a second memory area storing data, wherein the first memory area may store an operating system, application programs or instructions (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like. Further, the memory 1409 may include volatile memory or nonvolatile memory, or the memory 1409 may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM), static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (ddr SDRAM), enhanced SDRAM (Enhanced SDRAM), synchronous DRAM (SLDRAM), and Direct RAM (DRRAM). Memory 1409 in embodiments of the application includes, but is not limited to, these and any other suitable types of memory.

Processor 1410 may include one or more processing units; optionally, the processor 1410 integrates an application processor that primarily processes operations involving an operating system, user interface, application programs, and the like, and a modem processor that primarily processes wireless communication signals, such as a baseband processor. It will be appreciated that the modem processor described above may not be integrated into the processor 1410.

The embodiment of the application also provides a readable storage medium, and the readable storage medium stores a program or an instruction, which when executed by a processor, implements each process of the above model training method or the image processing method embodiment, and can achieve the same technical effect, so that repetition is avoided, and no further description is provided herein.

The processor is a processor in the electronic device in the above embodiment. Readable storage media include computer readable storage media such as computer readable memory ROM, random access memory RAM, magnetic or optical disks, and the like.

The embodiment of the application further provides a chip, the chip comprises a processor and a communication interface, the communication interface is coupled with the processor, the processor is used for running programs or instructions, the processes of the model training method or the image processing method embodiment can be realized, the same technical effects can be achieved, and the repetition is avoided, and the description is omitted here.

It should be understood that the chips referred to in the embodiments of the present application may also be referred to as system-on-chip chips, chip systems, or system-on-chip chips, etc.

Embodiments of the present application provide a computer program product stored in a storage medium, where the program product is executed by at least one processor to implement the respective processes of the above model training method or the image processing method embodiment, and achieve the same technical effects, and are not repeated herein.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the embodiments of the present application may be embodied in essence or contributing to the prior art in the form of a computer software product stored on a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a cell phone, computer, server, or network device, etc.) to perform the methods of the various embodiments of the present application.

The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are to be protected by the present application.

Claims

1. A method of model training, comprising:

acquiring a first training data set and a second training data set; the first training data set comprises face image data, initial color temperature data and initial skin color data, the second training data set comprises original image data and initial white balance gain data, and the face image data is obtained based on the original image data;

training the first network model based on the first training data set to obtain a second network model, wherein the second network model is used for outputting a skin color feature map;

training the third network model based on the second training data set to obtain a fourth network model, wherein the fourth network model is used for outputting a first global feature map;

obtaining a second global feature map based on the skin color feature map and the first global feature map, wherein the second global feature map is used for obtaining first white balance gain data;

and obtaining third white balance gain data based on the first white balance gain data and the second white balance gain data, wherein the second white balance gain data is obtained based on the skin color feature map and a non-neural network white balance algorithm.

2. The model training method of claim 1, wherein the first network model comprises a first sub-network and a second sub-network, the training the first network model based on the first training data set to obtain the second network model comprises:

inputting the face image data to the first sub-network to obtain a skin color feature map;

processing the skin color feature map by adopting the second sub-network to obtain first skin color temperature data and first skin color data;

and updating parameters of the first sub-network and the second sub-network according to the initial color temperature data, the first color temperature data corresponding to the initial color temperature data, the initial skin color data and the first skin color data corresponding to the initial skin color data to obtain the second network model.

3. The method of claim 2, wherein the inputting the face image data into the first sub-network to obtain a skin tone feature map includes:

inputting the face image data to an average pooling layer in the first sub-network to obtain first characteristic data;

inputting the first characteristic data into a first convolution layer in the first sub-network to obtain second characteristic data;

Inputting the second characteristic data into a first activation function layer in the first sub-network to obtain third characteristic data;

inputting the third characteristic data into a second convolution layer in the first sub-network to obtain fourth characteristic data;

inputting the fourth characteristic data into a second activation function layer in the first subnetwork to obtain fifth characteristic data;

and inputting the fifth characteristic data to a third convolution layer in the first sub-network to obtain the skin color characteristic map.

4. The model training method of claim 2, wherein the processing the skin tone feature map using the second subnetwork to obtain first skin tone data and first skin tone data comprises:

inputting the skin color feature map to a fourth convolution layer of the second sub-network to obtain sixth feature data;

inputting the sixth characteristic data to a third activation function layer of the second sub-network to obtain seventh characteristic data;

inputting the seventh characteristic data to a maximum pooling layer of the second sub-network to obtain eighth characteristic data;

after the eighth characteristic data are input to a first full-connection layer of the second sub-network, inputting an output result of the first full-connection layer to a second full-connection layer of the second sub-network to obtain ninth characteristic data;

The first color temperature data and the first skin color data are determined based on the ninth feature data.

5. The model training method according to claim 2, wherein training the third network model based on the second training data set to obtain a fourth network model comprises:

inputting the original image data into a decomposition layer in the third network model to obtain decomposed data;

sampling the decomposed data to obtain tenth characteristic data;

inputting the tenth characteristic data to an average pooling layer in the third network model to obtain eleventh characteristic data;

performing convolution and activation processing on the eleventh feature data to obtain the first global feature map;

performing fusion operation on the first global feature map and the skin color feature map to obtain the second global feature map;

inputting the second global feature map to a pooling layer in the third network model to obtain twelfth feature data;

after the twelfth characteristic data is input to a third full-connection layer in the third network model, an output result of the third full-connection layer is input to a fourth full-connection layer in the third network model, and the first white balance gain data is obtained;

And updating parameters of the third network model based on the initial white balance gain data and the first white balance gain data to obtain the fourth network model.

6. The method of model training of claim 5, wherein the fusing the first global feature map with the skin tone feature map to obtain the second global feature map comprises:

zero padding is carried out on the skin color feature map, and thirteenth feature data are obtained;

adding the first global feature map and the thirteenth feature data corresponding channel to obtain fourteenth feature data;

and splicing the fourteenth feature data with the first global feature map to obtain the second global feature map.

7. An image processing method, comprising:

acquiring an image to be processed;

carrying out human image segmentation on the image to be processed to obtain a first image containing human images;

processing the first image by adopting a second network model to obtain a skin color feature map;

processing the image to be processed by adopting a fourth network model to obtain a first global feature map;

obtaining a second global feature map according to the skin color feature map and the first global feature map;

Obtaining first white balance gain data based on the second global feature map, and obtaining second white balance gain data based on the skin color feature map and a non-neural network white balance algorithm;

obtaining third white balance gain data based on the first white balance gain data and the second white balance gain data;

and processing the image to be processed based on the third white balance gain data to obtain a second image.

8. A model training device, comprising:

the first acquisition module is used for acquiring a first training data set and a second training data set; the first training data set comprises face image data, initial color temperature data and initial skin color data, the second training data set comprises original image data and initial white balance gain data, and the face image data is obtained based on the original image data;

the first training module is used for training the first network model based on the first training data set to obtain a second network model, and the second network model is used for outputting a skin color feature map;

the second training module is used for training the third network model based on the second training data set to obtain a fourth network model, and the fourth network model is used for outputting a first global feature map;

The first fusion module is used for obtaining a second global feature map based on the skin color feature map and the first global feature map, and the second global feature map is used for obtaining first white balance gain data;

the first processing module is configured to obtain third white balance gain data based on the first white balance gain data and second white balance gain data, where the second white balance gain data is obtained based on the skin color feature map and a non-neural network white balance algorithm.

9. An image processing apparatus, comprising:

the second acquisition module is used for acquiring the image to be processed;

the segmentation module is used for carrying out human image segmentation on the image to be processed to obtain a first image containing human images;

the second processing module is used for processing the first image by adopting a second network model to obtain a skin color feature map;

the third processing module is used for processing the image to be processed by adopting a fourth network model to obtain a first global feature map;

the second fusion module is used for obtaining a second global feature map according to the skin color feature map and the first global feature map;

the fourth processing module is used for obtaining first white balance gain data based on the second global feature map and obtaining second white balance gain data based on the skin color feature map and a non-neural network white balance algorithm;

The fourth processing module is further configured to obtain third white balance gain data based on the first white balance gain data and the second white balance gain data;

and the adjusting module is used for processing the image to be processed based on the third white balance gain data to obtain a second image.

10. An electronic device comprising a processor and a memory storing a program or instructions executable on the processor, which when executed by the processor, implement the steps of the method of any one of claims 1 to 7.