CN114663320A

CN114663320A - Image processing method, data set expansion method, storage medium, and electronic device

Info

Publication number: CN114663320A
Application number: CN202011527321.9A
Authority: CN
Inventors: 肖育豪; 杨凤海
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-12-22
Filing date: 2020-12-22
Publication date: 2022-06-24

Abstract

The application discloses an image processing method and device, an image data set expansion method and device, an image segmentation model training method, a video conference image processing method, a video live broadcast video data processing method, a computer storage medium and an electronic device, wherein the image processing method comprises the following steps: acquiring a first image comprising a target image element from a first data set; acquiring a second image from a second data set; performing white balance processing on the first image and the second image to obtain a first white balance image corresponding to the first image and a second white balance image corresponding to the second image; performing image fusion processing on the first white balance image and the second white balance image to generate a target fusion image in which the target image elements are fused into the second image; the method can improve the efficiency of image fusion processing and the image quality of the target fusion image, and avoid the defects of the border of the fusion image and the color differentiation of the fusion image.

Description

Image processing method, data set expansion method, storage medium, and electronic device

Technical Field

The application relates to the field of computer image processing application, in particular to an image processing method and device. The application also relates to an expansion method and device of the image data set, a training method of the image segmentation model, a processing method of a background image of a video conference, a processing method of video data in live video, a computer storage medium and electronic equipment.

Background

With the continuous development of artificial intelligence technology, deep learning is widely applied in various fields. Deep learning can be understood as a branch of machine learning, which refers to an artificial intelligence algorithm based on a deep neural network and a large amount of data, however, in order to achieve a better learning effect, deep learning needs to be obtained by training a large amount of sample data, and then the amount of the sample data also becomes one of the reasons for influencing the performance of the neural network model.

In the prior art, in a neural network model applied to image processing, the model performance is poor due to the lack of an image data set or the low quality of a synthesized image.

Disclosure of Invention

The application provides an image processing method, which aims to solve the problem of poor image fusion quality in the prior art.

The application provides an image processing method, comprising the following steps:

acquiring a first image comprising a target image element from a first data set;

acquiring a second image from a second data set;

performing white balance processing on the first image and the second image to obtain a first white balance image corresponding to the first image and a second white balance image corresponding to the second image;

and performing image fusion processing on the first white balance image and the second white balance image to generate a target fusion image in which the target image elements are fused into the second image.

In some embodiments, the white balance processing the first image and the second image to obtain a first white balance image corresponding to the first image and a second white balance image corresponding to the second image includes:

determining the gray level mean value of the first image according to the RGB three-channel mean value of the first image;

determining the gray level mean value of the second image according to the RGB three-channel mean value of the second image;

determining an RGB three-channel gain coefficient of the first image according to the gray average value of the first image;

determining an RGB three-channel gain system of the second image according to the gray average value of the second image;

adjusting the pixels of the first image according to the original pixel RGB three-channel value of the first image and the RGB three-channel gain coefficient of the first image to obtain a first white balance image;

and adjusting the pixels of the second image according to the RGB three-channel value of the original pixels of the second image and the RGB three-channel gain coefficient of the second image to obtain the second white balance image.

In some embodiments, the adjusting the pixels of the first image according to the three RGB channel values of the pixels of the first image and the three RGB channel gain coefficients of the first image to obtain the first white balance image includes:

obtaining RGB three-channel pixel values according to the product of the RGB three-channel pixel values of the first image and the RGB three-channel gain coefficient of the first image;

adjusting an original pixel RGB three-channel value of the first image according to the RGB three-channel pixel value;

determining the adjusted image as the first white balance image.

In some embodiments, the adjusting the pixels of the second image according to the three RGB channel values of the original pixels of the second image and the three RGB channel gain coefficients of the second image to obtain the second white balance image includes:

obtaining RGB three-channel pixel values according to the product of the RGB three-channel pixel values of the second image and the RGB three-channel gain coefficients of the second image;

adjusting an original pixel RGB three-channel value of the second image according to the RGB three-channel pixel value;

determining the adjusted image as the second white balance image.

In some embodiments, the image fusion processing the first white balance image and the second white balance image to generate a target fusion image in which the target image element is fused into the second image includes:

acquiring a mask image of the first image;

and performing image fusion processing on the first white balance image, the second white balance image and the mask image to generate a target fusion image in which the target image elements are fused into the second image.

In some embodiments, the performing image fusion processing on the first white balance image, the second white balance image, and the mask image to generate a target fusion image in which the target image element is fused into the second image includes:

and performing image fusion processing on the first white balance image, the second white balance image and the mask image in a Laplacian pyramid fusion mode to generate a target fusion image in which the target image elements are fused into the second image.

In some embodiments, the performing image fusion processing on the first white balance image, the second white balance image, and the mask image by using a laplacian pyramid fusion method to generate a target fusion image in which the target image elements are fused into the second image includes:

respectively constructing a Gaussian pyramid for the first white balance image, the second white balance image and the mask image to obtain a first Gaussian pyramid of the first white balance image, a second Gaussian pyramid of the second white balance image and a third Gaussian pyramid of the mask image;

respectively constructing a first Laplacian pyramid corresponding to the first Gaussian pyramid, a second Laplacian pyramid corresponding to the second Gaussian pyramid and a third Laplacian pyramid corresponding to the third Gaussian pyramid according to the first Gaussian pyramid, the second Gaussian pyramid and the third Gaussian pyramid;

respectively fusing corresponding image layers in the first Laplacian pyramid, the second Laplacian pyramid and the third Laplacian pyramid to generate a new Laplacian image pyramid;

and reconstructing the new Laplacian image pyramid to generate the target fusion image.

In some embodiments, the reconstructing the new pyramid of laplacian images to generate the target fusion image comprises:

sequentially performing up-sampling on the top-level image of the new Laplacian image pyramid to generate a sampling image;

and determining the sampling image as the target fusion image.

In some embodiments, further comprising:

performing data enhancement processing on the first white balance image and the second white balance image to obtain a first enhanced image and a second enhanced image;

the generating a target fusion image in which the target image element is fused into the second image by performing image fusion processing on the first white balance image and the second white balance image includes:

and performing image fusion processing on the first enhanced image of the first white balance image and the second enhanced image of the second white balance image to generate a target fusion image in which the target image element is fused into the second image.

The present application also provides an image processing apparatus including:

a first acquisition unit configured to acquire a first image including a target image element from a first data set;

a second acquisition unit for acquiring a second image from a second data set;

a first processing unit configured to perform white balance processing on the first image and the second image to obtain a first white balance image corresponding to the first image and a second white balance image corresponding to the second image;

and a second processing unit configured to perform image fusion processing on the first white balance image and the second white balance image, and generate a target fusion image in which the target image element is fused into the second image.

The present application further provides a method for expanding an image data set, comprising:

acquiring a target fusion image generated according to the image processing method;

the image dataset is augmented according to the target fusion image.

In some embodiments, said augmenting the image dataset according to the target fusion image comprises:

and taking the target fusion image as training data of an image segmentation model, and storing the training data into the image data set.

The present application also provides an expansion device of an image data set, comprising:

an acquisition unit configured to acquire a target fusion image generated according to the image processing method;

and the expansion unit is used for expanding the image data set according to the target fusion image.

The application also provides a training method of the image segmentation model, which comprises the following steps:

acquiring image data according to the image data set expanded in the image data set expansion method;

inputting the image data serving as a training parameter of an image segmentation model into the image segmentation model for training to obtain a trained image segmentation model; the trained image segmentation model can segment the image data input into the image segmentation model into a foreground image or a background image.

The application also provides an image processing method of the video conference, which comprises the following steps:

acquiring a video conference image in a video conference;

inputting the video conference image into an image segmentation model for learning, and identifying a human body image in the video conference image; the image segmentation model is a model trained by using image data in an image data set obtained by adopting the image data set expansion method as training data;

and blurring or replacing the image outside the human body image area range.

The application also provides a method for processing video data in live video, which comprises the following steps:

acquiring a live broadcast picture image in video live broadcast;

inputting the live broadcast picture image into an image segmentation model for learning, and identifying a human body image in the live broadcast picture image; the image segmentation model is a model trained by using image data in an image data set obtained by adopting the image data set expansion method as training data;

and adding preset information to the image outside the human body image area range.

The application also provides a computer storage medium for storing the data generated by the network platform and a program for processing the data generated by the network platform;

the program, when read and executed, performs the steps of the image processing method as described above; alternatively, the step of performing the method of augmenting an image data set as described above; alternatively, the step of performing the training method of the image segmentation model as described above; or, the steps of the image processing method of the video conference as described above are performed; or, the steps of the processing method of video data in the video live broadcast are executed.

The present application further provides an electronic device, comprising:

a processor;

a memory for storing a program for processing network platform generated data, which when read and executed by the processor performs the steps of the image processing method as described above; a step of performing the method of augmenting an image data set as described above; alternatively, the step of performing the training method of the image segmentation model as described above; or, the steps of the image processing method of the video conference as described above are performed; or, the steps of the processing method of video data in the video live broadcast are executed.

Compared with the prior art, the method has the following advantages:

the application provides an image processing method capable of acquiring a first image and a second image and a mask image of the first image from different data sets, wherein the first image can be an image including a target image element. And then the first image and the second image are subjected to white balance processing to reduce the difference of image boundaries and color tones caused by different shooting environments in post-fusion. And performing image fusion processing on the first white balance image and the second white balance image to generate a target fusion image in which the target image elements are fused into the second image, so that the efficiency of image fusion processing and the image quality of the target fusion image can be improved, and the defects of the border of the fusion image and the color difference of the fusion image are avoided.

The application provides an image data set expansion method, which can acquire a first image, a second image and a mask image of the first image from different data sets, wherein the first image can be an image comprising target image elements. And then the first image and the second image are subjected to white balance processing to reduce the difference of image boundaries and color tones caused by different shooting environments in post-fusion. And performing image fusion processing on the first white balance image and the second white balance image to generate a target fusion image in which the target image elements are fused into the second image, so that the efficiency of the image fusion processing and the image quality of the target fusion image can be improved. Because the target fusion image is generated by the images of the data sets with different sources, rich, high-quality and real training data can be provided for the image segmentation model to expand the image data set so as to meet the requirement of the image segmentation model training and meet the performance requirement of the model on image segmentation after the image segmentation model training.

Drawings

FIG. 1 is a flow chart of an embodiment of an image processing method provided by the present application;

FIG. 2 is a schematic structural diagram of an embodiment of an image processing apparatus provided in the present application;

FIG. 3 is a flow chart of an embodiment of a method for augmenting an image data set provided herein;

FIG. 4 is a schematic diagram illustrating an embodiment of an image data set expansion method provided in the present application;

FIG. 5 is a schematic structural diagram of an embodiment of an apparatus for expanding an image data set provided in the present application;

FIG. 6 is a flowchart of an embodiment of a training method for an image segmentation model provided in the present application;

FIG. 7 is a flowchart of an embodiment of an image processing method for a video conference provided by the present application;

fig. 8 is a flowchart of an embodiment of a method for processing video data in a live video provided in the present application;

fig. 9 is a schematic structural diagram of an embodiment of an electronic device provided in the present application.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit of this application and is therefore not limited to the specific implementations disclosed below.

The terminology used in the description herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. The description used in this application and in the appended claims is for example: the terms "a," "an," "first," and "second," etc., are not intended to be limiting in number or order, but rather are used to distinguish one type of information from another.

In combination with the above background art, the concept of the technical solution of the method for expanding an image data set provided by the present application is derived from that, in the prior art, when image segmentation is performed, development of an image segmentation model is hindered both in application and technology because image data is lacked during training of the image segmentation model. In order to solve the problem, the technical idea provided by the application is that the image data in different data sets can be synthesized, and the synthesized image data is used for expanding the data set of the image segmentation model, so that the data volume of the data set of the image segmentation model is enriched. For an image segmentation model or a neural network model, the image quality is also an important index affecting the performance of the model, and therefore, when image data of two data sets are fused, the problem of the image quality after fusion needs to be considered, so that the application provides an image data set expansion method, which can ensure the image quality of the expanded image data set while expanding the image data set, and avoid performance degradation caused by interference on model training due to the image quality.

Based on the above, first, an embodiment of an image processing method provided in the present application will be described below, please refer to fig. 1 and fig. 2, where fig. 1 is a flowchart of an embodiment of an image processing method provided in the present application. The embodiment of the image processing method comprises the following steps:

step S101: acquiring a first image comprising a target image element from a first data set;

the purpose of said step S101 is to acquire a first image from a first data set.

The first data set in step S101 may be an existing data set in a deep learning task; the first image may be an image comprising target image elements, which may be determined according to image requirements required by the image segmentation model, for example: for the portrait segmentation model, images including portrait elements are required, and the target image elements may be portraits (which may be all or part of human bodies), that is, the target image elements may be understood as human body images; in the case of a model for dividing an animal, a building, or the like, the target image element may be an animal image, a building image, or the like, or it is needless to say that the target image element may be partial information or all information of the image element.

In the present embodiment, a portrait image is taken as an example for explanation.

The target image element may also be understood from an image point of view as being a target foreground image, which should be portrait information for the target image to be a portrait, and thus the first image may be understood as being a foreground image including the portrait information.

Step S102: acquiring a second image from a second data set;

the purpose of step S102 is to acquire a second image.

In this embodiment, the second image may be acquired from the second data set. The second data set is a different image data set than the first data set, and the second image is understood to be a background image if it is a foreground image including a portrait based on the first image.

The first data set in step S101 and the second data set in step S102 may be classified identification data sets, such as: COCO, ImageNet, PASCAL VOC, Label me, SUN, Caltech, Corel5k, etc.; face data sets such as LFW (laboratory Faces in the Wild: outdoor Face detection database), VGG Face dataset, etc.; pedestrian detection data sets, and the like.

In this embodiment, the obtaining of the first image may obtain an image with a target image element being a portrait through the COCO data set; the second image may also be an image taken from the above dataset, or from another dataset. From an image processing perspective, the target image element may be understood to be a portrait, and the first image may be understood to be a first image in which the portrait is a foreground image. Accordingly, the second image may be a background image.

Step S103: performing white balance processing on the first image and the second image to obtain a first white balance image corresponding to the first image and a second white balance image corresponding to the second image;

the purpose of step S103 is to perform preprocessing on the first image and the second image, because the image data from the same data set or different data sets have inconsistent color tones, and therefore, the first image and the second image need to be color corrected to avoid the performance problem of the neural network model due to the color tone problem.

In the present embodiment, image color correction is mainly achieved by white balance processing. The specific implementation process may include:

step S103-1: determining the gray level mean value of the first image according to the RGB three-channel mean value of the first image;

step S103-2: determining the gray level mean value of the second image according to the RGB three-channel mean value of the second image;

step S103-3: determining an RGB three-channel gain coefficient of the first image according to the gray average value of the first image;

step S103-4: determining an RGB three-channel gain system of the second image according to the gray average value of the second image;

step S103-5: adjusting the pixels of the first image according to the original pixel RGB three-channel value of the first image and the RGB three-channel gain coefficient of the first image to obtain a first white balance image;

step S103-6: and adjusting the pixels of the second image according to the RGB three-channel value of the original pixels of the second image and the RGB three-channel gain coefficient of the second image to obtain the second white balance image.

Wherein the RGB three-channel mean value of the first image in the step S103-1 is

And the mean value of the grayscales of the first image and the second image in step S103-2 may adopt the following formula:

wherein the content of the first and second substances,

is the gray average value of RGB three channels.

The RGB three-channel gain coefficients in step S103-3 and step S103-4 may adopt the following formula:

wherein, K_rIs the gain factor of the R channel, K_gIs the gain coefficient of the G channel, K_bIs the gain factor of the B channel. Different images may be identified with different subscripts.

The specific implementation process of step S103-5 may include:

step S103-51: obtaining RGB three-channel pixel values according to the product of the RGB three-channel pixel values of the first image and the RGB three-channel gain coefficient of the first image;

step S103-52: adjusting an original pixel RGB three-channel value of the first image according to the RGB three-channel pixel value;

step S103-53: determining the adjusted image as the first white balance image.

The specific implementation process of step S103-6 may include:

step S103-61: obtaining RGB three-channel pixel values according to the product of the RGB three-channel pixel values of the second image and the RGB three-channel gain coefficients of the second image;

step S103-62: adjusting an original pixel RGB three-channel value of the second image according to the RGB three-channel pixel value;

step S103-63: determining the adjusted image as the second white balance image.

Namely: the three channel values of RGB of the original pixels of the first image and the second image can be adjusted by a Von Kries diagonal model, for example, using the following formula:

P(R′)＝P(R)×K_r；

P(G′)＝P(G)×K_g；

P(B′)＝P(B)×K_b；

and performing white balance processing on the first image and the second image to obtain a first white balance image and a second white balance image.

Step S104: performing image fusion processing on the first white balance image and the second white balance image to generate a target fusion image in which the target image elements are fused into the second image;

the purpose in step S104 is to generate a target fusion image that can fuse the target image element into the second image. That is, the foreground portrait image in the first image is fused to the background image provided by the second image.

An image fusion technique may be employed for generating the target fusion image at said step S104. In order to achieve a better image fusion effect, the problem of low quality of the fused image caused by boundary difference of the fused image can be avoided while the cost of operation time is saved.

In this embodiment, the specific implementation of step S104 may include:

step S104-1: acquiring a mask image of the first image;

the Mask image (Mask) is an image formed by blocking a target image element in the first image, and includes: the mask image is a binary image consisting of 0 and 1. When a mask is applied in a certain function, the 1-value area is processed, and the masked 0-value area is not included in the calculation. In this embodiment, the mask image may be the same size as the first image, with the target area pixel value being 1 and the non-target area pixel value being 0 in the mask image, so as to extract the pixel values of the target image elements from the first image according to the mask image. Thus, after the first image is acquired, the mask image is obtained by occlusion of the target image elements in the first image. The timing of acquiring the mask image is not particularly limited, for example: the mask image may be acquired when the first image is acquired, that is, the first image is subjected to mask processing so as to acquire the mask image, or the mask image may be acquired when the target fusion image is generated.

Step S104-2: and performing image fusion processing on the first white balance image, the second white balance image and the mask image in a Laplacian pyramid fusion mode to generate a target fusion image in which the target image elements are fused into the second image.

As shown in fig. 4, the specific implementation process of the step S104-2 may include:

step S104-21: and respectively constructing a Gaussian pyramid for the first white balance image, the second white balance image and the mask image to obtain a first Gaussian pyramid of the first white balance image, a second Gaussian pyramid of the second white balance image and a third Gaussian pyramid of the mask image. The gaussian pyramid is the most basic image pyramid.

The specific implementation process of step S104-21 may be that the first white balance image, the second white balance image, and the mask image are respectively subjected to downsampling convolution operation, so as to obtain a gaussian pyramid image of the first white balance image, a gaussian pyramid image of the second white balance image, and a gaussian pyramid image of the mask image. The following description of the process of obtaining the first gaussian pyramid is given by taking the first white balance image as an example:

taking the first white balance image as a bottom layer image G0 of a Gaussian pyramid image, namely a 0 th layer image, performing convolution on the bottom layer image by using a Gaussian kernel (n multiplied by n), then performing down-sampling (removing even rows and columns) on the convolved image to obtain an upper layer image G1 which is adjacent to the bottom layer image and is positioned on the bottom layer image, taking the image G1 as input, repeating the convolution and down-sampling operation in the G0 process to obtain an upper layer image G2 positioned on G1, and repeating the iteration for multiple times to form a pyramid image data structure, namely the first Gaussian pyramid. The following equation is expressed:

G_i＝Down(G_i-1)；

wherein Down is a Down-sampling function, G_iRepresenting a gaussian image of the ith layer. Based on the above, it can be understood that the down-sampling can be implemented by discarding even rows and even columns in the image, so that the length and width of the image are reduced by half and the area is reduced by a quarter.

Similarly, the second white balance image may also be downsampled to obtain a second gaussian pyramid, and the mask image may be downsampled to obtain a third gaussian pyramid.

Step S104-22: respectively constructing a first Laplacian pyramid corresponding to the first Gaussian pyramid, a second Laplacian pyramid corresponding to the second Gaussian pyramid and a third Laplacian pyramid corresponding to the third Gaussian pyramid according to the first Gaussian pyramid, the second Gaussian pyramid and the third Gaussian pyramid;

the laplacian pyramid in the step S104-22 can be understood as a pyramid of the residual image structure. Namely: in the operation process of the gaussian pyramid, partial high-frequency detail information of the image is lost through related operations such as convolution and down-sampling, and the image describing the high-frequency detail information is the laplacian pyramid.

The specific implementation process of step S104-22 may be to subtract the predicted image after the upsampling and gaussian convolution of the image of the previous layer adjacent to each layer of image of the gaussian pyramid from each layer of image of the gaussian pyramid, so as to obtain a series of difference images, which are the image structures of the laplacian pyramid. In other words, the image layer 3 in the gaussian pyramid is first up-sampled (an enlarged image, which may also be referred to as image interpolation), an image a having the same size as the image layer 2 is obtained after sampling, then down-sampled (a reduced image, which is down-sampled) is performed on the image a, a blurred image having the same size as the image layer 3 is obtained, and an image formed by the difference between a' and a is referred to as a difference image (residual image) as a laplacian image. That is, the laplacian pyramid is to record the difference between the upsampling and downsampling of each level of the laplacian pyramid and the downsampling of the gaussian pyramid. Namely, the following equation:

L_i＝G_i-U_p(Down(G_i))；

wherein L is_iRepresenting Laplace pyramid images, U_pRepresents the upsampling, G_iGaussian image representing the ith layer, Down (G)_i) Can be understood as being G_i+1I.e., the gaussian image of the (i + 1) th layer.

The formula can also be expressed as: l is_i＝G_i-U_p(G_i+1)；

In general, the image of the top level of the gaussian pyramid is the same as the image of the top level of laplacian.

Step S104-23: respectively fusing corresponding image layers in the first Laplacian pyramid, the second Laplacian pyramid and the third Laplacian pyramid to generate a new Laplacian image pyramid;

the specific implementation process of step S104-23 may be to add the image of the first laplacian pyramid and the image of the second laplacian pyramid according to a third laplacian pyramid, that is, the laplacian pyramid of the mask image, where the mask image is used to determine the fusion portion, and in this embodiment, the portrait portion in the first image is the mask portion. The result of the addition is a new laplacian image pyramid.

Step S104-24: and reconstructing the new Laplacian image pyramid to generate the target fusion image.

The reconstruction in steps S104-24 aims to construct a final target fusion image according to the new laplacian pyramid, and the construction process may include:

step S104-241: sequentially performing up-sampling on the top-level image of the new Laplacian image pyramid to generate a sampling image;

step S104-242: and determining the sampling image as the target fusion image.

The above is a description of the implementation process of step S104, and since the gaussian pyramid and the laplacian pyramid belong to the prior art, the above description process is more general.

In order to provide more image data to expand the image data set, in this embodiment, data enhancement processing may be performed on the first white balance image and the second white balance image, and the purpose of performing enhancement processing on the image data is to generate more image data, so that the generalization capability in image fusion can be improved, and a common method for enhancing image data includes: elastic deformation, image blur, image rotation, adding noise, etc. Therefore, the specific implementation of step S104 may include:

step S104-31: and performing image fusion processing on the first enhanced image of the first white balance image and the second enhanced image of the second white balance image to generate a target fusion image in which the target image elements are fused into the second image.

Based on the steps S104-31, the fusion processing can be carried out according to the image after the data enhancement processing when the image fusion processing is carried out, so that more target fusion images can be obtained.

The foregoing is a description of an embodiment of an image processing method provided in the present application, and according to the embodiment of the image processing method provided in the present application, images of two different backgrounds can be fused, that is: the target image element in the first image is fused into the second image, so that the boundary difference of a fusion area and the color difference between the fusion image and the background image during image fusion can be reduced, and the quality of the fusion image and the efficiency of image fusion processing can be improved.

In the embodiment, different image data are obtained from different data sets, white balance processing is performed on the two different image data to reduce the environmental difference between the two different image data, and then image fusion is performed through an image pyramid to generate a target fusion image, so that the target fusion image can avoid the problems of low quality, obvious boundary, large color difference and the like in fusion of target image elements into a background image due to the environmental difference of an original image.

With reference to fig. 2, since the embodiment of the apparatus is substantially similar to the embodiment of the method, the description is relatively simple, and related points can be found in the partial description of the embodiment of the method. The device embodiments described below are merely illustrative.

Referring to fig. 2, fig. 2 is a schematic structural diagram of an embodiment of an expansion apparatus for image data set provided in the present application, the embodiment of the apparatus includes:

a first acquisition unit 201 for acquiring a first image including a target image element from a first data set;

the specific implementation process of the first obtaining unit 201 may refer to the specific content of the step S101, and is not repeated here.

A second acquisition unit 202 for acquiring a second image from a second data set;

for a specific implementation process of the second obtaining unit 202, reference may be made to step S102, and details are not repeated here.

A first processing unit 203, configured to perform white balance processing on the first image and the second image to obtain a first white balance image corresponding to the first image and a second white balance image corresponding to the second image;

the first processing unit includes: the device comprises a first determining subunit, a second determining subunit, a third determining subunit, a fourth determining subunit, a first obtaining subunit and a second obtaining subunit.

The first determining subunit is configured to determine a grayscale mean of the first image according to an RGB three-channel mean of the first image;

the second determining subunit is configured to determine a grayscale mean value of the second image according to an RGB three-channel mean value of the second image;

the third determining subunit is configured to determine, according to the average grayscale value of the first image in the first determining subunit, an RGB three-channel gain coefficient of the first image;

the fourth determining subunit is configured to determine, according to the mean grayscale value of the second image in the first determining subunit, an RGB three-channel gain system of the second image;

the first obtaining subunit is configured to adjust the pixel of the first image according to the RGB triple-channel value of the original pixel of the first image and the RGB triple-channel gain coefficient of the first image in the third determining subunit, so as to obtain the first white balance image. The first obtaining subunit includes: a calculating subunit, an adjusting subunit and a determining subunit; the calculating subunit is configured to obtain an RGB three-channel pixel value according to a product of the RGB three-channel pixel value of the first image and the RGB three-channel gain coefficient of the first image; the adjusting subunit is configured to adjust an original pixel RGB three-channel value of the first image according to the RGB three-channel pixel value obtained in the calculating subunit; the determining subunit is configured to determine the image adjusted in the adjusting subunit as the first white balance image.

The second obtaining subunit is configured to adjust the pixels of the second image according to the RGB triple-channel value of the original pixel of the second image and the RGB triple-channel gain coefficient of the second image in the fourth determining subunit, so as to obtain the second white balance image. The second obtaining subunit includes: a calculating subunit, an adjusting subunit and a determining subunit; the calculating subunit is configured to obtain RGB three-channel pixel values according to a product of the RGB three-channel pixel values of the second image and the RGB three-channel gain coefficients of the second image; the adjusting subunit is used for adjusting the original pixel RGB three-channel value of the second image according to the RGB three-channel pixel value obtained in the calculating subunit; the determining subunit is configured to determine the image adjusted in the adjusting subunit as the second white balance image.

For detailed technical contents related to the specific implementation process of the first processing unit 203, reference may be made to step S103, and details are not repeated here.

A second processing unit 204, configured to perform image fusion processing on the first white balance image and the second white balance image, and generate a target fusion image in which the target image element is fused into the second image;

the second processing unit 204 may specifically include: an acquisition subunit and a processing subunit;

the acquiring subunit is configured to acquire a mask image of the first image; for details, refer to the content of step S104-1, and will not be repeated here.

The processing subunit is configured to perform image fusion processing on the first white balance image, the second white balance image, and the mask image in a laplacian pyramid fusion manner, so as to generate a target fusion image in which the target image elements are fused into the second image. The method specifically comprises the following steps: a first building subunit, a second building subunit, a fusion subunit and a reconstruction subunit.

The first constructing subunit is configured to respectively construct a gaussian pyramid for the first white balance image, the second white balance image, and the mask image, and obtain a first gaussian pyramid for the first white balance image, a second gaussian pyramid for the second white balance image, and a third gaussian pyramid for the mask image;

the second constructing subunit is configured to respectively construct, according to the first gaussian pyramid, the second gaussian pyramid, and the three gaussian pyramids, a first laplacian pyramid corresponding to the first gaussian pyramid, a second laplacian pyramid corresponding to the second gaussian pyramid, and a third laplacian pyramid corresponding to the third gaussian pyramid;

the fusion subunit is configured to fuse corresponding image layers in the first laplacian pyramid, the second laplacian pyramid, and the third laplacian pyramid, respectively, to generate a new laplacian image pyramid;

and the reconstruction subunit is configured to reconstruct the new laplacian image pyramid to generate the target fusion image.

The reconstruction subunit may include: an upsampling subunit and a determining subunit.

The upsampling subunit is configured to start sequentially upsampling the top-level image of the new laplacian image pyramid to generate a sampled image;

the determining subunit is configured to determine the sampling image as the target fusion image.

In order to provide more image data to expand the image data set, in this embodiment, the method may further include: and the enhancing unit is used for performing data enhancement processing on the first white balance image and the second white balance image to obtain a first enhanced image and a second enhanced image.

The second processing unit may specifically perform image fusion processing on the first enhanced image of the first white balance image, the second enhanced image of the second white balance image, and the mask image in the enhancing unit, and generate a target fusion image in which the target image element is fused to the second image.

For detailed technical contents related to the specific implementation process of the second processing unit 204, reference may be made to step S104, and details are not repeated here.

The method and the device provide a large amount of training sample data for training the image segmentation model so as to improve the data size of the image data in the image data set and improve the training performance of the portrait segmentation model. In conjunction with the above, the present application further provides an image data set expansion method, please refer to fig. 3 and fig. 4, fig. 3 is a flowchart of an embodiment of the image data set expansion method provided in the present application; fig. 4 is a schematic structural diagram of an embodiment of an image data set expansion method provided by the present application.

As shown in fig. 3, an embodiment of a method for expanding an image data set of the present application includes:

step S301: acquiring a target fusion image generated according to the image processing method;

for the specific implementation process of step S301, reference may be made to steps S101 to S104, and details are not repeated here.

Step S302: the image dataset is augmented according to the target fusion image.

The specific implementation process of step S302 may be to store the target fusion image as training data of an image segmentation model into the image dataset.

From the above, in the present embodiment, the portrait is adopted as the first image, and therefore, the image segmentation model in step S302 may be a portrait segmentation model.

The image segmentation model can be applied to relevant application scenes such as video conferences, video live broadcasts, portrait recognition and the like. According to the embodiment of the image data set expansion method, different image data can be obtained from different data sets, the two different image data are subjected to white balance processing to reduce the environmental difference between the two different image data, then image fusion is carried out through an image pyramid to generate a target fusion image, the target fusion image is used as image expansion data to be provided for an image data set required by an image segmentation model, and the target fusion image can avoid the problems that the quality is lower, the boundary is obvious, the color difference is larger and the like when target image elements are fused into a background image due to the environmental difference of an original image, so that the problem that the performance of the model is poor when the image segmentation model adopts a low-quality fusion image can be avoided, and the segmentation efficiency and the quality of the image segmentation model are improved.

The above is a detailed description of an embodiment of an image data set expansion method provided in the present application, and corresponds to the aforementioned embodiment of an image data set expansion method provided in the present application, and the present application also provides an embodiment of an image data set expansion device, please refer to fig. 5, since the device embodiment is basically similar to the method embodiment, the description is relatively simple, and related points can be referred to partial description of the method embodiment. The device embodiments described below are merely illustrative.

As shown in fig. 5, fig. 5 is a schematic structural diagram of an embodiment of an expansion apparatus for an image data set provided in the present application, where the embodiment includes:

an acquiring unit 501, configured to acquire a target fusion image generated according to the image processing method; the specific content of the obtaining unit 501 refers to the specific description of step S101 to step S104, and is not repeated here.

An expansion unit 502 for expanding the image data set according to the target fusion image.

The expansion unit 502 may specifically include a storage subunit, configured to store the target fusion image as training data of an image segmentation model into the image data set. For the specific implementation process of the expansion unit 501, reference may be made to the content of step S302 described above.

The above is a description of an embodiment of an image data set expansion device provided in the present application, and for understanding of the embodiment of the device, reference may be made to the description of the corresponding method embodiment, and repeated descriptions are omitted here.

In combination with the above, the present application further provides a training method of an image segmentation model, as shown in fig. 6, fig. 6 is a flowchart of an embodiment of the training method of the image segmentation model provided in the present application, where the embodiment of the training method includes:

step S601: acquiring image data according to the image data set expanded in the provided image data set expansion method;

the purpose of step S601 is to obtain image data from an image dataset, which is a processed target fusion image obtained based on the image processing method provided in steps S101 to S104, and to expand the image dataset by the target fusion image according to steps S301 to S302, the expanded image dataset having a larger amount of image data than the image data in the image dataset before expansion, and the expanded image dataset being capable of providing a large amount of image training parameters for the image segmentation model.

Step S602: inputting the image data serving as a training parameter of an image segmentation model into the image segmentation model for training to obtain a trained image segmentation model; the trained image segmentation model can segment the image data to be segmented input into the image segmentation model into a foreground image or a background image.

In this embodiment, the foreground image may be a human body image, that is, the image segmentation model can segment the image data to be segmented into a human body image or a background image, or a human body image and a background image.

With reference to the above, the present application further provides an image processing method for a video conference, as shown in fig. 7, an embodiment of the image processing method for a video conference includes:

step S701: acquiring a video conference image in a video conference;

step S702: inputting the video conference images into an image segmentation model for learning, and identifying human body images in the video conference images; wherein, the image segmentation model is a model obtained by training by using the image data in the image data set obtained by the provided image data set expansion method as training data;

step S703: and blurring or replacing the image outside the human body image area range.

With reference to the above, the present application further provides a method for processing video data in live video, where as shown in fig. 8, an embodiment of the method for processing video data in live video includes:

step S801: acquiring a live broadcast picture image in video live broadcast;

step S802: inputting the live broadcast picture image into an image segmentation model for learning, and identifying a human body image in the live broadcast picture image; wherein, the image segmentation model is a model trained by using the image data in the image data set obtained by the provided image data set expansion method as training data;

step S803: and adding preset information to the image outside the human body image area range. The preset information can be displayed in a designated area of the live broadcast picture or in an area outside the human body image area of the live broadcast picture. The preset information can change along with the change of the human body image region range, that is, the image segmentation model can learn and recognize the input live broadcast picture image in real time and determine the region range of the human body image in real time.

Based on the above, the present application further provides a computer storage medium for storing data generated by a network platform and a program for processing the data generated by the network platform;

when the program is read and executed, the steps of the image processing method provided by the application are executed; alternatively, the program, when being read and executed, performs the steps of the data set expansion method as provided in the present application; alternatively, the program, when read and executed, performs the training method of the image segmentation model as provided in the present application; or, the program, when being read and executed, executes the steps of the image processing method for the video conference as provided in the present application; alternatively, the program, when being read and executed, executes the steps of the method for processing video data in live video as provided in the present application.

As shown in fig. 9, fig. 9 is a schematic structural diagram of an embodiment of an electronic device provided in the present application, where the embodiment of the electronic device includes: a processor 901 and a memory 902;

the memory 902 is used for storing a program for processing data generated by the network platform, and when the program is read and executed by the processor 901, the program performs the steps of the image processing method as provided in the present application; alternatively, the program, when read and executed, performs the steps of the method of augmenting an image data set as provided in the present application; alternatively, the program, when being read and executed, performs the steps of the training method of the image segmentation model as provided in the present application; or, the program, when being read and executed, executes the steps of the image processing method for the video conference as provided in the present application; alternatively, the program, when being read and executed, executes the steps of the method for processing video data in live video as provided in the present application.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

1. Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.

2. As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Although the present application has been described with reference to the preferred embodiments, it is not intended to limit the present application, and those skilled in the art can make variations and modifications without departing from the spirit and scope of the present application, therefore, the scope of the present application should be determined by the claims that follow.

Claims

1. An image processing method, comprising:

acquiring a second image from a second data set;

2. The image processing method according to claim 1, wherein the performing white balance processing on the first image and the second image to obtain a first white balance image corresponding to the first image and a second white balance image corresponding to the second image comprises:

3. The method according to claim 2, wherein the adjusting the pixels of the first image according to the RGB three-channel values of the pixels of the first image and the RGB three-channel gain coefficients of the first image to obtain the first white balance image comprises:

determining the adjusted image as the first white balance image.

4. The image processing method according to claim 2, wherein the adjusting the pixels of the second image according to the RGB triple-channel values of the original pixels of the second image and the RGB triple-channel gain coefficients of the second image to obtain the second white balance image comprises:

determining the adjusted image as the second white balance image.

5. The image processing method according to claim 1, wherein the generating a target fusion image in which the target image element is fused to the second image by performing image fusion processing on the first white balance image and the second white balance image, comprises:

acquiring a mask image of the first image;

6. The image processing method according to claim 5, wherein performing image fusion processing on the first white balance image, the second white balance image, and the mask image to generate a target fusion image in which the target image element is fused to the second image, comprises:

7. The image processing method according to claim 6, wherein the generating a target fusion image in which the target image elements are fused to the second image by performing image fusion processing on the first white balance image, the second white balance image, and the mask image by using a laplacian pyramid fusion method includes:

8. The image processing method of claim 7, wherein the reconstructing the new pyramid of Laplacian images to generate the target fusion image comprises:

and determining the sampling image as the target fusion image.

9. The image processing method according to claim 1, further comprising:

10. An image processing apparatus characterized by comprising:

a second acquisition unit for acquiring a second image from a second data set;

11. A method of augmenting an image data set, comprising:

acquiring a target fusion image generated by the image processing method according to any one of claims 1 to 9;

the image dataset is augmented according to the target fusion image.

12. The method for image dataset expansion according to claim 11, wherein said expanding the image dataset according to the target fusion image comprises:

13. An apparatus for augmenting an image data set, comprising:

an acquisition unit configured to acquire a target fusion image generated by the image processing method according to any one of claims 1 to 8;

14. A training method of an image segmentation model is characterized by comprising the following steps:

the image data set augmented in the image data set augmenting method according to any one of the above claims 11 to 12, acquiring image data;

15. An image processing method for a video conference, comprising:

acquiring a video conference image in a video conference;

inputting the video conference images into an image segmentation model for learning, and identifying human body images in the video conference images; wherein the image segmentation model is a model trained by using image data in the image data set obtained by the image data set expansion method of any one of the above claims 11 to 12 as training data;

and blurring or replacing the image outside the human body image area range.

16. A method for processing video data in live video is characterized by comprising the following steps:

acquiring a live broadcast picture image in video live broadcast;

inputting the live broadcast picture image into an image segmentation model for learning, and identifying a human body image in the live broadcast picture image; wherein the image segmentation model is a model trained by using image data in the image data set obtained by the image data set expansion method of any one of the above claims 11 to 12 as training data;

17. A computer storage medium for storing network platform generated data and a program for processing the network platform generated data;

the program, when read and executed, performs the steps of the image processing method of any one of claim 1 through claim 9; or, the step of performing the method of augmenting an image data set according to any one of claims 11 to 12; or, the step of performing a training method of the image segmentation model as claimed in claim 14; or, the step of performing the image processing method of a video conference according to claim 15; or the steps of a method of processing video data in a live video according to claim 16.

18. An electronic device, comprising:

a processor;

a memory for storing a program for processing network platform generated data, the program, when read and executed by the processor, performing the steps of the image processing method of any one of claims 1 to 9; a step of performing an expansion method of an image data set according to any one of claim 11 to claim 12; or, the step of performing a training method of the image segmentation model as claimed in claim 14; or, the step of performing the image processing method of a video conference according to claim 15; or the steps of a method of processing video data in a live video according to claim 16.