WO2022188886A1

WO2022188886A1 - Image matting model training method and apparatus, and image matting method and apparatus

Info

Publication number: WO2022188886A1
Application number: PCT/CN2022/080531
Authority: WO
Inventors: 王闯闯; 钱贝贝; 杨飞宇; 胡正
Original assignee: 奥比中光科技集团股份有限公司
Priority date: 2021-03-11
Filing date: 2022-03-13
Publication date: 2022-09-15
Also published as: CN113052868A; CN113052868B

Abstract

The present invention is applicable to the technical field of image processing, and provides an image matting model training method and apparatus, and an image matting method and apparatus. The image matting model training method comprises: obtaining a training sample set, an initial teacher model, and an initial student model; training the initial teacher model by means of the training sample set to obtain a target teacher model and a first transparency mask output by the target teacher model; separately migrating first weight parameters in the target teacher model into sub-networks in the initial student model to obtain a transition student model; and training the transition student model according to the first transparency mask and the training sample set to obtain an image matting model. Because an image matting model has first weight parameters in a target teacher model and continuously learns a first transparency mask output by the target teacher model, the volume of the model and the processing duration are efficiently reduced when ensuring the processing accuracy.

Description

A kind of method and device for matting model training and image matting

This application claims the priority of the Chinese patent application filed on March 11, 2021, with the application number of 202110264893.0, and the invention titled "A method and device for patterning model training and image patterning", the entire contents of which are Incorporated herein by reference.

technical field

The present application belongs to the technical field of image processing, and in particular, relates to a method and device for matting model training and image matting.

Background technique

In the field of image processing, foreground matting is a common processing method. Among them, foreground matting refers to extracting the region of interest (foreground) in the image, obtaining a fine transparency mask, and using the transparency mask to extract the matting object from the image or video, so as to apply the matting object to photo editing, The movie is being recreated.

In traditional matting technology, a matting model is often used to obtain a transparency mask, and then a matting object is extracted from an image or video according to the transparency mask. However, in order to further improve the processing accuracy of the matting model, the traditional matting model is often large in size, resulting in a long processing time and cannot be applied to the application scenario of real-time matting.

SUMMARY OF THE INVENTION

In view of this, the embodiments of the present application provide a method for matting model training, a method for image matting, an apparatus for training a matting model, an apparatus for image matting, a first terminal device, a first The second terminal device and the computer-readable storage medium can solve the technical problem that the traditional matting model is often large in size, resulting in a long processing time and cannot be applied to the application scenario of real-time matting.

A first aspect of the embodiments of the present application provides a method for training a matting model, the method comprising:

Obtain a training sample set, an initial teacher model and an initial student model; wherein, the network structure complexity of the initial student model is lower than the network structure complexity of the initial teacher model; each training sample includes an input sample and an output sample; The input sample includes an image to be cutout, a background image, and a depth image of the image to be cutout, and the output sample includes a standard transparency mask corresponding to the image to be cutout;

Through the training sample set, the initial teacher model is trained to obtain a target teacher model and a first transparency mask output by the target teacher model;

respectively migrating the first weight parameter in the target teacher model to each sub-network in the initial student model to obtain a transitional student model;

According to the first transparency mask and the training sample set, the transitional student model is trained to obtain a matting model.

A second aspect of the embodiments of the present application provides a method for image matting, the method comprising:

Obtain the image to be cutout, the background image, and the depth image corresponding to the image to be cutout; wherein, the image to be cutout and the background image are images collected at the same viewing position, and the image to be cutout includes a matting object, the background image does not include the matting object;

Inputting the image to be cutout, the background image and the depth image into a pretrained cutout model, the target transparency mask output by the cutout model is obtained; the cutout model is trained by the transition student model Obtained, the transition student model is obtained by migrating the first weight parameter of the target teacher model to the initial student model; the network structure complexity of the cutout model is lower than the network structure complexity of the target teacher model;

According to the target transparency mask, the cutout image corresponding to the cutout object in the image to be cutout is intercepted.

A third aspect of the embodiments of the present application provides an apparatus for training a cutout model, the apparatus comprising:

The first obtaining unit is used to obtain a training sample set, an initial teacher model and an initial student model; wherein, the network structure complexity of the initial student model is lower than the network structure complexity of the initial teacher model; in each training sample Including input samples and output samples; the input samples include an image to be cut out, a background image and a depth image of the image to be cut out, and the output sample includes a standard transparency mask corresponding to the image to be cut out;

a first training unit, configured to train the initial teacher model to obtain a target teacher model and a first transparency mask output by the target teacher model through the training sample set;

a migration unit, configured to respectively migrate the first weight parameter in the target teacher model to each sub-network in the initial student model to obtain a transitional student model;

The second training unit is configured to train the transitional student model to obtain the matting model according to the first transparency mask and the training sample set.

A fourth aspect of the embodiments of the present application provides an apparatus for training a cutout model, the apparatus comprising:

The second acquiring unit is configured to acquire the image to be cut out, the background image and the depth image corresponding to the image to be cut out; wherein, the image to be cut out and the background image are images collected in the same viewing position, so The image to be cutout includes a cutout object, and the background image does not include the cutout object;

a processing unit, configured to input the image to be cutout, the background image and the depth image into a pretrained cutout model to obtain a target transparency mask output by the cutout model; the cutout model Obtained from the training of the transitional student model, the transitional student model is obtained by migrating the first weight parameter of the target teacher model to the initial student model; the network structure complexity of the cutout model is lower than the network structure complexity of the target teacher model ;

An intercepting unit, configured to intercept a cutout image corresponding to the cutout object in the to-be-cutout image according to the target transparency mask.

A fifth aspect of the embodiments of the present application provides a first terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer The program implements the steps of the method described in the first aspect above.

A sixth aspect of the embodiments of the present application provides a second terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer The program implements the steps of the method described in the second aspect above.

A seventh aspect of the embodiments of the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the first aspect or the second aspect described above is implemented. steps of the method.

Compared with the prior art, the embodiments of the present application have the following beneficial effects: the present application obtains the target teacher model by training the initial teacher model. Since the network structure of the target teacher model has high processing accuracy, the first weight parameter in the target teacher model is transferred to the initial student model to obtain the transition student model. And according to the first transparency mask and the training sample set output by the target teacher model, the transition student model is trained to obtain the matting model. Since the cutout model not only has the first weight parameter in the target teacher model, but also continuously learns the first transparency mask output by the target teacher model, the cutout model has a processing accuracy similar to that of the target teacher model, and the network of the cutout model The structure is relatively simple, so on the premise of ensuring the processing accuracy, the volume of the model and the processing time are subtly reduced.

Description of drawings

In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the accompanying drawings that are used in the description of the embodiments or related technologies. Obviously, the drawings in the following description are only some of the drawings in the present application. In the embodiments, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without any creative effort.

1 shows a schematic flowchart of a method for training a matting model provided by the present application;

Figure 2 shows a schematic diagram of a student model and a teacher model;

FIG. 3 shows a specific schematic flow chart of step 103 in a method for matting model training provided by the present application;

FIG. 4 shows a specific schematic flow chart of step 103 in a method for matting model training provided by the present application;

FIG. 5 shows a specific schematic flowchart of step 1043 in a method for training a matting model provided by the present application;

6 shows a specific schematic flow chart of step A4 in a method for matting model training provided by the present application;

FIG. 7 shows a schematic flowchart of a method for image matting provided by the present application;

FIG. 8 shows a schematic diagram of a device for training a cutout model provided by the present application;

FIG. 9 shows a schematic diagram of a device for image matting provided by the present application;

FIG. 10 is a schematic diagram of a first terminal device according to an embodiment of the present invention;

FIG. 11 is a schematic diagram of a second terminal device according to an embodiment of the present invention.

Detailed ways

In the following description, for the purpose of illustration rather than limitation, specific details such as a specific system structure and technology are set forth in order to provide a thorough understanding of the embodiments of the present application. However, it will be apparent to those skilled in the art that the present application may be practiced in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

Please refer to FIG. 1. FIG. 1 shows a schematic flowchart of a method for training a matting model provided by the present application. As shown in FIG. 1 , the calibration method is applied to the first terminal device, and the calibration method includes the following steps:

Step 101: Obtain a training sample set, an initial teacher model and an initial student model; wherein, the network structure complexity of the initial student model is lower than the network structure complexity of the initial teacher model; each training sample includes input samples and An output sample; the input sample includes an image to be cutout, a background image, and a depth image of the image to be cutout, and the output sample includes a standard transparency mask corresponding to the image to be cutout.

The training sample set includes different training samples, and each training sample includes an image to be cut, a background image, a depth image of the image to be cut, and a standard transparency mask corresponding to the image to be cut. The image to be cutout and the background image are images collected at the same framing position, and the difference between the two is that the image to be cutout includes a cutout object, and the background image does not include a cutout object (that is, the image to be cutout includes complete the foreground and background, only the background is included in the background image).

The training sample set is used to train the initial teacher model as well as the initial student model. Among them, the initial teacher model and the initial student model are used to obtain transparency masks. The initial teacher model is a model with high network structure complexity, which can extract rich feature information, and then obtain a high-precision transparency mask. Preferably, the initial teacher model can adopt a network structure such as a Resnet152 network.

Illustratively, taking the Resnet152 network as an example, Resnet152 is a highly complex super network model with 152 convolutional layers. When the number of convolutional layers of the model is more, the extracted features are richer, and the feature integrity is high, and then a high-precision transparency mask can be obtained. However, Resnet152 has a slow training speed, requires high-performance computing, and is only suitable for running in high-performance and high-memory devices. And the operation time is long, which cannot meet the needs of real-time matting. Based on the above considerations, the present application uses a student model with a simple network structure to learn the output results of the teacher model, so as to replace the high-complexity teacher model with a low-complexity student model under the premise of ensuring the processing effect. That is, the teacher model is only used in the training phase, and the student model is used to process the images in the application phase.

It is worth noting that this implementation is the process steps in the training phase, and the process steps in the application phase refer to steps 701 to 703 in the embodiment shown in FIG. 7 .

Step 102: Train the initial teacher model through the training sample set to obtain a target teacher model and a first transparency mask output by the target teacher model.

Each training sample in the training sample set performs the following process: input the image to be cutout, the background image, and the depth image of the image to be cutout into the initial teacher model, and obtain the initial transparency template output by the initial teacher model. Calculate the loss function based on the initial transparency mask and the standard transparency mask. Update the network parameters in the initial teacher model according to the loss function.

When the training of all the training samples in the training sample set is completed or the preset number of training times is reached, or the model convergence condition is reached, the target teacher model and the first transparency mask output by the target teacher model are obtained.

Step 103: Migrate the first weight parameter in the target teacher model to each sub-network in the initial student model to obtain a transitional student model.

Please refer to FIG. 2, which shows a schematic diagram of the student model and the teacher model. The dashed box M represents the teacher model, the dashed box N represents the student model, the box I represents the image to be matted, the box S represents the depth image of the image to be matted, and the box B represents the background image. After the initial teacher model is trained on the image to be cutout, the depth image and the background image, the target teacher model and the transparency mask α output by the target teacher model are obtained. The first weight parameters in the target teacher model are respectively transferred to each sub-network of the initial student model N (ie the Stage ₁ module to the Stage _n module) to obtain the transitional student model N. The network architectures adopted by different stage modules include, but are not limited to, a combination of one or more network architectures such as RefineNet network architecture or MobileNet network architecture.

When step 103 is performed, the first weight parameter can be directly transferred to the initial student model. However, since the first weight parameter in the target teacher model is floating-point data, the amount of calculation of floating-point data is relatively large. In order to optimize the calculation amount, in step 103, the following steps can also be performed:

As an optional embodiment of the present application, step 103 includes the following steps 1031 to 1032. Referring to FIG. 3 , FIG. 3 shows a specific schematic flowchart of step 103 in a method for training a matting model provided by the present application.

Step 1031: Quantize the floating-point first weight parameter into integer data to obtain a second weight parameter.

The quantization process for the first weight parameter is as follows: obtain the original file model of the target teacher model (eg TensorFlow, pyrorch or onnx model), and convert the original file model into intermediate files ".json format" and ".data format". Quantify the data in the intermediate file, and obtain the quantized ".quant format" file. The ".quant format" file includes the quantized integer weights of each layer of the target teacher model.

Wherein, the quantization method can adopt the existing quantization method or the following optional embodiments:

As an optional embodiment of the present application, each first weight parameter is sequentially substituted into the first formula group to obtain a second weight parameter corresponding to each first weight parameter.

The first formula group is as follows:

Among them, A represents the first quantization parameter, wherein the first quantization parameter refers to the corresponding scalable minimum scale factor between the floating-point first weight parameter and the integer first weight parameter, and J _max represents the The maximum weight parameter in (the maximum weight parameter is floating-point data), J _min represents the minimum weight parameter among all the first weight parameters (the minimum weight parameter is floating-point data), and α represents the preset integer data range in (The preset integer data range refers to the upper and lower limits of integer data, for example: 0-255, which can be preset according to different calculation precision requirements),

and

Represents rounding to the nearest integer, B represents the first preset integer value, the first preset integer value refers to the first integer value corresponding to the first floating-point weight parameter being zero, and N represents the each The first weight parameter, C represents the second weight parameter corresponding to each of the first weight parameters.

Step 1032: Migrate the second weight parameter to each sub-network in the initial student model to obtain a transitional student model.

As an embodiment of the present application, since the second weight parameter is integer data, the result output by the matting model is also integer data. However, since some protocols or hardware (eg, Open Pluggable Specification, Ops, Open Pluggable Specification) do not support integer data, a mapping relationship between floating-point data and integer data can be established in advance. To convert integer data to floating-point data when getting the output of the cutout model.

Step 104: Train the transitional student model to obtain the matting model according to the first transparency mask and the training sample set.

There are two ways to train the transitional student model:

Method ①: Input the image to be cutout, the background image, and the depth image of the image to be cutout in each training sample in the training sample set into the transition student model, and obtain the transition transparency template output by the transition student model. The loss function is calculated from the transition transparency mask and the first transparency mask. Update the network parameters in the transitional student model according to the loss function. When the training of all the training samples in the training sample set is completed or the preset number of training times is reached, or the model convergence condition is reached, the matting model is obtained.

Manner ②: As an optional embodiment of the present application, step 104 includes the following steps 1041 to 1044. Referring to FIG. 4 , FIG. 4 shows a specific schematic flowchart of step 103 in a method for training a matting model provided by the present application.

Step 1041: Quantize the floating-point transparency of each pixel in the first transparency mask into integer data to obtain a second transparency mask.

The quantization method can adopt the existing quantization method or the following optional embodiments:

As an optional embodiment of the present application, the floating-point transparency of each pixel in the first transparency mask is substituted into the second formula group to obtain the second transparency mask.

The second formula group is as follows:

Among them, D represents the second quantization parameter, wherein the second quantization parameter refers to the minimum scale factor corresponding to the floating point transparency and the integer transparency, and K _max represents the maximum transparency in the first transparency mask (the maximum transparency is floating-point data), K _min represents the minimum transparency in the first transparency mask (the minimum transparency is floating-point data), α represents the maximum value in the preset integer data range,

and

Indicates rounding to the nearest integer, E represents the second preset integer transparency, the second preset integer transparency refers to the second integer value corresponding to the floating point transparency of zero, and M represents the floating point value of each pixel. Point type transparency, F represents the integer type transparency corresponding to the floating point type transparency of each pixel.

Step 1042: Input the training samples into the transitional student model to obtain a third transparency mask output by the transitional student model.

Step 1043: Adjust a third weight parameter in the transition student model according to the second transparency mask and the third transparency mask.

In step 1043, the loss function between the second transparency mask and the third transparency mask can be directly calculated, and the third weight parameter in the transition student model can be adjusted according to the loss function.

Step 1043 can also be implemented by the following optional embodiments:

As an optional embodiment of the present application, step 1043 includes the following steps A1 to A4. Please refer to FIG. 5. FIG. 5 shows a specific schematic flowchart of step 1043 in a method for training a matting model provided by the present application.

Step A1: Calculate the first loss function through the first formula.

The first formula is as follows:

Among them, H represents the preset length of the composite image, M represents the preset width of the composite image, a _i,j represents the first transparency of the pixel in the ith row and the jth column in the second transparency mask,

Represents the second transparency of the pixel at row i and column j in the third transparency mask.

Step A2: Calculate the second loss function through the second formula.

The second formula is as follows:

Among them, μ represents the first average transparency value of each pixel in the second transparency mask, μ ² represents the square of the first transparency average value, μ ^* represents the second average transparency value of each pixel in the third transparency mask, and μ ^*2 represents the first transparency value. The square of the mean value of the two transparency, σ represents the first transparency variance of each pixel in the second transparency mask, σ2 represents the square of the first transparency variance, σ ^* represents the ^second transparency variance of each pixel in the third transparency mask, σ ^*2 represents the square of the second transparency variance, c ₁ represents the first constant, and c ₂ represents the second constant.

Step A3: Calculate the third loss function through the third formula.

The third formula is as follows:

Among them, γ represents the third constant, θ _i,j represents the index of the difficult pixel in the third transparency mask, and the difficult pixel refers to the pixel that cannot be processed by the transition student model, and the index is as follows:

Wherein, mn represents the range of m×n pixels adjacent to the difficult pixel point, and A _i,j represents the adjacent pixel point of the unprocessable pixel point.

Step A4: Adjust the third weight parameter in the transitional student model according to the first loss function, the second loss function and the third loss function.

In step A4, the first loss function, the second loss function, and the third loss function can be directly used as a joint loss function, and the third weight parameter in the transitional student model can be adjusted.

Step A4 can also be implemented by the following optional embodiments:

As an optional embodiment of the present application, step A4 includes the following steps A41 to A42. Referring to FIG. 6, FIG. 6 shows a specific schematic flowchart of step A4 in a method for training a matting model provided by the present application.

Step A41: Multiply the first loss function, the second loss function, and the third loss function by their corresponding preset weights to obtain a joint loss function.

Step A42: Adjust the third weight parameter in the transitional student model according to the joint loss function.

Step 1044, the steps of inputting the training samples into the transition student model to obtain the third transparency mask output by the transition student model and subsequent steps are sequentially performed on each of the training samples in the training sample set, to obtain a Graphical model.

In this embodiment, the target teacher model is obtained by training the initial teacher model. Since the network structure of the target teacher model has high processing accuracy, the first weight parameter in the target teacher model is transferred to the initial student model to obtain the transition student model. And according to the first transparency mask and the training sample set output by the target teacher model, the transition student model is trained to obtain the matting model. Since the cutout model not only has the first weight parameter in the target teacher model, but also continuously learns the first transparency mask output by the target teacher model, the cutout model has a processing accuracy similar to that of the target teacher model, and the network of the cutout model The structure is relatively simple, so on the premise of ensuring the processing accuracy, the volume of the model and the processing time are subtly reduced.

Referring to FIG. 7 , FIG. 7 shows a schematic flowchart of a method for applying the above-mentioned matting model to image matting provided by the present application. As shown in FIG. 7 , the method is applied to the second terminal device, and the method includes the following steps:

Step 701: Obtain an image to be cutout, a background image, and a depth image corresponding to the image to be cutout; wherein, the image to be cutout and the background image are images collected at the same viewing position, and the image to be cutout is an image collected at the same viewing position. The image includes a matting object, and the background image does not include the matting object.

Step 702: Input the image to be cutout, the background image and the depth image into a pretrained cutout model to obtain a target transparency mask output by the cutout model; The transitional student model is obtained by training the student model, and the transitional student model is obtained by migrating the first weight parameter of the target teacher model to the initial student model; the network structure complexity of the matting model is lower than that of the target teacher model.

During the processing of the model, some pixels may not be able to obtain depth data. In order to improve the recall rate of the foreground, the application simultaneously uses the image to be cutout, the background image and the depth image as the input data of the cutout model to accurately Extract depth features, and then obtain high-accuracy target transparency masks.

Step 703 , according to the target transparency mask, intercept the cutout image corresponding to the cutout object in the image to be cutout.

After the matting image is obtained, the matting image and the image to be synthesized are synthesized to obtain a target synthesized image. The synthesis process is shown in the following formula:

I=αF+(1-α)B

Among them, α represents the target transparency mask, I represents the target composite image, F represents the image to be matted, and B represents the image to be composited.

In this embodiment, since the transitional student model adopts the first weight parameter of the target teacher model, and trains the first weight parameter to obtain the matting model. Among them, the network structure complexity of the matting model is lower than that of the target teacher model. Therefore, the matting model improves the image processing efficiency on the premise of ensuring the processing accuracy.

As shown in FIG. 8, the present application provides an apparatus 8 for training a cutout model. Please refer to FIG. 8. FIG. 8 shows a schematic diagram of an apparatus for training a cutout model provided by the present application. As shown in FIG. A graphical model training device, including:

The first obtaining unit 81 is used to obtain a training sample set, an initial teacher model and an initial student model; wherein, the network structure complexity of the initial student model is lower than the network structure complexity of the initial teacher model;

The first training unit 82 is used to train the initial teacher model to obtain a target teacher model and a first transparency mask output by the target teacher model through the training sample set; each training sample includes an input sample and an output sample ; The input sample includes an image to be cutout, a background image and a depth image of the image to be cutout, and the output sample includes a standard transparency mask corresponding to the image to be cutout;

The migration unit 83 is used to respectively migrate the first weight parameter in the target teacher model to each sub-network in the initial student model to obtain a transitional student model;

The second training unit 84 is configured to train the transitional student model to obtain a matting model according to the first transparency mask and the training sample set.

Provided in the present application is an apparatus for training a cutout model, which obtains a target teacher model by training an initial teacher model. Since the network structure of the target teacher model has high processing accuracy, the first weight parameter in the target teacher model is transferred to the initial student model to obtain the transition student model. And according to the first transparency mask and the training sample set output by the target teacher model, the transition student model is trained to obtain the matting model. Since the cutout model not only has the first weight parameter in the target teacher model, but also continuously learns the first transparency mask output by the target teacher model, the cutout model has a processing accuracy similar to that of the target teacher model, and the network of the cutout model The structure is relatively simple, so on the premise of ensuring the processing accuracy, the volume of the model and the processing time are subtly reduced.

As shown in FIG. 9, the present application provides an image matting apparatus 9. Please refer to FIG. 9. FIG. 9 shows a schematic diagram of an image matting apparatus provided by the present application. As shown in FIG. 9, an image matting devices, including:

The second acquiring unit 91 is configured to acquire an image to be cut out, a background image, and a depth image corresponding to the image to be cut out; wherein, the image to be cut out and the background image are images collected at the same viewing position, The image to be cutout includes a cutout object, and the background image does not include the cutout object;

The processing unit 92 is configured to input the image to be cutout, the background image and the depth image into a pretrained cutout model to obtain a target transparency mask output by the cutout model; the cutout The model is obtained by training the transitional student model, and the transitional student model is obtained by migrating the first weight parameter of the target teacher model to the initial student model; the network structure complexity of the cutout model is lower than that of the target teacher model. Spend;

The intercepting unit 93 is configured to intercept the cutout image corresponding to the cutout object in the image to be cutout according to the target transparency mask.

In the device for training a cutout model provided by the present application, the transitional student model adopts the first weight parameter of the target teacher model, and the cutout model is obtained by training the first weight parameter. Among them, the network structure complexity of the matting model is lower than that of the target teacher model. Therefore, the matting model improves the image processing efficiency on the premise of ensuring the processing accuracy.

FIG. 10 is a schematic diagram of a first terminal device according to an embodiment of the present invention. As shown in FIG. 10, a first terminal device 100 in this embodiment includes: a processor 1001, a memory 1002, and a computer program 1003 stored in the memory 1002 and running on the processor 1001, such as a A program for matting model training. When the processor 1001 executes the computer program 1003 , the steps in each of the foregoing method embodiments for training a cutout model are implemented, for example, steps 101 to 104 shown in FIG. 1 . Alternatively, when the processor 1001 executes the computer program 134, the functions of the units in the foregoing apparatus embodiments, such as the functions of units 81 to 84 shown in FIG. 8 , are implemented.

Exemplarily, the computer program 1003 may be divided into one or more units, and the one or more units are stored in the memory 1002 and executed by the processor 1001 to complete the present invention. The one or more units may be a series of computer program instruction segments capable of performing specific functions, and the instruction segments are used to describe the execution process of the computer program 1003 in the first terminal device 100 . For example, the computer program 1003 can be divided into units with the following specific functions:

a first obtaining unit, used for obtaining a training sample set, an initial teacher model and an initial student model; wherein, the network structure complexity of the initial student model is lower than the network structure complexity of the initial teacher model;

The first terminal device includes but is not limited to a processor 1001 and a memory 1002 . Those skilled in the art can understand that FIG. 10 is only an example of a first terminal device 100, and does not constitute a limitation to a first terminal device 100, and may include more or less components than those shown in the figure, or combinations thereof Some components, or different components, for example, the roaming control device may also include input and output devices, network access devices, buses, and the like.

The processor 1001 may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, digital signal processors (Digital Signal Processors, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 1002 may be an internal storage unit of the first terminal device 100, such as a hard disk or a memory of the first terminal device 100. The memory 1002 may also be an external storage device of the first terminal device 100, such as a plug-in hard disk equipped on the first terminal device 100, a smart memory card (Smart Media Card, SMC), Secure Digital (SD) card, flash memory card (Flash Card), etc. Further, the memory 1002 may also include both an internal storage unit of the first terminal device 100 and an external storage device. The memory 1002 is used for storing the computer program and other programs and data required by the one kind of roaming control device. The memory 1002 may also be used to temporarily store data that has been output or will be output.

FIG. 11 is a schematic diagram of a second terminal device according to an embodiment of the present invention. As shown in FIG. 11, a second terminal device 11 in this embodiment includes: a processor 111, a memory 112, and a computer program 113 stored in the memory 112 and executable on the processor 111, such as a A program for image matting. When the processor 111 executes the computer program 113 , the steps in each of the above-mentioned embodiments of the image matting method are implemented, for example, steps 701 to 703 shown in FIG. 7 . Alternatively, when the processor 111 executes the computer program 134, the functions of the units in the foregoing device embodiments, for example, the functions of the units 91 to 93 shown in FIG. 9 , are implemented.

Exemplarily, the computer program 113 may be divided into one or more units, and the one or more units are stored in the memory 112 and executed by the processor 111 to complete the present invention. The one or more units may be a series of computer program instruction segments capable of performing specific functions, and the instruction segments are used to describe the execution process of the computer program 113 in the second terminal device 11 . For example, the computer program 113 can be divided into units with specific functions as follows:

The second terminal device includes but is not limited to the processor 111 and the memory 112 . Those skilled in the art can understand that FIG. 11 is only an example of a second terminal device 11 , and does not constitute a limitation to a second terminal device 11 , and may include more or less components than those shown in the figure, or combinations thereof Some components, or different components, for example, the roaming control device may also include input and output devices, network access devices, buses, and the like.

The processor 111 may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), Off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 112 may be an internal storage unit of the second terminal device 11 , such as a hard disk or a memory of the second terminal device 11 . The memory 112 may also be an external storage device of the second terminal device 11, such as a plug-in hard disk equipped on the second terminal device 11, a smart memory card (Smart Media Card, SMC), Secure Digital (SD) card, flash memory card (Flash Card), etc. Further, the memory 112 may also include both an internal storage unit of the second terminal device 11 and an external storage device. The memory 112 is used for storing the computer program and other programs and data required by the one kind of roaming control device. The memory 112 may also be used to temporarily store data that has been output or will be output.

It should be understood that the size of the sequence numbers of the steps in the above embodiments does not mean the sequence of execution, and the execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

It should be noted that the information exchange, execution process and other contents between the above-mentioned devices/units are based on the same concept as the method embodiments of the present application. For specific functions and technical effects, please refer to the method embodiments section. It is not repeated here.

Those skilled in the art can clearly understand that, for the convenience and simplicity of description, only the division of the above-mentioned functional units and modules is used as an example. Module completion, that is, dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above. Each functional unit and module in the embodiment may be integrated in one processing unit, or each unit may exist physically alone, or two or more units may be integrated in one unit, and the above-mentioned integrated units may adopt hardware. It can also be realized in the form of software functional units. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing from each other, and are not used to limit the protection scope of the present application. For the specific working processes of the units and modules in the above-mentioned system, reference may be made to the corresponding processes in the foregoing method embodiments, which will not be repeated here.

Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the steps in the foregoing method embodiments can be implemented.

The embodiments of the present application provide a computer program product, when the computer program product runs on a mobile terminal, the steps in the foregoing method embodiments can be implemented when the mobile terminal executes the computer program product.

The integrated unit, if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium. Based on this understanding, the present application realizes all or part of the processes in the methods of the above embodiments, which can be completed by instructing the relevant hardware through a computer program, and the computer program can be stored in a computer-readable storage medium. When executed by a processor, the steps of each of the above method embodiments can be implemented. Wherein, the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file or some intermediate form, and the like. The computer-readable medium may include at least: any entity or device capable of carrying the computer program code to the photographing device/living detection device, recording medium, computer memory, read-only memory (ROM), random access Memory (Random Access Memory, RAM), electrical carrier signal, telecommunication signal and software distribution medium. For example, U disk, mobile hard disk, disk or CD, etc. In some jurisdictions, under legislation and patent practice, computer readable media may not be electrical carrier signals and telecommunications signals.

In the foregoing embodiments, the description of each embodiment has its own emphasis. For parts that are not described or described in detail in a certain embodiment, reference may be made to the relevant descriptions of other embodiments.

Those of ordinary skill in the art can realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.

In the embodiments provided in this application, it should be understood that the disclosed apparatus/network device and method may be implemented in other manners. For example, the apparatus/network device embodiments described above are only illustrative. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods, such as multiple units. Or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units.

It is to be understood that, when used in this specification and the appended claims, the term "comprising" indicates the presence of the described feature, integer, step, operation, element and/or component, but does not exclude one or more other The presence or addition of features, integers, steps, operations, elements, components and/or sets thereof.

It will also be understood that, as used in this specification and the appended claims, the term "and/or" refers to and including any and all possible combinations of one or more of the associated listed items.

As used in the specification of this application and the appended claims, the term "if" may be contextually interpreted as "when" or "once" or "in response to determining" or "in response to monitoring of ". Similarly, the phrases "if it is determined" or "if the [described condition or event] is monitored" can be interpreted, depending on the context, to mean "once it is determined" or "in response to the determination" or "once the [described condition or event] is monitored. ]" or "in response to the detection of the [described condition or event]".

In addition, in the description of the specification of the present application and the appended claims, the terms "first", "second", "third", etc. are only used to distinguish the description, and should not be construed as indicating or implying relative importance.

References in this specification to "one embodiment" or "some embodiments" and the like mean that a particular feature, structure or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," "in other embodiments," etc. in various places in this specification are not necessarily All refer to the same embodiment, but mean "one or more but not all embodiments" unless specifically emphasized otherwise. The terms "including", "including", "having" and their variants mean "including but not limited to" unless specifically emphasized otherwise.

The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the above-mentioned embodiments, those of ordinary skill in the art should understand that: it can still be used for the above-mentioned implementations. The technical solutions described in the examples are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions in the embodiments of the application, and should be included in the within the scope of protection of this application.

Claims

A method for matting model training, wherein the method comprises:

Obtain a training sample set, an initial teacher model and an initial student model; wherein, the network structure complexity of the initial student model is lower than the network structure complexity of the initial teacher model; each training sample includes an input sample and an output sample; The input sample includes an image to be cutout, a background image, and a depth image of the image to be cutout, and the output sample includes a standard transparency mask corresponding to the image to be cutout;

Through the training sample set, the initial teacher model is trained to obtain a target teacher model and a first transparency mask output by the target teacher model;

respectively migrating the first weight parameter in the target teacher model to each sub-network in the initial student model to obtain a transitional student model;

The matting model is obtained by training the transitional student model according to the first transparency mask and the training sample set.
The method according to claim 1, wherein, respectively migrating the first weight parameters in the target teacher model to each sub-network in the initial student model to obtain a transitional student model, comprising:

Quantize the floating-point first weight parameter into integer data to obtain the second weight parameter;

The second weight parameter is transferred to each sub-network in the initial student model to obtain a transitional student model.
The method according to claim 2, characterized in that, quantizing the floating-point first weight parameter into integer data to obtain the second weight parameter, comprising:

Substitute each first weight parameter into the first formula group in turn to obtain the second weight parameter corresponding to each first weight parameter;

The first formula group is as follows:

Among them, A represents the first quantization parameter, J max represents the maximum weight parameter among all the first weight parameters, J min represents the minimum weight parameter among all the first weight parameters, and α represents the maximum weight parameter in the preset integer data range value,
and
Represents rounding to the nearest integer, B represents the first preset integer value, the first preset integer value refers to the first integer value corresponding to the first floating-point weight parameter being zero, and N represents the each The first weight parameter, C represents the second weight parameter corresponding to each of the first weight parameters.
The method according to claim 1, wherein, according to the first transparency mask and the training sample set, training the transitional student model to obtain the matting model, comprising:

Quantifying the floating-point transparency of each pixel in the first transparency mask into integer data to obtain a second transparency mask;

Input the training sample into the transition student model to obtain the third transparency mask output by the transition student model;

adjusting the third weight parameter in the transition student model according to the second transparency mask and the third transparency mask;

The steps of inputting the training samples into the transitional student model to obtain the third transparency mask output by the transitional student model and subsequent steps are performed successively for each training sample in the training sample set, and the matting is obtained. Model.
The method according to claim 4, characterized in that, quantizing the floating-point transparency of each pixel in the first transparency mask into integer data to obtain the second transparency mask, comprising:

Substitute the floating-point transparency of each pixel in the first transparency mask into the second formula group to obtain a second transparency mask;

The second formula group is as follows:

Among them, D represents the second quantization parameter, K max represents the maximum transparency in the first transparency mask, K min represents the minimum transparency in the first transparency mask, α represents the maximum value in the preset integer data range,
and
Indicates rounding, E represents the second preset integer transparency, the second preset integer transparency refers to the second integer value corresponding to the floating point transparency of zero, and M represents the floating point of each pixel. Point type transparency, F represents the integer type transparency corresponding to the floating point type transparency of each pixel.
The method according to claim 4, wherein the adjusting the third weight parameter in the transition student model according to the second transparency mask and the third transparency mask comprises:

Calculate the first loss function through the first formula;

The first formula is as follows:

Among them, H represents the preset length of the composite image, M represents the preset width of the composite image, a i,j represents the first transparency of the pixel in the ith row and the jth column in the second transparency mask,
represents the second transparency of the pixel in row i and column j in the third transparency mask;

Calculate the second loss function by the second formula;

The second formula is as follows:

Among them, μ represents the first average transparency value of each pixel in the second transparency mask, μ 2 represents the square of the first transparency average value, μ * represents the second average transparency value of each pixel in the third transparency mask, and μ *2 represents the first transparency value. The square of the mean value of the two transparency, σ represents the first transparency variance of each pixel in the second transparency mask, σ2 represents the square of the first transparency variance, σ * represents the second transparency variance of each pixel in the third transparency mask, σ *2 represents the square of the second transparency variance, c 1 represents the first constant, and c 2 represents the second constant;

Calculate the third loss function by the third formula;

The third formula is as follows:

Among them, γ represents the third constant, θ i,j represents the index of the difficult pixel in the third transparency mask, and the difficult pixel refers to the pixel that cannot be processed by the transition student model, and the index is as follows:

Wherein, mn represents the range of m×n pixels adjacent to the difficult pixel point, and A i,j represents the adjacent pixel point of the unprocessable pixel point;

A third weight parameter in the transitional student model is adjusted according to the first loss function, the second loss function and the third loss function.
The method according to claim 6, wherein, adjusting the third weight parameter in the transitional student model according to the first loss function, the second loss function and the third loss function, comprising: :

Multiply the first loss function, the second loss function and the third loss function by their respective preset weights to obtain a joint loss function;

A third weight parameter in the transitional student model is adjusted according to the joint loss function.
A method for image matting, characterized in that the method comprises:

Obtain the image to be cutout, the background image, and the depth image corresponding to the image to be cutout; wherein, the image to be cutout and the background image are images collected at the same viewing position, and the image to be cutout includes a matting object, the background image does not include the matting object;

Inputting the image to be cutout, the background image and the depth image into a pretrained cutout model, the target transparency mask output by the cutout model is obtained; the cutout model is trained by the transition student model Obtained, the transition student model is obtained by migrating the first weight parameter of the target teacher model to the initial student model; the network structure complexity of the cutout model is lower than the network structure complexity of the target teacher model;

According to the target transparency mask, the cutout image corresponding to the cutout object in the image to be cutout is intercepted.
A device for matting model training, wherein the device comprises:

The first obtaining unit is used to obtain a training sample set, an initial teacher model and an initial student model; wherein, the network structure complexity of the initial student model is lower than the network structure complexity of the initial teacher model; in each training sample Including input samples and output samples; the input samples include an image to be cut out, a background image and a depth image of the image to be cut out, and the output sample includes a standard transparency mask corresponding to the image to be cut out;

a first training unit, configured to train the initial teacher model to obtain a target teacher model and a first transparency mask output by the target teacher model through the training sample set;

a migration unit, configured to respectively migrate the first weight parameter in the target teacher model to each sub-network in the initial student model to obtain a transitional student model;

The second training unit is configured to train the transitional student model to obtain the matting model according to the first transparency mask and the training sample set.
A device for image matting, characterized in that the device comprises:

The second acquiring unit is configured to acquire the image to be cut out, the background image and the depth image corresponding to the image to be cut out; wherein, the image to be cut out and the background image are images collected in the same viewing position, so The image to be cutout includes a cutout object, and the background image does not include the cutout object;

a processing unit, configured to input the image to be cutout, the background image and the depth image into a pretrained cutout model to obtain a target transparency mask output by the cutout model; the cutout model Obtained from the training of the transitional student model, the transitional student model is obtained by migrating the first weight parameter of the target teacher model to the initial student model; the network structure complexity of the cutout model is lower than the network structure complexity of the target teacher model ;

An intercepting unit, configured to intercept a cutout image corresponding to the cutout object in the to-be-cutout image according to the target transparency mask.