CN115147434A

CN115147434A - Image processing method, device, terminal equipment and computer readable storage medium

Info

Publication number: CN115147434A
Application number: CN202110342064.XA
Authority: CN
Inventors: 孟俊彪
Original assignee: Wuhan TCL Group Industrial Research Institute Co Ltd
Current assignee: Wuhan TCL Group Industrial Research Institute Co Ltd
Priority date: 2021-03-30
Filing date: 2021-03-30
Publication date: 2022-10-04

Abstract

The application is applicable to the technical field of image processing, and provides an image processing method, an image processing device, terminal equipment and a computer readable storage medium, wherein the method comprises the following steps: acquiring an image to be scratched; and inputting the image to be matte into the trained image matte model for processing, and outputting an image matte result corresponding to the image to be matte. The method and the device can effectively improve the accuracy of portrait cutout.

Description

Image processing method, image processing device, terminal equipment and computer readable storage medium

Technical Field

The present application belongs to the field of image processing technologies, and in particular, to an image processing method, an image processing apparatus, a terminal device, and a computer-readable storage medium.

Background

The portrait cutout refers to a process of separating a portrait region from a background region in an original picture, that is, blurring the background region except the portrait. With the rapid development of artificial intelligence and the continuous improvement of aesthetic requirements of people, portrait cutout is more and more important in the fields of picture decoration, such as picture beautification, picture background replacement, picture retouching and the like. At present, the portrait cutout is usually realized by inputting a ternary diagram of a picture in a cutout model, but the cutout model based on the ternary diagram cannot accurately determine a portrait area, so that the accuracy rate influencing the portrait cutout is low, and the user experience is influenced.

Disclosure of Invention

The embodiment of the application provides an image processing method, an image processing device, terminal equipment and a computer readable storage medium, which can effectively improve the accuracy of portrait matting.

In a first aspect, an embodiment of the present application provides an image processing method, including:

obtain to be treated matting an image;

and inputting the image to be subjected to matting into the trained image matting model for processing, and outputting an image matting result corresponding to the image to be subjected to matting.

The portrait matting model comprises a portrait segmentation module and a portrait regression module, wherein the portrait segmentation module and the portrait regression module are obtained by training through a training image set, and the training image set comprises an original image, a first portrait mask image and a second portrait mask image which are obtained according to the original image.

It can be seen that the original image is utilized, the first portrait mask image and the second portrait mask image obtained based on the original image are used for training the portrait matting model, the portrait matting model is guided to learn the region of interest, the portrait matting model is guided to carry out portrait matting, the portrait matting is prevented from being input through the ternary image, and therefore the accuracy of the portrait matting is improved.

In a second aspect, an embodiment of the present application provides an image processing apparatus, including:

the acquisition module is used for acquiring an image to be subjected to matting;

and the matting module is used for inputting the image to be matte into the trained portrait matting model for processing and outputting a portrait matting result corresponding to the image to be matte.

In a third aspect, an embodiment of the present application provides a terminal device, where the terminal device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the method of any one of the above first aspects when executing the computer program.

In a fourth aspect, the present application provides a computer-readable storage medium, where a computer program is stored, and when executed by a processor, the computer program implements the method of any one of the first aspect.

In a fifth aspect, embodiments of the present application provide a computer program product, which, when run on a terminal device, causes the terminal device to perform the method of any one of the above first aspects.

It is understood that the beneficial effects of the second aspect to the fifth aspect can be referred to the related description of the first aspect, and are not described herein again.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that are needed in the embodiments or prior art descriptions will be briefly described below, and it is apparent that, the drawings in the following description are only some embodiments of the application, and other drawings can be obtained by those skilled in the art without inventive effort.

Fig. 1 is a schematic flow chart of an image processing method provided in an embodiment of the present application;

FIG. 2 is a network structure diagram of a portrait matting model provided in an embodiment of the present application;

FIG. 3 is a schematic diagram of a training process of a portrait matting model according to an embodiment of the present application;

FIG. 4 is a schematic flowchart of a portrait mask process according to an embodiment of the present disclosure;

FIG. 5 is a schematic flowchart of a portrait segmentation provided in an embodiment of the present application;

FIG. 6 is a schematic flow chart of portrait matting provided in an embodiment of the present application;

fig. 7 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless otherwise specifically stated.

Referring to fig. 1, fig. 1 shows a schematic flowchart of an image processing method according to an embodiment of the present disclosure. The order of the steps in the flowchart may be changed and some steps may be omitted according to different requirements. As shown in fig. 1, the image processing method may include:

s101, the terminal equipment obtains an image to be subjected to matting.

In the embodiment of the present application, the image to be scratched is an image that needs to be segmented between a portrait area and a background area in the image.

S102, the terminal device inputs the image to be subjected to matting into the trained image matting model for processing, and outputs an image matting result corresponding to the image to be subjected to matting.

Specifically, the portrait matting model includes a portrait segmentation module and a portrait regression module, which are obtained by training a training image set, where the training image set includes an original image, and a first portrait mask image and a second portrait mask image obtained from the original image.

In the embodiment of the application, the portrait segmentation module is used for segmenting the portrait of the image to obtain the portrait area. The portrait regression model is used for carrying out portrait matting on the portrait area output by the portrait segmentation module to obtain a matting area. Specifically, as shown in fig. 2, fig. 2 is a network structure of a portrait matting model provided in an embodiment of the present application. Wherein, output1 in fig. 2 is the result of segmenting the portrait, and output2 is the result of matting the portrait.

Further, before inputting the image to be subjected to matting into the trained portrait matting model for processing, the portrait matting model needs to be trained to ensure the portrait identification accuracy of the portrait matting model.

Referring to fig. 3, fig. 3 shows a schematic flowchart of training a portrait matting model according to an embodiment of the present application. As shown in fig. 3, before the image to be matted is obtained, the method further includes:

s301, the terminal equipment acquires an original image and performs portrait mask processing on the original image to obtain a first portrait mask image;

s302, the terminal equipment processes the first portrait mask image to obtain a second portrait mask image;

s303, the terminal equipment inputs the original image into an untrained image matting model, and the original image is segmented through an image segmentation module to obtain an image segmentation result;

s304, the terminal equipment acquires a first feature map acquired by the portrait segmentation module, and processes the original image and the first feature map by using the portrait regression module to obtain a training matting result;

s305, the terminal device calculates the training loss of the untrained image matting model according to the image segmentation result, the training matting result, the first image mask image and the second image mask image;

s306, the terminal equipment judges whether the training loss meets a preset condition;

s307, when the training loss does not meet the preset condition, the terminal equipment adjusts the model parameters of the untrained image matting model and returns to execute the step of inputting the original image to the untrained image matting model and the subsequent steps;

and S308, when the training loss meets the preset condition, the terminal equipment obtains the trained portrait matting model.

Referring to fig. 4, fig. 4 is a schematic flow chart illustrating a portrait mask process according to an embodiment of the present disclosure. As shown in fig. 4, the performing, by the terminal device, a portrait mask process on the original image to obtain a first portrait mask image may include:

s401, carrying out portrait segmentation processing on an original image by the terminal equipment to obtain a segmented image;

s402, the terminal equipment determines a ternary region of the segmented image;

and S403, carrying out image matting processing on the ternary region by the terminal equipment to obtain a first portrait mask image.

In an alternative embodiment, the original image segmentation process may be implemented by a currently known image segmentation model, such as the depllabv 3+ model.

In an alternative embodiment, the terminal device determining the ternary region of the segmented image may include: carrying out corrosion operation on the segmented image to obtain a foreground region of the segmented image; performing expansion operation on the segmented image to obtain a background area of the segmented image; determining a transition region of the segmented image according to the foreground region and the background region; the foreground region, the background region, and the transition region are processed (e.g., summarized) to obtain a ternary region of the segmented image. The foreground region refers to a portrait region of the segmented image, the background region refers to a non-portrait region of the segmented image, and the transition region refers to an uncertain region of the segmented image, i.e., a region uncertain as a portrait or a non-portrait. It should be noted that the erosion operation and the dilation operation belong to the current mature technologies and are not further described herein.

In an alternative embodiment, the Matting process for the ternary regions can be implemented by a currently known Matting model, such as Deep Image Matting. Further, another embodiment of the present application further includes: and deleting the first portrait mask image with non-fine portrait from the first portrait mask image, namely deleting the first portrait mask image with a non-portrait area so as to ensure the supervised learning effect of subsequent model training and improve the portrait identification accuracy of the model.

The embodiment of the application is based on portrait mask processing, so that the reliability of subsequent training data for training the model can be ensured, the processing process of the model to the data in the early stage is reduced, the speed and the accuracy of subsequent model training can be improved, and the generalization of the model training is enhanced.

In an optional embodiment of the present application, the processing, by the terminal device, the first portrait mask image to obtain a second portrait mask image includes: the terminal equipment zooms the first portrait mask image to obtain a zoomed image; and the terminal equipment performs binarization processing on the zoomed image to obtain a second portrait mask image.

The scaling is to scale the first portrait mask image according to a certain ratio, for example, scale the size of the first portrait mask image to 1/16 of the first portrait mask image to obtain a scaled image. In the embodiment of the application, the zoomed image can be binarized according to the preset pixel point threshold value, that is, the pixel value of the pixel point of which the pixel value is greater than or equal to the pixel point threshold value in the zoomed image can be set to 0, and the pixel value of the pixel point of which the pixel value is less than the pixel point threshold value is set to 255. Here, the pixel threshold may be set to 15 to set a smaller pixel threshold to increase the recall rate of subsequent model training.

Referring to the figure 5 of the drawings, in which, fig. 5 shows a schematic flowchart of portrait segmentation provided by an embodiment of the present application. As shown in fig. 5, the terminal device performs segmentation processing on the original image through the portrait segmentation module to obtain a portrait segmentation result, which may include:

s501, the terminal device utilizes a portrait segmentation module to extract features of an original image to obtain a first feature map;

s502, the terminal device conducts convolution operation on the first feature map to obtain a portrait segmentation result.

In an alternative embodiment, as shown in fig. 2, the portrait segmentation module may include a first segmentation unit, a second segmentation unit, a third segmentation unit, a fourth segmentation unit, a fifth segmentation unit and a sixth segmentation unit, which are connected in sequence. Wherein, the first and second dividing units may include four sequentially connected convolution layers (conv), batch normalization layer (bn) and active layer (e.g. based on modified linear unit relu). The third segmentation unit may include six sets of a convolution layer, a batch normalization layer, and a relu activation layer, which are sequentially connected. The fourth segmentation unit may include three sets of a convolution layer, a batch normalization layer, and a relu activation layer, which are sequentially connected. The fifth segmentation unit may include an empty space convolutional pooling pyramid (ASSP) layer. The sixth partition unit may include a convolution layer.

Specifically, the terminal device may sequentially process the original image by using the first segmentation unit, the second segmentation unit, the third segmentation unit, and the fourth segmentation unit, wherein each segmentation unit may first perform convolution feature extraction on the original image by using each group of convolution layers to obtain an initial feature map, then perform normalization processing on the initial feature map by using a batch normalization layer to obtain a standard feature map, and finally process the standard feature map by using a relu activation layer to obtain an initial first feature map. Subsequently, the terminal device may perform multi-scale feature fusion on the initial first feature map by using the ASSP layer in the fifth segmentation unit to obtain a final first feature map. Finally, the terminal device can perform convolution processing on the final first feature map through the convolution layer in the sixth segmentation unit to obtain a portrait segmentation result. The convolution, normalization, activation and feature fusion belong to the current mature model training process, and are not further described here.

For example, please refer to the network structure of the image segmentation module 201 shown in fig. 2. The portrait segmentation module 201 receives original images of w, h, and 3, where w is width, h is height, and 3 is the number of channels, and the following expressions are similar. After the original image is processed by the first segmentation unit, first feature maps of w/2, h/2 and 64 can be obtained and input to the second segmentation unit. After the first characteristic diagrams of w/2, h/2 and 64 are processed by the second segmentation unit, the first characteristic diagrams of w/4, h/4 and 128 can be obtained and input into the third segmentation unit. After the first characteristic maps of w/4, h/4 and 128 are processed by the third segmentation unit, the first characteristic maps of w/8, h/8 and 256 can be obtained and input into the fourth segmentation unit. After the first characteristic maps of w/8, h/8 and 256 are processed by the fourth segmentation unit, the first characteristic maps of w/16, h/16 and 512 can be obtained and input into the fifth segmentation unit. After the first characteristic maps of w/16, h/16 and 512 are processed by the fifth segmentation unit, the first characteristic maps of w/16, h/16 and 256 can be obtained and input into the sixth segmentation unit. The sixth segmentation unit performs convolution operation on the first feature maps of w/16, h/16 and 256, and finally outputs single-channel portrait segmentation results (w/16, h/16 and 1).

Referring to fig. 6, fig. 6 shows a schematic flow chart of a portrait cutout provided by an embodiment of the present application. As shown in fig. 6, the processing, by the terminal device, the original image and the first feature map by using the portrait regression module to obtain a training matting result may include:

s601, the terminal device utilizes a portrait regression module to perform upsampling on the first feature map to obtain a second feature map, and the second feature map is spliced with the original image to obtain a spliced image;

s602, fusing the spliced images by the terminal equipment to obtain fused images;

and S603, the terminal equipment obtains a training matting result according to the fusion image.

The upsampling refers to sampling the feature map to a specified resolution, for example, after an original image (416, 3) is subjected to a series of convolution pooling operations, a feature map (13, 16) is obtained, and in order to compare the feature map with the corresponding original image, the feature map needs to be changed to (416, 3), which is called upsampling. Further, in order to better understand the feature semantic information of the second feature map, the embodiment of the application splices the second feature map and the original image to obtain a spliced image. In an alternative embodiment, the upsampling may be implemented by a currently known linear interpolation algorithm, and the stitching may be implemented by a currently known stitching algorithm, such as a Surf (speedup Robust Features) algorithm.

In an alternative embodiment, the fusing operation may include: convolution, batch normalization, and activation operations (e.g., relu activation). The operations of convolution, batch normalization and activation may refer to the feature extraction of the original image, and are not described herein.

As shown in fig. 2, the terminal device may obtain the training matting result by performing convolution, activation (e.g., activation using a sigmoid function) and other processes on the fused image.

For example, referring to the network structure of the image regression module 202 shown in fig. 2, the image regression module 202 may include a first regression unit, a second regression unit, a third regression unit, a fourth regression unit, a fifth regression unit, and a sixth regression unit, which are connected in sequence. The first regression unit may include an upsampling layer, and the second, third and fourth regression units may include a convolutional layer, a batch normalization layer, a relu activation layer and an upsampling layer. The fifth regression unit may include four sets of a convolution layer, a batch normalization layer, and a relu activation layer, which are sequentially connected. The sixth regression unit may include a convolution layer and a sigmoid activation layer. The first regression unit can be connected with a fifth segmentation unit in the portrait segmentation module and is used for acquiring the first characteristic diagram output by the fifth segmentation unit. The second regression unit may be connected to a third segmentation unit in the portrait segmentation module, and is configured to obtain the first feature map output by the third segmentation unit. The third regression unit may be connected to the second segmentation unit in the portrait segmentation module, and is configured to obtain the first feature map output by the second segmentation unit. The fourth regression unit may be connected to the first segmentation unit in the portrait segmentation module, and is configured to obtain the first feature map output by the first segmentation unit. A fifth regression unit may be used to obtain the original image.

Specifically, the first regression unit of the portrait regression module 201 may obtain the first feature maps of w/16, h/16, and 256 output by the fifth segmentation unit, perform upsampling on the first feature maps of w/16, h/16, and 256 to obtain intermediate feature maps of w/8, h/8, and 256, and input the intermediate feature maps to the second regression unit. The second regression unit can also acquire the first feature maps of w/8, h/8 and 256 output by the third segmentation unit, and splice the first feature maps of w/8, h/8 and 256 and the intermediate feature maps of w/8, h/8 and 256, then carry out convolution, batch normalization, activation and upsampling processing to obtain the intermediate feature maps of w/4, h/4 and 384, and input the intermediate feature maps into the third regression unit. The third regression unit can also acquire the first feature maps of w/4, h/4 and 128 output by the second segmentation unit, and splice the first feature maps of w/4, h/4 and 128 and the intermediate feature maps of w/4, h/4 and 384, then carry out convolution, batch normalization, activation and upsampling processing to obtain the intermediate feature maps of w/2, h/2 and 128, and input the intermediate feature maps into the fourth regression unit. The fourth regression unit can also acquire the first feature maps of w/2, h/2 and 64 output by the first segmentation unit, splice the first feature maps of w/2, h/2 and 64 and the intermediate feature maps of w/2, h/2 and 128, then carry out convolution, batch normalization, activation and upsampling processing to obtain the second feature maps of w, h and 67, and input the second feature maps into the fifth regression unit. The fifth regression unit can splice the original image and the second feature map to obtain a spliced image, then fuse the spliced image through convolution, batch normalization, relu activation and the like to obtain a fused image of w, h and 64, and input the fused image into the sixth regression unit. The sixth regression unit can process the fused image through convolution and sigmoid activation to obtain training matting results of w, h and 1. The high-level information and the shallow information in the characteristic diagram are fused by connecting the portrait segmentation module with the portrait regression module, and the matting effect of portrait matting is improved.

In an alternative embodiment of the present application, calculating a training loss of the untrained image matting model from the image segmentation result, the training matting result, the first image mask image and the second image mask image may comprise: calculating a first training loss of the portrait segmentation module according to the portrait segmentation result and the second portrait mask image; calculating a second training loss of the portrait regression module according to the training matting result and the first portrait mask image; and calculating the training loss of the untrained image matting model according to the first training loss and the second training loss.

In an optional embodiment, the terminal device calculates a first training loss of the portrait segmentation module according to the portrait segmentation result and the second portrait mask image, and the method comprises the following steps:

the terminal equipment inputs the portrait segmentation result and the second portrait mask image into a training loss formula for calculation to obtain a first training loss of the portrait segmentation module;

wherein, the training loss formula may be:

LC＝m _g logm _p +(1-m _g )log(1-m _p )

LC represents the first training loss of the portrait segmentation module, m _g Pixel value, m, representing the g-th pixel of the second portrait mask image _p And the pixel value of the p-th pixel point representing the portrait segmentation result.

In an alternative embodiment, calculating a second training loss for the portrait regression module based on the training matte result and the first portrait mask image may comprise: calculating a second training loss for the human regression module according to the following formula:

L1＝|α _p -α _g |

wherein L1 represents a second training loss of the human regression module, α _g Representing a first portrait mask image, alpha _p Training the sectional drawing result.

In an alternative embodiment, calculating a training loss for the untrained image matting model based on the first training loss and the second training loss may comprise: and adding the first training loss and the second training loss to obtain a training loss L of the untrained image matting model, namely L = L1+ LC.

Further, in order to better identify the portrait area of the original image, after calculating a training loss of the untrained portrait matting model according to the first training loss and the second training loss, the embodiment of the application may further include: and the weight proportion lost by the first training of the portrait segmentation module is reduced, and the portrait regression module is mainly learned. If the weight ratio of the first training loss is set to 0.1, the training loss for obtaining the untrained image matting model is as follows: l = L1+0.1 × LC, where L1 represents the second training loss of the portrait regression module and LC represents the first training loss of the portrait segmentation module.

In an alternative embodiment of the present application, the predetermined condition is that the training loss is less than a predetermined loss threshold. That is, when the training loss is smaller than the preset loss threshold, it indicates that the training loss satisfies the preset condition. When the training loss is greater than or equal to the loss threshold, the training loss is not satisfied with the preset condition. The preset loss threshold may be set according to an actual scene. It can be understood that the parameter adjustment of the portrait matting model can be implemented by a currently known stochastic gradient descent algorithm, which is not further described herein.

The embodiment of the application is based on the training of the portrait matting model, the portrait segmentation module and the portrait regression module can be well combined, the model is guided to learn the region of interest of the portrait by adopting a regression algorithm on the basis of portrait classification, and the portrait information of the transition region is recognized, so that the accuracy rate of portrait matting can be improved.

Referring to fig. 7, fig. 7 is a schematic structural diagram illustrating an image processing apparatus according to an embodiment of the present disclosure.

The image processing apparatus provided by the embodiment of the application can be installed in a terminal device. Depending on the implemented functionality, the image processing apparatus may comprise an acquisition module 701 and a matting module 702. The modules in the embodiments of the present application, which may also be referred to as units, refer to a series of computer program segments that can be executed by a processor of a terminal device and can perform a fixed function, and are stored in a memory of the terminal device.

In the embodiments of the present application, the functions of the respective modules/units are as follows:

an obtaining module 701, configured to obtain an image to be subjected to matting;

and the matting module 702 is configured to input the image to be matte into the trained image matting model for processing, and output an image matting result corresponding to the image to be matte.

In detail, in the embodiment of the present application, each module in the image processing apparatus adopts the same technical means as the image processing method in fig. 1 to 6 when in use, and can produce the same technical effect, and details are not repeated here.

Referring to fig. 8, fig. 8 is a schematic structural diagram of a terminal device according to an embodiment of the present disclosure. As shown in fig. 8, the terminal device 8 of this embodiment may include: at least one processor 80, a memory 81, and a computer program 82 stored in the memory 81 and executable on the at least one processor 80, the steps in any of the various image processing method embodiments described above being implemented when the computer program 82 is executed by the processor 80.

The terminal device 8 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing device. The terminal device 8 may include, but is not limited to, a processor 80, a memory 81. Those skilled in the art will appreciate that fig. 8 is merely an example of the terminal device 8, and does not constitute a limitation of the terminal device 8, and may include more or less components than those shown, or combine some components, or different components, such as an input-output device, a network access device, and the like.

The Processor 80 may be a Central Processing Unit (CPU), and the Processor 80 may also be other general purpose processors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), off-the-shelf Programmable Gate arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 81 may in some embodiments be an internal storage unit of the terminal device, such as a hard disk or a memory of the terminal device. The memory 81 may be an external storage device of the terminal device 8 in other embodiments, such as a plug-in hard disk provided on the terminal device 8, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the memory 81 may also include both an internal storage unit of the terminal device and an external storage device. The memory 81 is used for storing an operating system, an application program, a BootLoader (BootLoader), data, and other programs, such as a program code of the computer program 82. The memory 81 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. For the specific working processes of the units and modules in the system, reference may be made to the corresponding processes in the foregoing method embodiments, which are not described herein again.

The embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the steps in the foregoing method embodiments may be implemented.

The embodiments of the present application provide a computer program product, which when running on a terminal device, enables the terminal device to implement the steps in the above method embodiments when executed.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable storage medium may include at least: any entity or device capable of carrying computer program code to the apparatus/terminal device, recording medium, computer memory, read-only memory (ROM), random Access Memory (RAM), electrical carrier wave signals, telecommunication signals, and software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable storage media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and proprietary practices.

In the above embodiments, the description of each embodiment has its own emphasis, and reference may be made to the related description of other embodiments for parts that are not described or recited in any embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, a module or a unit may be divided into only one logical function, and may be implemented in other ways, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included within the scope of the present application.

Claims

1. An image processing method, characterized by comprising:

acquiring an image to be scratched;

and inputting the image to be scratched into a trained image scratching model for processing, and outputting an image scratching result corresponding to the image to be scratched.

2. The method of claim 1, wherein the portrait matting model comprises a portrait segmentation module and a portrait regression module, the portrait segmentation module and the portrait regression module being trained from a training image set, the training image set comprising an original image and a first portrait mask image and a second portrait mask image derived from the original image.

3. A method as recited in claim 2, wherein prior to the obtaining an image to be matted, the method further comprises:

acquiring the original image, and performing portrait mask processing on the original image to obtain a first portrait mask image;

processing the first portrait mask image to obtain a second portrait mask image;

inputting the original image into an untrained portrait cutout model, and segmenting the original image through the portrait segmentation module to obtain a portrait segmentation result;

acquiring a first characteristic diagram acquired by the portrait segmentation module;

processing the original image and the first feature map by using the portrait regression module to obtain a training sectional image result;

calculating a training loss for the untrained image matting model from the image segmentation result, the training matting result, the first image mask image and the second image mask image;

when the training loss does not meet the preset condition, adjusting the model parameters of the untrained image matting model, and returning to execute the step of inputting the original image to the untrained image matting model and the subsequent steps;

and when the training loss meets the preset condition, obtaining a trained portrait sectional model.

4. The method of claim 3, wherein the subjecting the original image to the portrait mask processing to obtain the first portrait mask image comprises:

performing portrait segmentation processing on the original image to obtain a segmented image;

and determining a ternary region of the segmented image, and performing matting processing on the ternary region to obtain the first portrait mask image.

5. The method of claim 4, wherein determining the ternary regions of the segmented image comprises:

carrying out corrosion operation on the segmented image to obtain a foreground region of the segmented image;

performing expansion operation on the segmented image to obtain a background area of the segmented image;

determining a transition region of the segmented image according to the foreground region and the background region;

and processing the foreground region, the background region and the transition region to obtain a ternary region of the segmentation image.

6. The method of claim 3, wherein said processing the first portrait mask image to obtain the second portrait mask image comprises:

zooming the first portrait mask image to obtain a zoomed image;

the scaled image is subjected to a binarization process, and obtaining the second portrait mask image.

7. A method according to any one of claims 3-6 wherein the calculating a training loss for the untrained image matting model from the image segmentation results, the training matting results, the first image mask image, and the second image mask image comprises:

calculating a first training loss of the portrait segmentation module according to the portrait segmentation result and the second portrait mask image;

calculating a second training loss of the portrait regression module according to the training matte result and the first portrait mask image;

calculating a training loss of the untrained image matting model according to the first training loss and the second training loss.

8. The method of claim 7, wherein calculating a first training loss for the portrait segmentation module based on the portrait segmentation result and the second portrait mask image comprises:

inputting the portrait segmentation result and the second portrait mask image into a training loss formula for calculation to obtain a first training loss of the portrait segmentation module;

wherein the training loss formula is:

LC＝m _g logm _p +(1-m _g )log(1-m _p )

LC represents a first training loss of the portrait segmentation module, m _g A g-th pixel value, m, representing the second portrait mask image _p A p-th pixel value representing the result of the segmentation of the portrait.

9. The method of claim 8, wherein the processing the original image and the first feature map with the image regression module to obtain a training matte result comprises:

utilizing the portrait regression module to perform upsampling on the first feature map to obtain a second feature map;

splicing the second characteristic graph and the original image to obtain a spliced image;

fusing the spliced images to obtain fused images;

and obtaining a training sectional image result according to the fusion image.

10. An image processing apparatus characterized by comprising:

and the matting module is used for inputting the image to be matte to a trained portrait matting model for processing and outputting a portrait matting result corresponding to the image to be matte.

11. A terminal device, characterized in that the terminal device comprises a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the method according to any of claims 1 to 9 when executing the computer program.

12. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method according to any one of claims 1 to 9.