WO2024007135A1 - 图像处理方法、装置、终端设备、电子设备及存储介质 - Google Patents

图像处理方法、装置、终端设备、电子设备及存储介质 Download PDF

Info

Publication number
WO2024007135A1
WO2024007135A1 PCT/CN2022/103760 CN2022103760W WO2024007135A1 WO 2024007135 A1 WO2024007135 A1 WO 2024007135A1 CN 2022103760 W CN2022103760 W CN 2022103760W WO 2024007135 A1 WO2024007135 A1 WO 2024007135A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
segmented
model
segmentation model
segmentation
Prior art date
Application number
PCT/CN2022/103760
Other languages
English (en)
French (fr)
Inventor
王利鸣
葛运航
王晓涛
雷磊
Original Assignee
北京小米移动软件有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京小米移动软件有限公司 filed Critical 北京小米移动软件有限公司
Priority to PCT/CN2022/103760 priority Critical patent/WO2024007135A1/zh
Priority to CN202280004251.XA priority patent/CN117651972A/zh
Publication of WO2024007135A1 publication Critical patent/WO2024007135A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation

Definitions

  • the present disclosure relates to the field of image processing technology, and in particular, to an image processing method, device, terminal equipment, electronic equipment and storage medium.
  • the image segmentation result can be obtained by performing image segmentation on the preprocessed original image through a trained model.
  • the model outputs a fusion result obtained by merging the hard segmentation decoding result and the soft segmentation decoding result.
  • the quality of image segmentation results is mediocre due to only classifying images and lack of transparency effects.
  • the present disclosure provides an image processing method, device, terminal equipment, electronic equipment and storage medium to at least solve the problem in the related art that the quality of image segmentation results is average due to only classifying images and lacking transparency effects.
  • an image processing method including: acquiring an image to be segmented; inputting the image to be segmented into a segmentation model to obtain a segmented image output by the segmentation model; according to the segmentation After the image is cut out, the image to be segmented is processed to obtain a target image; the target image is an image of the area where the target body is located in the image to be segmented; wherein the segmentation model is based on at least one of the target bodies.
  • the first dimension information is obtained by training, and the synchronization model is obtained by training based on at least one second dimension information of the target body.
  • inputting the image to be segmented into a segmentation model to obtain a segmented image output by the segmentation model includes: after inputting the image to be segmented into the segmentation model, The first feature extraction network of the segmentation model generates output information, and the output information of the first feature extraction network is input to the first branch network and the second branch network of the segmentation model. Based on the first branch network The output information and the output information of the second branch network are used to obtain the segmented image; wherein the first branch network corresponds to at least one first dimension information of the target volume, and the second branch network corresponds to At least one second dimensional information of the target volume.
  • obtaining the segmented image based on the output information of the first branch network and the output information of the second branch network includes: the segmentation model converts the first branch
  • the output information of the network, the output information of the second branch network, and the output information of the first feature extraction network are input to the fusion network of the segmentation model, and the fusion network of the segmentation model outputs the segmented image.
  • the fusion network of the segmentation model includes multiple convolutional layers.
  • the synchronization model includes a second feature extraction network and a third branch network corresponding to at least one second dimensional information of the target volume; when training the segmentation model During the process, after the second training sample is input to the synchronization model, it is input to the third branch network through the second feature extraction network; the second feature extraction network has the same structure as the first feature extraction network and The model parameters are the same, and/or the second branch network and the third branch network have the same structure and the same model parameters; wherein the second training sample is an image marked with at least one second dimensional information of the target body. .
  • the segmentation in the process of training the segmentation model, is adjusted based on a first loss function determined by the segmentation model and a second loss function determined by the synchronization model.
  • Model parameters of the model wherein the first loss function is determined based on the first training sample input to the segmentation model; the second loss function is determined based on the second training sample input to the synchronization model; the third loss function is determined based on the second training sample input to the synchronization model.
  • a training sample is an image marked with at least one first-dimensional information of the target body.
  • adjusting the model parameters of the segmentation model based on the first loss function determined by the segmentation model and the second loss function determined by the synchronization model includes: calculating the The sum of the first loss function and the second loss function is used to obtain a total loss function; gradient parameters are calculated according to the total loss function; and model parameters of the segmentation model are adjusted according to the gradient parameters.
  • adjusting the model parameters of the segmentation model based on the first loss function determined by the segmentation model and the second loss function determined by the synchronization model includes: according to the The first loss function and the second loss function calculate gradient parameters; and adjust model parameters of the segmentation model according to the gradient parameters.
  • the image processing method further includes: acquiring a background image; and fusing the target image with the background image to obtain a fused image.
  • an image processing device including: a first acquisition module configured to acquire an image to be segmented; an input module configured to input the image to be segmented into a segmentation model, Obtain the segmented image output by the segmentation model; a processing module configured to perform matting processing on the image to be segmented according to the segmented image to obtain a target image; the target image is the image to be segmented The image of the area where the target body is located; wherein the segmentation model is trained based on at least one first dimensional information of the target body and a synchronization model; the synchronization model is based on at least one second dimension information of the target body. Dimensional information is trained.
  • the input module is further configured to: after inputting the image to be segmented into the segmentation model, generate output information by the first feature extraction network of the segmentation model, and The output information of the first feature extraction network is input to the first branch network and the second branch network of the segmentation model, and based on the output information of the first branch network and the output information of the second branch network, the said Segmented image; wherein the first branch network corresponds to at least one first dimensional information of the target volume, and the second branch network corresponds to at least one second dimensional information of the target volume.
  • the input module is further configured to perform: the segmentation model combines the output information of the first branch network, the output information of the second branch network, and the first The output information of the feature extraction network is input to the fusion network of the segmentation model, and the fusion network of the segmentation model outputs the segmented image.
  • the fusion network of the segmentation model includes multiple convolutional layers.
  • the synchronization model includes a second feature extraction network and a third branch network corresponding to at least one second dimensional information of the target volume; when training the segmentation model During the process, after the second training sample is input to the synchronization model, it is input to the third branch network through the second feature extraction network; the second feature extraction network has the same structure as the first feature extraction network and The model parameters are the same, and/or the second branch network and the third branch network have the same structure and the same model parameters; wherein the second training sample is an image marked with at least one second dimensional information of the target body. .
  • the segmentation in the process of training the segmentation model, is adjusted based on a first loss function determined by the segmentation model and a second loss function determined by the synchronization model.
  • Model parameters of the model wherein the first loss function is determined based on the first training sample input to the segmentation model; the second loss function is determined based on the second training sample input to the synchronization model; the third loss function is determined based on the second training sample input to the synchronization model.
  • a training sample is an image marked with at least one first-dimensional information of the target body.
  • adjusting the model parameters of the segmentation model based on the first loss function determined by the segmentation model and the second loss function determined by the synchronization model includes: calculating the The sum of the first loss function and the second loss function is used to obtain a total loss function; gradient parameters are calculated according to the total loss function; and model parameters of the segmentation model are adjusted according to the gradient parameters.
  • the at least one first dimensional information of the target body includes: the transparency of each pixel in the image of the area where the target body is located; the at least one second dimensional information of the target body It includes: among the pixels of the image to be segmented, the pixels of the image belonging to the area where the target object is located.
  • the image processing device further includes: a second acquisition module configured to acquire a background image; and a fusion module configured to fuse the target image with the background image. , get the fused image.
  • a terminal device including: the image processing apparatus described in the second aspect of the embodiment of the present disclosure.
  • an electronic device including: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured to execute the instructions to Implement the method described in the first aspect of the embodiment of the present disclosure.
  • a computer-readable storage medium which when instructions in the computer-readable storage medium are executed by a processor of an electronic device, enables the electronic device to execute the steps of the first embodiment of the present disclosure. The method described in one aspect.
  • the segmented image is obtained by inputting the image to be segmented into the segmentation model, and the target image is obtained by matting the image to be segmented based on the segmented image. Since the segmented image provides It obtains the transparency information of each pixel in the image, thereby improving the quality of the image segmentation results.
  • Figure 1 is a flow chart of an image processing method according to an exemplary embodiment
  • Figure 2 is a schematic diagram of the image to be segmented
  • Figure 3 is a schematic diagram of the segmented image
  • Figure 4 is a schematic diagram of the target image
  • Figure 5 is a flow chart of another image processing method according to an exemplary embodiment
  • Figure 6 is a schematic structural diagram of the segmentation model
  • Figure 7 is another structural diagram of the segmentation model
  • Figure 8 is a schematic structural diagram of the synchronization model
  • Figure 9 is a schematic structural diagram of the image processing model
  • Figure 10 is a schematic diagram of the background image
  • Figure 11 is a schematic diagram of the fused image
  • Figure 12 is a block diagram of an image processing device according to an exemplary embodiment
  • Figure 13 is a block diagram of a terminal device according to an exemplary embodiment
  • FIG. 14 is a block diagram of an electronic device according to an exemplary embodiment.
  • Figure 1 is a flow chart of an image processing method according to an exemplary embodiment. As shown in Figure 1, the image processing method according to the embodiment of the present disclosure may include the following steps.
  • the execution subject of the image processing method in the embodiment of the present disclosure may be a user terminal device.
  • User terminal devices may specifically include but are not limited to mobile phones, tablet computers, notebooks, desktop computers, vehicle-mounted terminals, smart home appliances, etc.
  • the image processing method according to the embodiment of the present disclosure can be executed by the image processing device according to the embodiment of the present disclosure.
  • the image processing device according to the embodiment of the present disclosure can be configured in any user terminal device to execute the image processing method according to the embodiment of the present disclosure.
  • the image to be segmented is the original image obtained on the computing platform and waiting for image segmentation.
  • the image to be segmented contains the imaging of an object. As shown in Figure 2, it is an image to be segmented about a horse. , to obtain the image to be segmented for subsequent processing.
  • the computing platform can be a mobile phone, a computer, a system on a chip (SOC), etc. This disclosure takes a mobile phone as an example for description, but does not limit the type of computing platform. For example, objects can be photographed through a mobile phone. Get the corresponding image to be segmented.
  • S102 input the image to be segmented into the segmentation model, and obtain the segmented image output by the segmentation model.
  • the segmentation model is a pre-trained model for segmenting the image to be segmented.
  • the segmented image is the image obtained after the image to be segmented has been segmented by the segmentation model.
  • the image to be segmented obtained in step S101 is input.
  • Segmentation model you can get the segmented image output by the segmentation model. For example, as mentioned in the above example, if the image to be segmented shown in Figure 2 is input into the segmentation model, the corresponding segmented image can be obtained, as shown in Figure 3.
  • S103 Perform matting processing on the image to be segmented based on the segmented image to obtain the target image.
  • the target body is an object waiting to be segmented in the image to be segmented.
  • the horse in Figure 2 is the object corresponding to the target body
  • the target image is the image of the area where the target body is located in the image to be segmented, such as , the image of the area where the horse is located in Figure 2 is the image corresponding to the target image.
  • the image to be segmented obtained in step S101 is cut out to obtain the target image, as shown in Figure 4.
  • the segmentation model is trained based on at least one first dimensional information of the target volume
  • the synchronization model is trained based on at least one second dimensional information of the target volume.
  • at least one first dimensional information of the target body includes: the transparency of each pixel in the image of the area where the target body is located
  • at least one second dimensional information of the target body includes: the pixels of the image to be segmented belong to the area where the target body is located.
  • the pixel segmentation processing of the image includes semantic hard segmentation processing and transparency soft segmentation processing.
  • the hard segmentation processing can obtain the pixel classification result of whether each pixel in the corresponding image belongs to a certain type of object
  • the soft segmentation processing can obtain the corresponding pixel segmentation processing.
  • Each pixel in the edge part of the image belongs to the transparency classification result of a certain type of object
  • the image processing method provided by the embodiment of the present disclosure acquires an image to be segmented, inputs the image to be segmented into a segmentation model, obtains a segmented image output by the segmentation model, and performs matting processing on the image to be segmented based on the segmented image to obtain a target image.
  • the present disclosure obtains a segmented image by inputting the image to be segmented into a segmentation model, and performs matting processing on the image to be segmented based on the segmented image to obtain a target image. Since the segmented image provides transparency information for each pixel in the image, the image quality is improved. Quality of segmentation results.
  • FIG. 5 is a flow chart of another image processing method according to an exemplary embodiment. As shown in FIG. 5 , the image processing method according to the embodiment of the present disclosure may include the following steps.
  • step S501 in this embodiment is the same as step S101 in the above embodiment, and will not be described again here.
  • Step S102 in the above embodiment "Input the image to be segmented into the segmentation model and obtain the segmented image output by the segmentation model” may specifically include the following step S502.
  • the first feature extraction network of the segmentation model After the image to be segmented is input into the segmentation model, the first feature extraction network of the segmentation model generates output information, and the output information of the first feature extraction network is input to the first branch network and the second branch network of the segmentation model. Based on The output information of the first branch network and the output information of the second branch network obtain the segmented image.
  • segmentation processing includes semantic hard segmentation processing and transparency soft segmentation processing.
  • the hard segmentation processing can obtain the pixel classification result of whether each pixel in the corresponding image belongs to a certain type of object.
  • the soft segmentation processing The transparency classification result of each pixel in the object edge part of the corresponding image belonging to a certain type of object can be obtained.
  • the first feature extraction network refers to a network that extracts features of the image to be segmented.
  • the first feature extraction network of the segmentation model Generate output information, input the output information of the first feature extraction network to the first branch network and the second branch network of the segmentation model, and obtain the segmented image based on the output information of the first branch network and the output information of the second branch network.
  • the first branch network corresponds to at least one first dimension information of the target body, that is, the first branch network corresponds to the transparency of each pixel in the image of the area where the target body is located, that is, corresponds to the soft segmentation result
  • the second branch network corresponds to At least one second dimensional information about the target volume, that is, the second branch network corresponds to the pixels of the image belonging to the area where the target volume is located among the pixels of the image to be segmented, that is, corresponding to the hard segmentation result.
  • the first feature extraction network may be an encoder
  • the first branch network may be a soft segmentation decoder
  • the second branch network may be a hard segmentation decoder.
  • the present disclosure does not limit the specific method of obtaining the segmented image based on the output information of the first branch network and the output information of the second branch network, and it can be set according to the actual situation.
  • the segmentation model inputs the output information of the first branch network, the output information of the second branch structure network, and the output information of the first feature extraction network into the fusion network of the segmentation model.
  • the fusion network outputs the segmented image, as shown in Figure 7.
  • the fusion network of the segmentation model includes multiple convolution layers, that is, the output information of the first branch network, the output information of the second branch structure network, and the output information of the first feature extraction network are subjected to convolution processing to obtain the fusion result, which is the corresponding Image after segmentation.
  • the corresponding segmented image can be obtained through a set of convolutions.
  • the fusion network can be a fusion decoder.
  • the segmentation model is trained based on at least one first dimensional information of the target volume
  • the synchronization model is trained based on at least one second dimensional information of the target volume.
  • the synchronization model is further described, as shown in Figure 8, which is a schematic structural diagram of the synchronization model.
  • the synchronization model includes a second feature extraction network and a third branch network corresponding to at least one second dimensional information of the target body. That is, the third branch network corresponds to the pixels of the image belonging to the area where the target body is located among the pixels of the image to be segmented, that is, it corresponds to the hard segmentation result.
  • the second training sample is an image marked with at least one second dimension information of the target body.
  • the second training sample is input to the synchronization model, it is input to the third feature extraction network through the second feature extraction network.
  • the second feature extraction network has the same structure and the same model parameters as the first feature extraction network, and/or the second branch network has the same structure and the same model parameters as the third branch network.
  • the model parameters of the segmentation model are adjusted based on the first loss function determined by the segmentation model and the second loss function determined by the synchronization model.
  • the first loss function is determined based on the first training sample input to the segmentation model
  • the second loss function is determined based on the second training sample input to the synchronization model
  • the first training sample is at least one first dimension of the target body marked Informational images.
  • the sum of the first loss function and the second loss function is calculated to obtain the total loss function
  • the gradient parameters are calculated according to the total loss function
  • the model parameters of the segmentation model are adjusted according to the gradient parameters. It should be noted that the calculated gradient parameters are simultaneously transmitted to the segmentation model and the synchronization model to adjust the parameters of the segmentation model and the synchronization model.
  • Figure 9 is a schematic diagram of an image processing model.
  • the image processing model consists of an upper and lower model, corresponding to two tasks, namely Task 1 and Task 2. Only the lower model is used when performing image processing. , and the upper model and the lower model are used when training the model.
  • the upper model is composed of the encoder (Backbone) and the hard segmentation encoder (Segmentation Head).
  • the lower model is composed of the encoder (Backbone) and the hard segmentation encoder (Segmentation Head).
  • soft segmentation encoder (Matting Head) and fusion encoder (Fusion Head) perform hard segmentation data annotation (Segmentation Data) on the upper model, and perform soft segmentation data annotation (Matting Data) on the lower model.
  • the encoder outputs the encoding result of hard segmentation data and inputs the encoding result into the hard segmentation encoder.
  • the encoder outputs the encoding result of soft segmentation data and inputs the encoding result into the hard segmentation encoder and soft segmentation
  • the encoder outputs the corresponding hard segmentation results and soft segmentation results respectively, and inputs the output hard segmentation results, soft segmentation results, and encoding results together into the fusion encoder to output the fusion result.
  • the encoders in the upper and lower models and The hard segmentation decoder needs parameter synchronization (Grad&Param Sync) during the training process, that is, the gradient parameters calculated between the two models are used to adjust the model.
  • step S503 in this embodiment is the same as step S103 in the above embodiment, and will not be described again here.
  • the background image is the image used as the background. As shown in Figure 10, the background image is acquired for subsequent processing.
  • the target image obtained in step S503 and the background image obtained in step S504 are fused to obtain a fused image, as shown in Figure 11.
  • the image processing method provided by the embodiment of the present disclosure obtains an image to be segmented, and after inputting the image to be segmented into a segmentation model, the first feature extraction network of the segmentation model generates output information, and inputs the output information of the first feature extraction network into The first branch network and the second branch network of the segmentation model obtain the segmented image based on the output information of the first branch network and the output information of the second branch network, and perform matting processing on the image to be segmented based on the segmented image to obtain the target image. Obtain the background image, fuse the target image and the background image to obtain the fused image.
  • the present disclosure obtains a segmented image by inputting the image to be segmented into a segmentation model, and performs matting processing on the image to be segmented based on the segmented image to obtain a target image. Since the segmented image provides transparency information for each pixel in the image, the image quality is improved. Quality of segmentation results.
  • image fusion processing is achieved by fusing the target image with the background image, and because the trained segmentation model incorporates the transparency information of the image to be segmented, the image segmentation effect of the segmentation model is improved, and the image segmentation is further improved. Quality of results.
  • FIG. 12 is a block diagram of an image processing device according to an exemplary embodiment.
  • the image processing device 1200 of the embodiment of the present disclosure includes: a first acquisition module 1201, an input module 1202 and a processing module 1203.
  • the first acquisition module 1201 is configured to acquire the image to be segmented.
  • the input module 1202 is configured to input the image to be segmented into the segmentation model to obtain a segmented image output by the segmentation model.
  • the processing module 1203 is configured to perform matting processing on the image to be segmented based on the segmented image to obtain a target image;
  • the target image is an image of the area where the target body is located in the image to be segmented;
  • the segmentation model is trained based on at least one first dimensional information of the target body, and the synchronization model is trained based on at least one second dimensional information of the target body.
  • the input module 1202 is further configured to perform: after inputting the image to be segmented into the segmentation model, generate output information by the first feature extraction network of the segmentation model, and convert the output of the first feature extraction network Information is input to the first branch network and the second branch network of the segmentation model, and the segmented image is obtained based on the output information of the first branch network and the output information of the second branch network; wherein the first branch network corresponds to at least one part of the target volume. first dimensional information, and the second branch network corresponds to at least one second dimensional information of the target body.
  • the input module 1202 is further configured to perform: the segmentation model inputs the output information of the first branch network, the output information of the second branch network, and the output information of the first feature extraction network into The fusion network of the segmentation model outputs the segmented image.
  • the fusion network of the segmentation model includes multiple convolutional layers.
  • the synchronization model includes a second feature extraction network and a third branch network corresponding to at least one second dimensional information of the target volume; in the process of training the segmentation model, the second training After the sample is input to the synchronization model, it is input to the third branch network through the second feature extraction network; the second feature extraction network has the same structure and the same model parameters as the first feature extraction network, and/or the second branch network and the third branch network The structures are the same and the model parameters are the same; where the second training sample is an image marked with at least one second dimension information of the target body.
  • the model parameters of the segmentation model are adjusted based on the first loss function determined by the segmentation model and the second loss function determined by the synchronization model; wherein, A loss function is determined based on the first training sample input to the segmentation model; the second loss function is determined based on the second training sample input to the synchronization model; the first training sample is an image marked with at least one first dimensional information of the target body .
  • adjusting the model parameters of the segmentation model based on the first loss function determined by the segmentation model and the second loss function determined by the synchronization model includes: calculating the first loss function and the second loss function. Sum the values to get the total loss function; calculate the gradient parameters based on the total loss function; adjust the model parameters of the segmentation model based on the gradient parameters.
  • the at least one first dimensional information of the target volume includes: the transparency of each pixel in the image of the area where the target volume is located; the at least one second dimensional information of the target volume includes: the transparency of the image to be segmented Pixels in the image that belong to the area where the target object is located.
  • the image processing device 1200 further includes: a second acquisition module configured to acquire a background image; and a fusion module configured to fuse the target image and the background image to obtain a fused image.
  • the image processing device acquires an image to be segmented, and after inputting the image to be segmented into a segmentation model, the first feature extraction network of the segmentation model generates output information, and inputs the output information of the first feature extraction network into The first branch network and the second branch network of the segmentation model obtain the segmented image based on the output information of the first branch network and the output information of the second branch network, and perform matting processing on the image to be segmented based on the segmented image to obtain the target image. Obtain the background image, fuse the target image and the background image to obtain the fused image.
  • the present disclosure obtains a segmented image by inputting the image to be segmented into a segmentation model, and performs matting processing on the image to be segmented based on the segmented image to obtain a target image. Since the segmented image provides transparency information for each pixel in the image, the image quality is improved. Quality of segmentation results.
  • image fusion processing is achieved by fusing the target image with the background image, and because the trained segmentation model incorporates the transparency information of the image to be segmented, the image segmentation effect of the segmentation model is improved, and the image segmentation is further improved. Quality of results.
  • Figure 13 is a block diagram of a terminal device 1300 according to an exemplary embodiment.
  • the terminal device 1300 of the embodiment of the present disclosure includes: the image processing device 1200 of the above embodiment.
  • the terminal device provided by the embodiment of the present disclosure can execute the image processing device as described above, obtain the image to be segmented, input the image to be segmented into the segmentation model, obtain the segmented image output by the segmentation model, and use the segmented image to be segmented according to the segmented image.
  • the image is cut out to obtain the target image.
  • the present disclosure obtains a segmented image by inputting the image to be segmented into a segmentation model, and performs matting processing on the image to be segmented based on the segmented image to obtain a target image. Since the segmented image provides transparency information for each pixel in the image, the image quality is improved. Quality of segmentation results.
  • FIG. 14 is a block diagram of an electronic device 1400 according to an exemplary embodiment.
  • the above-mentioned electronic device 1400 includes:
  • the memory 1401 and the processor 1402 are connected to a bus 1403 of different components (including the memory 1401 and the processor 1402).
  • the memory 1401 stores a computer program.
  • the processor 1402 executes the program, the image processing method described above in the embodiment of the present disclosure is implemented.
  • Bus 1403 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a graphics accelerated port, a processor, or a local bus using any of a variety of bus structures.
  • these architectures include, but are not limited to, the Industry Standard Architecture (ISA) bus, the Micro Channel Architecture (MAC) bus, the Enhanced ISA bus, the Video Electronics Standards Association (VESA) local bus, and the Peripheral Component Interconnect ( PCI) bus.
  • ISA Industry Standard Architecture
  • MAC Micro Channel Architecture
  • VESA Video Electronics Standards Association
  • PCI Peripheral Component Interconnect
  • Electronic device 1400 typically includes a variety of electronic device-readable media. These media can be any available media that can be accessed by electronic device 1400, including volatile and nonvolatile media, removable and non-removable media.
  • Memory 1401 may also include computer system readable media in the form of volatile memory, such as random access memory (RAM) 1404 and/or cache memory 1405.
  • Electronic device 1400 may further include other removable/non-removable, volatile/non-volatile computer system storage media.
  • storage system 1406 may be used to read and write to non-removable, non-volatile magnetic media (not shown in Figure 14, commonly referred to as a "hard drive”).
  • a disk drive for reading and writing removable non-volatile disks (e.g., "floppy disks"), and removable non-volatile optical disks (e.g., CD-ROM, DVD-ROM) may be provided. or other optical media) that can read and write optical disc drives.
  • each drive may be connected to bus 1403 through one or more data media interfaces.
  • Memory 1401 may include at least one program product having a set (eg, at least one) of program modules configured to perform the functions of embodiments of the present disclosure.
  • Program modules 1407 generally perform functions and/or means in the embodiments described in this disclosure.
  • Electronic device 1400 may also communicate with one or more external devices 1409 (e.g., keyboard, pointing device, display 1410, etc.), with one or more devices that enable a user to interact with electronic device 1400, and/or with Any device (eg, network card, modem, etc.) that enables the electronic device 1400 to communicate with one or more other computing devices. This communication may occur through an input/output (I/O) interface 1412.
  • the electronic device 1400 may also communicate with one or more networks (eg, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through the network adapter 1413.
  • network adapter 1413 communicates with other modules of electronic device 1400 through bus 1403.
  • the processor 1402 executes various functional applications and data processing by running programs stored in the memory 1401 .
  • the electronic device can execute the image processing method as described above, obtain the image to be segmented, input the image to be segmented into the segmentation model, obtain the segmented image output by the segmentation model, and obtain the image to be segmented based on the segmented image. Carry out cutout processing to obtain the target image.
  • the present disclosure obtains a segmented image by inputting the image to be segmented into a segmentation model, and performs matting processing on the image to be segmented based on the segmented image to obtain a target image. Since the segmented image provides transparency information for each pixel in the image, the image quality is improved. Quality of segmentation results.
  • the present disclosure also proposes a computer-readable storage medium.
  • the electronic device when the instructions in the computer-readable storage medium are executed by the processor of the electronic device, the electronic device can perform the image processing method as described above.
  • the computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

本公开关于一种图像处理方法、装置、终端设备、电子设备及存储介质,属于图像处理技术领域。其中,该方法包括:获取待分割图像,将待分割图像输入分割模型,得到由分割模型输出的分割后图像,根据分割后图像对待分割图像进行抠图处理,得到目标图像,目标图像为待分割图像中目标体所在区域的图像,其中,分割模型基于目标体的至少一种第一维度信息,以及同步模型进行训练得到,同步模型基于目标体的至少一种第二维度信息进行训练得到。本公开通过将待分割图像输入分割模型得到分割后图像,并根据分割后图像对待分割图像进行抠图处理得到目标图像,由于分割后图像提供了图像中每一像素的透明度信息,从而提高了图像分割结果的质量。

Description

图像处理方法、装置、终端设备、电子设备及存储介质 技术领域
本公开涉及图像处理技术领域,尤其涉及一种图像处理方法、装置、终端设备、电子设备及存储介质。
背景技术
目前,通过训练好的模型对预处理后的原始图像进行图像分割,模型输出硬分割解码结果和软分割解码结果融合得到的融合结果,可以获得图像分割结果。但是,由于只对图像进行分类和缺乏透明度效果,使得图像分割结果的质量一般。
发明内容
本公开提供一种图像处理方法、装置、终端设备、电子设备及存储介质,以至少解决相关技术中由于只对图像进行分类和缺乏透明度效果,使得图像分割结果的质量一般的问题。
本公开的技术方案如下:
根据本公开实施例的第一方面,提供一种图像处理方法,包括:获取待分割图像;将所述待分割图像输入分割模型,得到由所述分割模型输出的分割后图像;根据所述分割后图像对所述待分割图像进行抠图处理,得到目标图像;所述目标图像为所述待分割图像中目标体所在区域的图像;其中,所述分割模型基于所述目标体的至少一种第一维度信息,以及同步模型进行训练得到;所述同步模型基于所述目标体的至少一种第二维度信息进行训练得到。
在本公开的一个实施例中,所述将所述待分割图像输入分割模型,得到由所述分割模型输出的分割后图像,包括:将所述待分割图像输入所述分割模型后,由所述分割模型的第一特征提取网络生成输出信息,并将所述第一特征提取网络的输出信息输入到所述分割模型的第一分支网络和第二分支网络,基于所述第一分支网络的输出信息和所述第二分支网络的输出信息得到所述分割后图像;其中,所述第一分支网络对应于所述目标体的至少一种第一维度信息,所述第二分支网络对应于所述目标体的至少一种第二维度信息。
在本公开的一个实施例中,所述基于所述第一分支网络的输出信息和所述第二分支网络的输出信息得到所述分割后图像,包括:所述分割模型将所述第一分支网络的输出信息、所述第二分支网络的输出信息,以及所述第一特征提取网络的输出信息,输入到所述分割模型的融合网络,由所述分割模型的融合网络输出所述分割后图像。
在本公开的一个实施例中,所述分割模型的融合网络包括多个卷积层。
在本公开的一个实施例中,所述同步模型中包括第二特征提取网络和对应于所述目标体的至少一种第二维度信息的第三分支网络;在对所述分割模型进行训练的过程中,第二训练样本输入到所述同步模型后,经过所述第二特征提取网络输入到所述第三分支网络;所述第二特征提取网络与所述第一特征提取网络结构相同且模型参数相同,和/或所述第二分支网络与所述第三分支网络结构相同且模型参数相同;其中,所述第二训练样本为标记了目标体的至少一种第二维度信息的图像。
在本公开的一个实施例中,在对所述分割模型进行训练的过程中,基于由所述分割模型确定的第一损失函数,以及由所述同步模型确定的第二损失函数调整所述分割模型的模型参数;其中,所述第一损失函数基于输入到所述分割模型的第一训练样本确定;所述第二损失函数基于输入到所述同步模型的第二训练样本确定;所述第一训练样本为标记了目标体的至少一种第一维度信息的图像。
在本公开的一个实施例中,所述基于由所述分割模型确定的第一损失函数,以及由所述同步模型确定的第二损失函数调整所述分割模型的模型参数,包括:计算所述第一损失函数和所述第二损失函数的和值,得到总的损失函数;根据所述总的损失函数计算梯度参数;根据所述梯度参数调整所述分割模型的模型参数。
在本公开的一个实施例中,所述基于由所述分割模型确定的第一损失函数,以及由所述同步模型确定的第二损失函数调整所述分割模型的模型参数,包括:根据所述第一损失函数和所述第二损失函数计算梯度参数;根据所述梯度参数调整所述分割模型的模型参数。
在本公开的一个实施例中,所述图像处理方法,还包括:获取背景图像;将所述目标图像与所述背景图像进行融合,得到融合图像。
根据本公开实施例的第二方面,提供一种图像处理装置,包括:第一获取模块,被配置为执行获取待分割图像;输入模块,被配置为执行将所述待分割图像输入分割模型,得到由所述分割模型输出的分割后图像;处理模块,被配置为执行根据所述分割后图像对所述待分割图像进行抠图处理,得到目标图像;所述目标图像为所述待分割图像中目标体所在区域的图像;其中,所述分割模型基于所述目标体的至少一种第一维度信息,以及同步模型进行训练得到;所述同步模型基于所述目标体的至少一种第二维度信息进行训练得到。
在本公开的一个实施例中,所述输入模块,进一步被配置为执行:将所述待分割图像输入所述分割模型后,由所述分割模型的第一特征提取网络生成输出信息,并将所述第一特征提取网络的输出信息输入到所述分割模型的第一分支网络和第二分支网络,基于所述第一分支网络的输出信息和所述第二分支网络的输出信息得到所述分割后图像;其中,所述第一分支网络对应于所述目标体的至少一种第一维度信息,所述第二分支网络对应于所述目标体的至少一种第二维度信息。
在本公开的一个实施例中,所述输入模块,进一步被配置为执行:所述分割模型将所述第一分支网络的输出信息、所述第二分支网络的输出信息,以及所述第一特征提取网络的输出信息,输入到所述分割模型的融合网络,由所述分割模型的融合网络输出所述分割后图像。
在本公开的一个实施例中,所述分割模型的融合网络包括多个卷积层。
在本公开的一个实施例中,所述同步模型中包括第二特征提取网络和对应于所述目标体的至少一种第二维度信息的第三分支网络;在对所述分割模型进行训练的过程中,第二训练样本输入到所述同步模型后,经过所述第二特征提取网络输入到所述第三分支网络;所述第二特征提取网络与所述第一特征提取网络结构相同且模型参数相同,和/或所述第二分支网络与所述第三分支网络结构相同且模型参数相同;其中,所述第二训练样本为标记了目标体的至少一种第二维度信息的图像。
在本公开的一个实施例中,在对所述分割模型进行训练的过程中,基于由所述分割模型确定的第一损失函数,以及由所述同步模型确定的第二损失函数调整所述分割模型的模型参数;其中,所述第一损 失函数基于输入到所述分割模型的第一训练样本确定;所述第二损失函数基于输入到所述同步模型的第二训练样本确定;所述第一训练样本为标记了目标体的至少一种第一维度信息的图像。
在本公开的一个实施例中,所述基于由所述分割模型确定的第一损失函数,以及由所述同步模型确定的第二损失函数调整所述分割模型的模型参数,包括:计算所述第一损失函数和所述第二损失函数的和值,得到总的损失函数;根据所述总的损失函数计算梯度参数;根据所述梯度参数调整所述分割模型的模型参数。
在本公开的一个实施例中,所述目标体的至少一种第一维度信息包括:所述目标体所在区域的图像中每一像素的透明度;所述目标体的至少一种第二维度信息包括:所述待分割图像的像素中属于所述目标体所在区域的图像的像素。
在本公开的一个实施例中,所述图像处理装置,还包括:第二获取模块,被配置为执行获取背景图像;融合模块,被配置为执行将所述目标图像与所述背景图像进行融合,得到融合图像。
根据本公开实施例的第三方面,提供一种终端设备,包括:如本公开实施例第二方面所述的图像处理装置。
根据本公开实施例的第四方面,提供一种电子设备,包括:处理器;用于存储所述处理器的可执行指令的存储器;其中,所述处理器被配置为执行所述指令,以实现如本公开实施例第一方面所述的方法。
根据本公开实施例的第五方面,提供一种计算机可读存储介质,当所述计算机可读存储介质中的指令由电子设备的处理器执行时,使得电子设备能够执行如本公开实施例第一方面所述的方法。
本公开的实施例提供的技术方案至少带来以下有益效果:通过将待分割图像输入分割模型得到分割后图像,并根据分割后图像对待分割图像进行抠图处理得到目标图像,由于分割后图像提供了图像中每一像素的透明度信息,从而提高了图像分割结果的质量。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理,并不构成对本公开的不当限定。
图1是根据一示例性实施例示出的一种图像处理方法的流程图;
图2是待分割图像的示意图;
图3是分割后图像的示意图;
图4是目标图像的示意图;
图5是根据一示例性实施例示出的另一种图像处理方法的流程图;
图6是分割模型的结构示意图;
图7是分割模型的另一种结构示意图;
图8是同步模型的结构示意图;
图9是图像处理模型的结构示意图;
图10是背景图像的示意图;
图11是融合图像的示意图;
图12是根据一示例性实施例示出的一种图像处理装置的框图;
图13是根据一示例性实施例示出的一种终端设备的框图;
图14是根据一示例性实施例示出的一种电子设备的框图。
具体实施方式
为了使本领域普通人员更好地理解本公开的技术方案,下面将结合附图,对本公开实施例中的技术方案进行清楚、完整地描述。
需要说明的是,本公开的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本公开的实施例能够以除了在这里图示或描述的那些以外的顺序实施。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和装置的例子。
图1是根据一示例性实施例示出的一种图像处理方法的流程图,如图1所示,本公开实施例的图像处理方法,可以包括以下步骤。
S101,获取待分割图像。
需要说明的是,本公开实施例的图像处理方法的执行主体可以为用户终端设备。用户终端设备具体可包括但不限于手机、平板电脑、笔记本、台式电脑、车载终端、智能家电等。本公开实施例的图像处理方法可以由本公开实施例的图像处理装置执行,本公开实施例的图像处理装置可以配置在任意用户终端设备中,以执行本公开实施例的图像处理方法。
本公开的实施例中,待分割图像即计算平台上得到的等待进行图像分割的原始图像,其中,待分割图像上包含有物体的成像,如图2所示,为一个关于马的待分割图像,对待分割图像进行获取,以进行后续处理。需要说明的是,计算平台可以为手机、电脑和片上系统(SOC)等,本公开以手机为例进行描述,但不构成对计算平台的类型的限定,例如,可以通过手机对物体进行拍照来得到对应的待分割图像。
S102,将待分割图像输入分割模型,得到由分割模型输出的分割后图像。
本公开的实施例中,分割模型为对待分割图像进行分割的预先训练好的模型,分割后图像即待分割图像经过分割模型的分割处理后得到的图像,将步骤S101中获取的待分割图像输入分割模型,可以得到由分割模型输出的分割后图像。例如,如上述示例所说,将图2所示的待分割图像输入分割模型中,可得到对应的分割后图像,如图3所示。
S103,根据分割后图像对待分割图像进行抠图处理,得到目标图像。
本公开的实施例中,目标体为待分割图像中等待进行分割的物体,例如,图2中的马即为目标体对应的物体,目标图像为待分割图像中目标体所在区域的图像,例如,图2中的马所在的区域的图像即为目标图像对应的图像,根据步骤S102得到的分割后图像对步骤S101获取的待分割图像进行抠图处理,得到目标图像,如图4所示。
需要说明的是,分割模型基于目标体的至少一种第一维度信息,以及同步模型进行训练得到;同步模型基于目标体的至少一种第二维度信息进行训练得到。其中,目标体的至少一种第一维度信息包括:目标体所在区域的图像中每一像素的透明度,目标体的至少一种第二维度信息包括:待分割图像的像素中属于目标体所在区域的图像的像素分割处理包括语义的硬分割处理和透明度的软分割处理,其中,硬分割处理可得到对应的图像中每一个像素是否属于某一类物体的像素分类结果,软分割处理可得到对应的图像的物体边缘部分中的每一个像素属于某一类物体的透明度分类结果,
本公开的实施例提供的图像处理方法,获取待分割图像,将待分割图像输入分割模型,得到由分割模型输出的分割后图像,根据分割后图像对待分割图像进行抠图处理,得到目标图像。本公开通过将待分割图像输入分割模型得到分割后图像,并根据分割后图像对待分割图像进行抠图处理得到目标图像,由于分割后图像提供了图像中每一像素的透明度信息,从而提高了图像分割结果的质量。
图5是根据一示例性实施例示出的另一种图像处理方法的流程图,如图5所示,本公开实施例的图像处理方法,可以包括以下步骤。
S501,获取待分割图像。
具体的,本实施例中的步骤S501与上述实施例中的步骤S101相同,此处不再赘述。
上述实施例中的步骤S102“将待分割图像输入分割模型,得到由分割模型输出的分割后图像”具体可包括以下步骤S502。
S502,将待分割图像输入分割模型后,由分割模型的第一特征提取网络生成输出信息,并将第一特征提取网络的输出信息输入到分割模型的第一分支网络和第二分支网络,基于第一分支网络的输出信息和第二分支网络的输出信息得到分割后图像。
本领域人员可以理解的,分割处理包括语义的硬分割处理和透明度的软分割处理,其中,硬分割处理可得到对应的图像中每一个像素是否属于某一类物体的像素分类结果,软分割处理可得到对应的图像的物体边缘部分中的每一个像素属于某一类物体的透明度分类结果,
本公开的实施例中,第一特征提取网络指对待分割图像进行特征提取的网络,如图6所示,将步骤S201获取的待分割图像输入分割模型后,由分割模型的第一特征提取网络生成输出信息,并将第一特征提取网络的输出信息输入到分割模型的第一分支网络和第二分支网络,基于第一分支网络的输出信息和第二分支网络的输出信息得到分割后图像。其中,第一分支网络对应于目标体的至少一种第一维度信息即第一分支网络对应于目标体所在区域的图像中每一像素的透明度,即对应于软分割结果,第二分支网络对应于目标体的至少一种第二维度信息即第二分支网络对应于待分割图像的像素中属于目标体所在区域的图像的像素,即对应于硬分割结果。可选地,第一特征提取网络可以为编码器,第一分支网络可以为软分割解码器,对应的,第二分支网络可以为硬分割解码器。
需要说明的是,本公开对基于第一分支网络的输出信息和第二分支网络的输出信息得到分割后图像的具体方式不作限定,可根据实际情况进行设置。
作为一种可能的实施方式,分割模型将第一分支网络的输出信息、第二分支结构网络的输出信息,以及第一特征提取网络的输出信息,输入到分割模型的融合网络,由分割模型的融合网络输出分割后图 像,如图7所示。其中,分割模型的融合网络包括多个卷积层,即将第一分支网络的输出信息、第二分支结构网络的输出信息以及第一特征提取网络的输出信息经过卷积处理得到融合结果即对应的分割后图像。可选地,可以通过一组卷积来得到对应的分割后图像。需要说明的是,融合网络可以为融合解码器。
上述实施例中提到分割模型基于目标体的至少一种第一维度信息,以及同步模型进行训练得到,同步模型基于目标体的至少一种第二维度信息进行训练得到。这里,进一步对同步模型进行描述,如图8所示,为同步模型的结构示意图,同步模型中包括第二特征提取网络和对应于目标体的至少一种第二维度信息的第三分支网络,即第三分支网络对应于待分割图像的像素中属于目标体所在区域的图像的像素,即对应于硬分割结果。在对分割模型进行训练的过程中,第二训练样本为标记了目标体的至少一种第二维度信息的图像,将第二训练样本输入到同步模型后,经过第二特征提取网络输入到第三分支网络,第二特征提取网络与第一特征提取网络结构相同且模型参数相同,和/或第二分支网络与第三分支网络结构相同且模型参数相同。
继续对分割模型的训练过程进行描述,在对分割模型进行训练的过程中,基于由分割模型确定的第一损失函数,以及由同步模型确定的第二损失函数调整分割模型的模型参数。其中,第一损失函数基于输入到分割模型的第一训练样本确定;第二损失函数基于输入到同步模型的第二训练样本确定,第一训练样本为标记了目标体的至少一种第一维度信息的图像。
作为一种可能的实施方式,计算第一损失函数和第二损失函数的和值,得到总的损失函数,根据总的损失函数计算梯度参数,根据梯度参数调整分割模型的模型参数。需要说明的是,计算出的梯度参数同时传送给分割模型和同步模型,以对分割模型和同步模型两个模型的参数进行调整。
举例来说,图9为图像处理模型的示意图,如图9所示,图像处理模型由上下两个模型组成,对应着两个任务即任务1和任务2,在进行图像处理时仅使用下模型,而在对模型进行训练时使用上模型和下模型,上模型由编码器(Backbone)和硬分割编码器(Segmentation Head)组成,下模型由编码器(Backbone)、硬分割编码器(Segmentation Head)、软分割编码器(Matting Head)和融合编码器(Fusion Head)组成,对上模型进行硬分割数据标注(Segmentation Data),对下模型进行软分割数据标注(Matting Data),上模型中,编码器输出硬分割数据的编码结果,并将编码结果输入至硬分割编码器中,下模型中,编码器输出软分割数据的编码结果,并将编码结果输入至硬分割编码器中和软分割编码器中分别输出对应的硬分割结果和软分割结果,并将输出的硬分割结果、软分割结果以及编码结果一同输入至融合编码器中以输出融合结果,其中,上下模型中的编码器和硬分割解码器在训练的过程中需要进行参数同步(Grad&Param Sync)即使用两个模型之间计算出的梯度参数对模型进行调整,在进行训练时,上下两个模型的初始参数相同,则通过两个模型的损失函数计算出的梯度参数也是相同的,所以两个模型在训练过程中的参数始终相同,不难看出,由此训练出的图像处理模型融合了硬分割结果的信息,可以更好的实现分割效果。S503,根据分割后图像对待分割图像进行抠图处理,得到目标图像。
具体的,本实施例中的步骤S503与上述实施例中的步骤S103相同,此处不再赘述。
S504,获取背景图像。
本公开的实施例中,背景图像即用作背景的图像,如图10所示,对背景图像进行获取,以进行后 续处理。
S505,将目标图像与背景图像进行融合,得到融合图像。
本公开的实施例中,将步骤S503得到的目标图像与步骤S504获取的背景图像进行融合,得到融合图像,如图11所示。
本公开的实施例提供的图像处理方法,获取待分割图像,将待分割图像输入分割模型后,由分割模型的第一特征提取网络生成输出信息,并将第一特征提取网络的输出信息输入到分割模型的第一分支网络和第二分支网络,基于第一分支网络的输出信息和第二分支网络的输出信息得到分割后图像,根据分割后图像对待分割图像进行抠图处理,得到目标图像,获取背景图像,将目标图像与背景图像进行融合,得到融合图像。本公开通过将待分割图像输入分割模型得到分割后图像,并根据分割后图像对待分割图像进行抠图处理得到目标图像,由于分割后图像提供了图像中每一像素的透明度信息,从而提高了图像分割结果的质量。同时,通过将目标图像与背景图像进行融合,实现了图像的融合处理,且由于训练好的分割模型融合了待分割图像的透明度信息,提高了分割模型的图像分割的效果,进一步提高了图像分割结果的质量。
图12是根据一示例性实施例示出的一种图像处理装置的框图。如图12所示,本公开实施例的图像处理装置1200,包括:第一获取模块1201、输入模块1202和处理模块1203。
第一获取模块1201,被配置为执行获取待分割图像。
输入模块1202,被配置为执行将待分割图像输入分割模型,得到由分割模型输出的分割后图像。
处理模块1203,被配置为执行根据分割后图像对待分割图像进行抠图处理,得到目标图像;目标图像为待分割图像中目标体所在区域的图像;
其中,分割模型基于目标体的至少一种第一维度信息,以及同步模型进行训练得到;同步模型基于目标体的至少一种第二维度信息进行训练得到。
在本公开的一个实施例中,输入模块1202,进一步被配置为执行:将待分割图像输入分割模型后,由分割模型的第一特征提取网络生成输出信息,并将第一特征提取网络的输出信息输入到分割模型的第一分支网络和第二分支网络,基于第一分支网络的输出信息和第二分支网络的输出信息得到分割后图像;其中,第一分支网络对应于目标体的至少一种第一维度信息,第二分支网络对应于目标体的至少一种第二维度信息。
在本公开的一个实施例中,输入模块1202,进一步被配置为执行:分割模型将第一分支网络的输出信息、第二分支网络的输出信息,以及第一特征提取网络的输出信息,输入到分割模型的融合网络,由分割模型的融合网络输出分割后图像。
在本公开的一个实施例中,分割模型的融合网络包括多个卷积层。
在本公开的一个实施例中,同步模型中包括第二特征提取网络和对应于目标体的至少一种第二维度信息的第三分支网络;在对分割模型进行训练的过程中,第二训练样本输入到同步模型后,经过第二特征提取网络输入到第三分支网络;第二特征提取网络与第一特征提取网络结构相同且模型参数相同,和/或第二分支网络与第三分支网络结构相同且模型参数相同;其中,第二训练样本为标记了目标体的至 少一种第二维度信息的图像。
在本公开的一个实施例中,在对分割模型进行训练的过程中,基于由分割模型确定的第一损失函数,以及由同步模型确定的第二损失函数调整分割模型的模型参数;其中,第一损失函数基于输入到分割模型的第一训练样本确定;第二损失函数基于输入到同步模型的第二训练样本确定;第一训练样本为标记了目标体的至少一种第一维度信息的图像。
在本公开的一个实施例中,基于由分割模型确定的第一损失函数,以及由同步模型确定的第二损失函数调整分割模型的模型参数,包括:计算第一损失函数和第二损失函数的和值,得到总的损失函数;根据总的损失函数计算梯度参数;根据梯度参数调整分割模型的模型参数。
在本公开的一个实施例中,目标体的至少一种第一维度信息包括:目标体所在区域的图像中每一像素的透明度;目标体的至少一种第二维度信息包括:待分割图像的像素中属于目标体所在区域的图像的像素。
在本公开的一个实施例中,图像处理装置1200,还包括:第二获取模块,被配置为执行获取背景图像;融合模块,被配置为执行将目标图像与背景图像进行融合,得到融合图像。
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该装置的方法实施例中进行了详细描述,此处将不做详细阐述说明。
本公开的实施例提供的图像处理装置,获取待分割图像,将待分割图像输入分割模型后,由分割模型的第一特征提取网络生成输出信息,并将第一特征提取网络的输出信息输入到分割模型的第一分支网络和第二分支网络,基于第一分支网络的输出信息和第二分支网络的输出信息得到分割后图像,根据分割后图像对待分割图像进行抠图处理,得到目标图像,获取背景图像,将目标图像与背景图像进行融合,得到融合图像。本公开通过将待分割图像输入分割模型得到分割后图像,并根据分割后图像对待分割图像进行抠图处理得到目标图像,由于分割后图像提供了图像中每一像素的透明度信息,从而提高了图像分割结果的质量。同时,通过将目标图像与背景图像进行融合,实现了图像的融合处理,且由于训练好的分割模型融合了待分割图像的透明度信息,提高了分割模型的图像分割的效果,进一步提高了图像分割结果的质量。
图13是根据一示例性实施例示出的一种终端设备1300的框图。
如图13所示,本公开实施例的终端设备1300包括:如上述实施例的图像处理装置1200。
本公开的实施例提供的终端设备,可以执行如前所述的图像处理装置,获取待分割图像,将待分割图像输入分割模型,得到由分割模型输出的分割后图像,根据分割后图像对待分割图像进行抠图处理,得到目标图像。本公开通过将待分割图像输入分割模型得到分割后图像,并根据分割后图像对待分割图像进行抠图处理得到目标图像,由于分割后图像提供了图像中每一像素的透明度信息,从而提高了图像分割结果的质量。
图14是根据一示例性实施例示出的一种电子设备1400的框图。
如图14所示,上述电子设备1400包括:
存储器1401及处理器1402,连接不同组件(包括存储器1401和处理器1402)的总线1403,存储器1401存储有计算机程序,当处理器1402执行程序时实现本公开实施例上述的图像处理方法。
总线1403表示几类总线结构中的一种或多种,包括存储器总线或者存储器控制器,外围总线,图形加速端口,处理器或者使用多种总线结构中的任意总线结构的局域总线。举例来说,这些体系结构包括但不限于工业标准体系结构(ISA)总线,微通道体系结构(MAC)总线,增强型ISA总线、视频电子标准协会(VESA)局域总线以及外围组件互连(PCI)总线。
电子设备1400典型地包括多种电子设备可读介质。这些介质可以是任何能够被电子设备1400访问的可用介质,包括易失性和非易失性介质,可移动的和不可移动的介质。
存储器1401还可以包括易失性存储器形式的计算机系统可读介质,例如随机存取存储器(RAM)1404和/或高速缓存存储器1405。电子设备1400可以进一步包括其它可移动/不可移动的、易失性/非易失性计算机系统存储介质。仅作为举例,存储系统1406可以用于读写不可移动的、非易失性磁介质(图14未显示,通常称为“硬盘驱动器”)。尽管图14中未示出,可以提供用于对可移动非易失性磁盘(例如“软盘”)读写的磁盘驱动器,以及对可移动非易失性光盘(例如CD-ROM,DVD-ROM或者其它光介质)读写的光盘驱动器。在这些情况下,每个驱动器可以通过一个或者多个数据介质接口与总线1403相连。存储器1401可以包括至少一个程序产品,该程序产品具有一组(例如至少一个)程序模块,这些程序模块被配置以执行本公开各实施例的功能。
具有一组(至少一个)程序模块1407的程序/实用工具1408,可以存储在例如存储器1401中,这样的程序模块1407包括但不限于操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块1407通常执行本公开所描述的实施例中的功能和/或装置。
电子设备1400也可以与一个或多个外部设备1409(例如键盘、指向设备、显示器1410等)通信,还可与一个或者多个使得用户能与该电子设备1400交互的设备通信,和/或与使得该电子设备1400能与一个或多个其它计算设备进行通信的任何设备(例如网卡,调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口1412进行。并且,电子设备1400还可以通过网络适配器1413与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。如图14所示,网络适配器1413通过总线1403与电子设备1400的其它模块通信。应当明白,尽管图中未示出,可以结合电子设备1400使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。
处理器1402通过运行存储在存储器1401中的程序,从而执行各种功能应用以及数据处理。
需要说明的是,本实施例的电子设备的实施过程和技术原理参见前述对本公开实施例的图像处理装置的解释说明,此处不再赘述。
本公开实施例提供的电子设备,可以执行如前所述的图像处理方法,获取待分割图像,将待分割图像输入分割模型,得到由分割模型输出的分割后图像,根据分割后图像对待分割图像进行抠图处理,得到目标图像。本公开通过将待分割图像输入分割模型得到分割后图像,并根据分割后图像对待分割图像 进行抠图处理得到目标图像,由于分割后图像提供了图像中每一像素的透明度信息,从而提高了图像分割结果的质量。
为了实现上述实施例,本公开还提出一种计算机可读存储介质。
其中,该计算机可读存储介质中的指令由电子设备的处理器执行时,使得电子设备能够执行如前所述的图像处理方法。可选的,计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其它实施方案。本公开旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由下面的权利要求指出。
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。

Claims (13)

  1. 一种图像处理方法,其特征在于,包括:
    获取待分割图像;
    将所述待分割图像输入分割模型,得到由所述分割模型输出的分割后图像;
    根据所述分割后图像对所述待分割图像进行抠图处理,得到目标图像;所述目标图像为所述待分割图像中目标体所在区域的图像;
    其中,所述分割模型基于所述目标体的至少一种第一维度信息,以及同步模型进行训练得到;所述同步模型基于所述目标体的至少一种第二维度信息进行训练得到。
  2. 根据权利要求1所述的图像处理方法,其特征在于,所述将所述待分割图像输入分割模型,得到由所述分割模型输出的分割后图像,包括:
    将所述待分割图像输入所述分割模型后,由所述分割模型的第一特征提取网络生成输出信息,并将所述第一特征提取网络的输出信息输入到所述分割模型的第一分支网络和第二分支网络,基于所述第一分支网络的输出信息和所述第二分支网络的输出信息得到所述分割后图像;
    其中,所述第一分支网络对应于所述目标体的至少一种第一维度信息,所述第二分支网络对应于所述目标体的至少一种第二维度信息。
  3. 根据权利要求2所述的图像处理方法,其特征在于,所述基于所述第一分支网络的输出信息和所述第二分支网络的输出信息得到所述分割后图像,包括:
    所述分割模型将所述第一分支网络的输出信息、所述第二分支网络的输出信息,以及所述第一特征提取网络的输出信息,输入到所述分割模型的融合网络,由所述分割模型的融合网络输出所述分割后图像。
  4. 根据权利要求3所述的图像处理方法,其特征在于,所述分割模型的融合网络包括多个卷积层。
  5. 根据权利要求2所述的图像处理方法,其特征在于,所述同步模型中包括第二特征提取网络和对应于所述目标体的至少一种第二维度信息的第三分支网络;在对所述分割模型进行训练的过程中,第二训练样本输入到所述同步模型后,经过所述第二特征提取网络输入到所述第三分支网络;
    所述第二特征提取网络与所述第一特征提取网络结构相同且模型参数相同,和/或所述第二分支网络与所述第三分支网络结构相同且模型参数相同;
    其中,所述第二训练样本为标记了目标体的至少一种第二维度信息的图像。
  6. 根据权利要求5所述的图像处理方法,其特征在于,在对所述分割模型进行训练的过程中,基于由所述分割模型确定的第一损失函数,以及由所述同步模型确定的第二损失函数调整所述分割模型的模型参数;
    其中,所述第一损失函数基于输入到所述分割模型的第一训练样本确定;所述第二损失函数基于输入到所述同步模型的第二训练样本确定;所述第一训练样本为标记了目标体的至少一种第一维度信息的图像。
  7. 根据权利要求6所述的图像处理方法,其特征在于,所述基于由所述分割模型确定的第一损失函数,以及由所述同步模型确定的第二损失函数调整所述分割模型的模型参数,包括:
    计算所述第一损失函数和所述第二损失函数的和值,得到总的损失函数;
    根据所述总的损失函数计算梯度参数;
    根据所述梯度参数调整所述分割模型的模型参数。
  8. 根据权利要求1-7任一项所述的图像处理方法,其特征在于,
    所述目标体的至少一种第一维度信息包括:所述目标体所在区域的图像中每一像素的透明度;
    所述目标体的至少一种第二维度信息包括:所述待分割图像的像素中属于所述目标体所在区域的图像的像素。
  9. 根据权利要求1所述的图像处理方法,其特征在于,还包括:
    获取背景图像;
    将所述目标图像与所述背景图像进行融合,得到融合图像。
  10. 一种图像处理装置,其特征在于,包括:
    第一获取模块,被配置为执行获取待分割图像;
    输入模块,被配置为执行将所述待分割图像输入分割模型,得到由所述分割模型输出的分割后图像;
    处理模块,被配置为执行根据所述分割后图像对所述待分割图像进行抠图处理,得到目标图像;所述目标图像为所述待分割图像中目标体所在区域的图像;
    其中,所述分割模型基于所述目标体的至少一种第一维度信息,以及同步模型进行训练得到;所述同步模型基于所述目标体的至少一种第二维度信息进行训练得到。
  11. 一种终端设备,其特征在于,包括:如权利要求10所述的图像处理装置。
  12. 一种电子设备,其特征在于,包括:
    处理器;
    用于存储所述处理器的可执行指令的存储器;
    其中,所述处理器被配置为执行所述指令,以实现如权利要求1-9中任一项所述的方法。
  13. 一种计算机可读存储介质,其特征在于,当所述计算机可读存储介质中的指令由电子设备的处理器执行时,使得电子设备能够执行如权利要求1-9中任一项所述的方法。
PCT/CN2022/103760 2022-07-04 2022-07-04 图像处理方法、装置、终端设备、电子设备及存储介质 WO2024007135A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2022/103760 WO2024007135A1 (zh) 2022-07-04 2022-07-04 图像处理方法、装置、终端设备、电子设备及存储介质
CN202280004251.XA CN117651972A (zh) 2022-07-04 2022-07-04 图像处理方法、装置、终端设备、电子设备及存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/103760 WO2024007135A1 (zh) 2022-07-04 2022-07-04 图像处理方法、装置、终端设备、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2024007135A1 true WO2024007135A1 (zh) 2024-01-11

Family

ID=89454730

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/103760 WO2024007135A1 (zh) 2022-07-04 2022-07-04 图像处理方法、装置、终端设备、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN117651972A (zh)
WO (1) WO2024007135A1 (zh)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109377509A (zh) * 2018-09-26 2019-02-22 深圳前海达闼云端智能科技有限公司 图像语义分割标注的方法、装置、存储介质和设备
CN113052755A (zh) * 2019-12-27 2021-06-29 杭州深绘智能科技有限公司 一种基于深度学习的高分辨率图像智能化抠图方法
US20210248788A1 (en) * 2020-02-07 2021-08-12 Casio Computer Co., Ltd. Virtual and real composite image data generation method, virtual and real images compositing system, trained model generation method, virtual and real composite image data generation device
CN113570614A (zh) * 2021-01-18 2021-10-29 腾讯科技(深圳)有限公司 图像处理方法、装置、设备及存储介质
CN113744280A (zh) * 2021-07-20 2021-12-03 北京旷视科技有限公司 图像处理方法、装置、设备及介质
US20220044365A1 (en) * 2020-08-07 2022-02-10 Adobe Inc. Automatically generating a trimap segmentation for a digital image by utilizing a trimap generation neural network
CN114187317A (zh) * 2021-12-10 2022-03-15 北京百度网讯科技有限公司 图像抠图的方法、装置、电子设备以及存储介质
CN114299088A (zh) * 2021-12-27 2022-04-08 北京达佳互联信息技术有限公司 图像处理方法及装置

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109377509A (zh) * 2018-09-26 2019-02-22 深圳前海达闼云端智能科技有限公司 图像语义分割标注的方法、装置、存储介质和设备
CN113052755A (zh) * 2019-12-27 2021-06-29 杭州深绘智能科技有限公司 一种基于深度学习的高分辨率图像智能化抠图方法
US20210248788A1 (en) * 2020-02-07 2021-08-12 Casio Computer Co., Ltd. Virtual and real composite image data generation method, virtual and real images compositing system, trained model generation method, virtual and real composite image data generation device
US20220044365A1 (en) * 2020-08-07 2022-02-10 Adobe Inc. Automatically generating a trimap segmentation for a digital image by utilizing a trimap generation neural network
CN113570614A (zh) * 2021-01-18 2021-10-29 腾讯科技(深圳)有限公司 图像处理方法、装置、设备及存储介质
CN113744280A (zh) * 2021-07-20 2021-12-03 北京旷视科技有限公司 图像处理方法、装置、设备及介质
CN114187317A (zh) * 2021-12-10 2022-03-15 北京百度网讯科技有限公司 图像抠图的方法、装置、电子设备以及存储介质
CN114299088A (zh) * 2021-12-27 2022-04-08 北京达佳互联信息技术有限公司 图像处理方法及装置

Also Published As

Publication number Publication date
CN117651972A (zh) 2024-03-05

Similar Documents

Publication Publication Date Title
CN110503703B (zh) 用于生成图像的方法和装置
JP7110502B2 (ja) 深度を利用した映像背景減算法
CN109461167B (zh) 图像处理模型的训练方法、抠图方法、装置、介质及终端
US10983596B2 (en) Gesture recognition method, device, electronic device, and storage medium
CN110189336B (zh) 图像生成方法、系统、服务器及存储介质
CN109344755B (zh) 视频动作的识别方法、装置、设备及存储介质
CN109300179B (zh) 动画制作方法、装置、终端和介质
CN110516598B (zh) 用于生成图像的方法和装置
WO2022227218A1 (zh) 药名识别方法、装置、计算机设备和存储介质
WO2022089267A1 (zh) 样本数据获取方法、图像分割方法、装置、设备和介质
CN114266860B (zh) 三维人脸模型建立方法、装置、电子设备及存储介质
CN110349161B (zh) 图像分割方法、装置、电子设备、及存储介质
CN113780326A (zh) 一种图像处理方法、装置、存储介质及电子设备
JP2023526899A (ja) 画像修復モデルを生成するための方法、デバイス、媒体及びプログラム製品
WO2020087434A1 (zh) 一种人脸图像清晰度评价方法及装置
CN113361535A (zh) 图像分割模型训练、图像分割方法及相关装置
CN116310315A (zh) 抠图方法、装置、电子设备以及存储介质
CN111815748B (zh) 一种动画处理方法、装置、存储介质及电子设备
WO2024179322A1 (zh) 图像处理方法、装置、设备及存储介质
CN112714337A (zh) 视频处理方法、装置、电子设备和存储介质
WO2024007135A1 (zh) 图像处理方法、装置、终端设备、电子设备及存储介质
CN111914850B (zh) 图片特征提取方法、装置、服务器和介质
WO2023109086A1 (zh) 文字识别方法、装置、设备及存储介质
CN109857244B (zh) 一种手势识别方法、装置、终端设备、存储介质及vr眼镜
WO2024055194A1 (zh) 虚拟对象生成方法、编解码器训练方法及其装置

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 202280004251.X

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22949725

Country of ref document: EP

Kind code of ref document: A1