CN116612146B

CN116612146B - Image processing method, device, electronic equipment and computer storage medium

Info

Publication number: CN116612146B
Application number: CN202310844760.XA
Authority: CN
Inventors: 黄子旋; 游薪渝; 考月英; 于景铭; 吕江靖; 贾荣飞; 吕承飞
Original assignee: Taobao China Software Co Ltd
Current assignee: Taobao China Software Co Ltd
Priority date: 2023-07-11
Filing date: 2023-07-11
Publication date: 2023-11-17
Anticipated expiration: 2043-07-11
Also published as: CN116612146A

Abstract

The application provides an image processing method, a device, an electronic device and a computer storage medium, wherein in the image processing method, after an original image and a silhouette image are obtained, image segmentation is carried out under a first image resolution, after a first segmentation result and a second segmentation result are obtained, image interpolation processing is carried out on the first segmentation result and the second segmentation result, a third segmentation result corresponding to the first segmentation result under the original image resolution and a fourth segmentation result corresponding to the second segmentation result under the original image resolution are obtained, the original image resolution is larger than the first image resolution, in fact, image segmentation is carried out under a low image resolution condition, then the image segmentation result under the low image resolution is converted into an image segmentation result under a high image resolution in an image interpolation processing mode, so that a target segmentation result is obtained, and image segmentation at a pixel level is realized; the segmentation mode has no limitation on the object types in the image, and improves the applicability of image segmentation.

Description

Image processing method, device, electronic equipment and computer storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to an image processing method, an image processing apparatus, an electronic device, and a computer storage medium.

Background

Image segmentation is a technique and process of dividing an image into several specific regions with unique properties and presenting objects of interest. It is a key step from image processing to image analysis. Objects in the image (which may be considered as foreground) may be identified by image segmentation techniques for subsequent high quality object three-dimensional reconstruction algorithms.

In the existing image segmentation scheme, the image is mainly segmented under the condition of low image resolution, the image cannot be segmented at the pixel level, and the object in the image is generally limited in category when the image is segmented, so that the object reconstruction effect cannot be ensured. How to realize image segmentation at the pixel level and improve the practicability of image segmentation is a technical problem that needs to be solved currently.

Disclosure of Invention

The application provides an image processing method for realizing image segmentation at a pixel level and improving the practicability of the image segmentation, and also provides an image processing device, electronic equipment and a computer storage medium corresponding to the image processing method.

The application provides an image processing method, which comprises the following steps:

obtaining an original image for representing an object, a silhouette image for representing the object; the silhouette image is an image corresponding to the original image;

obtaining a first image of the original image at a first image resolution and a second image of the silhouette image at the first image resolution;

respectively carrying out image segmentation on the first image and the second image to obtain a first segmentation result corresponding to the first image and a second segmentation result corresponding to the second image; the first segmentation result is used for indicating that each pixel point in the first image belongs to a foreground or a background, and the second segmentation result is used for indicating that each pixel point in the second image belongs to the foreground or the background;

respectively carrying out image interpolation processing on the first segmentation result and the second segmentation result based on the original image resolution of the original image and the silhouette image, and obtaining a third segmentation result corresponding to the first segmentation result and a fourth segmentation result corresponding to the second segmentation result; the original image resolution is greater than the first image resolution;

And obtaining a target segmentation result according to the third segmentation result and the fourth segmentation result.

Optionally, the performing image segmentation on the first image and the second image to obtain a first segmentation result corresponding to the first image and a second segmentation result corresponding to the second image includes:

and respectively taking the first image and the second image as input data of a trained image segmentation result prediction model, and inputting the input data into the trained image segmentation result prediction model to obtain the first segmentation result and the second segmentation result.

Optionally, the obtaining the target segmentation result according to the third segmentation result and the fourth segmentation result includes:

performing aggregation treatment on the third segmentation result and the fourth segmentation result to obtain an aggregation segmentation result;

and taking the aggregate segmentation result as input data of a trained image segmentation result optimization model, and inputting the input data into the trained image segmentation result optimization model to obtain the target segmentation result.

Optionally, the method further comprises: and optimizing the target segmentation result to obtain an optimized target segmentation result.

Optionally, the optimizing the target segmentation result to obtain an optimized target segmentation result includes:

determining a first target background area comprising a hollowed-out area in the silhouette image based on gray value distribution of a gray level image corresponding to the silhouette image;

and optimizing the target segmentation result according to the first target background area to obtain an optimized target segmentation result.

Optionally, the determining, based on the gray value distribution of the gray map corresponding to the silhouette image, a first target background area including a hollowed-out area in the silhouette image includes:

determining a first histogram for representing gray value distribution of each pixel point in the gray map according to the gray value of each pixel point in the gray map; the first histogram is used for representing the corresponding relation between each gray value corresponding to the pixel point in the gray map and the number of the pixel points of each gray value;

and determining a first target background area comprising a hollowed-out area in the silhouette image according to the first histogram.

Optionally, the method further comprises:

according to the first target background area, setting a predicted value of a pixel point corresponding to the first target background area in the silhouette image as a first gray level value, and setting a predicted value of a pixel point corresponding to the first target foreground area as a second gray level value, so as to obtain a first binary image;

Optimizing the target segmentation result according to the first target background area to obtain an optimized target segmentation result, including:

setting the gray value of the pixel point with the gray value of the first gray value in the first binary image in the target segmentation result as a second gray value, and obtaining an optimized target segmentation result, wherein the target segmentation result is a second binary image with the predicted value of the pixel point corresponding to the first initial foreground region set as the first gray value and the predicted value of the pixel point corresponding to the first initial background region set as the second gray value, the first initial background region is a background region not comprising a hollowed-out region, the first initial foreground region is a foreground region not comprising a hollowed-out region, and the first target foreground region is a foreground region not comprising a hollowed-out region.

Optionally, the determining, according to the first histogram, a first target background area including a hollowed-out area in the silhouette image includes:

determining a preset gray value interval;

determining a gray upper limit value and a gray lower limit value corresponding to a first target background area in the first histogram based on the gray value interval;

And determining a first target background area comprising a hollowed-out area in the silhouette image according to the gray level upper limit value and the gray level lower limit value.

Optionally, the determining, based on the gray value interval, a gray upper limit value and a gray lower limit value corresponding to the first target background area in the first histogram includes:

based on the sequence of gray values from large to small in the first histogram, traversing downwards from the upper limit value of the gray value interval in the first histogram, and determining the first current traversing gray value as the gray upper limit value of the first target background area if the gray gradient value corresponding to the first current traversing gray value currently traversed in the first histogram is larger than a preset first gradient threshold value;

based on the sequence of gray values from large to small in the first histogram, traversing downwards in the first histogram from the lower limit value of the gray value interval, and determining the second current traversing gray value as the gray lower limit value of the first target background area if the gray gradient value corresponding to the second current traversing gray value traversed in the first histogram is smaller than a preset second gradient threshold value and the number of pixels corresponding to the second current traversing gray value is smaller than a preset pixel number threshold value;

The gray gradient value represents the ratio of the change amount of the gray value corresponding to the number of pixel points to the change amount of the gray value.

determining a target reflection area in a second initial foreground area according to gray value distribution of pixel points of the second initial foreground area in a gray image corresponding to the silhouette image, wherein the second initial foreground area is an area confirmed as a foreground in the target segmentation result;

and optimizing the target segmentation result according to the target reflection region to obtain an optimized target segmentation result.

Optionally, the determining, according to the gray value distribution of the pixel points of the second initial foreground area in the gray map corresponding to the silhouette image, the target reflection area in the second initial foreground area includes:

determining a gray level map corresponding to the silhouette image;

determining a second histogram for representing gray value distribution of pixels of a second initial foreground region in the gray map according to the gray map and the target segmentation result; the second histogram is used for representing the corresponding relation between each gray value corresponding to the pixel point of the second initial foreground area in the gray map and the number of the pixel points of each gray value;

And determining a target reflection area in the second initial foreground area according to the second histogram.

Optionally, the determining, according to the second histogram, a target reflection area in the second initial foreground area includes:

determining an initial reflection area in the second initial foreground area according to the second histogram;

judging whether a non-reflection area exists in the initial reflection area;

and if so, removing the non-reflection area in the initial reflection area, and determining the target reflection area.

Optionally, the target segmentation result is a second binary image in which a predicted value of a pixel point corresponding to the second initial foreground region is set to a first gray value, and a predicted value of a pixel point corresponding to the second initial background region is set to a second gray value;

the method further comprises the steps of: according to the initial back image area, setting a predicted value of a pixel point corresponding to the initial back image area in the silhouette image as a first gray level value, and setting a predicted value of a pixel point corresponding to an area except the initial back image area as a second gray level value, so as to obtain a third binary image for binarizing the predicted value of the pixel point of the silhouette image;

According to the second binary image and the third binary image, a fourth binary image for binarizing the predicted value of the pixel point of the silhouette image is obtained, wherein the predicted value of the pixel point corresponding to the non-highlight region of the object in the fourth binary image is set to be a first gray level value, and the predicted value of the pixel point corresponding to the region except for the non-highlight region of the object is set to be a second gray level value;

the determining whether a non-reflection area exists in the initial reflection area includes:

and judging whether a non-reflection area exists in the initial reflection area according to the third binary image, the fourth binary image and a preset non-reflection area detection strategy.

Optionally, the method further comprises: removing the residual outline of the edge and the appointed communication region in the third binary image to obtain a fifth binary image;

the judging whether the initial back-image area has the non-back-image area according to the third binary image, the fourth binary image and the preset non-back-image area detection strategy comprises the following steps:

judging whether a non-reflection area exists in the initial reflection area or not based on a first bounding box in the fifth binary image, a second bounding box in the fourth binary image and a preset non-reflection area detection strategy, wherein the first bounding box is used for marking an area formed by pixels with gray values being first gray values in the fifth binary image, and the second bounding box is used for marking an area formed by pixels with gray values being first gray values in the fourth binary image.

The present application provides an image processing apparatus including:

a first obtaining unit configured to obtain an original image representing an object, a silhouette image representing the object; the silhouette image is an image corresponding to the original image;

a second obtaining unit configured to obtain a first image of the original image at a first image resolution and a second image of the silhouette image at the first image resolution;

the segmentation unit is used for respectively carrying out image segmentation on the first image and the second image to obtain a first segmentation result corresponding to the first image and a second segmentation result corresponding to the second image; the first segmentation result is used for indicating that each pixel point in the first image belongs to a foreground or a background, and the second segmentation result is used for indicating that each pixel point in the second image belongs to the foreground or the background;

the interpolation processing unit is used for respectively carrying out image interpolation processing on the first segmentation result and the second segmentation result based on the original image resolution of the original image and the silhouette image to obtain a fourth segmentation result corresponding to the third segmentation result and the second segmentation result corresponding to the first segmentation result; the original image resolution is greater than the first image resolution;

And the target segmentation result obtaining unit is used for obtaining a target segmentation result according to the third segmentation result and the fourth segmentation result.

The present application provides an electronic device including:

a processor;

and a memory for storing a computer program which is executed by the processor to perform the image processing method.

The present application provides a computer storage medium storing a computer program to be executed by a processor to perform the above-described image processing method.

Compared with the prior art, the embodiment of the application has the following advantages:

the application provides an image processing method, which comprises the following steps: obtaining an original image for representing an object, a silhouette image for representing the object; the silhouette image is an image corresponding to the original image; obtaining a first image of an original image under a first image resolution and a second image of a silhouette image under the first image resolution; respectively carrying out image segmentation on the first image and the second image to obtain a first segmentation result corresponding to the first image and a second segmentation result corresponding to the second image; the first segmentation result is used for indicating that each pixel point in the first image belongs to the foreground or the background, and the second segmentation result is used for indicating that each pixel point in the second image belongs to the foreground or the background; respectively carrying out image interpolation processing on the first segmentation result and the second segmentation result based on the original image resolution of the original image and the silhouette image to obtain a third segmentation result corresponding to the first segmentation result and a fourth segmentation result corresponding to the second segmentation result; the original image resolution is greater than the first image resolution; and obtaining a target segmentation result according to the third segmentation result and the fourth segmentation result. In the image processing method, after an original image and a silhouette image are obtained, image segmentation is carried out under a first image resolution, then image interpolation processing is directly carried out on the first segmentation result and the second segmentation result after the first segmentation result and the second segmentation result are obtained, a fourth segmentation result is obtained, corresponding to a third segmentation result and a second segmentation result, of the first segmentation result under the original image resolution, is obtained, the original image resolution is larger than the first image resolution, the image segmentation is carried out under a condition of low image resolution, then the image segmentation result under the low image resolution is converted into an image segmentation result under a high image resolution in a mode of image interpolation processing, then a target segmentation result is obtained on the basis of the image segmentation result under the high image resolution, the image segmentation effect is further improved, and image segmentation at a pixel level is achieved through the high image resolution; meanwhile, the image segmentation mode adopted by the application has no category restriction on objects in the image, thereby improving the applicability of image segmentation.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.

Fig. 1 is a schematic view of an image processing method according to the present application.

Fig. 2 is a flowchart of an image processing method according to a first embodiment of the present application.

Fig. 3 is a schematic diagram of an original image and a silhouette image in the present embodiment.

Fig. 4 is a schematic diagram of a process for segmenting an image using a two-stage model.

Fig. 5 is a schematic view of a first segmentation scene.

Fig. 6 is a schematic diagram of a second segmentation scenario.

Fig. 7 is a schematic diagram of a process of removing a hollowed-out area in a foreground of a target segmentation result.

Fig. 8 is a schematic diagram of a first histogram.

Fig. 9 is a schematic diagram of a result of removing a hollowed-out area in a foreground of a target segmentation result.

Fig. 10 is a schematic diagram of a process of removing a ghost area in a target segmentation result foreground.

Fig. 11 is a schematic diagram of the result of removing the ghost area in the target segmentation result foreground.

Fig. 12 is a schematic diagram of a third binary image and a fourth binary image according to the present embodiment.

Fig. 13 is a schematic diagram of a fifth binary image according to the present embodiment.

Fig. 14 is a schematic diagram of a first bounding box in a fifth binary image and a second bounding box in a fourth binary image according to the present embodiment.

Fig. 15 is a schematic view of an image processing apparatus according to a second embodiment of the present application.

Fig. 16 is a schematic diagram of an electronic device according to a third embodiment of the present application.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. The present application may be embodied in many other forms than those herein described, and those skilled in the art will readily appreciate that the present application may be similarly embodied without departing from the spirit or essential characteristics thereof, and therefore the present application is not limited to the specific embodiments disclosed below.

The application provides an image processing method, an image processing device, an electronic device and a computer storage medium. The image processing method, apparatus, electronic device, and computer storage medium are described below by way of specific embodiments, respectively.

The image processing method of the application can be applied to the scene of dividing the image to identify the object or the object in the image. For example, when the object in the image is a hat, the outline of the hat in the image may be determined by dividing the image. When the object in the image is a slipper, the outline of the slipper in the image can be determined by dividing the image.

In the application, the object is shot through self-designed 720-degree shooting equipment, so that an original image for representing the object and a silhouette image for representing the object are obtained; the original image may refer to an image of an object photographed under natural light conditions or under normal lighting conditions, and the silhouette image may refer to an image of an object photographed under silhouette lighting conditions (the principle of capturing a silhouette image by a silhouette lamp is to make an object region darker to be closer to black, and other regions brighter to facilitate dividing the object in the image). The original image and the silhouette image which correspond to each other are different only in color of light of the light source, and the shooting angle and the object placement angle are identical.

In the application, 720-degree shooting equipment is a set of shooting equipment, and can be composed of a transparent turntable (such as a glass turntable), a plurality of camera matrixes, a lighting system and a control device, wherein the shooting equipment can shoot the top and the bottom of an object at the same time, the object is placed on the transparent turntable during shooting, and the shooting equipment (i.e. a camera) can shoot images of various rotation angles of the object during one rotation of the object through rotation of the turntable. As an example, placing the transparent turntable horizontally, an image of one circle in the horizontal rotation process can be shot, and meanwhile, a circular ring is arranged in the vertical direction, and a plurality of cameras are placed at a plurality of angles and positions through the circular ring, namely: the camera can be arranged at any position of the circular ring, and can be rotated in combination with the rotary table to shoot all surfaces of the object, namely, 360-degree shooting of the object in the horizontal direction and the vertical direction is realized, and 720-degree shooting of the object is further realized. The shooting view angles of the cameras are determined by the positions of different clamping grooves of the cameras arranged on the circular ring, and different shooting view angles are selected according to different shooting objects; for example, when the photographed object is a shoe, the photographing view angle of the camera needs to be raised to be slightly higher so that the inner cavity of the shoe can be photographed. In the application, the number of cameras can be set according to the requirement, as long as a plurality of cameras arranged on the vertical circular ring can shoot the angle of an object in a circle in the vertical direction.

According to the application, images with multiple visual angles can be simultaneously shot through control equipment control, the shooting efficiency is higher, and an object does not need to be manually moved in the shooting process, so that the modeling effect of the object easy to deform is improved.

In the application, after an original image and a corresponding silhouette image are obtained, in order to improve the image segmentation efficiency, a first image of the original image under a first image resolution and a second image of the silhouette image under the first image resolution are obtained first, then the first image and the second image are subjected to image segmentation, and a first segmentation result corresponding to the first image and a second segmentation result corresponding to the second image are obtained; in order to achieve the pixel-level segmentation effect under the original image resolution, after a first segmentation result and a second segmentation result are obtained, performing image interpolation processing on the first segmentation result and the second segmentation result to obtain a third segmentation result corresponding to the first segmentation result and a fourth segmentation result corresponding to the second segmentation result; finally, a target segmentation result is obtained according to the third segmentation result and the fourth segmentation result.

After performing image interpolation processing on the first segmentation result and the second segmentation result to obtain a third segmentation result corresponding to the first segmentation result and a fourth segmentation result corresponding to the second segmentation result, repairing edges of the interpolated result through a mapping network (a refined segmentation network, an example of a trained image segmentation result optimization model), namely: and judging whether the edge saw tooth generated by interpolation belongs to the foreground or the background again according to the silhouette image, and removing the saw tooth, so that pixel level segmentation is realized under high image resolution, and the high-precision segmentation effect under high image resolution is achieved. The flow of the marking network is as follows: two types of images are input, namely a silhouette image with the resolution of an original image and a segmentation result after interpolation processing (the segmentation result after interpolation processing can refer to a third segmentation result and a fourth segmentation result), the third segmentation result and the fourth segmentation result can be divided into 3 parts according to the values of pixel points in the image (the values of the pixel points can be between 0 and 255), a foreground area, a background area and an uncertain area, and at the moment, the foreground area or the background area of the segmentation result after interpolation processing can be used for judging which parts are foreground parts, which parts are background parts and which parts are uncertain areas on the silhouette image; and judging whether the uncertain region belongs to the foreground or the background according to the color of the foreground region and the color of the background region of the silhouette image, so as to optimize saw teeth in the segmentation result.

In order to facilitate understanding of the above image processing method, please refer to fig. 1, which is a schematic view of a scenario of the image processing method of the present application.

In this scenario, taking an example of performing an image processing method at a server, the server is a computing device for providing services such as data processing and storage for a user terminal, and in general, the server may refer to a server or a server cluster. User terminals are typically electronic devices that facilitate direct operation by a user.

In the present application, referring specifically to fig. 1, the following process is performed by the server: firstly, obtaining an original image and a corresponding silhouette image; then, a first image of the original image under the first image resolution and a second image of the silhouette image under the first image resolution are obtained, image segmentation is carried out on the first image and the second image, and a first segmentation result corresponding to the first image and a second segmentation result corresponding to the second image are obtained; then, performing image interpolation processing on the first segmentation result and the second segmentation result to obtain a third segmentation result corresponding to the first segmentation result and a fourth segmentation result corresponding to the second segmentation result; finally, a target segmentation result is obtained according to the third segmentation result and the fourth segmentation result.

When the terminal needs to obtain the image segmentation result, the terminal sends a request message for requesting to obtain the image segmentation result to the server, and after the server obtains the request message, the server obtains a target segmentation result according to the image processing method based on the request message and provides the target segmentation result to the terminal.

Fig. 1 described above is a schematic view of an application scenario of an image processing method according to the present application, and the application scenario of the image processing method according to the present application is merely one embodiment of the application scenario of the image processing method according to the present application, which is provided for convenience in understanding the image processing method according to the present application, and is not limited to the image processing method according to the present application. The embodiment of the application is not repeated for other application scenarios of the image processing method.

First embodiment

A first embodiment of the present application provides an image processing method, which is described below with reference to fig. 2 to 10. An applicable scene of the image processing method may be referred to the above-described scene embodiment, and for some examples of the present embodiment, reference is made to the above-described scene embodiment.

The image processing method of the embodiment of the application comprises the following steps.

Step S201: obtaining an original image for representing an object, a silhouette image for representing the object; the silhouette image is an image corresponding to the original image.

In the present embodiment, as a first step of image processing, an original image for representing an object, a silhouette image for representing an object, are obtained prior to image processing.

Specifically, for easy understanding of the original image and the silhouette image in the present embodiment, please refer to fig. 3, which is a schematic diagram of the original image and the silhouette image in the present embodiment. The left column in fig. 3 corresponds to the original image, and the right column in fig. 3 corresponds to the silhouette image. A silhouette image being an image corresponding to an original image may actually refer to: the shooting angles of the original image and the corresponding silhouette image are identical, the placement modes of the objects are identical, and the difference is only that the colors of light emitted by a light system during shooting of the original image and the silhouette image are different.

Step S202: a first image of the original image at a first image resolution and a second image of the silhouette image at the first image resolution are obtained.

In this embodiment, in order to improve robustness for a silhouette image, when the photographic effect of the silhouette image is poor, a better segmentation effect can still be obtained by adopting the image processing method of this embodiment, and in this embodiment, the object segmentation results of the original image and the silhouette image are predicted by adopting the deep learning model, and the object segmentation results are used for segmenting the object region in the image from the image. The value of the pixel point in the object segmentation result is between 0 and 255, the object segmentation result can be converted into a binary image by setting a preset threshold, for example, the pixel point with the value of the pixel point (the predicted value of the pixel point) being greater than or equal to 128 can be used as a foreground, the gray value of the pixel point being set to 1, the pixel point with the value of the pixel point being less than 128 is used as a background, and the gray value of the pixel point being set to 0, thereby obtaining the binary image. In summary, the object segmentation result can be displayed through a binary image later, namely: the pixels corresponding to the object region may be represented in the image by one gray value (a first gray value, e.g., 1, or 255), and the pixels corresponding to the non-object region may be represented by another gray value (a second gray value, e.g., 0).

In order to reduce the occupied video memory during model training, in this embodiment, a two-stage model (a trained image segmentation result prediction model and a trained image segmentation result optimization model) is used to segment an image, and an initial image segmentation result prediction model is first trained on the image size of 512×512 (512×512 is an example of the first image resolution) with low image resolution, so as to reduce the video memory occupation and time cost during training; similarly, when the segmentation results of the original image and the silhouette image are predicted using the trained image segmentation result prediction model after the training of the initial image segmentation result prediction model, it is also necessary to perform prediction on the image with low image resolution first. Therefore, it is necessary to obtain a first image of the original image at the first image resolution and a second image of the silhouette image at the first image resolution to perform the division result of the prediction image under the low image resolution condition.

In this embodiment, the image is actually segmented under the low image resolution, and then the image segmentation result of the original image resolution is obtained by the image interpolation processing mode, so as to facilitate improving the image segmentation efficiency. Meanwhile, compared with the existing image segmentation mode, the method is wider in applicability and has no limitation on object types.

The existing image segmentation mode is that the image segmentation models trained on different data sets are different in applicable object types, for example, 80 types of objects including people, vehicles and the like are trained on the data sets, and objects of other types beyond the range of the data sets cannot be segmented by adopting the trained image segmentation models. The existing image segmentation mode is based on a semantic segmentation model, for example, 80 categories are assumed, 80 channels exist on each pixel point of a final predicted segmentation result, and each channel correspondingly has a value and represents one category respectively; then, it is determined which channel has a larger value, and the pixel is classified into this category. For example, assuming that 1 channel is a person, 2 channel is a car, etc., and then the prediction segmentation result 1 channel is largest, the category of this pixel point is a person. In practice, the existing image segmentation method cannot predict objects exceeding the 80 categories, but the segmentation result output by the method only has one channel, the foreground and the background are distinguished by a preset threshold, the object categories in the image are not classified, and the image to be segmented is not limited by the category.

Step S203: and respectively carrying out image segmentation on the first image and the second image to obtain a first segmentation result corresponding to the first image and a second segmentation result corresponding to the second image.

In this embodiment, the first segmentation result is used to indicate that each pixel point in the first image belongs to a foreground or a background, and the second segmentation result is used to indicate that each pixel point in the second image belongs to a foreground or a background. As an example, the first segmentation result and the second segmentation result are subsequently represented by using binary images (i.e., in order to distinguish between the foreground and the background in the images, pixels corresponding to the foreground and the background are distinguished and represented by using different gray values).

In this embodiment, after obtaining the first image and the second image and obtaining the trained image segmentation result prediction model, as one embodiment of respectively performing image segmentation on the first image and the second image to obtain a first segmentation result corresponding to the first image and a second segmentation result corresponding to the second image, it may be referred to as:

and respectively taking the first image and the second image as input data of a trained image segmentation result prediction model, and inputting the input data into the trained image segmentation result prediction model to obtain a first segmentation result and a second segmentation result.

In particular, for easy understanding of the above-mentioned process of segmenting an image using a two-stage model, please refer to fig. 4, which is a schematic diagram of the process of segmenting an image using a two-stage model. In fig. 4, first, a trained image segmentation result prediction model is used to segment a first image and a second image, so as to obtain a first segmentation result and a second segmentation result; after that, step S204 is performed.

Step S204: and respectively carrying out image interpolation processing on the first segmentation result and the second segmentation result based on the original image resolution of the original image and the silhouette image, and obtaining a third segmentation result corresponding to the first segmentation result and a fourth segmentation result corresponding to the second segmentation result.

In this embodiment, the original image resolution is greater than the first image resolution.

In order to realize the segmentation of the original image and the silhouette image under the condition of high image resolution, after the first segmentation result and the second segmentation result are obtained, the image interpolation processing is carried out on the first segmentation result and the second segmentation result, and a fourth segmentation result corresponding to the third segmentation result and the second segmentation result corresponding to the first segmentation result is obtained. In fact, the third segmentation result corresponds to a segmentation result of the original image at the resolution of the original image; the fourth segmentation result corresponds to a segmentation result of the silhouette image at the original image resolution.

As an example, after the first segmentation result and the second segmentation result are obtained, the first segmentation result and the second segmentation result are then interpolated to a high image resolution of 6000×4000 (6000×4000 may be an example of the original image resolution), so as to obtain a third segmentation result and a fourth segmentation result. Image interpolation is a process of generating a high resolution image from a low resolution image based on a model framework to recover lost information in the image. The image interpolation method mainly comprises the following steps: nearest neighbor interpolation, bilinear interpolation, biquadratic interpolation, bicubic interpolation, and other higher order methods. In this embodiment, any one of the image interpolation methods listed above may be employed. Image interpolation is actually to scale up the segmentation result at low image resolution to high image resolution to obtain an image segmentation result at high image resolution, but this process causes jaggies to appear in the corresponding image in the segmentation result. Image interpolation is a way to enlarge an image, for example, the original image is only 512 pixels, and is enlarged to 6000, a lot of pixels are added, the value of the added pixels (the value corresponding to the predicted value at the added pixels) is unknown, and the value of the added pixels needs to be estimated by the value of the known pixels.

After the third segmentation result and the fourth segmentation result are obtained, since the segmentation is performed using two images (the original image and the silhouette image), in order to facilitate the representation of the segmentation result, an aggregation process (which may be regarded as a fusion process) is required for the third segmentation result and the fourth segmentation result, and the target segmentation result is subsequently obtained.

Step S205: and obtaining a target segmentation result according to the third segmentation result and the fourth segmentation result.

Specifically, obtaining the target segmentation result according to the third segmentation result and the fourth segmentation result may refer to: firstly, carrying out aggregation treatment on a third segmentation result and a fourth segmentation result to obtain an aggregation segmentation result; and then, taking the aggregate segmentation result as input data of the trained image segmentation result optimization model, and inputting the input data into the trained image segmentation result optimization model to obtain a target segmentation result.

Performing aggregation processing on the third segmentation result and the fourth segmentation result, and obtaining an aggregation segmentation result may refer to: averaging the third segmentation result and the fourth segmentation result to obtain an average segmentation result; specifically, the value of each pixel point in the third segmentation result and the fourth segmentation result is between two values (for example, between 0 and 255), and since the pixel points in the third segmentation result and the fourth segmentation result also correspond to each other, when the third segmentation result and the fourth segmentation result are aggregated, for a certain pixel point, the value of the pixel point in the third segmentation result and the value of the pixel point in the fourth segmentation result can be averaged to obtain an aggregate value of the pixel point, and by adopting this method, the aggregate values of all the pixel points can be obtained, so as to determine the aggregate segmentation result. The values of the pixels are actually between 0 and 255, and belong to predicted result values (i.e. predicted values), wherein the predicted result values are equivalent to scores of the foreground to which each pixel output by the model possibly belongs, and the higher the values, the more likely the pixel belongs to the foreground.

Of course, when the aggregation processing is performed, for a certain pixel point, a maximum value or a minimum value may be taken as an aggregation value of the pixel point from the value of the pixel point in the third segmentation result and the value of the pixel point in the fourth segmentation result. As to whether the above is based on the average or takes the maximum or minimum value, it is determined according to the camera position where the original image or the silhouette image is taken. Of course, the above-listed average-based method, maximum value or minimum value are all examples of the aggregation processing, and the embodiment is not limited to the aggregation processing method.

In this embodiment, since the above-mentioned image interpolation process may cause saw teeth to appear in the corresponding image in the aggregated segmentation result, the aggregated segmentation result may be used as input data of the trained image segmentation result optimization model to remove saw teeth in the image, so as to obtain a target segmentation result without saw teeth.

As an example, the aggregated segmentation result may be input to a matching network (an example of a trained image segmentation result optimization model) for repair, and edge saw teeth in an image corresponding to the aggregated segmentation result are removed, so as to achieve the effect of high-resolution segmentation. The jaggies are caused by the fact that an image with low image resolution is amplified to an image with high image resolution (namely, an image interpolation process), edge restoration can restore the jaggies of edges, and particularly, the value of a pixel point of an edge area of an aggregation segmentation result can be judged again through a material network to judge whether the pixel point of the edge area (the edge area can be an uncertain area which is uncertain as a foreground or a background) is the foreground or the background, so that the jaggies can be eliminated.

The marking network (the aggregate segmentation result is that the segmentation result is divided into three parts, namely a foreground, a background and an uncertain region) can judge whether the uncertain region belongs to the foreground or the background according to the foreground and background information, and then edge saw teeth in the image corresponding to the aggregate segmentation result are repaired.

In some objects, there may be a hollowed-out area, and when the area occupied by the hollowed-out area in the original image or the silhouette image is small, the hollowed-out area may be mistakenly taken as a part of the object to serve as a foreground in the image segmentation process, and in fact, the hollowed-out area should be taken as a background part. Specifically, please refer to fig. 5, which is a schematic diagram of a first segmentation scene, it can be seen from the original image and the silhouette image in fig. 5 that the "packets" in the original image and the silhouette image in the first row each include a hollowed-out area, and the object segmentation result corresponding to the first row has no hollowed-out area; the third row of original images and the 'package' in the silhouette images comprise hollowed areas; and the third row of corresponding target segmentation results also have no hollowed-out area.

In addition, at some shooting angles, the original image and the silhouette image which may be shot also include a back image region of the object, and the back image region may be mistakenly taken as a part of the object region to serve as a foreground in the image segmentation process, and in fact, the back image region should be taken as a background portion. Specifically, please refer to fig. 6, which is a schematic diagram of a second segmentation scene, it can be seen from the silhouette image in fig. 6 that the silhouette image includes the back image region, and the corresponding object segmentation result includes the back image region as a part of the object. In the schematic diagram in the target segmentation result, the white part represents an object (namely, foreground), and the gray value of the pixel point of the object area is a first gray value (such as 1); the black portion represents the background, and the gray value of the pixel point of the background area is a second gray value (e.g. 0). Of course, it can be understood that the binary image in the target segmentation result in fig. 6 is also obtained based on the predicted value of each pixel in the previously predicted silhouette image, in this process, a pixel whose predicted value is greater than or equal to the preset threshold is taken as a foreground, a pixel whose gray value is set to 1, and a pixel whose predicted value is less than the preset threshold is taken as a background, and its gray value is set to 0, so as to obtain the binary image in the target segmentation result.

Thus, after obtaining the target segmentation result, it further includes: and optimizing the target segmentation result to obtain an optimized target segmentation result. And the hollowed-out area and the reflection area in the foreground of the target segmentation result are removed by optimizing the target segmentation result.

When the hollowed-out area in the foreground of the target segmentation result needs to be removed, the target segmentation result is optimized, and the optimized target segmentation result is obtained, which can be obtained as follows: firstly, determining a first target background area comprising a hollowed-out area in a silhouette image based on gray value distribution of a gray image corresponding to the silhouette image; and then, optimizing the target segmentation result according to the first target background area to obtain an optimized target segmentation result.

In this embodiment, the pixel points of the original image and the silhouette image have values (in the range of 0-255) corresponding to three color channels of RGB (Red, green, blue, i.e., red, green, and blue), and the corresponding gray scale map is equivalent to a single-channel image to be converted from the three color channels of the image, and in the gray scale map, the value range of each pixel point is also 0-255.

More specifically, as the gray value distribution based on the gray map corresponding to the silhouette image, determining the first target background area including the hollowed-out area in the silhouette image may refer to: firstly, determining a first histogram for representing gray value distribution of each pixel point in a gray map according to gray values of each pixel point in the gray map; the first histogram is used for representing the corresponding relation between each gray value corresponding to the pixel point in the gray map and the number of the pixel points of each gray value; and then, according to the first histogram, determining a first target background area comprising a hollowed-out area in the silhouette image.

In this embodiment, further comprising: and according to the first target background area, setting the predicted value of the pixel point corresponding to the first target background area in the silhouette image as a first gray level value, and setting the predicted value of the pixel point corresponding to the first target foreground area as a second gray level value, so as to obtain a first binary image. In the present embodiment, as an example, the first gradation value is 1 and the second gradation value is 0.

After the first binary image is obtained, as a way to optimize the target segmentation result according to the first target background area, to obtain an optimized target segmentation result may refer to: setting the gray value of the pixel point with the gray value of the first gray value in the first binary image corresponding to the gray value in the target segmentation result as the second gray value, and obtaining the optimized target segmentation result, wherein the target segmentation result is a second binary image with the predicted value of the pixel point corresponding to the first initial foreground region set as the first gray value and the predicted value of the pixel point corresponding to the first initial background region set as the second gray value, the first initial background region is a background region which does not comprise a hollowed-out region, the first initial foreground region is a foreground region which comprises the hollowed-out region, and the first target foreground region is a foreground region which does not comprise the hollowed-out region.

In order to facilitate understanding the above process of removing the hollowed-out area in the foreground of the target segmentation result, please refer to fig. 7, which is a schematic diagram of the process of removing the hollowed-out area in the foreground of the target segmentation result. First, a silhouette image is converted into a gray scale image, and then, distribution of gray scale values of respective pixels on the gray scale image is confirmed according to first histogram statistics. In the silhouette image, the brightness of the background area is higher than that of the object area, the gray value of the background area (corresponding pixel point) on the first histogram is larger than that of the object area, and the color of the hollowed-out area is always consistent with that of the background area, so that the range of all the backgrounds (including the hollowed-out area) can be further determined according to the interval of the gray value, and the hollowed-out area is further arranged in the background area. Thirdly, binarizing predicted values of pixel points in the silhouette image according to the first target background area to obtain a first binary image; finally, according to the first binary image, the gray values of part of pixel points in the target segmentation result are re-binarized, and the optimized first target segmentation result is obtained.

In order to facilitate understanding of the first histogram and how to determine the first target background area including the hollowed-out area in the silhouette image based on the first histogram, please refer to fig. 8, which is a schematic diagram of the first histogram. The box outlined on the first histogram corresponds to the gray value range of the first target background area. The abscissa in the first histogram represents the gray value, and the ordinate corresponds to the number of pixels corresponding to a certain gray value. Of course, the number of pixels is counted based on the gray scale map.

Specifically, determining, according to the first histogram, a first target background area including a hollowed-out area in the silhouette image may refer to: firstly, determining a preset gray value interval; then, based on the gray value interval, determining a gray upper limit value and a gray lower limit value corresponding to the first target background area in the first histogram; and finally, determining a first target background area comprising the hollowed-out area in the silhouette image according to the gray upper limit value and the gray lower limit value.

More specifically, as one way of determining the gradation upper limit value and the gradation lower limit value corresponding to the first target background region in the first histogram based on the gradation value interval: firstly, traversing downwards from an upper limit value of a gray value interval in a first histogram based on the sequence of gray values from large to small in the first histogram, and determining the first current traversed gray value as the gray upper limit value of a first target background area if the gray gradient value corresponding to the first current traversed gray value currently traversed in the first histogram is larger than a preset first gradient threshold value; then, traversing downwards from the lower limit value of a gray value interval in the first histogram based on the sequence of gray values from large to small in the first histogram, and determining the second current traversing gray value as the gray lower limit value of the first target background area if the gray gradient value corresponding to the second current traversing gray value traversed in the first histogram is smaller than a preset second gradient threshold value and the number of pixels corresponding to the second current traversing gray value is smaller than a preset pixel number threshold value; the gray gradient value represents the ratio of the change amount of the gray value corresponding to the number of pixel points to the change amount of the gray value.

In the process of determining the first target background area including the hollowed-out area in the silhouette image according to the first histogram, the first target background area including the hollowed-out area in the silhouette image is actually determined by using a mode of combining trough detection, gradient detection and a preset threshold value.

In this embodiment, a preset gray value interval is set correspondingly for different cameras, and when an image shot by a certain camera is segmented, a first target background area including a hollowed-out area is determined by adopting the preset gray value interval corresponding to the camera. For example, the preset gray value interval may be a gray value between 150 and 200.

When the above-mentioned upper and lower gray-level values corresponding to the first target background area are determined, if the preset gray-level value interval is between 150 and 200, referring to fig. 8, based on the manner that the abscissa is from right to left, there are substantially no pixels above 200 gray-level values, the slope of the corresponding curve in fig. 8 (i.e., the gray-level gradient value) is small, and the slope of the corresponding curve becomes large when the gray-level value is near 180, i.e.: the gray scale gradient value is very large, so that the gray scale upper limit value (180) of the first target background area can be determined by the gray scale gradient value, and the gray scale lower limit value (140) of the first target background area can be determined by the same principle, but when the gray scale lower limit value of the first target background area is confirmed, a preset pixel number threshold value can be combined. Namely: when the gray gradient value corresponding to a certain gray value is smaller than the preset second gradient threshold value, and the number of corresponding pixels is smaller than the preset threshold value of the number of pixels, for example, the position near the gray value 140 in fig. 8. Meanwhile, it can be seen from fig. 8 that: the number of pixels corresponding to the 100-150 gray scale values is small, and the 100-150 gray scale values can be considered to be the boundary points of the foreground and the background.

In general, the gray values of the pixels corresponding to the foreground region in the image are smaller and are collected in the range of 0-100, the gray values of the pixels corresponding to the background region are about more than 150, and the first histogram is used for conveniently obtaining the number of the pixels corresponding to each gray value. The valley detection is to detect the demarcation between the background and foreground regions, as can be seen for example by fig. 8: the 100-150 gray scale values correspond to the demarcation points of the foreground and background.

In order to facilitate understanding of the segmentation result of the removed hollowed-out area in the foreground of the target segmentation result, please refer to fig. 9, which is a schematic diagram of the result of removing the hollowed-out area in the foreground of the target segmentation result, wherein the optimized first target segmentation result in fig. 9 is the segmentation result of removing the hollowed-out area in the foreground of the target segmentation result, and compared with the target segmentation result, the foreground area of the optimized first target segmentation result does not contain the hollowed-out area any more, and the obtained optimized first target segmentation result is more in accordance with the actual situation of the object. In fig. 3, 5 and 9 of the drawings, the original image is subjected to a color removal process to satisfy the requirements, and in fact, the original image may be a color image.

When the reflection region in the foreground of the target segmentation result needs to be removed, optimizing the target segmentation result to obtain an optimized target segmentation result, which may be: firstly, determining a target reflection area in a second initial foreground area according to gray value distribution of pixel points of the second initial foreground area in a gray image corresponding to a silhouette image, wherein the second initial foreground area is an area confirmed as a foreground in a target segmentation result; in fact, the second initial foreground region is substantially the same as the first initial foreground region, and may refer to a region identified as foreground in the target segmentation result, so that different naming is adopted to distinguish between a region identified as foreground in the target segmentation result that includes a hollowed-out region in the first case and a region including a reflection region in the second case. And then, optimizing the target segmentation result according to the target reflection region to obtain an optimized target segmentation result.

More specifically, as one embodiment of determining the target reflection area in the second initial foreground area from the gray value distribution of the pixel points of the second initial foreground area in the gray map corresponding to the silhouette image: firstly, determining a gray level map corresponding to a silhouette image; then, according to the gray level map and the target segmentation result, determining a second histogram for representing gray level value distribution of pixel points of a second initial foreground region in the gray level map; the second histogram is used for representing the corresponding relation between each gray value corresponding to the pixel point of the second initial foreground region in the gray map and the number of the pixel points of each gray value; finally, a target reflection area in the second initial foreground area is determined from the second histogram.

In order to facilitate understanding the above process of removing the ghost areas in the foreground of the target segmentation result, please refer to fig. 10, which is a schematic diagram of the process of removing the ghost areas in the foreground of the target segmentation result.

Firstly, converting a silhouette image into a gray level image, and then, obtaining a second histogram of the gray level image in a second initial foreground region, wherein the gray level value of a back image region is larger than that of an object region, and the gray level demarcation point of the object and the back image can be determined through the second histogram. The initial reflection region is determined by extracting the region larger than the gray value dividing point on the target dividing result, in fact, the initial reflection region also comprises the highlight region on the object (the highlight region is actually a part of the object region and is not the reflection region, and the highlight region is possibly misjudged as the reflection region because the gray value of the highlight region is larger than the gray value of the non-highlight region of the object), and the highlight region is required to be restored to the object region in the subsequent process, namely, the highlight region in the initial reflection region is removed, and the target reflection region is determined.

And removing the highlight region in the third binary image corresponding to the initial reflection region, further obtaining a binary image only containing the target reflection region, and finally determining an optimized second target segmentation result based on the binary image only containing the target reflection region and the second binary image corresponding to the target segmentation result.

In order to facilitate understanding of the segmentation result of removing the ghost area in the foreground of the target segmentation result, please refer to fig. 11, which is a schematic diagram of the result of removing the ghost area in the foreground of the target segmentation result, wherein the optimized second target segmentation result in fig. 11 is the segmentation result of removing the ghost area in the foreground of the target segmentation result, and compared with the target segmentation result, the foreground area of the optimized second target segmentation result does not contain the ghost area any more, and the obtained optimized second target segmentation result is more in line with the actual situation of the object.

In this embodiment, determining the target reflection area in the second initial foreground area according to the second histogram may refer to: firstly, determining an initial reflection area in a second initial foreground area according to a second histogram; then, judging whether a non-reflection area exists in the initial reflection area; if so, removing the non-reflection area in the initial reflection area to determine the target reflection area.

In this embodiment, the target segmentation result is a second binary image in which the predicted value of the pixel corresponding to the second initial foreground region is set to the first gray value, and the predicted value of the pixel corresponding to the second initial background region is set to the second gray value.

In this embodiment, further comprising: according to the initial back image area, setting the predicted value of the pixel point corresponding to the initial back image area in the silhouette image as a first gray level value, and setting the predicted value of the pixel point corresponding to the area except the initial back image area as a second gray level value, so as to obtain a third binary image for binarizing the predicted value of the pixel point of the silhouette image; meanwhile, according to the second binary image and the third binary image, a fourth binary image is obtained, wherein the predicted value of the pixel point corresponding to the non-highlight region of the object in the fourth binary image is set to be a first gray value, and the predicted value of the pixel point corresponding to the region except for the non-highlight region of the object is set to be a second gray value. In order to facilitate understanding of the third binary image and the fourth binary image, please refer to fig. 12, which is a schematic diagram of the third binary image and the fourth binary image in the present embodiment. In this embodiment, the initial reflection area may include a target reflection area and a highlight area of the object. According to the second binary image and the third binary image, obtaining a fourth binary image for binarizing the predicted value of the pixel point of the silhouette image may refer to: and subtracting the second binary image from the third binary image to obtain a fourth binary image for binarizing the predicted value of the pixel point of the silhouette image.

After the third binary image and the fourth binary image are obtained, as determining whether or not a non-back image area exists in the initial back image area, it may be referred to as: and judging whether a non-reflection area exists in the initial reflection area according to the third binary image, the fourth binary image and a preset non-reflection area detection strategy.

In order to make the judging result of judging whether the non-reflection area exists in the initial reflection area more accurate, removing the outline remained at the edge in the third binary image and the appointed communication area, and obtaining a fifth binary image. In order to facilitate understanding of the fifth binary image, please refer to fig. 13, which is a schematic diagram of the fifth binary image of the present embodiment.

The reason why the contour remaining at the edge in the third binary image and the specified connected region are removed is that in this embodiment, the bounding box is set based on the connected region, and if the thin line connected at the edge is not removed, it is possible to connect each small region when the small regions are calculated by the connected region, and thus the range of the bounding box selection is inaccurate.

After the fifth binary image is obtained, determining whether a non-reflection area exists in the initial reflection area according to the third binary image, the fourth binary image and a preset non-reflection area detection strategy may refer to: judging whether a non-reflection area exists in the initial reflection area or not based on a first bounding box in the fifth binary image, a second bounding box in the fourth binary image and a preset non-reflection area detection strategy, wherein the first bounding box is used for identifying an area formed by pixels with gray values of a first gray value in the fifth binary image, and the second bounding box is used for identifying an area formed by pixels with gray values of the first gray value in the fourth binary image. In order to facilitate understanding of the first bounding box in the fifth binary image and the second bounding box in the fourth binary image, please refer to fig. 14, which is a schematic diagram of the first bounding box in the fifth binary image and the second bounding box in the fourth binary image of the present embodiment.

The preset non-reflection region detection strategy can adopt the following modes: judging whether the center of the ghost area (actually, the ghost area is a theoretical ghost area, and no highlight area is included) is below the center of the object area based on the second surrounding frame and the first surrounding frame; whether the specific gravity of the overlapping area between the second surrounding frame and the first surrounding frame is not more than a first proportion, and whether the specific gravity of the overlapping area occupying the reflection area is not more than a second proportion; whether the lower boundary of the reflection area is lower than the lower boundary of the object area. As an example, in the present embodiment, the center of the reflection area needs to be below the center of the object area; the specific gravity of the overlapping area between the second surrounding frame and the first surrounding frame is not more than 60%, and the overlapping area accounts for not more than 90% of the specific gravity of the reflection area; meanwhile, the lower boundary of the reflection area is lower than the lower boundary of the object area.

In this way, it is possible to determine a binary image including only the target reflection area in fig. 10. The binary image including only the target back image area actually refers to a binary image in which the predicted value of the pixel point of the target back image area in the silhouette image is set to the first pixel value and the predicted value of the pixel point of the non-target back image area is set to the second pixel value.

In fact, in this embodiment, in the process of obtaining the optimized first target segmentation result or the optimized second target segmentation result, after the hollowed-out area or the ghost area is removed, the intermediate segmentation result with the hollowed-out area or the ghost area removed may also be input into the trained image segmentation result optimization model, so as to finally obtain the optimized first target segmentation result or the optimized second target segmentation result.

The application provides an image processing method, in the image processing method, after an original image and a silhouette image are obtained, image segmentation is carried out under a first image resolution, then image interpolation processing is directly carried out on the first segmentation result and the second segmentation result after the first segmentation result and the second segmentation result are obtained, a fourth segmentation result corresponding to a third segmentation result corresponding to the first segmentation result under the original image resolution and a fourth segmentation result corresponding to the second segmentation result under the original image resolution is obtained, the original image resolution is larger than the first image resolution, the image segmentation is carried out under the condition of low image resolution, then the image segmentation result under the low image resolution is converted into an image segmentation result under the high image resolution in an image interpolation processing mode, then a target segmentation result is obtained on the basis of the image segmentation result under the high image resolution, the image segmentation effect is further improved, and the image segmentation at the pixel level is realized through the high image resolution; meanwhile, the image segmentation mode adopted by the application has no category restriction on objects in the image, thereby improving the applicability of image segmentation.

Meanwhile, the 720-degree shooting equipment is used for realizing the high-precision and high-resolution segmentation effect, the high-precision segmentation of any object is realized in a mode of predicting the segmentation result by the two-stage deep learning model, and the robustness to the background placed by the object, the lamplight irradiating the object and the shooting angle is high. The image processing method of the embodiment achieves the pixel-level segmentation effect, the integral IOU is above 0.99352, and the semantic segmentation IOU (Intersection over Union, namely, intersection point on the union set, intersection ratio) is an index for evaluating the performance of the semantic segmentation model and is used for representing the proportion of the overlapping part of the segmentation result of the model prediction and the theoretical segmentation result to the whole part. Specifically, the IOU value may be obtained by dividing the intersection area of the predicted segmentation result and the theoretical segmentation result by the union area thereof. The higher the IOU value is, the higher the coincidence degree of the model prediction segmentation result and the theoretical segmentation result is, and the better the model performance is. Typically IOU values range from 0 to 1.

Second embodiment

The second embodiment of the present application also provides an image processing apparatus corresponding to the image processing method provided in the first embodiment of the present application. Since the device embodiment is substantially similar to the first embodiment, the description is relatively simple, and reference is made to the partial description of the first embodiment for relevant points. The device embodiments described below are merely illustrative.

Fig. 15 is a schematic diagram of an image processing apparatus according to a second embodiment of the present application.

The image processing apparatus 1500 includes:

a first obtaining unit 1501 for obtaining an original image representing an object, a silhouette image representing an object; the silhouette image is an image corresponding to the original image;

a second obtaining unit 1502 configured to obtain a first image of the original image at a first image resolution and a second image of the silhouette image at the first image resolution;

a dividing unit 1503, configured to divide the first image and the second image to obtain a first division result corresponding to the first image and a second division result corresponding to the second image; the first segmentation result is used for indicating that each pixel point in the first image belongs to a foreground or a background, and the second segmentation result is used for indicating that each pixel point in the second image belongs to the foreground or the background;

an interpolation processing unit 1504, configured to perform image interpolation processing on the first segmentation result and the second segmentation result based on original image resolutions of the original image and the silhouette image, to obtain a fourth segmentation result corresponding to a third segmentation result corresponding to the first segmentation result and the second segmentation result; the original image resolution is greater than the first image resolution;

A target segmentation result obtaining unit 1505, configured to obtain a target segmentation result according to the third segmentation result and the fourth segmentation result.

Optionally, the dividing unit is specifically configured to:

Optionally, the target segmentation result obtaining unit is specifically configured to:

Optionally, the method further comprises: an optimizing unit; the optimizing unit is specifically configured to: and optimizing the target segmentation result to obtain an optimized target segmentation result.

Optionally, the optimizing unit is specifically configured to:

Optionally, the method further comprises: a first binary image obtaining unit; the first binary image obtaining unit is specifically configured to:

the optimizing unit is specifically configured to:

Optionally, the optimizing unit is specifically configured to:

determining a preset gray value interval;

Optionally, the optimizing unit is specifically configured to:

determining a gray level map corresponding to the silhouette image;

Optionally, the optimizing unit is specifically configured to:

judging whether a non-reflection area exists in the initial reflection area;

the apparatus further comprises: an auxiliary binary image obtaining unit; the auxiliary binary image obtaining unit is specifically configured to:

according to the initial back image area, setting a predicted value of a pixel point corresponding to the initial back image area in the silhouette image as a first gray level value, and setting a predicted value of a pixel point corresponding to an area except the initial back image area as a second gray level value, so as to obtain a third binary image for binarizing the predicted value of the pixel point of the silhouette image;

The optimizing unit is specifically configured to:

Optionally, the method further comprises: the removing unit is specifically used for: removing the residual outline of the edge and the appointed communication region in the third binary image to obtain a fifth binary image;

the optimizing unit is specifically configured to:

Third embodiment

The third embodiment of the present application also provides an electronic device corresponding to the method of the first embodiment of the present application.

As shown in fig. 16, fig. 16 is a schematic diagram of an electronic device according to a third embodiment of the present application.

In this embodiment, an optional hardware structure of the electronic device 1600 may be as shown in fig. 16, including: at least one processor 1601, at least one memory 1602 and at least one communication bus 1605; the memory 1602 includes a program 1603 and data 1604.

Bus 1605 may be a communication device that transfers data between components within electronic device 1600, such as an internal bus (e.g., CPU-memory bus, processor central processing unit, CPU for short), an external bus (e.g., universal serial bus port, peripheral component interconnect express port), etc.

In addition, the electronic device further includes: at least one network interface 1606, at least one peripheral interface 1607. A network interface 1606 to provide wired or wireless communications with respect to an external network 1608 (e.g., the internet, an intranet, a local area network, a mobile communications network, etc.); in some embodiments, network interface 1606 may include any number of network interface controllers (English: network interface controller, NIC for short), radio Frequency (RF) modules, transponders, transceivers, modems, routers, gateways, any combination of wired network adapters, wireless network adapters, bluetooth adapters, infrared adapters, near field communication (English: near Field Communication, NFC) adapters, cellular network chips, and the like.

Peripheral interface 1607 is used to connect with peripherals, such as peripheral 1 (1609 in fig. 16), peripheral 2 (1610 in fig. 16), and peripheral 3 (1611 in fig. 16) in the figures. Peripherals, i.e., peripheral devices, which may include, but are not limited to, cursor control devices (e.g., mice, touchpads, or touchscreens), keyboards, displays (e.g., cathode ray tube displays, liquid crystal displays). A display or light emitting diode display, a video input device (e.g., a video camera or an input interface communicatively coupled to a video archive), etc.

The processor 1601 may be a CPU or specific integrated circuit ASIC (Application Specific Integrated Circuit) or one or more integrated circuits configured to implement embodiments of the present application.

The memory 1602 may comprise high-speed RAM (r) (collectively: random Access Memory, random access memory) memory or may also comprise non-volatile memory (non-volatile memory), such as at least one disk memory.

The processor 1601 invokes programs and data stored in the memory 1602 to perform a method according to the first embodiment of the present application.

Fourth embodiment

The fourth embodiment of the present application also provides a computer storage medium storing a computer program that is executed by a processor to perform the method of the first embodiment of the present application, corresponding to the method of the first embodiment of the present application.

While the application has been described in terms of preferred embodiments, it is not intended to be limiting, but rather, it will be apparent to those skilled in the art that various changes and modifications can be made herein without departing from the spirit and scope of the application as defined by the appended claims.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory. The Memory may include volatile Memory, random Access Memory (RAM), and/or nonvolatile Memory in a computer-readable medium, such as Read-Only Memory (ROM) or flash RAM. Memory is an example of computer-readable media.

1. Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change Memory (English: phase change Memory; PRAM for short), static random access Memory (English: static Random Access Memory; SRAM for short), dynamic random access Memory (English: dynamic Random Access Memory; DRAM for short), other types of Random Access Memory (RAM), read-Only Memory (ROM), electrically erasable programmable read-Only Memory (EEPROM for short), flash Memory or other Memory technology, read-Only optical disk read-Only Memory (English: compact Disc Read-Only Memory; CD-ROM for short), digital versatile disks (English: digital versatile disc; DVD for short) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer readable media, as defined herein, does not include non-transitory computer readable storage media (non-transitory computer readable storage media), such as modulated data signals and carrier waves.

2. It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region, and provide corresponding operation entries for the user to select authorization or rejection.

Claims

1. An image processing method, comprising:

2. The method according to claim 1, wherein the performing image segmentation on the first image and the second image to obtain a first segmentation result corresponding to the first image and a second segmentation result corresponding to the second image includes:

3. The method of claim 1, wherein the obtaining a target segmentation result from the third segmentation result and the fourth segmentation result comprises:

4. The method as recited in claim 1, further comprising: and optimizing the target segmentation result to obtain an optimized target segmentation result.

5. The method of claim 4, wherein optimizing the target segmentation result to obtain an optimized target segmentation result comprises:

6. The method of claim 5, wherein determining a first target background area in the silhouette image including a hollowed-out area based on a gray value distribution of a gray map corresponding to the silhouette image comprises:

7. The method as recited in claim 6, further comprising:

8. The method of claim 6, wherein determining a first target background area in the silhouette image that includes a hollowed-out area according to the first histogram comprises:

determining a preset gray value interval;

9. The method of claim 8, wherein determining a gray upper limit value and a gray lower limit value for a first target background region in the first histogram based on the gray value interval comprises:

10. The method of claim 4, wherein optimizing the target segmentation result to obtain an optimized target segmentation result comprises:

11. The method according to claim 10, wherein the determining the target reflection area in the second initial foreground area according to the gray value distribution of the pixels of the second initial foreground area in the gray map corresponding to the silhouette image includes:

determining a gray level map corresponding to the silhouette image;

12. The method of claim 11, wherein the determining a target reflection area in the second initial foreground area from the second histogram comprises:

judging whether a non-reflection area exists in the initial reflection area;

13. The method of claim 12, wherein the target segmentation result is a second binary image in which a predicted value of a pixel corresponding to a second initial foreground region is set to a first gray level and a predicted value of a pixel corresponding to a second initial background region is set to a second gray level;

14. The method as recited in claim 13, further comprising: removing the residual outline of the edge and the appointed communication region in the third binary image to obtain a fifth binary image;

15. An image processing apparatus, comprising:

16. An electronic device, comprising:

a processor;

a memory for storing a computer program to be run by a processor for performing the method of any one of claims 1-14.

17. A computer storage medium, characterized in that the computer storage medium stores a computer program, which is executed by a processor, for performing the method of any of claims 1-14.