WO2022133627A1 - Image segmentation method and apparatus, and device and storage medium - Google Patents

Image segmentation method and apparatus, and device and storage medium Download PDF

Info

Publication number
WO2022133627A1
WO2022133627A1 PCT/CN2020/137858 CN2020137858W WO2022133627A1 WO 2022133627 A1 WO2022133627 A1 WO 2022133627A1 CN 2020137858 W CN2020137858 W CN 2020137858W WO 2022133627 A1 WO2022133627 A1 WO 2022133627A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
module
segmentation
label
features
Prior art date
Application number
PCT/CN2020/137858
Other languages
French (fr)
Chinese (zh)
Inventor
曹桂平
Original Assignee
广州视源电子科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州视源电子科技股份有限公司 filed Critical 广州视源电子科技股份有限公司
Priority to CN202080099096.5A priority Critical patent/CN115349139A/en
Priority to PCT/CN2020/137858 priority patent/WO2022133627A1/en
Publication of WO2022133627A1 publication Critical patent/WO2022133627A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation

Definitions

  • the embodiments of the present application relate to the technical field of image processing, and in particular, to an image segmentation method, apparatus, device, and storage medium.
  • Image segmentation is one of the common techniques in image processing, which is used to accurately extract the region of interest in the image to be processed, and use the region of interest as the target region image to facilitate subsequent processing of the target region image (such as background replacement). , deducting the image of the target area, etc.).
  • Portrait-based image segmentation is an important application in the field of image segmentation.
  • Portrait-based image segmentation refers to the accurate separation of the portrait area and the background area in the image to be processed.
  • it is of great significance to perform portrait-based image segmentation for online video data. In scenarios such as online conferences or online live broadcasts, image segmentation is performed on the online video data to accurately separate the portrait area and the background area in the video data, and then the background image is replaced in the background area to protect user privacy. the goal of.
  • image segmentation mainly includes methods based on threshold, region-based, edge-based, graph theory and energy functional.
  • the threshold-based method needs to be segmented according to the grayscale features in the image, and its drawback is that it is only suitable for images in which the grayscale values of the portrait area are evenly distributed outside the grayscale values of the background area.
  • the region-based method divides the image into different regions according to the similarity criterion of the spatial neighborhood, and its disadvantage is that it cannot handle complex images.
  • the edge-based method mainly uses the discontinuity of local image features (such as the pixel mutation of the face edge) to obtain the boundary of the portrait region, and its disadvantage is that the computational complexity is high.
  • the methods based on graph theory and energy functional mainly use the energy functional of the image to perform portrait segmentation, but the disadvantage is that the amount of calculation is huge and artificial prior information is required. Due to the defects of the above technology, it cannot be applied to the scene of real-time, simple and accurate image segmentation for online video data.
  • Embodiments of the present application provide an image segmentation method, apparatus, device, and storage medium, so as to solve the technical problem that the above technology cannot accurately perform image segmentation on online video data.
  • an embodiment of the present application provides an image segmentation method, including:
  • the current frame image is input into the trained image segmentation model to obtain the first segmented image based on the target object;
  • an embodiment of the present application further provides an image segmentation device, including:
  • a data acquisition module for acquiring the current frame image in the video data, where the target object is displayed
  • a first segmentation module for inputting the current frame image into a trained image segmentation model to obtain a first segmented image based on the target object
  • a second segmentation module configured to perform smoothing processing on the first segmented image to obtain a second segmented image based on the target object
  • the repeated segmentation module is used for taking the next frame image in the video data as the current frame image, and returning to perform the operation of inputting the current frame image into the trained image segmentation model, until each frame image in the video data until the corresponding second segmented image is obtained.
  • the embodiments of the present application further provide an image segmentation device, including:
  • processors one or more processors
  • memory for storing one or more programs
  • the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement the image segmentation method as described in the first aspect.
  • an embodiment of the present application further provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the image segmentation method described in the first aspect.
  • the above-mentioned image segmentation method, device, equipment and storage medium by acquiring the video data including the target object, input each frame image of the video data into the image segmentation model to obtain the corresponding first segmented image, and then, for the first segmented image
  • the technical means of obtaining the second segmented image by performing smoothing processing solves the technical problem that some image segmentation technologies cannot accurately segment the online video data.
  • the online video data can be segmented in real time and accurately, and due to the self-learning of the image segmentation model, it can be applied to the online video data with complex images, and in the application process, It can be directly applied only by deploying the image segmentation model without artificial prior information, which simplifies the complexity of image segmentation and expands the application scenarios of image segmentation methods.
  • FIG. 1 is a flowchart of an image segmentation method provided by an embodiment of the present application.
  • FIG. 2 is a flowchart of another image segmentation method provided by an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of an image segmentation model provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of an original image provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a segmentation result image provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of an edge result image provided by an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of another image segmentation model provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of an image segmentation apparatus provided by an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of an image segmentation device provided by an embodiment of the present application.
  • first and second are only used to distinguish one entity or operation or object from another entity or operation or object, and do not necessarily require or imply these entities Or there is any such actual relationship or order between operations or objects.
  • first and second of the first segmented image and the second segmented image are used to distinguish two different segmented images.
  • the image segmentation method provided in this embodiment of the present application may be performed by an image segmentation device, which may be implemented in software and/or hardware, and the image segmentation device may be composed of two or more physical entities, or may be one Physical entity composition.
  • the image segmentation device may be a computer, a mobile phone, a tablet, or an interactive smart tablet, and other smart devices with data computing and analysis capabilities.
  • FIG. 1 is a flowchart of an image segmentation method provided by an embodiment of the present application.
  • the image segmentation method specifically includes:
  • Step 110 Acquire the current frame image in the video data, and the target object is displayed in the video data.
  • the video data is the video data for which image segmentation is currently required, which may be online video data or offline video data.
  • the video data includes multiple frames of images, and each frame of images displays a target object, which can be considered as an object that needs to be separated from the background image.
  • the background images of each frame image in the video data may be the same or different, which is not limited in the embodiment, and the target object may change with the playback of the video data, but during the change process, the type of the target object is not limited. Change.
  • the target object is a human being
  • the human image in the video data may change (such as replacing a person or adding a new person, etc.), but the target object in the video data is always a human being.
  • the target object is a human being as an example.
  • the source of the video data is not limited in this embodiment.
  • the video data is a piece of video shot by an image capture device (such as a camera, a camera, etc.) connected to the image segmentation device.
  • the video data is a conference screen obtained from a network in a video conference scenario.
  • the video data is a live broadcast image obtained from a network in a live broadcast scenario.
  • performing image segmentation on the video data refers to separating the region where the target object is located in each frame of image in the video data.
  • the target object is exemplarily described as a human being.
  • the processing of the video data is in units of frames, that is, the images in the video data are acquired frame by frame, and the images are processed to obtain a final image segmentation result.
  • the currently processed image is recorded as the current frame image, and the processing of the current frame image is taken as an example for description.
  • Step 120 Input the current frame image into the trained image segmentation model to obtain the first segmented image based on the target object.
  • the image segmentation model is a pre-trained neural network model, which is used to segment the target object in the current frame image and output the segmentation result corresponding to the current frame image.
  • the segmentation result is recorded as the first segmented image
  • the portrait area and the background area of the current frame image can be determined through the first segmented image, wherein the portrait area can be considered as the area where the target object (human) is located.
  • the first segmented image is a binary image, and its pixel values include two types: 0 and 1, wherein, the area with a pixel value of 0 belongs to the background area of the current frame image, and the area with a pixel value of 1 belongs to the current frame image. portrait area.
  • the pixel values are converted into two types: 0 and 255 before displaying the first segmented image, wherein the area with a pixel value of 0 belongs to the background area, and the area with a pixel value of 255 belongs to the background area. portrait area.
  • the resolution of the first divided image is the same as the resolution of the current frame image. It can be understood that when the image segmentation model has resolution requirements for the input image, that is, when a fixed-resolution image needs to be input, it is necessary to determine whether the resolution of the current frame image meets the resolution requirement. If it does not meet the resolution requirement, then Perform resolution conversion on the current frame image to obtain the current frame image that meets the resolution requirements.
  • the resolution conversion is also performed on the first segmented image, so that the resolution of the first segmented image is the same as the resolution of the original current frame image (that is, the current frame image before the resolution is converted). same.
  • the image segmentation model does not have a resolution requirement for the input image, the current frame image can be directly input to the image segmentation model to obtain the first segmented image with the same resolution.
  • the structure and parameters of the image segmentation model can be set according to actual conditions.
  • the image segmentation model adopts an autoencoder structure.
  • autoencoder is a kind of artificial neural network used in semi-supervised learning and unsupervised learning.
  • the autoencoder includes two parts: an encoder (encoder) and a decoder (decoder), wherein the encoder is used to extract the features in the image, and the decoder is used to decode the extracted features to obtain the learning result (for example, the first split image).
  • the encoder adopts a lightweight network to reduce the amount of data processing and calculation when extracting features, and to speed up the processing speed.
  • the decoder can be implemented by residual blocks combined with channel obfuscation, upsampling, etc., to achieve fully automatic real-time image segmentation.
  • the features of the current frame image at different resolutions can be extracted by the encoder, and then the decoder performs operations such as upsampling, fusion, decoding, etc. on each feature to reuse each feature, thereby obtaining an accurate first segmentation. image.
  • the image segmentation model is deployed under the forward inference framework.
  • the specific type of the forward reasoning framework can be set according to the actual situation, for example, the forward reasoning framework is the openvino framework.
  • the image segmentation model when deployed in the forward inference framework, the image segmentation model has a low dependence on the GPU, is relatively portable, and does not occupy a large storage space.
  • Step 130 Smooth the first segmented image to obtain a second segmented image based on the target object.
  • the edge jaggedness can be understood as jagged edges between the portrait area and the background area, which makes the separation of the portrait area and the background area too stiff.
  • the first segmented image is smoothed, that is, the edge jaggedness in the first segmented image is smoothed, so as to obtain a segmented image with smoother edges.
  • the segmented image is denoted as the second segmented image.
  • the second segmented image can also be considered as the final segmented result of the current frame image. It can be understood that the second segmented image is also a binary image, and its pixel values include two types: 0 and 1.
  • the area with a pixel value of 0 belongs to the background area of the current frame image, and the area with a pixel value of 1 belongs to the background area of the current frame image.
  • Portrait area The area with a pixel value of 0 belongs to the background area of the current frame image, and the area with a pixel value of 1 belongs to the background area of the current frame image. portrait
  • the smoothing processing is implemented by means of Gaussian smoothing filtering.
  • the Gaussian kernel function is used in the Gaussian smoothing filtering to process the first segmented image to obtain the second segmented image.
  • the Gaussian kernel function is a commonly used kernel function.
  • Step 140 take the next frame image in the video data as the current frame image, and return to perform the operation of inputting the current frame image to the trained image segmentation model, until each frame image in the video data obtains the corresponding second segmentation image.
  • the processing procedure is to take the next frame of image as the current frame image, and repeat steps 110 to 130 to obtain the second segmented image of the current frame image again, and then obtain the next frame of image again, and repeat the above process until Image segmentation is achieved until each frame of image in the video data obtains a corresponding second segmented image.
  • the current frame image can be processed according to actual needs.
  • the method further includes: acquiring a target background image, where the target background image includes the target background; The background of the current frame image is replaced according to the target background image and the second segmented image, so as to obtain a new image of the current frame.
  • the target background refers to a new background used after the background is replaced.
  • the target background image is the image that contains the target background.
  • the target background image and the second segmented image have the same resolution.
  • the target background image may be an image selected by the user of the image segmentation device, or may be a default image of the image segmentation device.
  • the background of the current frame image is replaced to obtain a replaced image.
  • the image after the background replacement is recorded as the new image of the current frame.
  • the background replacement method is: determining the pixels where the portrait area is located and the pixel points where the background area is located in the current frame image by using the second segmented image, after that, retaining the portrait area and replacing the corresponding background area with the one in the target background image. Correlate the target background to get a new image for the current frame.
  • the portrait area can be retained by I ⁇ S 2 (that is, after the current frame image is multiplied by the second segmented image, the pixels in the current frame image correspond to the pixels with the pixel value of 1 in the second segmented image, which are Retained), the background area can be replaced by (1-S 2 ) ⁇ B (that is, after the target background image is multiplied by the second segmented image, the pixels in the target background image correspond to the pixels whose pixel value is 0 in the second segmented image point, which is preserved).
  • a new image of the current frame corresponding to the current frame of image can be obtained through the second segmented image. After that, each new image of the current frame can form new video data after background replacement.
  • the video data is limited to include the target object.
  • the video data may not include the target object. 0 for the segmented image.
  • each frame of the video data is input into the image segmentation model to obtain the corresponding first segmented image, and then the first segmented image is smoothed to obtain the second segmented image
  • the video data can be accurately segmented, especially the online video data, which ensures the processing speed of the online video data.
  • the self-learning of the image segmentation model it can be applied to video data with complex images, and in the application process, it can be directly applied only by deploying the image segmentation model without artificial prior information, which simplifies the complexity of image segmentation.
  • the application scenarios of image segmentation methods are expanded.
  • FIG. 2 is a flowchart of another image segmentation method provided by an embodiment of the present application. This image segmentation method is based on the above-mentioned image segmentation method, and exemplifies the training process of the image segmentation model. Referring to Figure 2, the image segmentation method specifically includes:
  • Step 210 Acquire a training data set, where the training data set includes multiple original images.
  • the training data refers to the data that the image segmentation model learns when training the image segmentation model.
  • the training data is in the form of images, so the training data is referred to as the original image, and the original image and the video data contain the same type of target objects.
  • a training dataset refers to a dataset containing a large number of original images. That is, during the training process, a large number of original images are selected from the training data set for the image segmentation model to learn, so as to improve the accuracy of the image segmentation model.
  • the video data contains a large number of images. If the original images are collected according to the video data, the images need to be collected frame by frame based on the video data, which will consume a lot of work and production costs, and each collected original image will contain a large number of repetitions. content, which is not conducive to the training of image segmentation models. Therefore, in the embodiment, the training data set is constructed by replacing the video data with independent original images. At this point, the constructed training dataset can contain original images with different portrait poses in different scenes. Wherein, the scene is preferably a natural scene.
  • a plurality of natural scenes are preselected, and in each natural scene, a plurality of images containing human beings are captured by an image acquisition device as original images, wherein the postures of the human beings in the plurality of images are different.
  • the parameters of the image acquisition device such as the position of the image acquisition device, the aperture size, the degree of focus, etc.
  • the lighting in the natural environment on the performance of the image segmentation model when constructing the training data set, the same natural In the same portrait pose of the scene, multiple original images under different lighting and different shooting parameters are collected to ensure the performance of the image segmentation model when processing video data in different scenes, different portrait poses, different lighting and different shooting parameters.
  • an existing public image data set can also be used as the training data set, for example, the public data set Supervisely can be used as the training data set, or the public data set EG1800 can be used as the training data set.
  • Step 220 constructing a label data set according to the training data set, the label data set includes a plurality of segmentation label images and a plurality of edge label images, and an original image corresponds to a segmentation label image and an edge label image.
  • the label data can be understood as reference data for determining whether the image segmentation model is accurate, which plays a role of supervision. If the output result of the image segmentation model is more similar to the corresponding label data, it means that the accuracy of the image segmentation model is higher, that is, the performance is better; otherwise, the accuracy of the image segmentation model is lower. It can be understood that the process of training the image segmentation model is the process of making the output result of the image segmentation model more similar to the corresponding label data.
  • the image segmentation model outputs a segmented image and an edge image corresponding to the original image, wherein the segmented image refers to a binary image obtained by performing image segmentation on the target object in the original image.
  • the segmented image output by the image segmentation model in the training process is recorded as the segmentation result image.
  • the edge image refers to a binary image representing the edge between the portrait area and the background area in the original image.
  • the edge image output by the image segmentation model in the training process is recorded as the edge result image.
  • the label data is set according to the output result of the image segmentation module, including the segmentation label image and the edge label image, wherein the segmentation label image corresponds to the segmentation result image, and is used to play a role in the segmentation result image.
  • the edge label image corresponds to the edge result image, and is used for reference to the edge result image.
  • both the edge label image and the segmented label image can be obtained from the above-mentioned original image.
  • the portrait area, background area and edge area are marked in each original image, and then the edge label image and segmentation label image are obtained according to the portrait area, background area and edge area.
  • a human-marking method is used to mark a portrait region and a background region in each original image, and then a segmented label image is obtained according to the portrait region and the background region, and an edge label image is obtained according to the segmented label image.
  • step 220 includes steps 221-225:
  • Step 221 Acquire an annotation result for the original image.
  • the labeling result refers to the result obtained after labeling the portrait area and background area in the original image.
  • the labeling result is obtained by manual labeling, that is, the portrait area and the background area are manually marked in the above-mentioned original image, and then the image segmentation device obtains the labeling result according to the marked portrait area and background area.
  • Step 222 Obtain a corresponding segmented label image according to the labeling result.
  • the pixel value of each pixel included in the portrait region in the original image is changed to 255, and the pixel value of each pixel included in the background region in the original image is changed to 0, thereby obtaining the segmented label image.
  • the segmented label image is a binary image.
  • Step 223 Perform an erosion operation on the segmented label image to obtain an erosion image.
  • the erosion operation can be understood as reducing and refining the white area (ie, the portrait area) with a pixel value of 255 in the segmented label image.
  • the image obtained by performing the erosion operation on the segmented label image is recorded as the erosion image. It can be understood that the number of pixels occupied by the white area in the eroded image is smaller than the number of pixels occupied by the white area in the segmented label image, and the white area in the segmented label image can completely cover the white area in the eroded image.
  • Step 224 perform a Boolean operation on the segmented label image and the eroded image to obtain an edge label image corresponding to the original image.
  • Boolean operations include union, intersection, and subtraction.
  • a plurality of objects that perform Boolean operations are operation objects.
  • the operation objects include the segmented label image and the eroded image, more specifically, the segmented label image and the white area in the eroded image.
  • the result obtained by the Boolean operation may be recorded as a Boolean object.
  • the Boolean object is an edge label image.
  • union means that the resulting Boolean object contains the volume of the two operands.
  • the Boolean object obtained by combining the segmented label image and the eroded image is the white area in the segmented label image.
  • Intersection means that the resulting Boolean object only contains the common volume of the two operation objects (that is, only contains overlapping positions). Since the white area in the segmented label image can completely cover the white area in the eroded image, therefore, for the segmented label image and The resulting Boolean object after intersecting the eroded image is the white area in the eroded image.
  • Subtraction means that the Boolean object contains the volume of the operation object from which the intersection volume is subtracted.
  • the Boolean object obtained after subtracting the segmented label image and the eroded image is to remove the white color corresponding to the eroded image in the white area of the segmented label image.
  • the resulting white area after the area It can be understood that since the eroded image is an image obtained by reducing the white area in the segmented label image, the edges of the eroded image and the white area in the segmented label image are highly similar. After subtraction, you can get the white area that only represents the edge, that is, the edge label image.
  • Step 225 Obtain a label data set according to the segmented label image and the edge label image.
  • a label data set is formed by each segmented label icon and each edge label image. It can be understood that the segmentation label image and the edge label image can be considered as Ground Truth, that is, the correct label.
  • Step 230 Train the image segmentation model according to the training data set and the label data set.
  • an original image is input to the image segmentation model, and a loss function is constructed according to the output result of the image segmentation model and the corresponding label data in the label dataset, and then the model parameters of the image segmentation model are updated according to the loss function.
  • another original image is input into the updated image segmentation model to construct the loss function again, and the model parameters of the image segmentation model are updated again according to the loss function, and the above training process is repeated until the loss function converges.
  • the value of the loss function obtained by successive calculations is within the set range, it can be considered that the loss function has converged, and then the accuracy of the output result of the image segmentation model can be determined to be stable. Therefore, it can be considered that the image segmentation model has been trained.
  • the specific structure of the image segmentation model can be set according to the actual situation.
  • the image segmentation model includes: a normalization module, an encoding module, a channel confusion module, a residual module, a multiple upsampling module, an output module and an edge module as an example for description.
  • the image segmentation model is exemplarily described with the structure shown in FIG. 3 .
  • 3 is a schematic structural diagram of an image segmentation model provided by an embodiment of the present application. 3 , the image segmentation model includes a normalization module 21 , an encoding module 22 , four channel obfuscation modules 23 , three residual modules 24 , four multiple upsampling modules 25 , an output module 26 and an edge module 27 .
  • step 230 includes steps 231-2310:
  • Step 231 Input the original image to the normalization module to obtain a normalized image.
  • FIG. 4 is a schematic diagram of an original image provided by an embodiment of the present application.
  • the original image contains a portrait area, and it should be noted that the original image used in Figure 4 comes from the public dataset Supervisely.
  • normalization refers to the process of performing a series of standard processing and transformation on the image to transform the image into a fixed standard form.
  • the obtained standard image is called a normalized image.
  • Normalization is divided into linear normalization and nonlinear normalization.
  • the original image is processed by means of linear normalization.
  • linear normalization is to normalize the pixel values in each image from [0, 255] to [-1, 1], and the resolution of the obtained normalized image is equal to the resolution of the image before linear normalization .
  • the normalization module is a module that implements a linear normalization operation. After the original image is input to the normalization module, the normalization module outputs a normalized image with a pixel value of [-1, 1].
  • Step 232 using the coding module to obtain the multi-layer image features of the normalized image, and the resolutions of the image features of each layer are different.
  • the encoding module is used to extract features in the normalized image.
  • the extracted features are recorded as image features.
  • the image features may reflect information such as color features, texture features, shape features, and spatial relationship features in the normalized image, including global information and/or local information.
  • the encoding module is a lightweight network, where the lightweight network refers to a neural network with a small amount of parameters, a small amount of computation, and a short inference time.
  • the type of the lightweight network used by the encoding module can be selected according to the actual situation.
  • the encoding module 12 is a MobileNetV2 network as an example for description.
  • the normalized image can output multi-layer image features after passing through MobileNetV2, wherein the resolutions of the image features of each layer are different and there is a multiple relationship, and optionally, the resolution of the image features of each layer is smaller than the original image. resolution.
  • the image features of each layer are arranged from top to bottom in an order from high to low resolution, that is, the image features with the highest resolution are located in the highest layer, and the image features with the lowest resolution are located in the lowest layer. It can be understood that the number of layers of the image features output by the encoding module can be set according to the actual situation. For example, when the resolution of the original image is 224 ⁇ 224, the encoding module outputs four layers of image features. At this time, referring to FIG.
  • the resolution of the highest layer (first layer) image feature is 112 ⁇ 112 (the image feature of this layer is denoted as Feature112 ⁇ 112 in FIG. 3 )
  • the second The resolution of the high-level (second layer) image features is 56 ⁇ 56 (the image features of this layer are marked as Feature56 ⁇ 56 in Figure 3)
  • the resolution of the next-lowest layer (third layer) image features is 28 ⁇ 28 ( Figure 3).
  • the image feature of this layer is marked as Feature28 ⁇ 28)
  • the resolution of the image feature of the lowest layer (fourth layer) is 14 ⁇ 14 (the image feature of this layer is marked as Feature14 ⁇ 14 in FIG. 3 ).
  • the image features of each layer contain more and more information from the bottom to the top.
  • the encoding module can be understood as an encoder in the image segmentation model.
  • Step 233 Input the image features of each layer into the corresponding channel confusion module respectively to obtain multi-layer confusion features, and each layer of image features corresponds to a channel confusion module.
  • the channel confusion module is used to fuse the features between the channels in the layer to enrich the information contained in the image features of each layer and ensure the accuracy of the image segmentation model without increasing the subsequent calculation amount. It can be understood that each layer of image features corresponds to one channel confusion module. As shown in Figure 3, the four-layer image features correspond to four channel confusion modules 23. Each channel confusion module 23 is used to fuse the image features between multiple channels in the corresponding layer. .
  • the channel confusion module is composed of a 1 ⁇ 1 convolution layer, a batch normalization (BN) layer, and an activation function layer, wherein the activation function layer adopts a Relu activation function.
  • the 1 ⁇ 1 convolution layer is used to realize the confusion of image features between channels, and the BN layer + activation function layer can make the confused image features more stable.
  • the features output by the channel confusion module are recorded as confusion features. It can be understood that each layer of image features has corresponding confusion features, and the resolution of the confusion features and image features in the same layer is the same. In one embodiment, except for the confusing features with the lowest resolution, other confusing features are central layer features, that is, other layers may be considered as network central layers.
  • the obfuscated features of the lowest layer are represented as Decode 14 ⁇ 14, and the obfuscated features of other layers are represented as Center28 ⁇ 28, Center56 ⁇ 56, Center112 ⁇ 112 respectively.
  • the digital part represents the resolution.
  • the confusion feature output by the channel confusion module can also be regarded as the feature obtained after decoding the image feature, that is, the channel confusion module can also realize the function of decoding in addition to the confusion feature.
  • Step 234 Upsampling the confusion features of each layer except the confusion features at the highest resolution level, and fuses them with the confusion features of a higher resolution to obtain a fusion feature corresponding to a higher resolution.
  • Upsampling can be understood as enlarging the feature to enlarge the resolution of the feature.
  • the up-sampling is implemented by a linear interpolation method, that is, a suitable interpolation algorithm is used to insert new elements between the obfuscated features, so as to expand the resolution of the obfuscated features.
  • the resolution of the confusion feature can be enlarged by up-sampling, so that the enlarged resolution is equal to the one-level higher resolution.
  • the higher-level resolution refers to a resolution higher than the current up-sampling resolution and only higher than the current up-sampling resolution.
  • the up-sampling resolution can be considered as its higher-level resolution A lower resolution of the rate.
  • the resolution of each other layer is one level higher than the resolution of the next layer. It can be understood that, since the resolution of the confusion feature of any layer has a multiple relationship with its higher-level resolution, the multiple of upsampling can be determined according to the multiple.
  • the resolution of the confusion feature of a certain layer is 0.5 times the resolution of the higher level
  • the resolution of the confusion feature of this layer can be enlarged by means of double upsampling.
  • the confusing features of the higher resolution are fused with the corresponding up-sampled confusion features of the lower resolution through Skip Connection, so as to reuse the confusing features and ensure the use of more information in the subsequent processing process.
  • Characteristics It can be understood that image segmentation is a kind of dense pixel prediction (Dense Prediction), therefore, the original image segmentation model requires richer features.
  • the fused feature is recorded as a fused feature. In this case, except for the confusion feature with the lowest resolution, each layer of confusion features has a corresponding fusion feature.
  • the operation of feature fusion can be understood as a concatenate (vector splicing) operation.
  • the size of the fusion feature of each layer is the sum of the upsampling size of the confusion feature of this layer and the confusion feature of the lower resolution.
  • C in [NCHW] before the fusion of the confusion feature of this layer is 3.
  • C is 3 in [NCHW] before fusion after upsampling of the confusion feature of lower resolution .
  • N is the number
  • C is the number of channels
  • H is the height
  • W is the width
  • H ⁇ W can be understood as the resolution. It should be noted that since the highest resolution does not have a higher-order resolution, there is no need to upsample the obfuscated feature with the highest resolution.
  • the confusion feature Decode 14 ⁇ 14 of the lowest layer is doubled up-sampling and the resolution is doubled, that is, a feature with a resolution of 28 ⁇ 28 is obtained.
  • the lowest layer The confusion feature Center28 ⁇ 28 of the higher first-level resolution (ie the second lower layer) is fused with the 28 ⁇ 28 features after the double upsampling of the lowest layer through skip connection to obtain the fusion feature of the second lower layer.
  • double upsampling the confusing feature Center28 ⁇ 28 of the second lower layer and double the resolution that is, to obtain a feature with a resolution of 56 ⁇ 56.
  • the confusion feature Center56 ⁇ 56 is fused with the 56 ⁇ 56 features after double upsampling of the second lower layer through skip connection to obtain the fusion features of the second higher layer. In the same way, the fusion features of the highest level are obtained.
  • Step 235 Input the fusion features of each layer into the corresponding residual modules respectively to obtain multi-layer first decoding features, each layer of fusion features corresponds to a residual module, and the confusing feature with the lowest resolution is used as the first decoding feature with the lowest resolution. feature.
  • the residual module is used to further extract and decode the fusion features, and the residual module may include one or more residual blocks (Residual Block, RS Block).
  • the residual module includes one residual block as an example. description, and the structure of the residual block can be set in the actual situation. It can be understood that each layer of fusion features corresponds to a residual module, and the features output by the residual module after processing have the same resolution as the fusion features of this layer. Since the residual module can further extract and decode the fusion feature, that is, the feature output by the residual module is the decoding feature, therefore, in the embodiment, the feature output by the residual module is recorded as the first decoding feature.
  • the confusion feature with the lowest resolution has no corresponding fusion feature, there is no need to set a residual module in the layer with the lowest resolution.
  • the confusion feature with the lowest resolution can be directly regarded as the first decoding of this layer. feature.
  • the corresponding first decoding features can be obtained.
  • FIG. 3 it includes three residual modules 24, and the first decoding feature output after the fusion feature of the sub-bottom layer is input to the residual module is denoted as RS Block28 ⁇ 28, that is, the resolution of the first decoding feature is 28 ⁇ 28.
  • the first decoded feature output after the fusion feature of the next layer is input to the residual module is denoted as RS Block56 ⁇ 56, that is, the resolution of the first decoded feature is 56 ⁇ 56.
  • the first decoded feature output after the fusion feature of the highest layer is input to the residual module is denoted as RS Block112 ⁇ 112, that is, the resolution of the first decoded feature is 112 ⁇ 112.
  • the first decoding feature of the lowest layer is Decode14 ⁇ 14.
  • Step 236 Input the first decoding feature of each layer into the corresponding multiple upsampling module to obtain multiple second decoding features, each layer of the first decoding feature corresponds to a multiple upsampling module, and each second decoding feature is the same as the The original image has the same resolution.
  • the multiple upsampling module is configured to perform multiple upsampling on the first decoded feature, so that the resolution after the multiple upsampling is equal to the resolution of the original image.
  • the specific multiple of multiple upsampling can be determined according to the resolution of the first decoding feature and the resolution of the original image. For example, the resolution of the first decoding feature is 14 ⁇ 14, and the resolution of the original image is 224 ⁇ 224, then, the first decoded feature needs to be upsampled by 16 times to obtain a decoded feature with a resolution of 224 ⁇ 224.
  • the final output binary image (segmentation result image) is used to distinguish the foreground (such as a portrait area) and the background. Therefore, the segmentation task of the image segmentation model belongs to the two-class segmentation task. , at this time, before obtaining the segmentation result image, it is necessary to obtain the decoding feature with the number of channels 2.
  • the multiple upsampling module in addition to performing multiple upsampling on the first decoding feature, the multiple upsampling module also needs to change the number of channels of the multiple upsampled first decoding feature to 2. For the first decoding feature of each layer, it only changes the resolution after multiple upsampling, but the number of channels does not change.
  • a 1 ⁇ 1 volume is set in the multiple upsampling module A convolution layer, that is, after performing multiple upsampling on the first decoded feature, a 1 ⁇ 1 convolutional layer is connected to change the number of channels of the multiplely upsampled first decoded feature to 2.
  • the image segmentation model can also perform multi-classification segmentation tasks. At this time, before obtaining the final output image, it is also necessary to obtain decoding features with the number of channels equal to the number of classifications. For example, if the image segmentation model performs a five-category segmentation task, before finally outputting a five-category segmentation result image, it is necessary to obtain 5-channel decoding features.
  • the pixel values of the pixels in the segmentation label image need to be converted from 0 and 255 to 0 and 1, that is, the pixel value of 0 is converted to 0.
  • a pixel with a pixel value of 255 is converted to 1.
  • the feature output by the multiple upsampling module is denoted as the second decoding feature.
  • the first decoding feature of each layer corresponds to a multiple upsampling module
  • the second decoding feature with 2 channels and the same resolution as the original image can be obtained through the multiple upsampling module.
  • the second decoding feature can be considered as a network prediction output obtained after decoding the image features of the current layer.
  • the second decoding feature of each layer can be regarded as a temporary output result obtained after decoding the image feature of the layer.
  • the final segmentation result image can be obtained by temporarily outputting the result.
  • Step 237 Combine the multi-layer second decoding features and input them to the output module to obtain a segmentation result image.
  • the output module integrates the second decoding features of each layer to obtain a segmentation result image (ie, a binary image).
  • a segmentation result image ie, a binary image.
  • the second decoding features of each layer are first fused (ie, concatenate), so that the output module can obtain more abundant features, thereby restoring a more accurate image.
  • the output module uses the fused second decoding feature to obtain the segmentation result image.
  • the specific process of the output module is: connect the fused second decoding feature to a 1 ⁇ 1 convolutional layer to obtain a 2-channel decoding feature.
  • the fused second decoding feature is only a
  • the two decoding features are merged together, and through the 1 ⁇ 1 convolutional layer in the decoding module, the fused second decoding feature can be further decoded to output the final decoding feature after referring to the second decoding feature of each layer.
  • the decoding feature has 2 channels, which is used to describe the result of the binary classification, that is, to describe whether each pixel in the original image is a portrait area or a background area. After that, after passing the decoded features through the softmax function and the argmax function, the segmentation result image is obtained. That is, the output module consists of a 1 ⁇ 1 convolutional layer and an activation function layer.
  • the activation function layer consists of the softmax function and the argmax function.
  • the data processed by the softmax function can be understood as the output data of the logic layer, that is, the meaning represented by the decoding features output by the 1 ⁇ 1 convolution layer is interpreted, and the description of the logic layer is obtained.
  • the argmax function is a common function to obtain the output result, that is, the corresponding segmentation result image is output by the argmax function.
  • the four second decoding features are fused and input to the output module 26.
  • a 1 ⁇ 1 convolutional layer is first passed to obtain 2-channel decoding features (referred to as Refine224 ⁇ in FIG. 3 ).
  • the segmentation result image is obtained through the activation function layer (denoted as output224 ⁇ 224 in Figure 3).
  • the pixel value of each pixel in the segmentation result image output by the image segmentation model is 0 or 1, wherein, the pixel with a pixel value of 0 is a pixel in the background area, and a pixel with a pixel value of 1 is in the portrait area. pixel.
  • the pixel value of each pixel is multiplied by 255.
  • FIG. 5 is a schematic diagram of a segmentation result image provided by an embodiment of the present application. After inputting the training data shown in FIG. 4 into the image segmentation model shown in FIG. 3, a segmentation result image is obtained. After multiplying by 255, the segmentation result image shown in Figure 5 can be obtained.
  • Step 238 Input the first decoded feature with the highest resolution to the edge module to obtain an edge result image.
  • an edge module is set in the image segmentation model to perform additional supervision on the first decoding feature with the highest resolution through the edge module, that is, It acts as a regularization constraint to improve the ability of the image segmentation model to learn edges.
  • the specific structural embodiment of the edge module is not limited.
  • the edge module is taken as an example of a 1 ⁇ 1 convolutional layer for description. Exemplarily, after inputting the first decoded feature with the highest resolution into the edge module, an edge feature with 2 channels and the same resolution as the original image can be obtained, through which an edge feature that only expresses the edge can be obtained. the binary image.
  • the binary image expressing the edge is recorded as the edge result image.
  • the pixel value of each pixel in the edge result image is 0 or 1, wherein, the pixel with the pixel value of 1 represents the pixel where the edge is located, and the pixel with the pixel value of 0 represents the pixel where the non-edge is located.
  • the first decoded feature with the highest resolution has richer detailed information, therefore, more accurate edge features can be obtained through the first decoded feature with the highest resolution.
  • edge224 ⁇ 224 an edge feature with a resolution of 224 ⁇ 224 can be obtained, which is denoted as edge224 ⁇ 224 in FIG. 3 .
  • FIG. 6 is a schematic diagram of an edge result image provided by an embodiment of the present application.
  • the training data shown in FIG. 4 is input into the image segmentation model shown in FIG. 3 to obtain an edge result image. After multiplying by 255, the edge result image shown in Figure 6 can be obtained.
  • Step 239 Construct a loss function according to each second decoding feature, the edge result image, the corresponding segmentation label image and the edge label image, and update the model parameters of the image segmentation model according to the loss function.
  • the loss function of the segmentation network model is composed of a segmentation loss function and an edge loss function.
  • the segmentation loss function can reflect the segmentation ability of the segmentation network model, and the segmentation loss function is obtained according to the second decoding feature of each layer and the segmentation label image.
  • a sub-loss function can be obtained based on the second decoding feature of each layer and the segmentation label image, and the segmentation loss function can be obtained by combining the sub-loss functions of each layer. It can be understood that the calculation method of each sub-loss function is the same.
  • the sub-loss function is calculated by the Iou function, and the Iou function can be defined as: the ratio of the area of the intersection of the predicted pixel region (that is, the second decoding feature) and the label pixel region (that is, the segmented label image) to the area of the union.
  • the Iou function can reflect the overlapping similarity between the binary image corresponding to the second decoding feature and the segmented label image, and at this time, the sub-loss function calculated by the Iou function can reflect the loss of overlapping similarity.
  • the edge loss function can reflect the ability of the segmentation network model to learn edges, and the edge loss function is obtained from the edge result image and the edge label image.
  • the edge loss function adopts the Focal loss loss, which is a common loss function, which can reduce a large number of simple negative effects.
  • the weight of the sample in training can also be understood as a difficult sample mining.
  • the loss function of the segmentation network model is expressed as:
  • Loss represents the loss function of the segmentation network model
  • n represents the total number of layers corresponding to the second decoding feature
  • n represents the sub-loss function calculated from the second decoding feature with the highest resolution and the corresponding segmentation label image
  • a n represents the second decoding feature with the lowest resolution
  • B represents the corresponding segmentation label image
  • Iou n represents the overlap similarity between A n and B
  • loss edge is the Focal loss loss function.
  • the image segmentation model has a total of n layers (n ⁇ 2), that is, there are n layers of second decoding features.
  • n sub-loss functions can be obtained according to the second decoding features of the n layers and the segmentation label image.
  • the first layer has the highest resolution, and its corresponding sub-loss function is recorded as
  • the resolution of the second layer is the second highest, and its corresponding sub-loss function is recorded as
  • the resolution of the nth layer is the lowest, and its corresponding sub-loss function is recorded as Since each sub-loss function is calculated in the same manner, the embodiment takes the n-th layer sub-loss function as an example for description.
  • Exemplary which is Represents the loss of the nth layer Iou function.
  • a n represents the second decoding feature of the nth layer
  • B represents the corresponding segmented label image
  • a n ⁇ B represents the intersection of A n and B
  • a n ⁇ B represents the union of A n and B
  • Iou n represents A n and the overlapping similarity of B, at this time, Represents the loss of overlapping similarity.
  • loss edge represents an edge loss function
  • loss edge is a Focal loss loss function.
  • loss edge (p t ) - ⁇ t (1-p t ) ⁇ log(p t ).
  • p t represents the predicted probability value that the pixel in the edge result image is an edge
  • ⁇ t represents the balance weight coefficient, which is used to balance positive and negative samples
  • represents the modulation coefficient, which is used to control the weight of difficult and easy-to-classify samples.
  • the values of ⁇ t and ⁇ can be set according to the actual situation.
  • the loss edge can be obtained according to the loss edge (p t ) of each pixel in the edge result image. Specifically, the mean value is calculated after adding the loss edge (p t ) of each pixel point, and the calculated mean value is used as the loss edge .
  • the model parameters of the image segmentation model can be updated according to the loss function, so that the performance of the updated image segmentation model is higher.
  • Step 2310 Select the next original image, and return to perform the operation of inputting the original image to the normalization module until the loss function converges.
  • the image segmentation model After the image segmentation model is stabilized, it is determined that the training is over, and then the image segmentation model can be applied to segment the portraits in the video data.
  • the method further includes: when the image segmentation model is not a network model recognizable by the forward inference framework, converting the image segmentation model into a forward inference model Framework-aware network models.
  • the image segmentation model is trained in a corresponding framework, which is usually a framework such as tensorflow and pytorch.
  • the pytorch framework is used as an example for description.
  • the pytorch framework is mainly used for model design, training and testing. Since the image segmentation model application runs in real-time in the image segmentation device, and the pytorch framework occupies a large amount of memory, if the image segmentation model under the pytorch framework is run in an application of the image segmentation device, it will greatly increase the occupied by the application. of storage space.
  • the forward inference framework is generally aimed at a specific platform (such as an embedded platform), and different platforms have different hardware configurations. When the forward inference framework is deployed on a platform, it can combine the hardware configuration of the platform and make reasonable use of resources. Optimization acceleration, that is, the forward inference framework can perform optimization acceleration when running its on-premise model.
  • the forward inference model is mainly used for the prediction process of the model, wherein the prediction process includes the testing process of the model and the prediction process (application process) of the model, but does not include the training process of the model, and the forward inference framework relies on GPU It is low-level and lightweight, and does not make the application take up a large amount of storage space. Therefore, when applying an image segmentation model, run the image segmentation model in a forward inference framework. In one embodiment, before applying the image segmentation model, it is determined whether the image segmentation model is running in the forward inference framework. If the image segmentation model runs in the forward inference framework, the image segmentation model is directly applied.
  • the image segmentation model is converted into a network model recognizable in the forward inference framework.
  • the specific type of the forward reasoning framework can be set according to the actual situation, for example, the forward reasoning framework is an openvino framework.
  • the specific means to convert the image segmentation model under the pytorch framework into the image segmentation model under the openvino framework can be: using the existing pytorch conversion tool to convert the image segmentation model to the Open Neural Network Exchange (Open Neural Network Exchange, ONNX) model, and then use the openvino conversion tool to convert the ONNX model into an image segmentation model under the openvino framework.
  • ONNX is a standard for representing deep learning models, which enables models to be transferred between different frameworks.
  • the method further includes: deleting the edge module.
  • the advantage of setting the edge module in the training process is to improve the learning ability of the image segmentation model for the edge, thereby ensuring the accuracy of the segmentation result image.
  • the edge module can be deleted. , that is, cancel the data processing process of the edge module when applying the image segmentation model, so as to reduce the data processing amount of the image segmentation model and improve the processing speed.
  • the encoding module of the image segmentation model adopts a lightweight network, which can reduce the amount of data processing during encoding.
  • the channel confusion module can confuse the image features between channels without significantly increasing the amount of calculation, so as to enrich the channels in the channel. feature information to ensure the accuracy of the image segmentation model.
  • the edge module By setting the edge module, the learning ability of the image segmentation model for the edge is improved, and the accuracy of the image segmentation model is further ensured. In the application process, the edge module is deleted to reduce the calculation amount of the image segmentation model.
  • Converting the image segmentation model into an image segmentation model under the forward inference framework can reduce the low dependence of the image segmentation model on the GPU, and reduce the storage space occupied by the application running the image segmentation model.
  • the trained image segmentation model can accurately segment the portrait area in the video data without human prior or interaction. After testing, under the environment of ordinary PC integrated graphics card, the processing time of each frame of image in the video data It only takes about 20ms to realize real-time automatic portrait segmentation.
  • the image segmentation model further includes: a decoding module.
  • the method further includes: inputting the first decoding feature with the highest resolution to the decoding module to obtain a corresponding new first decoding feature.
  • FIG. 7 is a schematic structural diagram of another image segmentation model provided by an embodiment of the present application. Compared with the image segmentation model shown in FIG. 3 , the image segmentation model shown in FIG. 7 further includes a decoding module 28 .
  • a decoding module is passed to further decode the first decoding feature with the highest resolution, that is, a new first decoding feature is obtained.
  • the new first decoding feature can be considered as the first decoding feature finally obtained by the highest resolution level, and then the new first decoding feature is input to the multiple upsampling module and edge set in the highest resolution level Module, it can be understood that the number of channels and the resolution of the new first decoding feature are the same as the number of channels and the resolution of the original first decoding feature.
  • the first decoding feature after the decoding module 28 in FIG. 7 is denoted as Refine112 ⁇ 112, and its resolution is the same as that of RS Block112 ⁇ 112.
  • the decoding module is a convolutional network, and the number and structure of the convolutional layers are not limited.
  • the accuracy of the first decoding feature of the highest layer can be improved, thereby improving the accuracy of the image segmentation model.
  • the first decoding feature the lower the resolution, the more advanced the semantic feature it has, and the higher the resolution, the richer the detailed feature it has.
  • the first decoding feature with the highest resolution there will be sawtooth phenomenon when directly up-sampling it, that is, the detail feature will appear sawtooth phenomenon. Therefore, a decoding module is added to it, so as to make the final obtained new first decoding feature transition. It is more uniform and avoids the appearance of jaggedness.
  • the first decoding features included in other layers basically do not appear aliasing after up-sampling. Even if a decoding module is set for it, the accuracy of the image segmentation model will not be affected. Therefore, there is no need to set a decoding module for other layers. It can be understood that, in practical applications, if the aliasing phenomenon occurs after the up-sampling of the first decoding features of other layers, a decoding module may also be set for them, so as to improve the accuracy of the image segmentation model.
  • the target object is described as a human being, and in practical applications, the target object can also be any other object.
  • FIG. 8 is a schematic structural diagram of an image segmentation apparatus provided by an embodiment of the present application.
  • the image segmentation apparatus includes: a data acquisition module 301 , a first segmentation module 302 , a second segmentation module 303 and a repeated segmentation module 304 .
  • the data acquisition module 301 is used to acquire the current frame image in the video data, and the target object is displayed in the video data;
  • the first segmentation module 302 is used to input the current frame image into the trained image segmentation model, to Obtain the first segmented image based on the target object;
  • the second segmentation module 303 is used for smoothing the first segmented image to obtain the second segmented image based on the target object;
  • the repeating segmentation module 304 is used for the The next frame image in the video data is taken as the current frame image, and the operation of inputting the current frame image into the trained image segmentation model is returned to execute until each frame image in the video data obtains a corresponding second segmentation image.
  • a training acquisition module for acquiring a training data set, the training data set includes a plurality of original images; a label construction module for constructing a label data set according to the training data set, the label The dataset contains multiple segmentation label images and multiple edge label images, one of the original images corresponds to one segmented label image and one edge label image; the model training module is used for training according to the training dataset and the label dataset Image segmentation model.
  • the image segmentation model includes: a normalization module 21, an encoding module 22, a channel confusion module 23, a residual module 24, and a multiple upsampling module 25 , an output module 26 and an edge module 27 .
  • the above model training module includes: a normalization unit for inputting the original image into the normalization module 21 to obtain a normalized image; an encoding unit for obtaining the normalized image by using the encoding module 22
  • the multi-layer image features of the transformed image, and the resolution of each layer of image features is different;
  • the channel confusion unit is used to input the image features of each layer into the corresponding channel confusion module 23 respectively, so as to obtain multi-layer confusion features, each layer of the image
  • the feature corresponds to a channel confusion module 23;
  • the fusion unit is used to upsample the confusion features of each layer except the confusion features with the highest resolution, and fuse them with the confusion features of a higher resolution to obtain a higher resolution.
  • each second decoding feature has the same resolution as the original image;
  • the segmentation output unit is used to combine the multi-layer second decoding features and input to the output module 26 to obtain the segmentation result image;
  • the edge output unit with In inputting the first decoding feature with the highest resolution to this edge module 27, to obtain an edge result image;
  • a parameter updating unit for each of the second decoding features, edge result image, corresponding segmentation label image and edge label image Construct a loss function, and update the model parameters of the image segmentation model according to the loss function;
  • the image selection unit is used to select the next original image, and returns to perform the operation of inputting the original image to the normalization module until the loss until the function converges.
  • the image segmentation model further includes: a decoding module 28 .
  • the above-mentioned model training module also includes: a decoding unit, which is used to input the fusion features of each layer into the corresponding residual module 24 respectively, so as to obtain the multi-layer first decoding features, and then convert the first decoding features with the highest resolution. Input to the decoding module 28 to obtain the corresponding new first decoding feature.
  • the encoding module includes the MobileNetV2 network.
  • the loss function is expressed as: Among them, Loss represents the loss function, n represents the total number of layers corresponding to the second decoding feature, represents the sub-loss function calculated from the second decoding feature with the highest resolution and the corresponding segmentation label image, represents the sub-loss function calculated from the second decoding feature with the lowest resolution and the segmented label image, A n represents the second decoding feature with the lowest resolution, B represents the corresponding segmentation label image, Iou n represents the overlap similarity between A n and B, and loss edge is the Focal loss loss function.
  • an edge deletion module is further included, and after the loss function of the image segmentation model converges, the edge deletion module is further included.
  • a frame conversion module after training the image segmentation model according to the training data set and the label data set, when the image segmentation model is not a network model identifiable by the forward inference framework, Transform the image segmentation model into a network model recognizable by the forward inference framework.
  • the label construction module includes: a label acquisition unit, used for obtaining the labeling result for the original image; a segmentation label obtaining unit, used for obtaining a corresponding segmented label image according to the labeling result; an erosion unit, using is used to perform the erosion operation on the segmented label image to obtain the eroded image; the Boolean unit is used to perform the Boolean operation on the segmented label image and the eroded image to obtain the edge label image corresponding to the original image; the data set construction unit is used for to obtain a label dataset according to the segmented label image and the edge label image.
  • the method further includes: a target background acquisition module, configured to perform smoothing processing on the first segmented image to obtain a target background image after obtaining a second segmented image based on the target object. It contains a target background; a background replacement module is used to replace the background of the current frame image according to the target background image and the second divided image, so as to obtain a new image of the current frame.
  • a target background acquisition module configured to perform smoothing processing on the first segmented image to obtain a target background image after obtaining a second segmented image based on the target object. It contains a target background
  • a background replacement module is used to replace the background of the current frame image according to the target background image and the second divided image, so as to obtain a new image of the current frame.
  • the image segmentation device provided above can be used to execute the image segmentation method provided by any of the above embodiments, and has corresponding functions and beneficial effects.
  • the units and modules included are only divided according to functional logic, but are not limited to the above division, as long as the corresponding functions can be realized;
  • the specific names of the functional units are only for the convenience of distinguishing from each other, and are not used to limit the protection scope of the present application.
  • FIG. 9 is a schematic structural diagram of an image segmentation device provided by an embodiment of the present application.
  • the image segmentation device includes a processor 40, a memory 41, an input device 42, and an output device 44; the number of processors 40 in the image segmentation device may be one or more, and one processor 40 is used in FIG. 9 .
  • the processor 40 , the memory 41 , the input device 42 , and the output device 43 in the image segmentation device may be connected by a bus or in other ways, and the connection by a bus is taken as an example in FIG. 9 .
  • the memory 41 can be used to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the image segmentation method in the embodiments of the present application (for example, data acquisition in the image segmentation device). module 301, a first segmentation module 302, a second segmentation module 303, and a repeated segmentation module 304).
  • the processor 40 executes various functional applications and data processing of the image segmentation device by running the software programs, instructions and modules stored in the memory 41 , that is, to implement the above-mentioned image segmentation method.
  • the memory 41 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the image dividing apparatus, and the like.
  • memory 41 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device.
  • the memory 41 may further include memory located remotely from the processor 40, and these remote memories may be connected to the image segmentation apparatus through a network. Examples of such networks include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
  • the input device 42 may be used to receive input numerical or character information, and to generate key signal input related to user settings and function control of the image segmentation apparatus.
  • the output device 43 may include a display device such as a display screen.
  • the above-mentioned image segmentation device includes an image segmentation device, which can be used to execute any image segmentation method, and has corresponding functions and beneficial effects.
  • embodiments of the present application also provide a storage medium containing computer-executable instructions, when the computer-executable instructions are executed by a computer processor, for performing relevant operations in the image segmentation method provided by any embodiment of the present application , and has corresponding functions and beneficial effects.
  • the embodiments of the present application may be provided as a method, a system, or a computer program product.
  • the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects.
  • the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • the present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions.
  • These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • Memory may include non-persistent memory in computer readable media, random access memory (RAM) and/or non-volatile memory in the form of, for example, read only memory (ROM) or flash memory (flash RAM).
  • RAM random access memory
  • ROM read only memory
  • flash RAM flash memory
  • Computer-readable media includes both persistent and non-permanent, removable and non-removable media, and storage of information may be implemented by any method or technology.
  • Information may be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • computer-readable media does not include transitory computer-readable media, such as modulated data signals and carrier waves.

Abstract

The embodiments of the present application relate to the technical field of image processing. Provided are an image segmentation method and apparatus, and a device and a storage medium. The image segmentation method comprises: acquiring the current frame of image in video data, wherein a target object is displayed in the video data; inputting the current frame of image into a trained image segmentation model, so as to obtain a first segmented image based on the target object; performing smoothing processing on the first segmented image, so as to obtain a second segmented image based on the target object; and taking the next frame of image in the video data as the current frame of image, and returning to carrying out the operation of inputting the current frame of image into the trained image segmentation model, until a corresponding second segmented image is obtained for each frame of image in the video data. By means of the method, the technical problem in the prior art of it not being possible to accurately carry out image segmentation on online video data can be solved.

Description

图像分割方法、装置、设备及存储介质Image segmentation method, device, equipment and storage medium 技术领域technical field
本申请实施例涉及图像处理技术领域,尤其涉及一种图像分割方法、装置、设备及存储介质。The embodiments of the present application relate to the technical field of image processing, and in particular, to an image segmentation method, apparatus, device, and storage medium.
背景技术Background technique
图像分割是图像处理中常见的技术之一,其用于精确提取出待处理图像中的感兴趣区域,并将感兴趣区域作为目标区域图像,以便于后续对目标区域图像的处理(如背景替换、扣取目标区域图像等处理)。基于人像的图像分割是图像分割领域中的一项重要应用。基于人像的图像分割是指将待处理图像中的人像区域和背景区域进行准确分离。当前,随着计算机和网络技术的发展,对在线的视频数据进行基于人像的图像分割具有重要的意义。如在线会议或在线直播等场景下,对在线的视频数据进行图像分割,以将视频数据中的人像区域和背景区域进行准确分离,之后,对背景区域进行背景图像的替换,以达到保护用户隐私的目的。Image segmentation is one of the common techniques in image processing, which is used to accurately extract the region of interest in the image to be processed, and use the region of interest as the target region image to facilitate subsequent processing of the target region image (such as background replacement). , deducting the image of the target area, etc.). Portrait-based image segmentation is an important application in the field of image segmentation. Portrait-based image segmentation refers to the accurate separation of the portrait area and the background area in the image to be processed. At present, with the development of computer and network technology, it is of great significance to perform portrait-based image segmentation for online video data. In scenarios such as online conferences or online live broadcasts, image segmentation is performed on the online video data to accurately separate the portrait area and the background area in the video data, and then the background image is replaced in the background area to protect user privacy. the goal of.
发明人在实现本申请的过程中,发现一些图像分割技术存在如下缺陷:图像分割主要包括基于阈值、基于区域、基于边缘以及基于图论和能量泛函的方法。其中,基于阈值的方法需要根据图像中的灰度特征进行分割,其缺陷在于仅适用于人像区域的灰度值均匀分布在背景区域灰度值之外的图像。基于区域的方法是按照空间邻域的相似性准则将图像分割成不同的区域,其缺陷在于无法处理复杂的图像。基于边缘的方法主要利用图像局部特征的不连续性(如人脸边缘的像素突变)得到人像区域的边界,其缺陷在于计算复杂度高。基于图论和能量泛函的方法主要利用图像的能量泛函进行人像分割,其缺陷在于计算量巨大且需要人为先验信息。由于上述技术的缺陷,使其无法适用于对在线的视频数据进行实时、简单、准确的图像分割的场景。In the process of realizing the present application, the inventor found that some image segmentation techniques have the following defects: image segmentation mainly includes methods based on threshold, region-based, edge-based, graph theory and energy functional. Among them, the threshold-based method needs to be segmented according to the grayscale features in the image, and its drawback is that it is only suitable for images in which the grayscale values of the portrait area are evenly distributed outside the grayscale values of the background area. The region-based method divides the image into different regions according to the similarity criterion of the spatial neighborhood, and its disadvantage is that it cannot handle complex images. The edge-based method mainly uses the discontinuity of local image features (such as the pixel mutation of the face edge) to obtain the boundary of the portrait region, and its disadvantage is that the computational complexity is high. The methods based on graph theory and energy functional mainly use the energy functional of the image to perform portrait segmentation, but the disadvantage is that the amount of calculation is huge and artificial prior information is required. Due to the defects of the above technology, it cannot be applied to the scene of real-time, simple and accurate image segmentation for online video data.
综上,如何实时、简单、准确地对任一在线视频数据进行图像分割,成为了亟需解决的技术问题。In conclusion, how to segment any online video data in real time, simply and accurately has become a technical problem that needs to be solved urgently.
发明内容SUMMARY OF THE INVENTION
本申请实施例提供了一种图像分割方法、装置、设备及存储介质,以解决上述技术无法准确地对在线视频数据进行图像分割的技术问题。Embodiments of the present application provide an image segmentation method, apparatus, device, and storage medium, so as to solve the technical problem that the above technology cannot accurately perform image segmentation on online video data.
第一方面,本申请实施例提供了一种图像分割方法,包括:In a first aspect, an embodiment of the present application provides an image segmentation method, including:
获取视频数据中的当前帧图像,所述视频数据中显示有目标对象;Obtain the current frame image in the video data, and the target object is displayed in the video data;
将所述当前帧图像输入至训练好的图像分割模型,以得到基于所述目标对象的第一分割图像;The current frame image is input into the trained image segmentation model to obtain the first segmented image based on the target object;
对所述第一分割图像进行平滑处理,以得到基于所述目标对象的第二分割图像;smoothing the first segmented image to obtain a second segmented image based on the target object;
将所述视频数据中下一帧图像作为当前帧图像,并返回执行将所述当前帧图像输入至训练好的图像分割模型的操作,直到所述视频数据中每帧图像均得到对应的第二分割图像为止。Take the next frame image in the video data as the current frame image, and return to perform the operation of inputting the current frame image into the trained image segmentation model, until each frame image in the video data obtains the corresponding second image. until the image is divided.
第二方面,本申请实施例还提供了一种图像分割装置,包括:In a second aspect, an embodiment of the present application further provides an image segmentation device, including:
数据获取模块,用于获取视频数据中的当前帧图像,所述视频数据中显示有目标对象;a data acquisition module for acquiring the current frame image in the video data, where the target object is displayed;
第一分割模块,用于将所述当前帧图像输入至训练好的图像分割模型,以得到基于所述目标对象的第一分割图像;a first segmentation module, for inputting the current frame image into a trained image segmentation model to obtain a first segmented image based on the target object;
第二分割模块,用于对所述第一分割图像进行平滑处理,以得到基于所述目标对象的第二分割图像;a second segmentation module, configured to perform smoothing processing on the first segmented image to obtain a second segmented image based on the target object;
重复分割模块,用于将所述视频数据中下一帧图像作为当前帧图像,并返回执行将所述当前帧图像输入至训练好的图像分割模型的操作,直到所述视频数据中每帧图像均得到对应的第二分割图像为止。The repeated segmentation module is used for taking the next frame image in the video data as the current frame image, and returning to perform the operation of inputting the current frame image into the trained image segmentation model, until each frame image in the video data until the corresponding second segmented image is obtained.
第三方面,本申请实施例还提供了一种图像分割设备,包括:In a third aspect, the embodiments of the present application further provide an image segmentation device, including:
一个或多个处理器;one or more processors;
存储器,用于存储一个或多个程序;memory for storing one or more programs;
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如第一方面所述的图像分割方法。When the one or more programs are executed by the one or more processors, the one or more processors implement the image segmentation method as described in the first aspect.
第四方面,本申请实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如第一方面所述的图像分割方法。In a fourth aspect, an embodiment of the present application further provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the image segmentation method described in the first aspect.
上述图像分割方法、装置、设备及存储介质,通过获取包含目标对象的视频数据,将视频数据的各帧图像输入至图像分割模型,以得到对应的第一分割图像,之后,对第一分割图像进行平滑处理,以得到第二分割图像的技术手段,解决了一些图像分割技术无法准确地对在线视频数据进行图像分割的技术问题。通过采用自编码器的图像分割模型以及平滑处理可以实时、准确对在线视频数据进行图像分割,且由于图像分割模型的自学习性,可以适用于图像复杂的在线视频数据,且在应用过程中,只需部署图像分割模型便可以直接应用,无需人为先验信息,简化了图像分割的复杂度,扩大了图像分割方法的应用场景。The above-mentioned image segmentation method, device, equipment and storage medium, by acquiring the video data including the target object, input each frame image of the video data into the image segmentation model to obtain the corresponding first segmented image, and then, for the first segmented image The technical means of obtaining the second segmented image by performing smoothing processing solves the technical problem that some image segmentation technologies cannot accurately segment the online video data. By using the image segmentation model of the auto-encoder and the smoothing process, the online video data can be segmented in real time and accurately, and due to the self-learning of the image segmentation model, it can be applied to the online video data with complex images, and in the application process, It can be directly applied only by deploying the image segmentation model without artificial prior information, which simplifies the complexity of image segmentation and expands the application scenarios of image segmentation methods.
附图说明Description of drawings
图1为本申请实施例提供的一种图像分割方法的流程图;1 is a flowchart of an image segmentation method provided by an embodiment of the present application;
图2为本申请实施例提供的另一种图像分割方法的流程图;2 is a flowchart of another image segmentation method provided by an embodiment of the present application;
图3为本申请实施例提供的一种图像分割模型的结构示意图;3 is a schematic structural diagram of an image segmentation model provided by an embodiment of the present application;
图4为本申请实施例提供的原始图像示意图;4 is a schematic diagram of an original image provided by an embodiment of the present application;
图5为本申请实施例提供的分割结果图像示意图;FIG. 5 is a schematic diagram of a segmentation result image provided by an embodiment of the present application;
图6为本申请实施例提供的边缘结果图像示意图;6 is a schematic diagram of an edge result image provided by an embodiment of the present application;
图7为本申请实施例提供的另一种图像分割模型的结构示意图;7 is a schematic structural diagram of another image segmentation model provided by an embodiment of the present application;
图8为本申请实施例提供的一种图像分割装置的结构示意图;FIG. 8 is a schematic structural diagram of an image segmentation apparatus provided by an embodiment of the present application;
图9为本申请实施例提供的一种图像分割设备的结构示意图。FIG. 9 is a schematic structural diagram of an image segmentation device provided by an embodiment of the present application.
具体实施方式Detailed ways
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例用于解释本申请,而非对本申请的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本申请相关的部分而非全部结构。The present application will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are used to explain the present application, but not to limit the present application. In addition, it should be noted that, for the convenience of description, the drawings only show some but not all the structures related to the present application.
需要说明的是,在本文中,诸如第一和第二之类的关系术语仅仅用来将一个实体或操作或对象与另一个实体或操作或对象区分开来,而不一定要求或者暗示这些实体或操作或对象之间存在任何这种实际的关系或顺序。例如,第一分割图像和第二分割图像的“第一”和“第二”用来区分两个不同的分割图像。It should be noted that, in this document, relational terms such as first and second are only used to distinguish one entity or operation or object from another entity or operation or object, and do not necessarily require or imply these entities Or there is any such actual relationship or order between operations or objects. For example, "first" and "second" of the first segmented image and the second segmented image are used to distinguish two different segmented images.
本申请实施例提供的图像分割方法可以由图像分割设备执行,该图像分割设备可以通过软件和/或硬件的方式实现,该图像分割设备可以是两个或多个物理实体构成,也可以是一个物理实体构成。例如,图像分割设备可以是电脑、手机、平板或交互智能平板等具有数据运算、分析能力的智能设备。The image segmentation method provided in this embodiment of the present application may be performed by an image segmentation device, which may be implemented in software and/or hardware, and the image segmentation device may be composed of two or more physical entities, or may be one Physical entity composition. For example, the image segmentation device may be a computer, a mobile phone, a tablet, or an interactive smart tablet, and other smart devices with data computing and analysis capabilities.
图1为本申请实施例提供的一种图像分割方法的流程图。参考图1,该图像分割方法具体包括:FIG. 1 is a flowchart of an image segmentation method provided by an embodiment of the present application. Referring to Figure 1, the image segmentation method specifically includes:
步骤110、获取视频数据中的当前帧图像,视频数据中显示有目标对象。Step 110: Acquire the current frame image in the video data, and the target object is displayed in the video data.
视频数据为当前需要进行图像分割的视频数据,其可以为在线视频数据或离线视频数据。视频数据包含多帧图像,且每帧图像均显示有目标对象,该目标对象可以认为是需要与背景图像进行分离的对象。可选的,视频数据中各帧图像的背景图像可以相同或不同,实施例对此不作限定,且目标对象可以随着视频数据的播放而发生变化,但在变化过程中,目标对象的类型不变。例如,目标对象为人类时,视频数据中的人类图像可以发生变化(如换人或者增加新人等),但是视频数据中的目标对象一直为人类。在以下实施例中,目标对象以人类为例。可选的,视频数据的来源实施例不作限定。例如,视频数据是与图像分割设备连接的图 像采集装置(如摄像头、照相机等)拍摄的一段视频。再如,视频数据是视频会议场景下从网络中获取的会议画面。又如,视频数据是直播场景下从网络中获取的直播画面。The video data is the video data for which image segmentation is currently required, which may be online video data or offline video data. The video data includes multiple frames of images, and each frame of images displays a target object, which can be considered as an object that needs to be separated from the background image. Optionally, the background images of each frame image in the video data may be the same or different, which is not limited in the embodiment, and the target object may change with the playback of the video data, but during the change process, the type of the target object is not limited. Change. For example, when the target object is a human being, the human image in the video data may change (such as replacing a person or adding a new person, etc.), but the target object in the video data is always a human being. In the following embodiments, the target object is a human being as an example. Optionally, the source of the video data is not limited in this embodiment. For example, the video data is a piece of video shot by an image capture device (such as a camera, a camera, etc.) connected to the image segmentation device. For another example, the video data is a conference screen obtained from a network in a video conference scenario. For another example, the video data is a live broadcast image obtained from a network in a live broadcast scenario.
示例性的,对视频数据进行图像分割是指分离出视频数据中每帧图像内目标对象所在的区域,实施例中,以目标对象为人类进行示例性描述。示例性的,对视频数据的处理以帧为单位,即逐帧获取视频数据中的图像,并对图像进行处理以得到最终的图像分割结果。实施例中,将当前处理的图像记为当前帧图像,并以处理当前帧图像为例进行描述。Exemplarily, performing image segmentation on the video data refers to separating the region where the target object is located in each frame of image in the video data. In the embodiment, the target object is exemplarily described as a human being. Exemplarily, the processing of the video data is in units of frames, that is, the images in the video data are acquired frame by frame, and the images are processed to obtain a final image segmentation result. In the embodiment, the currently processed image is recorded as the current frame image, and the processing of the current frame image is taken as an example for description.
步骤120、将当前帧图像输入至训练好的图像分割模型,以得到基于目标对象的第一分割图像。Step 120: Input the current frame image into the trained image segmentation model to obtain the first segmented image based on the target object.
图像分割模型为预先训练好的神经网络模型,其用于对当前帧图像中的目标对象进行分割,并输出当前帧图像对应的分割结果。实施例中,将分割结果记为第一分割图像,通过第一分割图像可以确定当前帧图像的人像区域和背景区域,其中,人像区域可以认为是目标对象(人类)所在的区域。一个实施例中,第一分割图像为二值图像,其像素值包括0和1两种,其中,像素值为0的区域属于当前帧图像的背景区域,像素值为1的区域属于当前帧图像的人像区域。可理解,为了便于第一分割图像的可视化显示,在显示第一分割图像前将像素值转换为0和255两种,其中,像素值为0的区域属于背景区域,像素值为255的区域属于人像区域。第一分割图像的分辨率与当前帧图像的分辨率相同。可理解,当图像分割模型对于输入的图像存在分辨率要求时,即需要输入固定分辨率的图像时,需要先确定当前帧图像的分辨率是否满足分辨率要求,若不满足分辨率要求,则对当前帧图像进行分辨率转换,以得到满足分辨率要求的当前帧图像。此时,得到第一分割图像之后,同样对第一分割图像进行分辨率转换,以使第一分割图像的分辨率与原始的当前帧图像(即转换分辨率之前的当前帧图像)的分辨率相同。当图像分割模型对于输入的图像不存在分辨率要求时,可直接将当前帧图像输入至图像分割模型,以得到分辨率相同的第一分割图像。The image segmentation model is a pre-trained neural network model, which is used to segment the target object in the current frame image and output the segmentation result corresponding to the current frame image. In the embodiment, the segmentation result is recorded as the first segmented image, and the portrait area and the background area of the current frame image can be determined through the first segmented image, wherein the portrait area can be considered as the area where the target object (human) is located. In one embodiment, the first segmented image is a binary image, and its pixel values include two types: 0 and 1, wherein, the area with a pixel value of 0 belongs to the background area of the current frame image, and the area with a pixel value of 1 belongs to the current frame image. portrait area. It can be understood that in order to facilitate the visual display of the first segmented image, the pixel values are converted into two types: 0 and 255 before displaying the first segmented image, wherein the area with a pixel value of 0 belongs to the background area, and the area with a pixel value of 255 belongs to the background area. Portrait area. The resolution of the first divided image is the same as the resolution of the current frame image. It can be understood that when the image segmentation model has resolution requirements for the input image, that is, when a fixed-resolution image needs to be input, it is necessary to determine whether the resolution of the current frame image meets the resolution requirement. If it does not meet the resolution requirement, then Perform resolution conversion on the current frame image to obtain the current frame image that meets the resolution requirements. At this time, after the first segmented image is obtained, the resolution conversion is also performed on the first segmented image, so that the resolution of the first segmented image is the same as the resolution of the original current frame image (that is, the current frame image before the resolution is converted). same. When the image segmentation model does not have a resolution requirement for the input image, the current frame image can be directly input to the image segmentation model to obtain the first segmented image with the same resolution.
示例性的,图像分割模型的结构和参数可以根据实际情况设定。实施例中,图像分割模型采用自编码器结构。其中,自编码器(autoencoder)是一类在半监督学习和非监督学习中使用的人工神经网络,其功能是通过将输入信息作为学习目标,对输入信息进行表征学习。自编码器包含编码器(encoder)和解码器(decoder)两部分,其中,编码器用于提取图像中的特征,解码器用于对提取的特征进行解码以得到学习结果(例如实施例中的第一分割图像)。可选的,编码器采用轻量级网络,以减小提取特征时的数据处理量、计算量,并加快处理速度。解码器可以由残差块结合通道混淆、上采样等过程实现,以实现全自动实时分割图像。实施例中,通过编码器可以提取当前帧图像不同分辨率下的特征,之后,通过解码器对各特征进行上采样、融合、解码等操作,以重复利用各特征,进而得到准确的第一分割图像。Exemplarily, the structure and parameters of the image segmentation model can be set according to actual conditions. In the embodiment, the image segmentation model adopts an autoencoder structure. Among them, autoencoder (autoencoder) is a kind of artificial neural network used in semi-supervised learning and unsupervised learning. The autoencoder includes two parts: an encoder (encoder) and a decoder (decoder), wherein the encoder is used to extract the features in the image, and the decoder is used to decode the extracted features to obtain the learning result (for example, the first split image). Optionally, the encoder adopts a lightweight network to reduce the amount of data processing and calculation when extracting features, and to speed up the processing speed. The decoder can be implemented by residual blocks combined with channel obfuscation, upsampling, etc., to achieve fully automatic real-time image segmentation. In the embodiment, the features of the current frame image at different resolutions can be extracted by the encoder, and then the decoder performs operations such as upsampling, fusion, decoding, etc. on each feature to reuse each feature, thereby obtaining an accurate first segmentation. image.
可选的,图像分割模型部署在前向推理框架下。前向推理框架的具体类型可以根据实际情况设定,例如,前向推理框架为openvino框架。其中,部署在前向推理框架时,图像分割模型对GPU的依赖程度低,且较为轻便,不会占用较大的存储空间。Optionally, the image segmentation model is deployed under the forward inference framework. The specific type of the forward reasoning framework can be set according to the actual situation, for example, the forward reasoning framework is the openvino framework. Among them, when deployed in the forward inference framework, the image segmentation model has a low dependence on the GPU, is relatively portable, and does not occupy a large storage space.
步骤130、对第一分割图像进行平滑处理,以得到基于目标对象的第二分割图像。Step 130: Smooth the first segmented image to obtain a second segmented image based on the target object.
实施例中,第一分割图像中存在不同程度的边缘锯齿,其中,边缘锯齿可以理解为人像区域和背景区域的边缘成锯齿状,其显得人像区域和背景区域分离的过于生硬。实施例中,为了降低边缘锯齿的影响,对第一分割图像进行平滑处理,即平滑第一分割图像中的边缘锯齿,以得到边缘更为平滑的分割图像,实施例中,将平滑处理后的分割图像记为第二分割图像。第二分割图像也可以认为是当前帧图像的最终分割结果。可理解,第二分割图像同样为二值图像,其像素值包括0和1两种,其中,像素值为0的区域属于当前帧图像的背景区域,像素值为1的区域属于当前帧图像的人像区域。In the embodiment, there are different degrees of edge jaggedness in the first segmented image, wherein the edge jaggedness can be understood as jagged edges between the portrait area and the background area, which makes the separation of the portrait area and the background area too stiff. In the embodiment, in order to reduce the influence of edge jaggedness, the first segmented image is smoothed, that is, the edge jaggedness in the first segmented image is smoothed, so as to obtain a segmented image with smoother edges. The segmented image is denoted as the second segmented image. The second segmented image can also be considered as the final segmented result of the current frame image. It can be understood that the second segmented image is also a binary image, and its pixel values include two types: 0 and 1. The area with a pixel value of 0 belongs to the background area of the current frame image, and the area with a pixel value of 1 belongs to the background area of the current frame image. Portrait area.
其中,平滑处理时采用的技术手段可以根据实际情况设定,实施例中,通过高斯平滑滤波的方式实现平滑处理。示例性的,高斯平滑滤波中使用高斯核函数处理第一分割图像以得到第二分割图像。其中,高斯核函数为一种常用的核函数,此时,平滑处理可以表示为:S 2=S 1*G,其中,S 2表示第二分割图像,S 1表示第一分割图像,G表示高斯核函数。 The technical means used in the smoothing processing can be set according to the actual situation. In the embodiment, the smoothing processing is implemented by means of Gaussian smoothing filtering. Exemplarily, the Gaussian kernel function is used in the Gaussian smoothing filtering to process the first segmented image to obtain the second segmented image. The Gaussian kernel function is a commonly used kernel function. At this time, the smoothing process can be expressed as: S 2 =S 1 *G, where S 2 represents the second segmented image, S 1 represents the first segmented image, and G represents the Gaussian kernel function.
步骤140、将视频数据中下一帧图像作为当前帧图像,并返回执行将当前帧图像输入至训练好的图像分割模型的操作,直到视频数据中每帧图像均得到对应的第二分割图像为止。 Step 140, take the next frame image in the video data as the current frame image, and return to perform the operation of inputting the current frame image to the trained image segmentation model, until each frame image in the video data obtains the corresponding second segmentation image. .
示例性的,得到第二分割图像后,便可以认为已经完成对当前帧图像的图像分割,因此,可以处理视频数据中的下一帧图像。其中,处理过程为将下一帧图像作为当前帧图像,并重复步骤110-步骤130,以再次得到当前帧图像的第二分割图像,之后,再次获取下一帧图像,并重复上述过程,直到视频数据中每帧图像均得到对应的第二分割图像为止,以此实现图像分割。Exemplarily, after the second divided image is obtained, it can be considered that the image segmentation of the current frame image has been completed, and therefore, the next frame image in the video data can be processed. The processing procedure is to take the next frame of image as the current frame image, and repeat steps 110 to 130 to obtain the second segmented image of the current frame image again, and then obtain the next frame of image again, and repeat the above process until Image segmentation is achieved until each frame of image in the video data obtains a corresponding second segmented image.
可理解,得到第二分割图像后,便可以结合实际需求对当前帧图像进行处理。实施例中,以背景替换为例,此时,对第一分割图像进行平滑处理,以得到基于目标对象的第二分割图像之后,还包括:获取目标背景图像,目标背景图像中包含目标背景;根据目标背景图像和第二分割图像对当前帧图像进行背景替换,以得到当前帧新图像。It can be understood that, after the second segmented image is obtained, the current frame image can be processed according to actual needs. In the embodiment, taking background replacement as an example, at this time, after smoothing the first segmented image to obtain the second segmented image based on the target object, the method further includes: acquiring a target background image, where the target background image includes the target background; The background of the current frame image is replaced according to the target background image and the second segmented image, so as to obtain a new image of the current frame.
其中,目标背景是指背景替换后使用的新背景。目标背景图像是指包含目标背景的图像。可选的,目标背景图像和第二分割图像的分辨率相同。目标背景图像可以是图像分割设备的使用者选择的图像,还可以是图像分割设备默认的图像。示例性的,获取目标背景图像后,对当前帧图像进行背景替换,以得到替换后的图像。实施例中,将背景替换后的图像记为当前帧新图像。示例性的,背景替换方式为:通过第二分割图像确定当前帧图像中人像区域所 在的像素点和背景区域所在的像素点,之后,保留人像区域并将相应的背景区域替换为目标背景图像中相关的目标背景,以得到当前帧新图像。背景替换可以表示为:I’=I×S 2+(1-S 2)×B,其中,S 2表示第二分割图像,I’表示当前帧新图像,I表示当前帧图像,B表示目标背景图像。上述公式中,通过I×S 2可以保留人像区域(即当前帧图像与第二分割图像相乘后,当前帧图像中像素点对应于第二分割图像中像素值为1的像素点,其被保留下来),通过(1-S 2)×B可以替换背景区域(即目标背景图像与第二分割图像相乘后,目标背景图像中像素点对应于第二分割图像中像素值为0的像素点,其被保留下来)。可理解,对视频数据进行图像分割时,得到每帧第二分割图像后,均可以通过第二分割图像得到当前帧图像对应的当前帧新图像。之后,各当前帧新图像可以组成背景替换后新的视频数据。 The target background refers to a new background used after the background is replaced. The target background image is the image that contains the target background. Optionally, the target background image and the second segmented image have the same resolution. The target background image may be an image selected by the user of the image segmentation device, or may be a default image of the image segmentation device. Exemplarily, after the target background image is acquired, the background of the current frame image is replaced to obtain a replaced image. In the embodiment, the image after the background replacement is recorded as the new image of the current frame. Exemplarily, the background replacement method is: determining the pixels where the portrait area is located and the pixel points where the background area is located in the current frame image by using the second segmented image, after that, retaining the portrait area and replacing the corresponding background area with the one in the target background image. Correlate the target background to get a new image for the current frame. The background replacement can be expressed as: I'=I×S 2 +(1-S 2 )×B, where S 2 represents the second segmented image, I' represents the new image of the current frame, I represents the current frame image, and B represents the target background image. In the above formula, the portrait area can be retained by I×S 2 (that is, after the current frame image is multiplied by the second segmented image, the pixels in the current frame image correspond to the pixels with the pixel value of 1 in the second segmented image, which are Retained), the background area can be replaced by (1-S 2 )×B (that is, after the target background image is multiplied by the second segmented image, the pixels in the target background image correspond to the pixels whose pixel value is 0 in the second segmented image point, which is preserved). It can be understood that when the video data is segmented, after each frame of the second segmented image is obtained, a new image of the current frame corresponding to the current frame of image can be obtained through the second segmented image. After that, each new image of the current frame can form new video data after background replacement.
可理解,实施例中,为了便于理解技术方案,限定视频数据中包含目标对象,实际应用中,视频数据也可以不包含目标对象,此时,按照上述方法得到的第一分割图像为像素值全0的分割图像。It can be understood that, in the embodiment, in order to facilitate the understanding of the technical solution, the video data is limited to include the target object. In practical applications, the video data may not include the target object. 0 for the segmented image.
上述,通过获取包含目标对象的视频数据,将视频数据的各帧图像输入至图像分割模型,以得到对应的第一分割图像,之后,对第一分割图像进行平滑处理,以得到第二分割图像的技术手段,解决了一些图像分割技术无法准确地对在线视频数据进行图像分割的技术问题。通过采用自编码器的图像分割模型以及平滑处理可以准确地对视频数据进行图像分割,尤其是对在线视频数据进行分割,保证了在线视频数据的处理速度。且由于图像分割模型的自学习性,可以适用于图像复杂的视频数据,且在应用过程中,只需部署图像分割模型便可以直接应用,无需人为先验信息,简化了图像分割的复杂度,扩大了图像分割方法的应用场景。In the above, by acquiring the video data containing the target object, each frame of the video data is input into the image segmentation model to obtain the corresponding first segmented image, and then the first segmented image is smoothed to obtain the second segmented image It solves the technical problem that some image segmentation technologies cannot accurately segment online video data. By using the image segmentation model of the auto-encoder and the smoothing process, the video data can be accurately segmented, especially the online video data, which ensures the processing speed of the online video data. And due to the self-learning of the image segmentation model, it can be applied to video data with complex images, and in the application process, it can be directly applied only by deploying the image segmentation model without artificial prior information, which simplifies the complexity of image segmentation. The application scenarios of image segmentation methods are expanded.
可理解,上述图像分割方法可以认为是图像分割模型的应用过程。实际应用中,图像分割模型的性能可直接影响图像分割的结果,因此,除了应用图像分割模型外,图像分割模型的训练过程也是重要的环节。示例性的,图2为本申请实施例提供的另一种图像分割方法的流程图。本图像分割方法是在上述图像分割方法的基础上,对图像分割模型的训练过程进行示例性说明。参考图2,该图像分割方法具体包括:It can be understood that the above-mentioned image segmentation method can be considered as an application process of an image segmentation model. In practical applications, the performance of the image segmentation model can directly affect the results of image segmentation. Therefore, in addition to the application of the image segmentation model, the training process of the image segmentation model is also an important part. Exemplarily, FIG. 2 is a flowchart of another image segmentation method provided by an embodiment of the present application. This image segmentation method is based on the above-mentioned image segmentation method, and exemplifies the training process of the image segmentation model. Referring to Figure 2, the image segmentation method specifically includes:
步骤210、获取训练数据集,训练数据集包含多张原始图像。Step 210: Acquire a training data set, where the training data set includes multiple original images.
训练数据是指训练图像分割模型时使图像分割模型进行学习的数据,实施例中,训练数据为图像形式,因此,将训练数据称为原始图像,原始图像与视频数据包含相同类型的目标对象。示例性的,训练数据集是指包含大量原始图像的数据集。即训练过程中,从训练数据集中选择大量的原始图像供图像分割模型进行学习,以提高图像分割模型的精确度。The training data refers to the data that the image segmentation model learns when training the image segmentation model. In the embodiment, the training data is in the form of images, so the training data is referred to as the original image, and the original image and the video data contain the same type of target objects. Illustratively, a training dataset refers to a dataset containing a large number of original images. That is, during the training process, a large number of original images are selected from the training data set for the image segmentation model to learn, so as to improve the accuracy of the image segmentation model.
示例性的,视频数据包含大量的图像,如果根据视频数据采集原始图像,需要基于视频 数据逐帧采集图像,这样会消耗大量的工作量以及制作成本,且采集的各原始图像中会包含大量重复的内容,不利于图像分割模型的训练。因此,实施例中,通过独立的原始图像代替视频数据构建训练数据集。此时,构建的训练数据集可以包含不同场景下具有不同人像姿态的原始图像。其中,场景优选为自然场景。即预先选择多个自然场景,并在每个自然场景下利用图像采集装置拍摄多张包含人类的图像作为原始图像,其中,多张图像中人类的姿态不同。可选的,为了降低图像采集装置拍摄时的参数(如图像采集装置的位置、光圈大小、聚焦程度等)以及自然环境中光照对图像分割模型性能的影响,在构建训练数据集时,同一自然场景同一人像姿态下,采集多张处于不同光照及不同拍摄参数下的原始图像,以保证图像分割模型处理不同场景、不同人像姿态、不同光照以及不同拍摄参数下视频数据时的性能。Exemplarily, the video data contains a large number of images. If the original images are collected according to the video data, the images need to be collected frame by frame based on the video data, which will consume a lot of work and production costs, and each collected original image will contain a large number of repetitions. content, which is not conducive to the training of image segmentation models. Therefore, in the embodiment, the training data set is constructed by replacing the video data with independent original images. At this point, the constructed training dataset can contain original images with different portrait poses in different scenes. Wherein, the scene is preferably a natural scene. That is, a plurality of natural scenes are preselected, and in each natural scene, a plurality of images containing human beings are captured by an image acquisition device as original images, wherein the postures of the human beings in the plurality of images are different. Optionally, in order to reduce the influence of the parameters of the image acquisition device (such as the position of the image acquisition device, the aperture size, the degree of focus, etc.) and the lighting in the natural environment on the performance of the image segmentation model, when constructing the training data set, the same natural In the same portrait pose of the scene, multiple original images under different lighting and different shooting parameters are collected to ensure the performance of the image segmentation model when processing video data in different scenes, different portrait poses, different lighting and different shooting parameters.
可理解,还可以使用现有的公开图像数据集作为训练数据集,如使用公开数据集Supervisely作为训练数据集,再如使用公开数据集EG1800作为训练数据集。It is understandable that an existing public image data set can also be used as the training data set, for example, the public data set Supervisely can be used as the training data set, or the public data set EG1800 can be used as the training data set.
步骤220、根据训练数据集构建标签数据集,标签数据集包含多张分割标签图像和多张边缘标签图像,一张原始图像对应一张分割标签图像和一张边缘标签图像。 Step 220 , constructing a label data set according to the training data set, the label data set includes a plurality of segmentation label images and a plurality of edge label images, and an original image corresponds to a segmentation label image and an edge label image.
示例性的,标签数据可以理解为用于确定图像分割模型是否精确的参考数据,其起到监督作用。若图像分割模型输出的结果与对应的标签数据越相似,则说明图像分割模型的精确度越高,即性能越好,否则,说明图像分割模型的精确度越低。可理解,训练图像分割模型的过程便是使图像分割模型输出的结果与对应的标签数据越相似的过程。Exemplarily, the label data can be understood as reference data for determining whether the image segmentation model is accurate, which plays a role of supervision. If the output result of the image segmentation model is more similar to the corresponding label data, it means that the accuracy of the image segmentation model is higher, that is, the performance is better; otherwise, the accuracy of the image segmentation model is lower. It can be understood that the process of training the image segmentation model is the process of making the output result of the image segmentation model more similar to the corresponding label data.
一个实施例中,将原始图像输入图像分割模型后,图像分割模型输出原始图像对应的分割图像和边缘图像,其中,分割图像是指对原始图像中的目标对象进行图像分割后得到的二值图像,实施例中,将训练过程中图像分割模型输出的分割图像记为分割结果图像。边缘图像是指表示原始图像中人像区域和背景区域之间边缘的二值图像,实施例中,将训练过程中图像分割模型输出的边缘图像记为边缘结果图像。为了对图像分割模型进行精准训练,实施例中,根据图像分割模块输出结果设置标签数据包含分割标签图像和边缘标签图像,其中,分割标签图像对应于分割结果图像,用于对分割结果图像起到参考作用,边缘标签图像对应于边缘结果图像,用于对边缘结果图像起到参考作用。每张原始图像均存在对应的分割标签图像和边缘标签图像,各分割标签图像和边缘标签图像组成标签数据集。In one embodiment, after the original image is input into the image segmentation model, the image segmentation model outputs a segmented image and an edge image corresponding to the original image, wherein the segmented image refers to a binary image obtained by performing image segmentation on the target object in the original image. , in the embodiment, the segmented image output by the image segmentation model in the training process is recorded as the segmentation result image. The edge image refers to a binary image representing the edge between the portrait area and the background area in the original image. In the embodiment, the edge image output by the image segmentation model in the training process is recorded as the edge result image. In order to accurately train the image segmentation model, in the embodiment, the label data is set according to the output result of the image segmentation module, including the segmentation label image and the edge label image, wherein the segmentation label image corresponds to the segmentation result image, and is used to play a role in the segmentation result image. For reference, the edge label image corresponds to the edge result image, and is used for reference to the edge result image. Each original image has corresponding segmented label images and edge label images, and each segmented label image and edge label image constitutes a label dataset.
示例性的,边缘标签图像和分割标签图像均可以通过上述的原始图像得到。例如,采用人工标注的方式,在各原始图像中标记出人像区域和背景区域以及边缘区域,之后,根据人像区域、背景区域和边缘区域得到边缘标签图像和分割标签图像。又如,采用人工标注的方式,在各原始图像中标记出人像区域和背景区域,之后,根据人像区域和背景区域得到分割标签图像,并根据分割标签图像得到边缘标签图像。Exemplarily, both the edge label image and the segmented label image can be obtained from the above-mentioned original image. For example, by using manual labeling, the portrait area, background area and edge area are marked in each original image, and then the edge label image and segmentation label image are obtained according to the portrait area, background area and edge area. For another example, a human-marking method is used to mark a portrait region and a background region in each original image, and then a segmented label image is obtained according to the portrait region and the background region, and an edge label image is obtained according to the segmented label image.
实施例中,以通过人工标注得到分割标签图像并通过分割标签图像得到边缘标签图像的方式进行示例性描述,在本实施例中,步骤220包括步骤221-步骤225:In the embodiment, an exemplary description is given in the manner of obtaining a segmented label image by manual annotation and obtaining an edge label image by segmenting the label image. In this embodiment, step 220 includes steps 221-225:
步骤221、获取针对原始图像的标注结果。Step 221: Acquire an annotation result for the original image.
标注结果是指对原始图像中人像区域和背景区域进行标记后得到的结果。实施例中,采用人工标注的方式得到标注结果,即人工在上述的原始图像中标记出人像区域和背景区域,之后,图像分割设备根据标记出的人像区域和背景区域得到标注结果。The labeling result refers to the result obtained after labeling the portrait area and background area in the original image. In the embodiment, the labeling result is obtained by manual labeling, that is, the portrait area and the background area are manually marked in the above-mentioned original image, and then the image segmentation device obtains the labeling result according to the marked portrait area and background area.
步骤222、根据标注结果得到对应的分割标签图像。Step 222: Obtain a corresponding segmented label image according to the labeling result.
示例性的,根据标注结果,将原始图像中人像区域包含的各像素点的像素值变为255,将原始图像中背景区域包含的各像素点的像素值变为0,进而得到分割标签图像。可理解,分割标签图像为二值图像。Exemplarily, according to the labeling result, the pixel value of each pixel included in the portrait region in the original image is changed to 255, and the pixel value of each pixel included in the background region in the original image is changed to 0, thereby obtaining the segmented label image. Understandably, the segmented label image is a binary image.
步骤223、对分割标签图像进行腐蚀操作,以得到腐蚀图像。Step 223: Perform an erosion operation on the segmented label image to obtain an erosion image.
腐蚀操作可以理解为将分割标签图像中像素值为255的白色区域(即人像区域)进行缩减细化。实施例中,将对分割标签图像进行腐蚀操作后得到的图像记为腐蚀图像。可理解,腐蚀图像中白色区域所占用的像素点的数量小于分割标签图像中白色区域所占用的像素点的数量,且分割标签图像中的白色区域可完全覆盖腐蚀图像中的白色区域。The erosion operation can be understood as reducing and refining the white area (ie, the portrait area) with a pixel value of 255 in the segmented label image. In the embodiment, the image obtained by performing the erosion operation on the segmented label image is recorded as the erosion image. It can be understood that the number of pixels occupied by the white area in the eroded image is smaller than the number of pixels occupied by the white area in the segmented label image, and the white area in the segmented label image can completely cover the white area in the eroded image.
步骤224、对分割标签图像和腐蚀图像进行布尔操作,以得到原始图像对应的边缘标签图像。Step 224 , perform a Boolean operation on the segmented label image and the eroded image to obtain an edge label image corresponding to the original image.
布尔操作包括联合、相交和相减。进行布尔操作的多个对象为操作对象,实施例中,操作对象包括分割标签图像和腐蚀图像,更具体为分割标签图像和腐蚀图像中的白色区域。通过布尔操作得到的结果可以记为布尔对象,实施例中,布尔对象为边缘标签图像。示例性的,联合是指得到的布尔对象包含两个操作对象的体积。由于分割标签图像中的白色区域可完全覆盖腐蚀图像中的白色区域,因此,对分割标签图像和腐蚀图像进行联合后得到的布尔对象为分割标签图像中的白色区域。相交是指得到的布尔对象只包含两个操作对象共同的体积(即仅包含重叠的位置),由于分割标签图像中的白色区域可完全覆盖腐蚀图像中的白色区域,因此,对分割标签图像和腐蚀图像进行相交后得到的布尔对象为腐蚀图像中的白色区域。相减是指布尔对象包含从中减去相交体积的操作对象的体积,如对分割标签图像和腐蚀图像进行相减后得到的布尔对象为在分割标签图像的白色区域中除去对应于腐蚀图像的白色区域后得到的白色区域。可理解,由于腐蚀图像是对分割标签图像中的白色区域进行缩小后得到的图像,腐蚀图像和分割标签图像中白色区域的边缘高度相似,因此,将分割标签图像和腐蚀图像进行布尔操作的相减后,便可以得到仅表示边缘的白色区域,即得到边缘标签图像。此时,边缘标签图像可以表示为:GT edge=GT-GT erode,其中,GT edge表示边缘标签图像,GT表示分割 标签图像,GT erode表示腐蚀图像。可理解,边缘标签图像为二值图像,且分辨率等于分割标签图像的分辨率。 Boolean operations include union, intersection, and subtraction. A plurality of objects that perform Boolean operations are operation objects. In the embodiment, the operation objects include the segmented label image and the eroded image, more specifically, the segmented label image and the white area in the eroded image. The result obtained by the Boolean operation may be recorded as a Boolean object. In the embodiment, the Boolean object is an edge label image. Illustratively, union means that the resulting Boolean object contains the volume of the two operands. Since the white area in the segmented label image can completely cover the white area in the eroded image, the Boolean object obtained by combining the segmented label image and the eroded image is the white area in the segmented label image. Intersection means that the resulting Boolean object only contains the common volume of the two operation objects (that is, only contains overlapping positions). Since the white area in the segmented label image can completely cover the white area in the eroded image, therefore, for the segmented label image and The resulting Boolean object after intersecting the eroded image is the white area in the eroded image. Subtraction means that the Boolean object contains the volume of the operation object from which the intersection volume is subtracted. For example, the Boolean object obtained after subtracting the segmented label image and the eroded image is to remove the white color corresponding to the eroded image in the white area of the segmented label image. The resulting white area after the area. It can be understood that since the eroded image is an image obtained by reducing the white area in the segmented label image, the edges of the eroded image and the white area in the segmented label image are highly similar. After subtraction, you can get the white area that only represents the edge, that is, the edge label image. At this time, the edge label image can be expressed as: GT edge =GT-GT erode , where GT edge represents the edge label image, GT represents the segmentation label image, and GT erode represents the erosion image. It can be understood that the edge label image is a binary image, and the resolution is equal to the resolution of the segmentation label image.
步骤225、根据分割标签图像和边缘标签图像得到标签数据组。Step 225: Obtain a label data set according to the segmented label image and the edge label image.
按照上述步骤得到各原始图像的分割标签图像和边缘标签图像后,由各分割标签图标和各边缘标签图像组成标签数据组。可理解,分割标签图像和边缘标签图像可认为是Ground Truth,即正确的标签。After the segmented label images and edge label images of each original image are obtained according to the above steps, a label data set is formed by each segmented label icon and each edge label image. It can be understood that the segmentation label image and the edge label image can be considered as Ground Truth, that is, the correct label.
步骤230、根据训练数据集和标签数据集训练图像分割模型。Step 230: Train the image segmentation model according to the training data set and the label data set.
示例性的,将一张原始图像输入至图像分割模型,并根据图像分割模型输出的结果和标签数据集中对应的标签数据构建损失函数,之后,根据损失函数更新图像分割模型的模型参数。之后,将另外一张原始图像输入至更新后的图像分割模型,以再次构建损失函数,并根据损失函数再次更新图像分割模型的模型参数,重复上述训练过程,直到损失函数收敛为止。其中,连续次数计算得到的损失函数的数值在设定范围内时,可以认为损失函数收敛,进而确定图像分割模型输出结果的准确率稳定,因此,可以认为图像分割模型训练完毕。Exemplarily, an original image is input to the image segmentation model, and a loss function is constructed according to the output result of the image segmentation model and the corresponding label data in the label dataset, and then the model parameters of the image segmentation model are updated according to the loss function. After that, another original image is input into the updated image segmentation model to construct the loss function again, and the model parameters of the image segmentation model are updated again according to the loss function, and the above training process is repeated until the loss function converges. Among them, when the value of the loss function obtained by successive calculations is within the set range, it can be considered that the loss function has converged, and then the accuracy of the output result of the image segmentation model can be determined to be stable. Therefore, it can be considered that the image segmentation model has been trained.
示例性的,图像分割模型的具体结构可以根据实际情况设定。实施例中,以图像分割模型包括:归一化模块、编码模块、通道混淆模块、残差模块、多倍上采样模块、输出模块以及边缘模块为例进行描述。为了便于理解,以图3所示的结构对图像分割模型进行示例性描述。其中,图3为本申请实施例提供的一种图像分割模型的结构示意图。参考图3,图像分割模型包括归一化模块21、编码模块22、四个通道混淆模块23、三个残差模块24、四个多倍上采样模块25、输出模块26以及边缘模块27。在本实施例中,步骤230包括步骤231-步骤2310:Exemplarily, the specific structure of the image segmentation model can be set according to the actual situation. In the embodiment, the image segmentation model includes: a normalization module, an encoding module, a channel confusion module, a residual module, a multiple upsampling module, an output module and an edge module as an example for description. For ease of understanding, the image segmentation model is exemplarily described with the structure shown in FIG. 3 . 3 is a schematic structural diagram of an image segmentation model provided by an embodiment of the present application. 3 , the image segmentation model includes a normalization module 21 , an encoding module 22 , four channel obfuscation modules 23 , three residual modules 24 , four multiple upsampling modules 25 , an output module 26 and an edge module 27 . In this embodiment, step 230 includes steps 231-2310:
步骤231、将原始图像输入至归一化模块,以得到归一化图像。Step 231: Input the original image to the normalization module to obtain a normalized image.
实施例中,以原始图像的分辨率为224×224为例进行描述。例如,图4为本申请实施例提供的原始图像示意图。参考图4,原始图像包含一个人像区域,需说明图4中所使用的原始图像来源于公开数据集Supervisely。In the embodiment, description is given by taking an example that the resolution of the original image is 224×224. For example, FIG. 4 is a schematic diagram of an original image provided by an embodiment of the present application. Referring to Figure 4, the original image contains a portrait area, and it should be noted that the original image used in Figure 4 comes from the public dataset Supervisely.
示例性的,归一化是指对图像进行了一系列标准的处理变换,使图像变换为一固定标准形式的过程,此时,得到的标准图像称作归一化图像。归一化分为线性归一化和非线性归一化,实施例中,采用线性归一化的方式处理原始图像。其中,线性归一化为将各图像中的像素值自[0,255]归一化至[-1,1],且得到的归一化图像的分辨率与线性归一化前图像的分辨率相等。可理解,归一化模块为实现线性归一化操作的模块,将原始图像输入至归一化模块后,归一化模块输出像素值为[-1,1]的归一化图像。Exemplarily, normalization refers to the process of performing a series of standard processing and transformation on the image to transform the image into a fixed standard form. In this case, the obtained standard image is called a normalized image. Normalization is divided into linear normalization and nonlinear normalization. In the embodiment, the original image is processed by means of linear normalization. Among them, linear normalization is to normalize the pixel values in each image from [0, 255] to [-1, 1], and the resolution of the obtained normalized image is equal to the resolution of the image before linear normalization . It can be understood that the normalization module is a module that implements a linear normalization operation. After the original image is input to the normalization module, the normalization module outputs a normalized image with a pixel value of [-1, 1].
步骤232、利用编码模块得到归一化图像的多层图像特征,且每层图像特征的分辨率不 同。Step 232, using the coding module to obtain the multi-layer image features of the normalized image, and the resolutions of the image features of each layer are different.
编码模块用于提取归一化图像中的特征,实施例中,将提取的特征记为图像特征。可理解,图像特征可以体现归一化图像中颜色特征、纹理特征、形状特征和空间关系特征等信息,包括全局信息和/或局部信息。示例性的,编码模块为轻量级网络,其中,轻量级网络是指参数量少、计算量小、推理时间短的神经网络。编码模块所采用的轻量级网络的类型可以根据实际情况选择,实施例中,参考图3,以编码模块12为MobileNetV2网络为例进行描述。The encoding module is used to extract features in the normalized image. In the embodiment, the extracted features are recorded as image features. It can be understood that the image features may reflect information such as color features, texture features, shape features, and spatial relationship features in the normalized image, including global information and/or local information. Exemplarily, the encoding module is a lightweight network, where the lightweight network refers to a neural network with a small amount of parameters, a small amount of computation, and a short inference time. The type of the lightweight network used by the encoding module can be selected according to the actual situation. In the embodiment, referring to FIG. 3 , the encoding module 12 is a MobileNetV2 network as an example for description.
一个实施例中,归一化图像经过MobileNetV2后可以输出多层图像特征,其中,各层图像特征的分辨率不同且存在倍数的关系,可选的,各层图像特征的分辨率均小于原始图像的分辨率。一个实施例中,各层图像特征按照分辨率由高到低的顺序由上至下排列,即分辨率最高的图像特征位于最高层,分辨率最低的图像特征位于最低层。可理解,编码模块输出图像特征的层数可以结合实际情况设定。例如,当原始图像的分辨率为224×224时,编码模块输出四层图像特征。此时,参考图3,编码模块22输出的四层图像特征中,最高层(第一层)图像特征的分辨率为112×112(图3中该层图像特征记为Feature112×112),次高层(第二层)图像特征的分辨率为56×56(图3中该层图像特征记为Feature56×56),次低层(第三层)图像特征的分辨率为28×28(图3中该层图像特征记为Feature28×28),最低层(第四层)图像特征的分辨率为14×14(图3中该层图像特征记为Feature14×14)。可理解,各层图像特征由下至上所包含的信息量越来越多。各相邻层的图像特征之间存在相同的倍数关系,且各层图像特征的分辨率小于原始图像的分辨率。可理解,实施例中关于分辨率和层级的对应关系仅是为了解释图像分割模型,而非对图像分割模型的限定。In one embodiment, the normalized image can output multi-layer image features after passing through MobileNetV2, wherein the resolutions of the image features of each layer are different and there is a multiple relationship, and optionally, the resolution of the image features of each layer is smaller than the original image. resolution. In one embodiment, the image features of each layer are arranged from top to bottom in an order from high to low resolution, that is, the image features with the highest resolution are located in the highest layer, and the image features with the lowest resolution are located in the lowest layer. It can be understood that the number of layers of the image features output by the encoding module can be set according to the actual situation. For example, when the resolution of the original image is 224×224, the encoding module outputs four layers of image features. At this time, referring to FIG. 3 , among the four-layer image features output by the encoding module 22, the resolution of the highest layer (first layer) image feature is 112×112 (the image feature of this layer is denoted as Feature112×112 in FIG. 3 ), the second The resolution of the high-level (second layer) image features is 56 × 56 (the image features of this layer are marked as Feature56 × 56 in Figure 3), and the resolution of the next-lowest layer (third layer) image features is 28 × 28 (Figure 3). The image feature of this layer is marked as Feature28×28), and the resolution of the image feature of the lowest layer (fourth layer) is 14×14 (the image feature of this layer is marked as Feature14×14 in FIG. 3 ). Understandably, the image features of each layer contain more and more information from the bottom to the top. The same multiple relationship exists between the image features of each adjacent layer, and the resolution of the image features of each layer is smaller than that of the original image. It can be understood that the corresponding relationship between the resolution and the level in the embodiment is only for explaining the image segmentation model, rather than limiting the image segmentation model.
需说明,每层图像特征所包含的通道数实施例不作限定。It should be noted that the number of channels included in each layer of image features is not limited in the embodiment.
可理解,编码模块可以理解为图像分割模型中的编码器。Understandably, the encoding module can be understood as an encoder in the image segmentation model.
步骤233、分别将各层图像特征输入至对应的通道混淆模块,以得到多层混淆特征,每层图像特征对应一个通道混淆模块。Step 233: Input the image features of each layer into the corresponding channel confusion module respectively to obtain multi-layer confusion features, and each layer of image features corresponds to a channel confusion module.
通道混淆模块用于将层中各通道间的特征进行融合,以丰富各层图像特征所包含的信息且在不增加后续计算量时保证图像分割模型的精确度。可理解,每层图像特征对应一个通道混淆模块,如图3中四层图像特征对应四个通道混淆模块23,每个通道混淆模块23用于将对应层中多个通道间的图像特征进行融合。The channel confusion module is used to fuse the features between the channels in the layer to enrich the information contained in the image features of each layer and ensure the accuracy of the image segmentation model without increasing the subsequent calculation amount. It can be understood that each layer of image features corresponds to one channel confusion module. As shown in Figure 3, the four-layer image features correspond to four channel confusion modules 23. Each channel confusion module 23 is used to fuse the image features between multiple channels in the corresponding layer. .
一个实施例中,通道混淆模块由1×1卷积层、批归一化(Batch Normalization,BN)层和激活函数层组成,其中,激活函数层中采用Relu激活函数。其中,通过1×1卷积层实现通道间图像特征的混淆,通过BN层+激活函数层可以使得混淆后的图像特征更稳定。可理解,上述通道混淆模块的结构仅为一种示例性描述,实际应用中,还可以为通道混淆模块设置其 他的结构。In one embodiment, the channel confusion module is composed of a 1×1 convolution layer, a batch normalization (BN) layer, and an activation function layer, wherein the activation function layer adopts a Relu activation function. Among them, the 1×1 convolution layer is used to realize the confusion of image features between channels, and the BN layer + activation function layer can make the confused image features more stable. It can be understood that the structure of the above channel confusion module is only an exemplary description, and in practical applications, other structures may also be set for the channel confusion module.
示例性的,将通道混淆模块输出的特征记为混淆特征。可理解,每层图像特征均有对应的混淆特征,且同一层中混淆特征和图像特征的分辨率相同。一个实施例中,除了分辨率最低的混淆特征,其他的混淆特征为中心层特征,即其他层可以认为是网络中心层。以图3为例,经过各自的通道混淆模块23后,最低层的混淆特征表示为Decode 14×14,其他层的混淆特征分别表示为Center28×28、Center56×56、Center112×112。其中,数字部分表示分辨率。Exemplarily, the features output by the channel confusion module are recorded as confusion features. It can be understood that each layer of image features has corresponding confusion features, and the resolution of the confusion features and image features in the same layer is the same. In one embodiment, except for the confusing features with the lowest resolution, other confusing features are central layer features, that is, other layers may be considered as network central layers. Taking Figure 3 as an example, after passing through the respective channel obfuscation modules 23, the obfuscated features of the lowest layer are represented as Decode 14×14, and the obfuscated features of other layers are represented as Center28×28, Center56×56, Center112×112 respectively. Among them, the digital part represents the resolution.
可理解,通道混淆模块输出的混淆特征也可以认为是对图像特征进行解码后得到的特征,即通道混淆模块除了混淆特征外,还可以实现解码的作用。It can be understood that the confusion feature output by the channel confusion module can also be regarded as the feature obtained after decoding the image feature, that is, the channel confusion module can also realize the function of decoding in addition to the confusion feature.
步骤234、除分辨率最高层混淆特征外,将其他每层的混淆特征进行上采样,并与高一级分辨率的混淆特征进行融合以得到高一级分辨率对应的融合特征。Step 234: Upsampling the confusion features of each layer except the confusion features at the highest resolution level, and fuses them with the confusion features of a higher resolution to obtain a fusion feature corresponding to a higher resolution.
上采样可以理解为对特征进行放大,以扩大特征的分辨率。实施例中,通过线性插值法实现上采样,即在混淆特征之间采用合适的插值算法插入新的元素,以扩大混淆特征的分辨率。Upsampling can be understood as enlarging the feature to enlarge the resolution of the feature. In the embodiment, the up-sampling is implemented by a linear interpolation method, that is, a suitable interpolation algorithm is used to insert new elements between the obfuscated features, so as to expand the resolution of the obfuscated features.
本步骤中,通过上采样可以扩大混淆特征的分辨率,以使扩大后的分辨率等于高一级分辨率。其中,高一级分辨率是指比当前进行上采样的分辨率高且仅比当前进行上采样的分辨率高的分辨率,此时,进行上采样的分辨率可以认为是其高一级分辨率的低一级分辨率。例如,图3中,除最低层外,其他每层的分辨率均是其下一层分辨率的高一级分辨率。可理解,由于任一层混淆特征的分辨率与其高一级分辨率为倍数关系,因此,可以根据该倍数确定上采样的倍数。例如,某一层混淆特征的分辨率是高一级分辨率的0.5倍,那么,可以采用二倍上采样的方式扩大该层混淆特征的分辨率。之后,将高一级分辨率的混淆特征通过跳跃连接(Skip Connection)与其低一级分辨率对应上采样后的混淆特征进行融合,以重复利用混淆特征,保证在后续处理过程中使用信息更丰富的特征。可理解图像分割为密集像素预测(Dense Prediction)的一种,因此,原始图像分割模型时需要更丰富的特征。实施例中,将融合后的特征记为融合特征,此时,除了分辨率最低的混淆特征外,每一层的混淆特征均有对应的融合特征。其中,进行特征融合的操作可以理解为concatenate(向量拼接)操作。可以理解,每一层的融合特征的大小是本层混淆特征和低一级分辨率混淆特征进行上采样后大小的和值,举例而言,本层混淆特征融合前的[NCHW]中C为3,低一级分辨率的混淆特征进行上采样后融合前的[NCHW]中C为3,那么,融合后的融合特征的[NCHW]中C为6,N、H、W的数值不变。其中,N为数量,C为通道数,H为高度,W为宽度,H×W可以理解为分辨率。需说明,由于最高的分辨率不存在高一级分辨率,因此,无需对分辨率最高的混 淆特征进行上采样。In this step, the resolution of the confusion feature can be enlarged by up-sampling, so that the enlarged resolution is equal to the one-level higher resolution. Among them, the higher-level resolution refers to a resolution higher than the current up-sampling resolution and only higher than the current up-sampling resolution. At this time, the up-sampling resolution can be considered as its higher-level resolution A lower resolution of the rate. For example, in Fig. 3, except for the lowest layer, the resolution of each other layer is one level higher than the resolution of the next layer. It can be understood that, since the resolution of the confusion feature of any layer has a multiple relationship with its higher-level resolution, the multiple of upsampling can be determined according to the multiple. For example, if the resolution of the confusion feature of a certain layer is 0.5 times the resolution of the higher level, then the resolution of the confusion feature of this layer can be enlarged by means of double upsampling. After that, the confusing features of the higher resolution are fused with the corresponding up-sampled confusion features of the lower resolution through Skip Connection, so as to reuse the confusing features and ensure the use of more information in the subsequent processing process. Characteristics. It can be understood that image segmentation is a kind of dense pixel prediction (Dense Prediction), therefore, the original image segmentation model requires richer features. In the embodiment, the fused feature is recorded as a fused feature. In this case, except for the confusion feature with the lowest resolution, each layer of confusion features has a corresponding fusion feature. Among them, the operation of feature fusion can be understood as a concatenate (vector splicing) operation. It can be understood that the size of the fusion feature of each layer is the sum of the upsampling size of the confusion feature of this layer and the confusion feature of the lower resolution. For example, C in [NCHW] before the fusion of the confusion feature of this layer is 3. C is 3 in [NCHW] before fusion after upsampling of the confusion feature of lower resolution . Among them, N is the number, C is the number of channels, H is the height, W is the width, and H×W can be understood as the resolution. It should be noted that since the highest resolution does not have a higher-order resolution, there is no need to upsample the obfuscated feature with the highest resolution.
举例而言,参考图3所示的图像分割模型,最低层的混淆特征Decode 14×14进行二倍上采样后分辨率扩大一倍,即得到分辨率为28×28的特征,之后,最低层的高一级分辨率(即次低层)的混淆特征Center28×28通过跳跃连接与最低层二倍上采样后的28×28特征进行融合,以得到次低层的融合特征。同理,将次低层的混淆特征Center28×28进行二倍上采样后分辨率扩大一倍,即得到分辨率为56×56的特征,之后,次低层的高一级分辨率(即次高层)的混淆特征Center56×56通过跳跃连接与次低层二倍上采样后的56×56特征进行融合,以得到次高层的融合特征。同理,得到最高层的融合特征。For example, referring to the image segmentation model shown in Figure 3, the confusion feature Decode 14×14 of the lowest layer is doubled up-sampling and the resolution is doubled, that is, a feature with a resolution of 28×28 is obtained. After that, the lowest layer The confusion feature Center28×28 of the higher first-level resolution (ie the second lower layer) is fused with the 28×28 features after the double upsampling of the lowest layer through skip connection to obtain the fusion feature of the second lower layer. In the same way, double upsampling the confusing feature Center28×28 of the second lower layer and double the resolution, that is, to obtain a feature with a resolution of 56 × 56. After that, the higher resolution of the second lower layer (that is, the second higher layer) The confusion feature Center56×56 is fused with the 56×56 features after double upsampling of the second lower layer through skip connection to obtain the fusion features of the second higher layer. In the same way, the fusion features of the highest level are obtained.
步骤235、分别将各层融合特征输入至对应的残差模块,以得到多层第一解码特征,每层融合特征对应一个残差模块,分辨率最低的混淆特征作为分辨率最低的第一解码特征。Step 235: Input the fusion features of each layer into the corresponding residual modules respectively to obtain multi-layer first decoding features, each layer of fusion features corresponds to a residual module, and the confusing feature with the lowest resolution is used as the first decoding feature with the lowest resolution. feature.
残差模块用于对融合特征进行进一步提取和解码,残差模块可以包含一个或多个残差块(Residual Block,RS Block),实施例中,以残差模块包含一个残差块为例进行描述,且残差块的结构可以实际情况设定。可以理解,每层融合特征对应一个残差模块,且残差模块处理后输出的特征与该层融合特征的分辨率相同。由于残差模块可以对融合特征进行进一步提取和解码,即残差模块输出的特征为解码特征,因此,实施例中,将残差模块输出的特征记为第一解码特征。The residual module is used to further extract and decode the fusion features, and the residual module may include one or more residual blocks (Residual Block, RS Block). In the embodiment, the residual module includes one residual block as an example. description, and the structure of the residual block can be set in the actual situation. It can be understood that each layer of fusion features corresponds to a residual module, and the features output by the residual module after processing have the same resolution as the fusion features of this layer. Since the residual module can further extract and decode the fusion feature, that is, the feature output by the residual module is the decoding feature, therefore, in the embodiment, the feature output by the residual module is recorded as the first decoding feature.
可理解,由于分辨率最低的混淆特征没有对应的融合特征,因此,无需在分辨率最低的层级中设置残差模块,此时,分辨率最低的混淆特征可以直接认为是该层的第一解码特征。相应的,其他层的融合特征经过对应的残差模块后,便可以得到对应的第一解码特征。It is understandable that since the confusion feature with the lowest resolution has no corresponding fusion feature, there is no need to set a residual module in the layer with the lowest resolution. At this time, the confusion feature with the lowest resolution can be directly regarded as the first decoding of this layer. feature. Correspondingly, after the fusion features of other layers pass through the corresponding residual modules, the corresponding first decoding features can be obtained.
以图3为例,其包含3个残差模块24,次底层的融合特征输入至残差模块后输出的第一解码特征记为RS Block28×28,即第一解码特征的分辨率为28×28。次高层的融合特征输入至残差模块后输出的第一解码特征记为RS Block56×56,即第一解码特征的分辨率为56×56。最高层的融合特征输入至残差模块后输出的第一解码特征记为RS Block112×112,即第一解码特征的分辨率为112×112。而最低层的第一解码特征为Decode14×14。Taking FIG. 3 as an example, it includes three residual modules 24, and the first decoding feature output after the fusion feature of the sub-bottom layer is input to the residual module is denoted as RS Block28×28, that is, the resolution of the first decoding feature is 28× 28. The first decoded feature output after the fusion feature of the next layer is input to the residual module is denoted as RS Block56×56, that is, the resolution of the first decoded feature is 56×56. The first decoded feature output after the fusion feature of the highest layer is input to the residual module is denoted as RS Block112×112, that is, the resolution of the first decoded feature is 112×112. And the first decoding feature of the lowest layer is Decode14×14.
步骤236、分别将各层第一解码特征输入至对应的多倍上采样模块,以得到多层第二解码特征,每层第一解码特征对应一个多倍上采样模块,各第二解码特征与原始图像的分辨率相同。Step 236: Input the first decoding feature of each layer into the corresponding multiple upsampling module to obtain multiple second decoding features, each layer of the first decoding feature corresponds to a multiple upsampling module, and each second decoding feature is the same as the The original image has the same resolution.
示例性的,多倍上采样模块用于对第一解码特征进行多倍上采样,以使多倍上采样后的分辨率等于原始图像的分辨率。其中,多倍上采样时的具体倍数可以根据第一解码特征的分辨率以及原始图像的分辨率决定,举例而言,第一解码特征的分辨率为14×14,原始图像的分辨率为224×224,那么,需要对第一解码特征进行16倍上采样才能得到分辨率为224×224 的解码特征。Exemplarily, the multiple upsampling module is configured to perform multiple upsampling on the first decoded feature, so that the resolution after the multiple upsampling is equal to the resolution of the original image. The specific multiple of multiple upsampling can be determined according to the resolution of the first decoding feature and the resolution of the original image. For example, the resolution of the first decoding feature is 14×14, and the resolution of the original image is 224 ×224, then, the first decoded feature needs to be upsampled by 16 times to obtain a decoded feature with a resolution of 224×224.
可理解,对于二分类的图像分割模型而言,其最终输出的二值图像(分割结果图像)用于区分前景(例如人像区域)和背景,因此,图像分割模型的分割任务属于二分类分割任务,此时,在得到分割结果图像之前,需要先得到通道数量为2的解码特征。实施例中,多倍上采样模块除了对第一解码特征进行多倍上采样外,还需要将多倍上采样后的第一解码特征的通道数变为2。对于每层的第一解码特征而言,其在多倍上采样后仅改变了分辨率,而通道数未发生变化,因此,实施例中,在多倍上采样模块中设置1×1的卷积层,即对第一解码特征进行多倍上采样后连接一个1×1的卷积层,以将多倍上采样后的第一解码特征的通道数变为2。实际应用中,图像分割模型还可以做多分类分割任务,此时,得到最终输出时的图像前,同样需要得到通道数量等于分类数量的解码特征。例如,图像分割模型做五分类的分割任务,则最终输出5类别分割结果图像前,需要先得到5通道的解码特征。需说明,使用对应的分割标签图像进行监督时,为了便于计算损失函数,需要将分割标签图像中像素点的像素值由0和255转换成0和1,即像素值为0的像素点转换为0、像素值为255的像素点转换为1。此时,在训练分割网络模型时,为了使图像分割模型最终输出2通道的解码特征,需要将分割标签图像将Ground Truth变化成one-hot编码形式,即每个类别均有一个通道,每个通道的像素点在属于当前类别时值为1,其他通道值为0。It can be understood that for the two-class image segmentation model, the final output binary image (segmentation result image) is used to distinguish the foreground (such as a portrait area) and the background. Therefore, the segmentation task of the image segmentation model belongs to the two-class segmentation task. , at this time, before obtaining the segmentation result image, it is necessary to obtain the decoding feature with the number of channels 2. In the embodiment, in addition to performing multiple upsampling on the first decoding feature, the multiple upsampling module also needs to change the number of channels of the multiple upsampled first decoding feature to 2. For the first decoding feature of each layer, it only changes the resolution after multiple upsampling, but the number of channels does not change. Therefore, in the embodiment, a 1×1 volume is set in the multiple upsampling module A convolution layer, that is, after performing multiple upsampling on the first decoded feature, a 1×1 convolutional layer is connected to change the number of channels of the multiplely upsampled first decoded feature to 2. In practical applications, the image segmentation model can also perform multi-classification segmentation tasks. At this time, before obtaining the final output image, it is also necessary to obtain decoding features with the number of channels equal to the number of classifications. For example, if the image segmentation model performs a five-category segmentation task, before finally outputting a five-category segmentation result image, it is necessary to obtain 5-channel decoding features. It should be noted that when using the corresponding segmentation label image for supervision, in order to facilitate the calculation of the loss function, the pixel values of the pixels in the segmentation label image need to be converted from 0 and 255 to 0 and 1, that is, the pixel value of 0 is converted to 0. A pixel with a pixel value of 255 is converted to 1. At this time, when training the segmentation network model, in order to make the image segmentation model finally output 2-channel decoding features, it is necessary to change the ground truth of the segmentation label image into one-hot encoding form, that is, each category has one channel, and each The pixel of the channel has a value of 1 when it belongs to the current category, and the value of other channels is 0.
实施例中,为了便于描述,将多倍上采样模块输出的特征记为第二解码特征。可理解,每层第一解码特征对应一个多倍上采样模块,通过多倍上采样模块可以得到通道数为2且分辨率与原始图像分辨率相同的第二解码特征。第二解码特征可以认为是对当前层的图像特征进行解码处理后得到的网络预测输出。In the embodiment, for the convenience of description, the feature output by the multiple upsampling module is denoted as the second decoding feature. It can be understood that the first decoding feature of each layer corresponds to a multiple upsampling module, and the second decoding feature with 2 channels and the same resolution as the original image can be obtained through the multiple upsampling module. The second decoding feature can be considered as a network prediction output obtained after decoding the image features of the current layer.
举例而言,参考图3,四层第一解码特征分别经过对应的多倍上采样模块25后可以得到四个分辨率为224×224、通道数为2的第二解码特征,图3中将4个第二解码特征均记为224×224。For example, referring to FIG. 3 , after the first decoding features of the four layers pass through the corresponding multiple upsampling modules 25 respectively, four second decoding features with a resolution of 224×224 and a channel number of 2 can be obtained. In FIG. 3 , the The four second decoding features are all recorded as 224×224.
可理解,每层的第二解码特征可以认为是对该层图像特征进行解码后得到的临时输出结果。通过临时输出结果可以得到最终的分割结果图像。It can be understood that the second decoding feature of each layer can be regarded as a temporary output result obtained after decoding the image feature of the layer. The final segmentation result image can be obtained by temporarily outputting the result.
步骤237、将多层第二解码特征联合后输入至输出模块,以得到分割结果图像。Step 237: Combine the multi-layer second decoding features and input them to the output module to obtain a segmentation result image.
由于图像分割模型最终需要输出一张分割结果图像,因此,获得第二解码特征后,通过输出模块对各层第二解码特征进行整合,以得到一张分割结果图像(即二值图像)。示例性的,先将各层第二解码特征进行融合(即Concatenate),便于输出模块获取更丰富的特征,从而恢复出更精确的图像。之后,输出模块利用融合后的第二解码特征得到分割结果图像。此时,输出模块的具体过程为:将融合后的第二解码特征连接1×1的卷积层,以得到一个2通道的 解码特征,可理解融合后的第二解码特征仅是将各第二解码特征合并在一起,而通过解码模块中的1×1卷积层,可以对融合后的第二解码特征进行进一步的解码,以参考各层第二解码特征后输出最终的解码特征,该解码特征为2通道,用于描述二分类的结果,即描述原始图像中各像素点为人像区域还是背景区域。之后,再将解码特征经过softmax函数和argmax函数后,得到分割结果图像。即输出模块由1×1卷积层和激活函数层组成。其中,激活函数层由softmax函数和argmax函数组成。其中,softmax函数处理完的数据可以理解为逻辑层的输出数据,即解读1×1卷积层输出的解码特征所表示的含义,得到逻辑层的描述。标签为one-hot形式时,argmax函数为得到输出结果的常用函数,即由argmax函数输出对应的分割结果图像。Since the image segmentation model ultimately needs to output a segmentation result image, after obtaining the second decoding feature, the output module integrates the second decoding features of each layer to obtain a segmentation result image (ie, a binary image). Exemplarily, the second decoding features of each layer are first fused (ie, concatenate), so that the output module can obtain more abundant features, thereby restoring a more accurate image. After that, the output module uses the fused second decoding feature to obtain the segmentation result image. At this time, the specific process of the output module is: connect the fused second decoding feature to a 1×1 convolutional layer to obtain a 2-channel decoding feature. It can be understood that the fused second decoding feature is only a The two decoding features are merged together, and through the 1×1 convolutional layer in the decoding module, the fused second decoding feature can be further decoded to output the final decoding feature after referring to the second decoding feature of each layer. The decoding feature has 2 channels, which is used to describe the result of the binary classification, that is, to describe whether each pixel in the original image is a portrait area or a background area. After that, after passing the decoded features through the softmax function and the argmax function, the segmentation result image is obtained. That is, the output module consists of a 1×1 convolutional layer and an activation function layer. Among them, the activation function layer consists of the softmax function and the argmax function. Among them, the data processed by the softmax function can be understood as the output data of the logic layer, that is, the meaning represented by the decoding features output by the 1×1 convolution layer is interpreted, and the description of the logic layer is obtained. When the label is in one-hot form, the argmax function is a common function to obtain the output result, that is, the corresponding segmentation result image is output by the argmax function.
举例而言,参考图3,将四个第二解码特征融合后输入至输出模块26,此时,先经过1×1的卷积层,得到2通道的解码特征(图3中记为Refine224×224),之后,经过激活函数层得到分割结果图像(图3中记为output224×224)。For example, referring to FIG. 3, the four second decoding features are fused and input to the output module 26. At this time, a 1×1 convolutional layer is first passed to obtain 2-channel decoding features (referred to as Refine224× in FIG. 3 ). 224), after that, the segmentation result image is obtained through the activation function layer (denoted as output224×224 in Figure 3).
可理解,图像分割模型输出的分割结果图像中各像素点的像素值为0或1,其中,像素值为0的像素点为背景区域的像素点,像素值为1的像素点为人像区域的像素点。为了便于分割结果图像的可视化,显示分割结果图像时,会对每个像素点的像素值乘255。例如,图5为本申请实施例提供的分割结果图像示意图,将图4所示的训练数据输入至图3所示的图像分割模型后得到分割结果图像,之后,将分割结果图像的各像素值乘255后,便可以得到图5所示的分割结果图像。It can be understood that the pixel value of each pixel in the segmentation result image output by the image segmentation model is 0 or 1, wherein, the pixel with a pixel value of 0 is a pixel in the background area, and a pixel with a pixel value of 1 is in the portrait area. pixel. In order to facilitate the visualization of the segmentation result image, when displaying the segmentation result image, the pixel value of each pixel is multiplied by 255. For example, FIG. 5 is a schematic diagram of a segmentation result image provided by an embodiment of the present application. After inputting the training data shown in FIG. 4 into the image segmentation model shown in FIG. 3, a segmentation result image is obtained. After multiplying by 255, the segmentation result image shown in Figure 5 can be obtained.
步骤238、将分辨率最高的第一解码特征输入至边缘模块,以得到边缘结果图像。Step 238: Input the first decoded feature with the highest resolution to the edge module to obtain an edge result image.
为了提高图像分割模型对人像区域和背景区域之间边缘的学习能力,实施例中,在图像分割模型中设置了边缘模块,以通过边缘模块对分辨率最高的第一解码特征进行额外监督,即起到正则约束的作用,以提高图像分割模型学习边缘的能力。边缘模块的具体结构实施例不做限定。实施例中,以边缘模块为一个1×1的卷积层为例进行描述。示例性的,将分辨率最高的第一解码特征输入至边缘模块后,可以得到一个通道数为2、分辨率与原始图像分辨率相同的边缘特征,通过该边缘特征可得到一张仅表述边缘的二值图像。实施例中,将表述边缘的二值图像记为边缘结果图像。可理解,边缘结果图像中各像素点的像素值为0或1,其中,像素值为1的像素点表示边缘所在的像素点,像素值为0的像素点表示非边缘所在的像素点。需说明,分辨率最高的第一解码特征拥有更丰富的细节信息,因此,通过分辨率最高的第一解码特征可一得到更为准确的边缘特征。In order to improve the learning ability of the image segmentation model for the edge between the portrait area and the background area, in the embodiment, an edge module is set in the image segmentation model to perform additional supervision on the first decoding feature with the highest resolution through the edge module, that is, It acts as a regularization constraint to improve the ability of the image segmentation model to learn edges. The specific structural embodiment of the edge module is not limited. In the embodiment, the edge module is taken as an example of a 1×1 convolutional layer for description. Exemplarily, after inputting the first decoded feature with the highest resolution into the edge module, an edge feature with 2 channels and the same resolution as the original image can be obtained, through which an edge feature that only expresses the edge can be obtained. the binary image. In the embodiment, the binary image expressing the edge is recorded as the edge result image. It can be understood that the pixel value of each pixel in the edge result image is 0 or 1, wherein, the pixel with the pixel value of 1 represents the pixel where the edge is located, and the pixel with the pixel value of 0 represents the pixel where the non-edge is located. It should be noted that the first decoded feature with the highest resolution has richer detailed information, therefore, more accurate edge features can be obtained through the first decoded feature with the highest resolution.
举例而言,如图3所示,最高层的第一解码特征RS Block112×112经过边缘模块27后,可以得到分辨率为224×224的边缘特征,图3中记为edge224×224。For example, as shown in FIG. 3 , after the first decoding feature RS Block112×112 of the highest layer passes through the edge module 27 , an edge feature with a resolution of 224×224 can be obtained, which is denoted as edge224×224 in FIG. 3 .
为了便于边缘结果图像的可视化,显示边缘结果图像时,会对每个像素点的像素值乘255。例如,图6为本申请实施例提供的边缘结果图像示意图,将图4所示的训练数据输入至图3所示的图像分割模型后得到边缘结果图像,之后,将边缘结果图像的各像素值乘255后,便可以得到图6所示边缘结果图像。In order to facilitate the visualization of the edge result image, the pixel value of each pixel is multiplied by 255 when the edge result image is displayed. For example, FIG. 6 is a schematic diagram of an edge result image provided by an embodiment of the present application. The training data shown in FIG. 4 is input into the image segmentation model shown in FIG. 3 to obtain an edge result image. After multiplying by 255, the edge result image shown in Figure 6 can be obtained.
可理解,除了归一化模块和编码模块外,其他的模块均可以认为是组成解码器的模块。It can be understood that, except for the normalization module and the encoding module, other modules can be considered as modules constituting the decoder.
步骤239、根据各第二解码特征、边缘结果图像、对应的分割标签图像和边缘标签图像构造损失函数,并根据损失函数更新图像分割模型的模型参数。Step 239: Construct a loss function according to each second decoding feature, the edge result image, the corresponding segmentation label image and the edge label image, and update the model parameters of the image segmentation model according to the loss function.
分割网络模型的损失函数由分割损失函数和边缘损失函数构成,其中,分割损失函数可以体现分割网络模型的分割能力,分割损失函数根据各层的第二解码特征和分割标签图像得到。此时,基于每层第二解码特征和分割标签图像可以得到一个子损失函数,各层子损失函数联合后可以得到分割损失函数。可以理解,各子损失函数的计算方式相同。一个实施例中,子损失函数通过Iou函数计算,Iou函数可被定义为:预测像素区域(即第二解码特征)和标签像素区域(即分割标签图像)交集的面积与并集的面积之比,即Iou函数可体现第二解码特征对应的二值图像和分割标签图像的重叠相似度,此时,通过Iou函数计算的子损失函数可以体现重叠相似度的损失。示例性的,边缘损失函数可体现分割网络模型学习边缘的能力,边缘损失函数通过边缘结果图像和边缘标签图像得到。一个实施例中,由于边缘的像素点在整幅原始图像的像素点中占比很低,因此,边缘损失函数采用Focal loss损失,Focal loss为一种常见的损失函数,其可以降低大量简单负样本在训练中所占的权重,也可理解为一种困难样本挖掘。The loss function of the segmentation network model is composed of a segmentation loss function and an edge loss function. The segmentation loss function can reflect the segmentation ability of the segmentation network model, and the segmentation loss function is obtained according to the second decoding feature of each layer and the segmentation label image. At this time, a sub-loss function can be obtained based on the second decoding feature of each layer and the segmentation label image, and the segmentation loss function can be obtained by combining the sub-loss functions of each layer. It can be understood that the calculation method of each sub-loss function is the same. In one embodiment, the sub-loss function is calculated by the Iou function, and the Iou function can be defined as: the ratio of the area of the intersection of the predicted pixel region (that is, the second decoding feature) and the label pixel region (that is, the segmented label image) to the area of the union. , that is, the Iou function can reflect the overlapping similarity between the binary image corresponding to the second decoding feature and the segmented label image, and at this time, the sub-loss function calculated by the Iou function can reflect the loss of overlapping similarity. Exemplarily, the edge loss function can reflect the ability of the segmentation network model to learn edges, and the edge loss function is obtained from the edge result image and the edge label image. In one embodiment, since the pixel points of the edge account for a very low proportion of the pixels of the entire original image, the edge loss function adopts the Focal loss loss, which is a common loss function, which can reduce a large number of simple negative effects. The weight of the sample in training can also be understood as a difficult sample mining.
示例性的,分割网络模型的损失函数表示为:
Figure PCTCN2020137858-appb-000001
Exemplarily, the loss function of the segmentation network model is expressed as:
Figure PCTCN2020137858-appb-000001
其中,Loss表示分割网络模型的损失函数,n表示第二解码特征对应的总层数,
Figure PCTCN2020137858-appb-000002
表示根据分辨率最高的第二解码特征和对应的分割标签图像计算得到的子损失函数,
Figure PCTCN2020137858-appb-000003
表示根据分辨率最低的第二解码特征和分割标签图像计算得到的子损失函数,
Figure PCTCN2020137858-appb-000004
Figure PCTCN2020137858-appb-000005
A n表示分辨率最低的第二解码特征,B表示对应的分割标签图像,Iou n表示A n和B的重叠相似度,loss edge为Focal loss损失函数。
Among them, Loss represents the loss function of the segmentation network model, n represents the total number of layers corresponding to the second decoding feature,
Figure PCTCN2020137858-appb-000002
represents the sub-loss function calculated from the second decoding feature with the highest resolution and the corresponding segmentation label image,
Figure PCTCN2020137858-appb-000003
represents the sub-loss function calculated from the second decoded feature with the lowest resolution and the segmented label image,
Figure PCTCN2020137858-appb-000004
Figure PCTCN2020137858-appb-000005
A n represents the second decoding feature with the lowest resolution, B represents the corresponding segmentation label image, Iou n represents the overlap similarity between A n and B, and loss edge is the Focal loss loss function.
示例性的,图像分割模型共有n层(n≥2),即第二解码特征共有n层,此时,根据n层第二解码特征和分割标签图像可以得到n个子损失函数。第一层的分辨率最高,其对应的子损失函数记为
Figure PCTCN2020137858-appb-000006
第二层的分辨率次高,其对应的子损失函数记为
Figure PCTCN2020137858-appb-000007
以此类推,第 n层的分辨率最低,其对应的子损失函数记为
Figure PCTCN2020137858-appb-000008
由于各子损失函数计算方式相同,因此,实施例中以第n层子损失函数为例进行描述。示例性的,
Figure PCTCN2020137858-appb-000009
Figure PCTCN2020137858-appb-000010
表示第n层Iou函数的损失。
Figure PCTCN2020137858-appb-000011
A n表示第n层的第二解码特征,B表示对应的分割标签图像,A n∩B表示A n和B的交集,A n∪B表示A n和B的并集,Iou n表示A n和B的重叠相似度,此时,
Figure PCTCN2020137858-appb-000012
表示重叠相似度的损失。可理解,第二解码特征对应的二值图像和分割标签图像越相似,对应的子损失函数越小,图像分割模型的分割能力越好,分割精确度越高。示例性的,loss edge表示边缘损失函数,实施例中,loss edge为Focal loss损失函数。loss edge(p t)=-α t(1-p t) γlog(p t)。其中,p t表示边缘结果图像中像素点为边缘的预测概率值,α t表示平衡权重系数,其用于平衡正负样本,γ表示调制系数,其用于控制难易分类样本的权重。α t和γ的值可以根据实际情况设定。根据边缘结果图像中各像素点的loss edge(p t)可以得到loss edge,其具体为将各像素点的loss edge(p t)相加后计算均值,并将计算得到的均值作为loss edge
Exemplarily, the image segmentation model has a total of n layers (n≥2), that is, there are n layers of second decoding features. At this time, n sub-loss functions can be obtained according to the second decoding features of the n layers and the segmentation label image. The first layer has the highest resolution, and its corresponding sub-loss function is recorded as
Figure PCTCN2020137858-appb-000006
The resolution of the second layer is the second highest, and its corresponding sub-loss function is recorded as
Figure PCTCN2020137858-appb-000007
By analogy, the resolution of the nth layer is the lowest, and its corresponding sub-loss function is recorded as
Figure PCTCN2020137858-appb-000008
Since each sub-loss function is calculated in the same manner, the embodiment takes the n-th layer sub-loss function as an example for description. Exemplary,
Figure PCTCN2020137858-appb-000009
which is
Figure PCTCN2020137858-appb-000010
Represents the loss of the nth layer Iou function.
Figure PCTCN2020137858-appb-000011
A n represents the second decoding feature of the nth layer, B represents the corresponding segmented label image, A n ∩ B represents the intersection of A n and B, A n ∪ B represents the union of A n and B, and Iou n represents A n and the overlapping similarity of B, at this time,
Figure PCTCN2020137858-appb-000012
Represents the loss of overlapping similarity. It can be understood that the more similar the binary image corresponding to the second decoding feature and the segmentation label image, the smaller the corresponding sub-loss function, the better the segmentation ability of the image segmentation model, and the higher the segmentation accuracy. Exemplarily, loss edge represents an edge loss function, and in an embodiment, loss edge is a Focal loss loss function. loss edge (p t )=-α t (1-p t ) γ log(p t ). Among them, p t represents the predicted probability value that the pixel in the edge result image is an edge, α t represents the balance weight coefficient, which is used to balance positive and negative samples, and γ represents the modulation coefficient, which is used to control the weight of difficult and easy-to-classify samples. The values of α t and γ can be set according to the actual situation. The loss edge can be obtained according to the loss edge (p t ) of each pixel in the edge result image. Specifically, the mean value is calculated after adding the loss edge (p t ) of each pixel point, and the calculated mean value is used as the loss edge .
得到损失函数后,便可以根据损失函数更新图像分割模型的模型参数,以使更新后的图像分割模型的性能更高。After the loss function is obtained, the model parameters of the image segmentation model can be updated according to the loss function, so that the performance of the updated image segmentation model is higher.
步骤2310、选择下一原始图像,并返回执行将原始图像输入至归一化模块的操作,直到损失函数收敛为止。Step 2310: Select the next original image, and return to perform the operation of inputting the original image to the normalization module until the loss function converges.
可理解,通过损失函数修改图像分割模型的模型参数后,可以认为一次训练结束,此时,再选择一个原始图像和对应的分割标签图像和边缘标签图像对图像分割模型进行训练,以再次计算损失函数并根据损失函数修改模型参数,经过多次训练后,若当前连续次数计算的损失函数的数值在预先设定的数值范围内,则说明损失函数收敛,即图像分割模型稳定。可理解,预先设定的数值范围的具体值可以根据实际情况设定。It can be understood that after modifying the model parameters of the image segmentation model through the loss function, it can be considered that one training is over. At this time, select an original image and the corresponding segmentation label image and edge label image to train the image segmentation model to calculate the loss again. function and modify the model parameters according to the loss function. After many times of training, if the value of the loss function calculated by the current consecutive times is within the preset value range, it means that the loss function converges, that is, the image segmentation model is stable. It can be understood that the specific value of the preset numerical range can be set according to the actual situation.
图像分割模型稳定后,确定训练结束,之后,便可以应用图像分割模型,以对视频数据中的人像进行分割。After the image segmentation model is stabilized, it is determined that the training is over, and then the image segmentation model can be applied to segment the portraits in the video data.
在上述实施例的基础上,根据训练数据集和标签数据集训练图像分割模型之后,还包括:在图像分割模型不是前向推理框架可识别的网络模型时,将图像分割模型转换成前向推理框架可识别的网络模型。On the basis of the above embodiment, after training the image segmentation model according to the training data set and the label data set, the method further includes: when the image segmentation model is not a network model recognizable by the forward inference framework, converting the image segmentation model into a forward inference model Framework-aware network models.
图像分割模型在对应的框架中进行训练,该框架通常为tensorflow、pytorch等框架,实施例中,以pytorch框架为例进行描述,pytorch框架主要用于模型的设计、训练和测试。由于图像分割模型应用在图像分割设备中实时运行,而pytorch框架占用的内存很大,若将pytorch框架下的图像分割模型运行在图像分割设备的某个应用程序中,会大大增加应用程序所占用的存储空间。同时,pytorch框架下运行图像分割模型时对图形处理器(Graphics Processing Unit,GPU)的依赖比较高,若图像分割设备中没有安装GPU,则图像分割模型会有较慢的处理速度。而前向推理框架一般针对特定平台(如嵌入式平台),不同的平台所拥有的硬件配置是不同的,而前向推理框架部署在平台时,可以结合平台的硬件配置,合理利用资源,进行优化加速,即前向推理框架在运行其内部部署的模型时可进行优化加速。前向推理模型主要用于模型的预测过程,其中,预测过程包括对模型的测试过程以及模型的预测过程(应用过程),而不包含模型的训练过程,并且,前向推理框架对GPU的依赖程度低,且较为轻便,不会使应用程序占用较大的存储空间。因此,在应用图像分割模型时,将图像分割模型运行在前向推理框架中。一个实施例中,在应用图像分割模型前,先确定图像分割模型是否运行前向推理框架中。若图像分割模型运行在前向推理框架中,则直接应用图像分割模型。若图像分割模型没有运行在前向推理框架中时,则图像分割模型转换成前向推理框架中可识别的网络模型。示例性的,前向推理框架的具体类型可以根据实际情况设定,例如,前向推理框架为openvino框架。此时,将pytorch框架下的图像分割模型转换成openvino框架下的图像分割模型时具体的手段可以为:利用现有的pytorch转换工具将图像分割模型转换为开放神经网络交换(Open Neural Network Exchange,ONNX)模型,再利用openvino转换工具将ONNX模型转换成openvino框架下的图像分割模型。其中,ONNX是一个用于表示深度学习模型的标准,可使模型在不同框架之间进行转移。The image segmentation model is trained in a corresponding framework, which is usually a framework such as tensorflow and pytorch. In the embodiment, the pytorch framework is used as an example for description. The pytorch framework is mainly used for model design, training and testing. Since the image segmentation model application runs in real-time in the image segmentation device, and the pytorch framework occupies a large amount of memory, if the image segmentation model under the pytorch framework is run in an application of the image segmentation device, it will greatly increase the occupied by the application. of storage space. At the same time, when running the image segmentation model under the pytorch framework, the dependence on the Graphics Processing Unit (GPU) is relatively high. If the GPU is not installed in the image segmentation device, the image segmentation model will have a slower processing speed. The forward inference framework is generally aimed at a specific platform (such as an embedded platform), and different platforms have different hardware configurations. When the forward inference framework is deployed on a platform, it can combine the hardware configuration of the platform and make reasonable use of resources. Optimization acceleration, that is, the forward inference framework can perform optimization acceleration when running its on-premise model. The forward inference model is mainly used for the prediction process of the model, wherein the prediction process includes the testing process of the model and the prediction process (application process) of the model, but does not include the training process of the model, and the forward inference framework relies on GPU It is low-level and lightweight, and does not make the application take up a large amount of storage space. Therefore, when applying an image segmentation model, run the image segmentation model in a forward inference framework. In one embodiment, before applying the image segmentation model, it is determined whether the image segmentation model is running in the forward inference framework. If the image segmentation model runs in the forward inference framework, the image segmentation model is directly applied. If the image segmentation model does not run in the forward inference framework, the image segmentation model is converted into a network model recognizable in the forward inference framework. Exemplarily, the specific type of the forward reasoning framework can be set according to the actual situation, for example, the forward reasoning framework is an openvino framework. At this time, the specific means to convert the image segmentation model under the pytorch framework into the image segmentation model under the openvino framework can be: using the existing pytorch conversion tool to convert the image segmentation model to the Open Neural Network Exchange (Open Neural Network Exchange, ONNX) model, and then use the openvino conversion tool to convert the ONNX model into an image segmentation model under the openvino framework. Among them, ONNX is a standard for representing deep learning models, which enables models to be transferred between different frameworks.
在上述实施例的基础上,图像分割模型的损失函数收敛之后,还包括:删除边缘模块。On the basis of the above embodiment, after the loss function of the image segmentation model converges, the method further includes: deleting the edge module.
可理解,训练过程中设置边缘模块的好处是:提高图像分割模型对边缘的学习能力,进而保证分割结果图像的精确度。而图像分割模型的应用过程中,由于仅需要输出第一分割图像,无需输出边缘结果图像,且图像分割模型已经具备边缘的学习能力,因此,在应用图像分割模型时,可以删除其中的边缘模块,即在应用图像分割模型时取消边缘模块的数据处理过程,以减小图像分割模型的数据处理量,提高了处理速度。It is understandable that the advantage of setting the edge module in the training process is to improve the learning ability of the image segmentation model for the edge, thereby ensuring the accuracy of the segmentation result image. In the application process of the image segmentation model, since only the first segmented image needs to be output, there is no need to output the edge result image, and the image segmentation model already has the ability to learn edges. Therefore, when applying the image segmentation model, the edge module can be deleted. , that is, cancel the data processing process of the edge module when applying the image segmentation model, so as to reduce the data processing amount of the image segmentation model and improve the processing speed.
上述,通过采集不同场景下的原始图像,可以避免基于视频数据逐帧采集原始图像时消耗大量工作量以及制作成本的问题,且不同场景下的原始图像之间的重复内容少,有利于提高图像分割模型的学习能力。图像分割模型的编码模块采用轻量级网络,可以减小编码时的数据处理量,同时,通过通道混淆模块可以在不明显增加计算量时将通道间的图像特征进行 混淆,以丰富通道中的特征信息,进而保证图像分割模型的精确度。并且,通过对混淆特征进行上采样并融合高一级分辨率的混淆特征,可以丰富不同分辨率下的细节特征,进一步保证图像分割模型的精确度。此外,通过使用融合特征以及第二解码特征可以实现各层特征进行重复利用和深度监督,提高了特征中包含信息的利用率,加强了信息的传递效率以及提高了标签数据监督的作用。通过设置边缘模块提高了图像分割模型对于边缘的学习能力,进一步保证图像分割模型的精确度,且在应用过程中,删除边缘模块,以减小图像分割模型的计算量。将图像分割模型转换成前向推理框架下的图像分割模型,可以降低图像分割模型对GPU的依赖程度低,且减小运行图像分割模型的应用程序占用的存储空间。经过训练的图像分割模型在应用过程中,无需人为先验或交互,能够准确将视频数据中的人像区域进行分割,经测试,在普通PC集成显卡环境下,视频数据中每帧图像的处理时间仅需20ms左右,能够实现实时自动人像分割。As mentioned above, by collecting original images in different scenarios, it is possible to avoid the problem of consuming a lot of work and production costs when capturing original images frame by frame based on video data, and there is less repetition of content between original images in different scenarios, which is conducive to improving the image quality. The learning ability of the segmentation model. The encoding module of the image segmentation model adopts a lightweight network, which can reduce the amount of data processing during encoding. At the same time, the channel confusion module can confuse the image features between channels without significantly increasing the amount of calculation, so as to enrich the channels in the channel. feature information to ensure the accuracy of the image segmentation model. Moreover, by up-sampling the confusing features and fusing the confusing features of a higher resolution, the detailed features at different resolutions can be enriched, and the accuracy of the image segmentation model can be further ensured. In addition, by using the fusion feature and the second decoding feature, the features of each layer can be reused and deeply supervised, which improves the utilization of the information contained in the features, enhances the efficiency of information transmission, and improves the role of label data supervision. By setting the edge module, the learning ability of the image segmentation model for the edge is improved, and the accuracy of the image segmentation model is further ensured. In the application process, the edge module is deleted to reduce the calculation amount of the image segmentation model. Converting the image segmentation model into an image segmentation model under the forward inference framework can reduce the low dependence of the image segmentation model on the GPU, and reduce the storage space occupied by the application running the image segmentation model. In the application process, the trained image segmentation model can accurately segment the portrait area in the video data without human prior or interaction. After testing, under the environment of ordinary PC integrated graphics card, the processing time of each frame of image in the video data It only takes about 20ms to realize real-time automatic portrait segmentation.
在上述实施例基础上,图像分割模型还包括:解码模块。相应的,步骤235之后,还包括:将分辨率最高的第一解码特征输入至解码模块,以得到对应的新的第一解码特征。Based on the above embodiments, the image segmentation model further includes: a decoding module. Correspondingly, after step 235, the method further includes: inputting the first decoding feature with the highest resolution to the decoding module to obtain a corresponding new first decoding feature.
图7为本申请实施例提供的另一种图像分割模型的结构示意图。相比于图3所示的图像分割模型,图7中所示的图像分割模型中还包括解码模块28。FIG. 7 is a schematic structural diagram of another image segmentation model provided by an embodiment of the present application. Compared with the image segmentation model shown in FIG. 3 , the image segmentation model shown in FIG. 7 further includes a decoding module 28 .
示例性的,通过残差模块得到分辨率最高的第一解码特征后,再经过一个解码模块,以对分辨率最高的第一解码特征进行进一步的解码,即得到新的第一解码特征,此时,新的第一解码特征可以认为是分辨率最高的层级最终得到的第一解码特征,之后,将新的第一解码特征输入至分辨率最高的层级中设置的多倍上采样模块和边缘模块,可理解,新的第一解码特征的通道数和分辨率与原的第一解码特征的通道数和分辨率相同。例如,图7中经过解码模块28后的第一解码特征记为Refine112×112,其分辨率与RS Block112×112的分辨率相同。一个实施例中,解码模块为卷积网络,其卷积层的数量以及结构实施例不作限定。通过解码模块可以提高最高层第一解码特征的精确度,进而提高图像分割模型的精确度。需说明,对于第一解码特征而言,分辨率越低,其拥有的语义特征越高级,分辨率越高,其拥有的细节特征越丰富。对于分辨率最高的第一解码特征而言,直接对其上采样会存在锯齿现象,即细节特征出现锯齿现象,因此,为其添加一解码模块,以使最终得到的新的第一解码特征过渡的更加均匀,避免锯齿现象的出现。而其他层包含的第一解码特征进行上采样后基本不会出现锯齿现象,即使为其设置解码模块,对图像分割模型的精确度影响不大,因此,无需为其他层设置解码模块。可理解,实际应用中,若其他层的第一解码特征进行上采样后出现锯齿现象,也可以为其设置解码模块,以提高图像分割模型的精确度。Exemplarily, after the first decoding feature with the highest resolution is obtained through the residual module, a decoding module is passed to further decode the first decoding feature with the highest resolution, that is, a new first decoding feature is obtained. , the new first decoding feature can be considered as the first decoding feature finally obtained by the highest resolution level, and then the new first decoding feature is input to the multiple upsampling module and edge set in the highest resolution level Module, it can be understood that the number of channels and the resolution of the new first decoding feature are the same as the number of channels and the resolution of the original first decoding feature. For example, the first decoding feature after the decoding module 28 in FIG. 7 is denoted as Refine112×112, and its resolution is the same as that of RS Block112×112. In one embodiment, the decoding module is a convolutional network, and the number and structure of the convolutional layers are not limited. Through the decoding module, the accuracy of the first decoding feature of the highest layer can be improved, thereby improving the accuracy of the image segmentation model. It should be noted that, for the first decoding feature, the lower the resolution, the more advanced the semantic feature it has, and the higher the resolution, the richer the detailed feature it has. For the first decoding feature with the highest resolution, there will be sawtooth phenomenon when directly up-sampling it, that is, the detail feature will appear sawtooth phenomenon. Therefore, a decoding module is added to it, so as to make the final obtained new first decoding feature transition. It is more uniform and avoids the appearance of jaggedness. The first decoding features included in other layers basically do not appear aliasing after up-sampling. Even if a decoding module is set for it, the accuracy of the image segmentation model will not be affected. Therefore, there is no need to set a decoding module for other layers. It can be understood that, in practical applications, if the aliasing phenomenon occurs after the up-sampling of the first decoding features of other layers, a decoding module may also be set for them, so as to improve the accuracy of the image segmentation model.
可理解,上述各图像分割方法中,以目标对象为人类进行描述,实际应用中,目标对象 还可以为其他任何物体。It can be understood that in the above image segmentation methods, the target object is described as a human being, and in practical applications, the target object can also be any other object.
图8为本申请实施例提供的一种图像分割装置的结构示意图。参考图8,该图像分割装置包括:数据获取模块301、第一分割模块302、第二分割模块303以及重复分割模块304。FIG. 8 is a schematic structural diagram of an image segmentation apparatus provided by an embodiment of the present application. Referring to FIG. 8 , the image segmentation apparatus includes: a data acquisition module 301 , a first segmentation module 302 , a second segmentation module 303 and a repeated segmentation module 304 .
其中,数据获取模块301,用于获取视频数据中的当前帧图像,该视频数据中显示有目标对象;第一分割模块302,用于将该当前帧图像输入至训练好的图像分割模型,以得到基于该目标对象的第一分割图像;第二分割模块303,用于对该第一分割图像进行平滑处理,以得到基于该目标对象的第二分割图像;重复分割模块304,用于将该视频数据中下一帧图像作为当前帧图像,并返回执行将该当前帧图像输入至训练好的图像分割模型的操作,直到该视频数据中每帧图像均得到对应的第二分割图像为止。Among them, the data acquisition module 301 is used to acquire the current frame image in the video data, and the target object is displayed in the video data; the first segmentation module 302 is used to input the current frame image into the trained image segmentation model, to Obtain the first segmented image based on the target object; the second segmentation module 303 is used for smoothing the first segmented image to obtain the second segmented image based on the target object; the repeating segmentation module 304 is used for the The next frame image in the video data is taken as the current frame image, and the operation of inputting the current frame image into the trained image segmentation model is returned to execute until each frame image in the video data obtains a corresponding second segmentation image.
在上述实施例的基础上,还包括:训练获取模块,用于获取训练数据集,该训练数据集包含多张原始图像;标签构建模块,用于根据该训练数据集构建标签数据集,该标签数据集包含多张分割标签图像和多张边缘标签图像,一张该原始图像对应一张分割标签图像和一张边缘标签图像;模型训练模块,用于根据该训练数据集和该标签数据集训练图像分割模型。On the basis of the above-mentioned embodiment, it also includes: a training acquisition module for acquiring a training data set, the training data set includes a plurality of original images; a label construction module for constructing a label data set according to the training data set, the label The dataset contains multiple segmentation label images and multiple edge label images, one of the original images corresponds to one segmented label image and one edge label image; the model training module is used for training according to the training dataset and the label dataset Image segmentation model.
在上述实施例的基础上,以图3所示的图像分割模型为例,图像分割模型包括:归一化模块21、编码模块22、通道混淆模块23、残差模块24、多倍上采样模块25、输出模块26以及边缘模块27。此时,上述模型训练模块包括:归一化单元,用于将该原始图像输入至该归一化模块21,以得到归一化图像;编码单元,用于利用该编码模块22得到该归一化图像的多层图像特征,且每层图像特征的分辨率不同;通道混淆单元,用于分别将各层图像特征输入至对应的通道混淆模块23,以得到多层混淆特征,每层该图像特征对应一个通道混淆模块23;融合单元,用于除分辨率最高的混淆特征外,将其他每层的混淆特征进行上采样,并与高一级分辨率的混淆特征进行融合以得到高一级分辨率对应的融合特征,如图3中除最高层的混淆特征外,其他每层的混淆特征进行上采样后与上一层混淆特征进行融合,以得到上一层的融合特征;残差单元,用于分别将各层融合特征输入至对应的残差模块24,以得到多层第一解码特征,每层融合特征对应一个残差模块24,分辨率最低的混淆特征作为分辨率最低的第一解码特征;多倍上采样单元,用于分别将各层第一解码特征输入至对应的多倍上采样模块25,以得到多层第二解码特征,每层第一解码特征对应一个多倍上采样模块,各第二解码特征与原始图像的分辨率相同;分割输出单元,用于将多层第二解码特征联合后输入至该输出模块26,以得到分割结果图像;边缘输出单元,用于将分辨率最高的第一解码特征输入至该边缘模块27,以得到边缘结果图像;参数更新单元,用于根据各该第二解码特征、边缘结果图像、对应的分割标签图像和边缘标签图像构造损失函数,并根据该损失函数更新该 图像分割模型的模型参数;图像选择单元,用于选择下一原始图像,并返回执行将该原始图像输入至该归一化模块的操作,直到该损失函数收敛为止。On the basis of the above embodiment, taking the image segmentation model shown in FIG. 3 as an example, the image segmentation model includes: a normalization module 21, an encoding module 22, a channel confusion module 23, a residual module 24, and a multiple upsampling module 25 , an output module 26 and an edge module 27 . At this time, the above model training module includes: a normalization unit for inputting the original image into the normalization module 21 to obtain a normalized image; an encoding unit for obtaining the normalized image by using the encoding module 22 The multi-layer image features of the transformed image, and the resolution of each layer of image features is different; the channel confusion unit is used to input the image features of each layer into the corresponding channel confusion module 23 respectively, so as to obtain multi-layer confusion features, each layer of the image The feature corresponds to a channel confusion module 23; the fusion unit is used to upsample the confusion features of each layer except the confusion features with the highest resolution, and fuse them with the confusion features of a higher resolution to obtain a higher resolution. The fusion feature corresponding to the resolution, as shown in Figure 3, except for the confusion feature of the highest layer, the confusion feature of each other layer is upsampled and then fused with the confusion feature of the previous layer to obtain the fusion feature of the upper layer; the residual unit , which is used to input the fusion features of each layer into the corresponding residual module 24 respectively, so as to obtain the multi-layer first decoding feature, each layer of fusion features corresponds to a residual module 24, and the confusion feature with the lowest resolution is regarded as the first decoding feature with the lowest resolution. a decoding feature; a multiple upsampling unit for inputting the first decoding feature of each layer to the corresponding multiple upsampling module 25 respectively, so as to obtain the second decoding feature of multiple layers, and the first decoding feature of each layer corresponds to a multiple The upsampling module, each second decoding feature has the same resolution as the original image; the segmentation output unit is used to combine the multi-layer second decoding features and input to the output module 26 to obtain the segmentation result image; the edge output unit, with In inputting the first decoding feature with the highest resolution to this edge module 27, to obtain an edge result image; a parameter updating unit, for each of the second decoding features, edge result image, corresponding segmentation label image and edge label image Construct a loss function, and update the model parameters of the image segmentation model according to the loss function; the image selection unit is used to select the next original image, and returns to perform the operation of inputting the original image to the normalization module until the loss until the function converges.
在上述实施例的基础上,参考图7,该图像分割模型还包括:解码模块28。相应的,上述模型训练模块还包括:解码单元,用于分别将各层该融合特征输入至对应的残差模块24,以得到多层第一解码特征之后,将分辨率最高的第一解码特征输入至该解码模块28,以得到对应的新的第一解码特征。On the basis of the above embodiment, referring to FIG. 7 , the image segmentation model further includes: a decoding module 28 . Correspondingly, the above-mentioned model training module also includes: a decoding unit, which is used to input the fusion features of each layer into the corresponding residual module 24 respectively, so as to obtain the multi-layer first decoding features, and then convert the first decoding features with the highest resolution. Input to the decoding module 28 to obtain the corresponding new first decoding feature.
在上述实施例的基础上,该编码模块包括MobileNetV2网络。On the basis of the above embodiment, the encoding module includes the MobileNetV2 network.
在上述实施例的基础上,该损失函数表示为:
Figure PCTCN2020137858-appb-000013
其中,Loss表示损失函数,n表示第二解码特征对应的总层数,
Figure PCTCN2020137858-appb-000014
表示根据分辨率最高的第二解码特征和对应的分割标签图像计算得到的子损失函数,
Figure PCTCN2020137858-appb-000015
表示根据分辨率最低的第二解码特征和该分割标签图像计算得到的子损失函数,
Figure PCTCN2020137858-appb-000016
Figure PCTCN2020137858-appb-000017
A n表示分辨率最低的第二解码特征,B表示对应的分割标签图像,Iou n表示A n和B的重叠相似度,loss edge为Focal loss损失函数。
On the basis of the above embodiment, the loss function is expressed as:
Figure PCTCN2020137858-appb-000013
Among them, Loss represents the loss function, n represents the total number of layers corresponding to the second decoding feature,
Figure PCTCN2020137858-appb-000014
represents the sub-loss function calculated from the second decoding feature with the highest resolution and the corresponding segmentation label image,
Figure PCTCN2020137858-appb-000015
represents the sub-loss function calculated from the second decoding feature with the lowest resolution and the segmented label image,
Figure PCTCN2020137858-appb-000016
Figure PCTCN2020137858-appb-000017
A n represents the second decoding feature with the lowest resolution, B represents the corresponding segmentation label image, Iou n represents the overlap similarity between A n and B, and loss edge is the Focal loss loss function.
在上述实施例的基础上,还包括:边缘删除模块,用于图像分割模型的损失函数收敛之后,还包括:删除该边缘模块。On the basis of the above-mentioned embodiment, an edge deletion module is further included, and after the loss function of the image segmentation model converges, the edge deletion module is further included.
在上述实施例的基础上,还包括:框架转换模块,用于根据该训练数据集和该标签数据集训练图像分割模型之后,在该图像分割模型不是前向推理框架可识别的网络模型时,将该图像分割模型转换成该前向推理框架可识别的网络模型。On the basis of the above-mentioned embodiment, it also includes: a frame conversion module, after training the image segmentation model according to the training data set and the label data set, when the image segmentation model is not a network model identifiable by the forward inference framework, Transform the image segmentation model into a network model recognizable by the forward inference framework.
在上述实施例的基础上,标签构建模块包括:标注获取单元,用于获取针对该原始图像的标注结果;分割标签获取单元,用于根据该标注结果得到对应的分割标签图像;腐蚀单元,用于对该分割标签图像进行腐蚀操作,以得到腐蚀图像;布尔单元,用于对该分割标签图像和该腐蚀图像进行布尔操作,以得到该原始图像对应的边缘标签图像;数据集构建单元,用于根据该分割标签图像和该边缘标签图像得到标签数据集。On the basis of the above-mentioned embodiment, the label construction module includes: a label acquisition unit, used for obtaining the labeling result for the original image; a segmentation label obtaining unit, used for obtaining a corresponding segmented label image according to the labeling result; an erosion unit, using is used to perform the erosion operation on the segmented label image to obtain the eroded image; the Boolean unit is used to perform the Boolean operation on the segmented label image and the eroded image to obtain the edge label image corresponding to the original image; the data set construction unit is used for to obtain a label dataset according to the segmented label image and the edge label image.
在上述实施例的基础上,还包括:目标背景获取模块,用于对该第一分割图像进行平滑处理,以得到基于该目标对象的第二分割图像之后,获取目标背景图像,该目标背景图像中包含目标背景;背景替换模块,用于根据该目标背景图像和该第二分割图像对该当前帧图像进行背景替换,以得到当前帧新图像。On the basis of the above-mentioned embodiment, the method further includes: a target background acquisition module, configured to perform smoothing processing on the first segmented image to obtain a target background image after obtaining a second segmented image based on the target object. It contains a target background; a background replacement module is used to replace the background of the current frame image according to the target background image and the second divided image, so as to obtain a new image of the current frame.
上述提供的图像分割装置可用于执行上述任意实施例提供的图像分割方法,具备相应的 功能和有益效果。The image segmentation device provided above can be used to execute the image segmentation method provided by any of the above embodiments, and has corresponding functions and beneficial effects.
值得注意的是,上述图像分割装置的实施例中,所包括的各个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,各功能单元的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。It is worth noting that, in the above embodiments of the image segmentation apparatus, the units and modules included are only divided according to functional logic, but are not limited to the above division, as long as the corresponding functions can be realized; The specific names of the functional units are only for the convenience of distinguishing from each other, and are not used to limit the protection scope of the present application.
图9为本申请实施例提供的一种图像分割设备的结构示意图。如图9所示,该图像分割设备包括处理器40、存储器41、输入装置42、输出装置44;图像分割设备中处理器40的数量可以是一个或多个,图9中以一个处理器40为例。图像分割设备中处理器40、存储器41、输入装置42、输出装置43可以通过总线或其他方式连接,图9中以通过总线连接为例。FIG. 9 is a schematic structural diagram of an image segmentation device provided by an embodiment of the present application. As shown in FIG. 9 , the image segmentation device includes a processor 40, a memory 41, an input device 42, and an output device 44; the number of processors 40 in the image segmentation device may be one or more, and one processor 40 is used in FIG. 9 . For example. The processor 40 , the memory 41 , the input device 42 , and the output device 43 in the image segmentation device may be connected by a bus or in other ways, and the connection by a bus is taken as an example in FIG. 9 .
存储器41作为一种计算机可读存储介质,可用于存储软件程序、计算机可执行程序以及模块,如本申请实施例中的图像分割方法对应的程序指令/模块(例如,图像分割装置中的数据获取模块301、第一分割模块302、第二分割模块303以及重复分割模块304)。处理器40通过运行存储在存储器41中的软件程序、指令以及模块,从而执行图像分割设备的各种功能应用以及数据处理,即实现上述的图像分割方法。As a computer-readable storage medium, the memory 41 can be used to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the image segmentation method in the embodiments of the present application (for example, data acquisition in the image segmentation device). module 301, a first segmentation module 302, a second segmentation module 303, and a repeated segmentation module 304). The processor 40 executes various functional applications and data processing of the image segmentation device by running the software programs, instructions and modules stored in the memory 41 , that is, to implement the above-mentioned image segmentation method.
存储器41可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序;存储数据区可存储根据图像分割设备的使用所创建的数据等。此外,存储器41可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实例中,存储器41可进一步包括相对于处理器40远程设置的存储器,这些远程存储器可以通过网络连接至图像分割设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 41 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the image dividing apparatus, and the like. In addition, memory 41 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some examples, the memory 41 may further include memory located remotely from the processor 40, and these remote memories may be connected to the image segmentation apparatus through a network. Examples of such networks include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
输入装置42可用于接收输入的数字或字符信息,以及产生与图像分割设备的用户设置以及功能控制有关的键信号输入。输出装置43可包括显示屏等显示设备。The input device 42 may be used to receive input numerical or character information, and to generate key signal input related to user settings and function control of the image segmentation apparatus. The output device 43 may include a display device such as a display screen.
上述图像分割设备包含图像分割装置,可以用于执行任意图像分割方法,具备相应的功能和有益效果。The above-mentioned image segmentation device includes an image segmentation device, which can be used to execute any image segmentation method, and has corresponding functions and beneficial effects.
此外,本申请实施例还提供一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行本申请任意实施例所提供的图像分割方法中的相关操作,且具备相应的功能和有益效果。In addition, the embodiments of the present application also provide a storage medium containing computer-executable instructions, when the computer-executable instructions are executed by a computer processor, for performing relevant operations in the image segmentation method provided by any embodiment of the present application , and has corresponding functions and beneficial effects.
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。As will be appreciated by those skilled in the art, the embodiments of the present application may be provided as a method, a system, or a computer program product.
因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施 例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein. The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram. These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams. These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。存储器可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。存储器是计算机可读介质的示例。In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory. Memory may include non-persistent memory in computer readable media, random access memory (RAM) and/or non-volatile memory in the form of, for example, read only memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Computer-readable media includes both persistent and non-permanent, removable and non-removable media, and storage of information may be implemented by any method or technology. Information may be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer-readable media does not include transitory computer-readable media, such as modulated data signals and carrier waves.
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括要素的过程、方 法、商品或者设备中还存在另外的相同要素。It should also be noted that the terms "comprising", "comprising" or any other variation thereof are intended to encompass a non-exclusive inclusion such that a process, method, article or device comprising a series of elements includes not only those elements, but also Other elements not expressly listed, or which are inherent to such a process, method, article of manufacture, or apparatus are also included. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, method, article of manufacture or apparatus that includes the element.
注意,上述仅为本申请的较佳实施例及所运用技术原理。本领域技术人员会理解,本申请不限于这里所述的特定实施例,对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本申请的保护范围。因此,虽然通过以上实施例对本申请进行了较为详细的说明,但是本申请不仅仅限于以上实施例,在不脱离本申请构思的情况下,还可以包括更多其他等效实施例,而本申请的范围由所附的权利要求范围决定。Note that the above are only preferred embodiments of the present application and applied technical principles. Those skilled in the art will understand that the present application is not limited to the specific embodiments described herein, and various obvious changes, readjustments and substitutions can be made by those skilled in the art without departing from the protection scope of the present application. Therefore, although the present application has been described in detail through the above embodiments, the present application is not limited to the above embodiments, and can also include more other equivalent embodiments without departing from the concept of the present application. The scope is determined by the scope of the appended claims.

Claims (13)

  1. 一种图像分割方法,其中,包括:An image segmentation method, comprising:
    获取视频数据中的当前帧图像,所述视频数据中显示有目标对象;Obtain the current frame image in the video data, and the target object is displayed in the video data;
    将所述当前帧图像输入至训练好的图像分割模型,以得到基于所述目标对象的第一分割图像;The current frame image is input into the trained image segmentation model to obtain the first segmented image based on the target object;
    对所述第一分割图像进行平滑处理,以得到基于所述目标对象的第二分割图像;smoothing the first segmented image to obtain a second segmented image based on the target object;
    将所述视频数据中下一帧图像作为当前帧图像,并返回执行将所述当前帧图像输入至训练好的图像分割模型的操作,直到所述视频数据中每帧图像均得到对应的第二分割图像为止。Take the next frame image in the video data as the current frame image, and return to perform the operation of inputting the current frame image into the trained image segmentation model, until each frame image in the video data obtains the corresponding second image. until the image is divided.
  2. 根据权利要求1所述的图像分割方法,其中,还包括:The image segmentation method according to claim 1, wherein, further comprising:
    获取训练数据集,所述训练数据集包含多张原始图像;Obtain a training data set, the training data set includes a plurality of original images;
    根据所述训练数据集构建标签数据集,所述标签数据集包含多张分割标签图像和多张边缘标签图像,一张所述原始图像对应一张分割标签图像和一张边缘标签图像;Build a label data set according to the training data set, the label data set includes a plurality of segmentation label images and a plurality of edge label images, and one of the original images corresponds to a segmentation label image and an edge label image;
    根据所述训练数据集和所述标签数据集训练所述图像分割模型。The image segmentation model is trained based on the training dataset and the label dataset.
  3. 根据权利要求2所述的图像分割方法,其中,所述图像分割模型包括:归一化模块、编码模块、通道混淆模块、残差模块、多倍上采样模块、输出模块以及边缘模块;The image segmentation method according to claim 2, wherein the image segmentation model comprises: a normalization module, an encoding module, a channel confusion module, a residual module, a multiple upsampling module, an output module and an edge module;
    所述根据所述训练数据集和所述标签数据集训练所述图像分割模型包括:The training of the image segmentation model according to the training data set and the label data set includes:
    将所述原始图像输入至所述归一化模块,以得到归一化图像;Inputting the original image to the normalization module to obtain a normalized image;
    利用所述编码模块得到所述归一化图像的多层图像特征,且每层所述图像特征的分辨率不同;Using the encoding module to obtain the multi-layer image features of the normalized image, and the resolution of the image features of each layer is different;
    分别将各层所述图像特征输入至对应的通道混淆模块,以得到多层混淆特征,每层所述图像特征对应一个通道混淆模块;The image features of each layer are respectively input to the corresponding channel confusion module to obtain multi-layer confusion features, and each layer of the image features corresponds to a channel confusion module;
    除分辨率最高的混淆特征外,将其他每层的混淆特征进行上采样,并与高一级分辨率的混淆特征进行融合以得到高一级分辨率对应的融合特征;In addition to the confusion features with the highest resolution, the confusion features of each other layer are up-sampled, and fused with the confusion features of the higher resolution to obtain the fusion features corresponding to the higher resolution;
    分别将各层所述融合特征输入至对应的残差模块,以得到多层第一解码特征,每层所述融合特征对应一个残差模块,分辨率最低的混淆特征作为分辨率最低的第一解码特征;The fusion features of each layer are respectively input into the corresponding residual modules to obtain multi-layer first decoding features, each layer of the fusion features corresponds to a residual module, and the confusing feature with the lowest resolution is used as the first decoding feature with the lowest resolution. decoding features;
    分别将各层所述第一解码特征输入至对应的多倍上采样模块,以得到多层第二解码特征,每层所述第一解码特征对应一个多倍上采样模块,各所述第二解码特征与所述原始图像的分辨率相同;The first decoding features of each layer are respectively input to the corresponding multiple upsampling modules to obtain multi-layer second decoding features, the first decoding features of each layer correspond to a multiple upsampling module, and each second decoding feature The decoded feature is the same resolution as the original image;
    将多层所述第二解码特征联合后输入至所述输出模块,以得到分割结果图像;Combining multiple layers of the second decoding features and inputting them to the output module to obtain a segmentation result image;
    将分辨率最高的第一解码特征输入至所述边缘模块,以得到边缘结果图像;Inputting the first decoded feature with the highest resolution to the edge module to obtain an edge result image;
    根据各所述第二解码特征、边缘结果图像、对应的分割标签图像和边缘标签图像构造损失函数,并根据所述损失函数更新所述图像分割模型的模型参数;Construct a loss function according to each of the second decoding features, the edge result image, the corresponding segmentation label image and the edge label image, and update the model parameters of the image segmentation model according to the loss function;
    选择下一原始图像,并返回执行将所述原始图像输入至所述归一化模块的操作,直到所述损失函数收敛为止。The next original image is selected, and the operation of inputting the original image to the normalization module is performed back until the loss function converges.
  4. 根据权利要求3所述的图像分割方法,其中,所述图像分割模型还包括:解码模块;The image segmentation method according to claim 3, wherein the image segmentation model further comprises: a decoding module;
    所述分别将各层所述融合特征输入至对应的残差模块,以得到多层第一解码特征之后,还包括:After inputting the fusion features of each layer into the corresponding residual modules to obtain the multi-layer first decoding features, the method further includes:
    将分辨率最高的第一解码特征输入至所述解码模块,以得到对应的新的第一解码特征。The first decoding feature with the highest resolution is input to the decoding module to obtain a corresponding new first decoding feature.
  5. 根据权利要求3所述的图像分割方法,其中,所述编码模块包括MobileNetV2网络。The image segmentation method according to claim 3, wherein the encoding module comprises a MobileNetV2 network.
  6. 根据权利要求3所述的图像分割方法,其中,所述损失函数表示为:
    Figure PCTCN2020137858-appb-100001
    The image segmentation method according to claim 3, wherein the loss function is expressed as:
    Figure PCTCN2020137858-appb-100001
    其中,Loss表示所述损失函数,n表示所述第二解码特征对应的总层数,
    Figure PCTCN2020137858-appb-100002
    表示根据分辨率最高的第二解码特征和对应的分割标签图像计算得到的子损失函数,
    Figure PCTCN2020137858-appb-100003
    表示根据分辨率最低的第二解码特征和所述分割标签图像计算得到的子损失函数,
    Figure PCTCN2020137858-appb-100004
    Figure PCTCN2020137858-appb-100005
    A n表示分辨率最低的第二解码特征,B表示对应的分割标签图像,Iou n表示A n和B的重叠相似度,loss edge为Focal loss损失函数。
    Among them, Loss represents the loss function, n represents the total number of layers corresponding to the second decoding feature,
    Figure PCTCN2020137858-appb-100002
    represents the sub-loss function calculated from the second decoding feature with the highest resolution and the corresponding segmentation label image,
    Figure PCTCN2020137858-appb-100003
    represents the sub-loss function calculated according to the second decoding feature with the lowest resolution and the segmented label image,
    Figure PCTCN2020137858-appb-100004
    Figure PCTCN2020137858-appb-100005
    A n represents the second decoding feature with the lowest resolution, B represents the corresponding segmentation label image, Iou n represents the overlap similarity between A n and B, and loss edge is the Focal loss loss function.
  7. 根据权利要求3所述的图像分割方法,其中,所述图像分割模型的损失函数收敛之后,还包括:The image segmentation method according to claim 3, wherein after the loss function of the image segmentation model converges, the method further comprises:
    删除所述边缘模块。Delete the edge module.
  8. 根据权利要求2所述的图像分割方法,其中,所述根据所述训练数据集和所述标签数据集训练图像分割模型之后,还包括:The image segmentation method according to claim 2, wherein after the image segmentation model is trained according to the training data set and the label data set, the method further comprises:
    在所述图像分割模型不是前向推理框架可识别的网络模型时,将所述图像分割模型转换成所述前向推理框架可识别的网络模型。When the image segmentation model is not a network model recognizable by the forward inference framework, the image segmentation model is converted into a network model recognizable by the forward inference framework.
  9. 根据权利要求2所述的图像分割方法,其中,所述根据所述训练数据集构建标签数据集包括:The image segmentation method according to claim 2, wherein the constructing a label data set according to the training data set comprises:
    获取针对所述原始图像的标注结果;obtaining an annotation result for the original image;
    根据所述标注结果得到对应的分割标签图像;Obtain a corresponding segmented label image according to the labeling result;
    对所述分割标签图像进行腐蚀操作,以得到腐蚀图像;performing an erosion operation on the segmented label image to obtain an eroded image;
    对所述分割标签图像和所述腐蚀图像进行布尔操作,以得到所述原始图像对应的边缘标签图像;performing a Boolean operation on the segmented label image and the eroded image to obtain an edge label image corresponding to the original image;
    根据所述分割标签图像和所述边缘标签图像得到标签数据集。A label dataset is obtained according to the segmented label image and the edge label image.
  10. 根据权利要求1所述的图像分割方法,其中,所述对所述第一分割图像进行平滑处理,以得到基于所述目标对象的第二分割图像之后,还包括:The image segmentation method according to claim 1, wherein after performing smoothing processing on the first segmented image to obtain a second segmented image based on the target object, the method further comprises:
    获取目标背景图像,所述目标背景图像中包含目标背景;obtaining a target background image, where the target background image includes the target background;
    根据所述目标背景图像和所述第二分割图像对所述当前帧图像进行背景替换,以得到当前帧新图像。Background replacement is performed on the current frame image according to the target background image and the second segmented image to obtain a new image of the current frame.
  11. 一种图像分割装置,其中,包括:An image segmentation device, comprising:
    数据获取模块,用于获取视频数据中的当前帧图像,所述视频数据中显示有目标对象;a data acquisition module for acquiring the current frame image in the video data, where the target object is displayed;
    第一分割模块,用于将所述当前帧图像输入至训练好的图像分割模型,以得到基于所述目标对象的第一分割图像;a first segmentation module, for inputting the current frame image into a trained image segmentation model to obtain a first segmented image based on the target object;
    第二分割模块,用于对所述第一分割图像进行平滑处理,以得到基于所述目标对象的第二分割图像;a second segmentation module, configured to perform smoothing processing on the first segmented image to obtain a second segmented image based on the target object;
    重复分割模块,用于将所述视频数据中下一帧图像作为当前帧图像,并返回执行将所述当前帧图像输入至训练好的图像分割模型的操作,直到所述视频数据中每帧图像均得到对应的第二分割图像为止。The repeated segmentation module is used for taking the next frame image in the video data as the current frame image, and returning to perform the operation of inputting the current frame image into the trained image segmentation model, until each frame image in the video data until the corresponding second segmented image is obtained.
  12. 一种图像分割设备,其中,包括:An image segmentation device, comprising:
    一个或多个处理器one or more processors
    存储器,用于存储一个或多个程序;memory for storing one or more programs;
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-10中任一所述的图像分割方法。When the one or more programs are executed by the one or more processors, the one or more processors implement the image segmentation method according to any one of claims 1-10.
  13. 一种计算机可读存储介质,其上存储有计算机程序,其中,该程序被处理器执行时实现如权利要求1-10中任一所述的图像分割方法。A computer-readable storage medium on which a computer program is stored, wherein when the program is executed by a processor, the image segmentation method according to any one of claims 1-10 is implemented.
PCT/CN2020/137858 2020-12-21 2020-12-21 Image segmentation method and apparatus, and device and storage medium WO2022133627A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202080099096.5A CN115349139A (en) 2020-12-21 2020-12-21 Image segmentation method, device, equipment and storage medium
PCT/CN2020/137858 WO2022133627A1 (en) 2020-12-21 2020-12-21 Image segmentation method and apparatus, and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/137858 WO2022133627A1 (en) 2020-12-21 2020-12-21 Image segmentation method and apparatus, and device and storage medium

Publications (1)

Publication Number Publication Date
WO2022133627A1 true WO2022133627A1 (en) 2022-06-30

Family

ID=82157066

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/137858 WO2022133627A1 (en) 2020-12-21 2020-12-21 Image segmentation method and apparatus, and device and storage medium

Country Status (2)

Country Link
CN (1) CN115349139A (en)
WO (1) WO2022133627A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114882076A (en) * 2022-07-11 2022-08-09 中国人民解放军国防科技大学 Lightweight video object segmentation method based on big data memory storage
CN115277452A (en) * 2022-07-01 2022-11-01 中铁第四勘察设计院集团有限公司 ResNet self-adaptive acceleration calculation method based on edge-end cooperation and application
CN116189194A (en) * 2023-04-27 2023-05-30 北京中昌工程咨询有限公司 Drawing enhancement segmentation method for engineering modeling
CN117237397A (en) * 2023-07-13 2023-12-15 天翼爱音乐文化科技有限公司 Portrait segmentation method, system, equipment and storage medium based on feature fusion

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116824308B (en) * 2023-08-30 2024-03-22 腾讯科技(深圳)有限公司 Image segmentation model training method and related method, device, medium and equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110827292A (en) * 2019-10-23 2020-02-21 中科智云科技有限公司 Video instance segmentation method and device based on convolutional neural network
CN110910391A (en) * 2019-11-15 2020-03-24 安徽大学 Video object segmentation method with dual-module neural network structure
WO2020170167A1 (en) * 2019-02-21 2020-08-27 Sony Corporation Multiple neural networks-based object segmentation in a sequence of color image frames

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020170167A1 (en) * 2019-02-21 2020-08-27 Sony Corporation Multiple neural networks-based object segmentation in a sequence of color image frames
CN110827292A (en) * 2019-10-23 2020-02-21 中科智云科技有限公司 Video instance segmentation method and device based on convolutional neural network
CN110910391A (en) * 2019-11-15 2020-03-24 安徽大学 Video object segmentation method with dual-module neural network structure

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115277452A (en) * 2022-07-01 2022-11-01 中铁第四勘察设计院集团有限公司 ResNet self-adaptive acceleration calculation method based on edge-end cooperation and application
CN115277452B (en) * 2022-07-01 2023-11-28 中铁第四勘察设计院集团有限公司 ResNet self-adaptive acceleration calculation method based on edge-side coordination and application
CN114882076A (en) * 2022-07-11 2022-08-09 中国人民解放军国防科技大学 Lightweight video object segmentation method based on big data memory storage
CN116189194A (en) * 2023-04-27 2023-05-30 北京中昌工程咨询有限公司 Drawing enhancement segmentation method for engineering modeling
CN116189194B (en) * 2023-04-27 2023-07-14 北京中昌工程咨询有限公司 Drawing enhancement segmentation method for engineering modeling
CN117237397A (en) * 2023-07-13 2023-12-15 天翼爱音乐文化科技有限公司 Portrait segmentation method, system, equipment and storage medium based on feature fusion

Also Published As

Publication number Publication date
CN115349139A (en) 2022-11-15

Similar Documents

Publication Publication Date Title
WO2022133627A1 (en) Image segmentation method and apparatus, and device and storage medium
CN110363716B (en) High-quality reconstruction method for generating confrontation network composite degraded image based on conditions
Gautam et al. Realistic river image synthesis using deep generative adversarial networks
US11651477B2 (en) Generating an image mask for a digital image by utilizing a multi-branch masking pipeline with neural networks
CN114008663A (en) Real-time video super-resolution
DE102016005407A1 (en) Joint depth estimation and semantic labeling of a single image
US11393100B2 (en) Automatically generating a trimap segmentation for a digital image by utilizing a trimap generation neural network
CN111832570A (en) Image semantic segmentation model training method and system
CN112990222B (en) Image boundary knowledge migration-based guided semantic segmentation method
CA3137297C (en) Adaptive convolutions in neural networks
CN110852199A (en) Foreground extraction method based on double-frame coding and decoding model
CN115565043A (en) Method for detecting target by combining multiple characteristic features and target prediction method
WO2022109922A1 (en) Image matting implementation method and apparatus, and device and storage medium
CN112070040A (en) Text line detection method for video subtitles
KR20210029692A (en) Method and storage medium for applying bokeh effect to video images
CN117237648B (en) Training method, device and equipment of semantic segmentation model based on context awareness
CN112364933A (en) Image classification method and device, electronic equipment and storage medium
JP2022090633A (en) Method, computer program product and computer system for improving object detection within high-resolution image
CN115205624A (en) Cross-dimension attention-convergence cloud and snow identification method and equipment and storage medium
Tran et al. Encoder–decoder network with guided transmission map: Robustness and applicability
Lin et al. Deep asymmetric extraction and aggregation for infrared small target detection
Zhang Detect forgery video by performing transfer learning on deep neural network
Yetiş Auto-conversion from2D drawing to 3D model with deep learning
EP3401843A1 (en) A method, an apparatus and a computer program product for modifying media content
Deshpande et al. Fusion of handcrafted edge and residual learning features for image colorization

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20966208

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20966208

Country of ref document: EP

Kind code of ref document: A1