CN111199547B

CN111199547B - Image segmentation method and device and terminal equipment

Info

Publication number: CN111199547B
Application number: CN201811386656.6A
Authority: CN
Inventors: 王树朋
Original assignee: TCL Technology Group Co Ltd
Current assignee: TCL Technology Group Co Ltd
Priority date: 2018-11-20
Filing date: 2018-11-20
Publication date: 2024-01-23
Anticipated expiration: 2038-11-20
Also published as: CN111199547A

Abstract

The invention is applicable to the technical field of image processing, and provides an image segmentation method, an image segmentation device and terminal equipment, wherein the method comprises the following steps: acquiring an image; performing super-pixel segmentation on the image to obtain a super-pixel segmentation result; analyzing the image by using a deep neural network to obtain semantic features of each pixel of the image; combining the super-pixel segmentation result with the semantic feature of each pixel to obtain the semantic feature of each super-pixel of the image; and based on the semantic features of each super pixel, performing super pixel combination on adjacent super pixels to obtain a combined super pixel segmentation result. The invention solves the technical problem of low image segmentation precision in the prior art.

Description

Image segmentation method and device and terminal equipment

Technical Field

The present invention belongs to the technical field of image processing, and in particular, relates to a method, an apparatus, and a terminal device for image segmentation.

Background

The traditional interactive GrabCut or FloodFill method uses pixels as minimum units, adopts color information as characteristics, and performs foreground or background segmentation based on background modeling or classification. This concept of taking pixels as minimum units results in its low edge protection, which easily leads to a false segmentation phenomenon at the edges of the object. Super-pixel segmentation is to over-segment an image, consider local areas with similar features as a whole, and each local area is called a super-pixel. In the over-segmentation mode, the boundary line between the internal and external characteristic differences of each super pixel is defined, so that the image segmentation has good target edge protection. Therefore, the super-pixel segmentation concept has been introduced in the prior art, and the excellent edge retention performance is used as a reference, so as to obtain an image segmentation scheme with excellent target edge retention performance.

However, in the existing target segmentation method based on super-pixel segmentation, a simple color feature is often adopted as the feature of the super-pixel, and such simple feature information is difficult to solve the task of target segmentation with similar color, only the image with high contrast can be processed, and for the image with similar color, the image with low contrast is often ineffective, so that a good segmentation effect is not obtained. As shown in fig. 1, after multiple interactions, the segmented object still has a false segmentation phenomenon in the edge area. This has a certain impact on later image editing operations such as foreground beautification and/or background replacement. Therefore, there is a need for an image segmentation method to improve the accuracy of image segmentation to reduce the false segmentation phenomenon.

Disclosure of Invention

In view of the above, the embodiments of the present invention provide a method, an apparatus, and a terminal device for image segmentation, which can solve the technical problem of low image segmentation accuracy in the prior art.

A first aspect of an embodiment of the present invention provides a method for image segmentation, including:

acquiring an image;

performing super-pixel segmentation on the image to obtain a super-pixel segmentation result;

analyzing the image by using a deep neural network to obtain semantic features of each pixel of the image;

combining the super-pixel segmentation result with the semantic feature of each pixel to obtain the semantic feature of each super-pixel of the image;

and based on the semantic features of each super pixel, performing super pixel combination on adjacent super pixels to obtain a combined super pixel segmentation result.

A second aspect of an embodiment of the present invention provides an apparatus for image segmentation, including:

the acquisition module is used for acquiring the image;

the super-pixel segmentation module is used for carrying out super-pixel segmentation on the image to obtain a super-pixel segmentation result;

the pixel analysis module is used for analyzing the image by using a deep neural network to obtain the semantic feature of each pixel of the image;

the combining module is used for combining the super-pixel segmentation result and the semantic feature of each pixel to obtain the semantic feature of each super-pixel of the image;

and the merging module is used for merging the adjacent super pixels based on the semantic features of each super pixel to obtain a merged super pixel segmentation result.

A third aspect of an embodiment of the present invention provides a terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method of the first aspect when executing the computer program.

A fourth aspect of the embodiments of the present invention provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the method of the first aspect.

In the embodiment of the invention, the semantic feature of each pixel of the image is obtained by using the deep neural network, and the adjacent super pixels are combined based on the semantic feature of each pixel and the super pixel segmentation result to obtain the combined super pixel segmentation result. The invention uses the excellent deep neural network in the semantic segmentation field as a super-pixel feature extractor or an initial segmentation tool, has higher distinguishing capability for semantic features, improves the image segmentation performance, and solves the technical problem of low image segmentation precision in the prior art.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic illustration of an image segmentation result provided in the prior art;

FIG. 2 is a schematic flow chart of an implementation of a method for image segmentation according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating another implementation of an image segmentation method according to an embodiment of the present invention;

FIG. 4 is a schematic flow chart of an implementation of a method for image segmentation according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an apparatus for image segmentation according to an embodiment of the present invention;

FIG. 6 is a schematic structural diagram of another image segmentation apparatus according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a terminal device according to an embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

In order to illustrate the technical scheme of the invention, the following description is made by specific examples.

Fig. 2 shows an implementation flow of a method for image segmentation according to an embodiment of the present invention, where the method is applicable to the case of image segmentation, for example, the case of foreground and/or background segmentation in an image, and is performed by an apparatus for image segmentation. The image segmentation means are typically arranged in the terminal device and are implemented in software and/or hardware. The terminal device may be a terminal device with computing capabilities, such as an intelligent mobile terminal, a computer, a server, etc. As shown in fig. 2, the method of image segmentation includes the steps of: s201 to S205.

S201, acquiring an image.

In the embodiment of the invention, the image is an image including the target object, may be an image originally shot, or may be an image processed on the original shot image, for example, an image obtained by capturing a shot image of the original shot image. The object is an object to be segmented, such as a person, and/or an animal (such as a cat or a dog), and/or an object (such as a bottle, a cup or a die), and the object may be a foreground and/or a background of an image. Illustratively, as shown in fig. 1, the target is a cat included in fig. 1.

S202, performing super-pixel segmentation on the image to obtain a super-pixel segmentation result.

In the embodiment of the invention, the super-pixel segmentation can be carried out on the image by adopting SLIC, SLICO or gSLIC and other methods, namely the image is segmented into a plurality of super-pixels, and the super-pixel segmentation result is obtained.

It should be noted that, in the embodiment of the present invention, the image is segmented into a plurality of independent regions with similar sizes, that is, superpixels, where pixels inside each superpixel have similar features, and different superpixels may have similar features or may have different features. The super-pixel segmentation can more accurately preserve the edge contour characteristics of the target.

S203, analyzing the image by using a deep neural network to obtain the semantic feature of each pixel of the image.

The deep neural network is a model for analyzing the image to obtain the semantic features of each pixel of the image, and is a model which is learned by using sample data.

The embodiment of the invention builds the deep neural network based on the machine learning technology, wherein the deep neural network comprises but is not limited to a logistic regression algorithm, a support vector machine, a convolutional neural network and the like. According to different adopted networks, different training methods can be selected for training. For a better understanding of this patent, the process of constructing a deep neural network is described herein using a convolutional neural network as an example.

The convolutional neural network is a special neural artificial neural network, and is different from other models of the neural network, and is mainly characterized by convolutional operation. Convolutional neural networks perform well in many areas, such as the image processing field. In general, convolutional neural networks are a hierarchical model, with inputs being raw data, such as RGB images, etc. The convolutional neural network extracts high-level semantic information from an original data input layer by layer through a series of layer-by-layer stacking of convolutional operation, pooling operation, nonlinear activation function mapping and the like, and the process is called feedforward operation. Finally, the last layer of the convolutional neural network formalizes its target task as an objective function. And (3) calculating an error or loss between the predicted value and the true value, feeding back the error or loss from the last layer to layer by using a back propagation algorithm, updating parameters of each layer, and feeding forward again after updating the parameters, so that the model is reciprocated until the network model is converged, thereby achieving the model training purpose.

In the embodiment of the invention, the image is used as the input of the convolutional neural network, the semantic feature of each pixel of the image is used as the output, and the deep neural network can be obtained by training through a back propagation algorithm.

The deep neural network obtained by learning is based on a sample database of a large number of images, and the sample database has wide data sources and large sample size and can cover the whole image distribution, so that the depth model obtained by learning has better robustness and generalization performance.

In the embodiment of the invention, the image is input into a learned deep neural network, namely the learned deep neural network is utilized to analyze the image, so that the semantic feature of each pixel of the image is obtained. Illustratively, the semantic features are 64-dimensional feature vectors; furthermore, the dimensions of the semantic features may also vary depending on the depth neural network learned, which is described herein as an example only and is not to be construed as limiting the invention.

It should be noted that, the semantic features obtained by the deep neural network have extremely strong feature discrimination capability, and extremely high segmentation IOU is obtained, thus providing an advantageous foundation for improving the precision of the subsequent image segmentation.

S204, combining the super-pixel segmentation result and the semantic feature of each pixel to obtain the semantic feature of each super-pixel of the image.

In the embodiment of the present invention, each superpixel is an independent area and includes a plurality of pixels, so that the semantic feature of each superpixel is obtained by combining the superpixel segmentation result obtained in step 202 and the semantic feature of each pixel obtained in step 203.

As an embodiment of the present invention, step 204 includes: and carrying out weighted average on the semantic features of the pixels included in each super pixel to obtain the semantic features of each super pixel of the image.

As another embodiment of the present invention, step 204 includes: and calculating the average value of the semantic features of the pixels included in each super pixel to obtain the semantic feature of each super pixel of the image.

In the embodiment of the invention, the semantic features of each super pixel are obtained by comprehensively calculating the semantic features of the pixels included in the super pixel, and the discrimination capability of the semantic features of the super pixel is further improved under the condition that the data volume is not large and the occupied resources are not high, so that the precision of the subsequent image segmentation result is further improved.

S205, based on the semantic features of each super pixel, super-pixel combination is carried out on adjacent super pixels, and a combined super-pixel segmentation result is obtained.

In the embodiment of the present invention, step 205 includes: calculating the similarity of the semantic features of two adjacent super pixels, if the similarity is larger than a preset threshold, combining the two adjacent super pixels, and updating the semantic features of the combined super pixels until the similarity of any two adjacent super pixels is smaller than or equal to the preset threshold, so as to obtain a combined super pixel segmentation result.

Wherein updating the semantic features of the merged superpixels includes: and calculating the average value of the semantic features of two adjacent superpixels to obtain the semantic features of the merged superpixels.

The average value may be a calculated average value or a weighted average value, and the present invention is not particularly limited thereto. According to the invention, by comprehensively considering the semantic features of the two merged superpixels, the discrimination capability of the semantic features of the superpixels is reserved to the maximum extent, and the precision of the subsequent image segmentation result is further ensured.

Further, the calculating the similarity of the semantic features of two adjacent superpixels includes:

by the following formula:

or->And calculating the similarity of the semantic features of two adjacent super pixels.

Where i and j represent two adjacent superpixels, ftr _i Representing semantic features of superpixel i, ftr _j Representing semantic features of superpixel j, ftr _ik Semantic features representing superpixels iFtr _i Ftr _jk Semantic feature Ftr representing superpixel j _j And the value range of k is 1 to n, and n is an integer.

In other embodiments of the present invention, the semantic segmentation result may also be directly corrected using the superpixel segmentation result to obtain an image segmentation result.

Thus, the embodiment of the invention completes image segmentation. Traditional foreground segmentation schemes based on superpixel segmentation often employ some relatively simple features, such as RGB or Lab color features, for superpixel merging. However, these simple color features are difficult to solve the problem of object segmentation with similar foreground and background, so that the application of super-pixel segmentation in the foreground segmentation field is severely restricted. The breakthrough progress of the deep neural network in the field of semantic segmentation clearly proves that the learned features of the deep neural network have extremely strong feature discrimination capability, and although the semantic segmentation has extremely high segmentation IOU, the semantic segmentation accuracy is still limited by the imprecise influence of the edge region, and different semantic features mutually interfere. In view of the extremely strong edge contour characteristic and low semantic internal aggregation of super-pixel segmentation, while the deep neural network has low edge contour characteristic and high semantic internal aggregation, the embodiment of the invention combines the two to obtain an accurate target segmentation scheme.

On the basis of the embodiment shown in fig. 2, as shown in fig. 3, after step 205, the method further includes: steps 301 and 302.

S301, performing semantic segmentation on the image by using a semantic segmentation network to obtain a category label of each pixel of the image.

In the embodiment of the invention, the semantic segmentation network is also called a saliency network and is used for analyzing the image to obtain the category label of each pixel of the image. The semantic segmentation network is the same as the deep neural network and is a model which is well learned by using sample data.

The embodiment of the invention builds the semantic segmentation network based on the machine learning technology, and the construction principle of the semantic segmentation network is the same as that of the deep neural network, and is not repeated here.

In the embodiment of the invention, the image is input into the learned semantic segmentation network, that is, the learned semantic segmentation network is utilized to analyze the image, so as to obtain the category label of each pixel of the image. Illustratively, category labels include foreground and background category labels; for another example, category labels include bottle, cup, mold, and background four category labels; furthermore, the number of category labels may vary depending on the semantic segmentation network learned, and is described herein as exemplary only and not to be construed as limiting the invention.

S302, merging the merged super-pixel segmentation result and the class label of each pixel to obtain the segmentation result of the image.

In the embodiment of the invention, the merged super-pixel segmentation result and the class label of each pixel are combined, and the boundary is adjusted to obtain the segmentation result of the image.

The merged superpixel segmentation result comprises the boundary of the superpixel and the semantic feature of each superpixel, the boundary of the merged superpixel is adjusted according to the class label of the pixel included by each superpixel after merging, for example, the class label of one or more pixels outside the boundary of the current superpixel is the same as the class label of most pixels included in the boundary of the superpixel, and the pixels outside the boundary or boundaries are considered as part of the current superpixel and merged into the current superpixel, so that the segmentation result of the image is obtained.

In the embodiment shown in fig. 3 of the present invention, based on the embodiment shown in fig. 2, the class label of each pixel of the image obtained through the semantic segmentation network is further fused, and the merged super-pixel segmentation result is further subjected to edge adjustment, so that the accuracy of image segmentation is further improved.

Optionally, on the basis of the embodiment shown in fig. 2 or fig. 3, after step 205 or step 302, steps 401 to 402 are further included, as shown in fig. 4.

S401, loading a display interface comprising the segmentation result.

In the embodiment of the present invention, after the merged superpixel segmentation result is obtained in step 205, a display interface including the merged superpixel segmentation result is loaded; or, after obtaining the segmentation structure of the image in step 302, loading a display interface including the readjusted super pixel segmentation result.

The terminal equipment comprises a monitoring device which monitors whether a user triggers the addition or/and deletion of the super-pixel event on the display interface. Wherein, the adding or deleting super-pixel event includes an event of performing an adding operation of a foreground super-pixel or a background super-pixel, or an event of performing an adding operation of a foreground super-pixel or a background super-pixel, and is triggered by a touch gesture on a display interface by a user, for example, clicking or circling a certain super-pixel to trigger adding or deleting the super-pixel event. Wherein clicking includes clicking or double clicking, etc., and circling includes touching operation of drawing arbitrary closed figure.

S402, if the event of adding or deleting the super pixel is monitored to trigger on the display interface, adjusting the super pixel segmentation result according to the monitored event of adding or deleting the super pixel, and loading the display interface comprising the adjusted super pixel segmentation result.

In the embodiment of the invention, according to the monitored adding or deleting of the super-pixel event, the current super-pixel, namely the selected target super-pixel, can be determined, if the selected target super-pixel is currently displayed as a foreground, such as bright color display, the triggered event is the deleting event of the foreground super-pixel, at this time, the target super-pixel is adjusted to be a background, and the display style is a background style, such as dark color display.

According to the embodiment of the invention, the super-pixel segmentation result can be further adjusted through simple interpersonal interaction, so that the user operation is reduced, and the user experience is improved. That is, the embodiment of the invention greatly reduces man-machine interaction of later image editing operation through accurate foreground object segmentation, and obtains better user experience.

As shown in fig. 5, an apparatus for image segmentation according to an embodiment of the present invention includes:

an acquisition module 501 for acquiring an image;

the super-pixel segmentation module 502 is configured to perform super-pixel segmentation on the image to obtain a super-pixel segmentation result;

a pixel analysis module 503, configured to analyze the image by using a deep neural network, so as to obtain semantic features of each pixel of the image;

a combination module 504, configured to combine the superpixel segmentation result and the semantic feature of each pixel to obtain the semantic feature of each superpixel of the image;

the merging module 505 is configured to perform superpixel merging on the adjacent superpixels based on the semantic feature of each superpixel, so as to obtain a merged superpixel segmentation result.

It should be noted that, the implementation process of the apparatus for image segmentation provided in this embodiment may refer to the implementation process of a method for image segmentation as provided in fig. 2, which is not described herein.

Optionally, as shown in fig. 6, the apparatus 600 for image segmentation further includes:

the semantic segmentation module 601 is configured to perform semantic segmentation on the image by using a semantic segmentation network to obtain a class label of each pixel of the image;

and the fusion module 602 is configured to fuse the merged super-pixel segmentation result and the class label of each pixel to obtain the segmentation result of the image.

It should be noted that, the implementation process of the apparatus for image segmentation provided in this embodiment may refer to the implementation process of a method for image segmentation as provided in fig. 3, which is not described herein.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.

Fig. 7 is a schematic diagram of a terminal device according to an embodiment of the present invention. As shown in fig. 7, the terminal device 7 of this embodiment includes: a processor 70, a memory 71 and a computer program 72 stored in the memory 71 and executable on the processor 70, such as a program for image segmentation. The processor 70, when executing the computer program 72, implements the steps in the method embodiment of image segmentation described above, such as steps S201 to S205 shown in fig. 2. Alternatively, the processor 70, when executing the computer program 72, performs the functions of the modules/units of the apparatus embodiments described above, e.g., the functions of the units 501 to 505 shown in fig. 5.

By way of example, the computer program 72 may be partitioned into one or more modules/units that are stored in the memory 71 and executed by the processor 70 to complete the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing the specified functions, which instruction segments are used to describe the execution of the computer program 72 in the processor 70.

For example, the computer program 72 may be divided into an acquisition module, a super-pixel division module, a pixel analysis module, a combination module, and a merging module (module in the virtual device), each of which functions specifically as follows:

the acquisition module is used for acquiring the image;

The terminal device 7 may include, but is not limited to, a processor 70, a memory 71. It will be appreciated by those skilled in the art that fig. 7 is merely an example of the terminal device 7 and does not constitute a limitation of the terminal device 7, and may include more or less components than illustrated, or may combine certain components, or different components, e.g., the terminal device may further include an input-output device, a network access device, a bus, etc.

The processor 70 may be a central processing unit (Central Processing Unit, CPU), or may be another general purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a Field-programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, a discrete gate or transistor logic device, a discrete hardware component, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 71 may be an internal storage unit of the terminal device 7, such as a hard disk or a memory of the terminal device 7. The memory 71 may be an external storage device of the terminal device 7, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the terminal device 7. Further, the memory 71 may also include both an internal storage unit and an external storage device of the terminal device 7. The memory 71 is used for storing the computer program as well as other programs and data required by the terminal device 7. The memory 71 may also be used for temporarily storing data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other manners. For example, the apparatus/terminal device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical function division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims

1. A method of image segmentation, the method comprising:

acquiring an image;

based on the semantic features of each super pixel, super pixel combination is carried out on adjacent super pixels, and a combined super pixel segmentation result is obtained;

after the merged superpixel segmentation result is obtained, the method further comprises the following steps:

performing semantic segmentation on the image by using a semantic segmentation network to obtain a category label of each pixel of the image;

merging the merged super-pixel segmentation result and the class label of each pixel to obtain the segmentation result of the image; the merged superpixel segmentation result comprises the boundary of the superpixel and the semantic feature of each superpixel, and the boundary of the merged superpixel is adjusted according to the class label of the pixel included by each merged superpixel.

2. The method of claim 1, wherein said combining the superpixel segmentation result with the semantic feature of each pixel to obtain the semantic feature of each superpixel of the image comprises:

and calculating the average value of the semantic features of the pixels included in each super pixel to obtain the semantic feature of each super pixel of the image.

3. The method of claim 1, wherein the performing superpixel merging on the neighboring superpixels based on the semantic feature of each superpixel to obtain the merged superpixel segmentation result comprises:

calculating the similarity of the semantic features of two adjacent super pixels, if the similarity is larger than a preset threshold, combining the two adjacent super pixels, and updating the semantic features of the combined super pixels until the similarity of any two adjacent super pixels is smaller than or equal to the preset threshold, so as to obtain a combined super pixel segmentation result.

4. The method of claim 3, wherein updating the semantic features of the merged superpixel comprises:

and calculating the average value of the semantic features of two adjacent superpixels to obtain the semantic features of the merged superpixels.

5. A method according to claim 3, wherein said calculating the similarity of semantic features of two adjacent superpixels comprises:

by the following formula:

or->Calculating the similarity of semantic features of two adjacent super pixels;

where i and j represent two adjacent superpixels, ftr _i Representing semantic features of superpixel i, ftr _j Representing semantic features of superpixel j, ftr _ik Semantic feature Ftr representing superpixel i _i Ftr _jk Semantic feature Ftr representing superpixel j _j And the value range of k is 1 to n, and n is an integer.

6. The method of claim 1, wherein the method further comprises:

loading a display interface comprising a segmentation result;

if the event of adding or deleting the super pixel is monitored to trigger on the display interface, the super pixel segmentation result is adjusted according to the monitored event of adding or deleting the super pixel, and the display interface comprising the adjusted super pixel segmentation result is loaded.

7. An apparatus for image segmentation, the apparatus comprising:

the acquisition module is used for acquiring the image;

the merging module is used for merging the adjacent super pixels based on the semantic features of each super pixel to obtain a merged super pixel segmentation result;

the image segmentation apparatus further includes:

the semantic segmentation module is used for carrying out semantic segmentation on the image by utilizing a semantic segmentation network to obtain a category label of each pixel of the image;

the fusion module is used for fusing the merged super-pixel segmentation result and the class label of each pixel to obtain the segmentation result of the image; the merged superpixel segmentation result comprises the boundary of the superpixel and the semantic feature of each superpixel, and the boundary of the merged superpixel is adjusted according to the class label of the pixel included by each merged superpixel.

8. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 6 when the computer program is executed.

9. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 6.