CN111598912A

CN111598912A - Image segmentation method and device

Info

Publication number: CN111598912A
Application number: CN201910126877.8A
Authority: CN
Inventors: 董健
Original assignee: Beijing Qihoo Technology Co Ltd
Current assignee: Beijing Qihoo Technology Co Ltd
Priority date: 2019-02-20
Filing date: 2019-02-20
Publication date: 2020-08-28

Abstract

The invention relates to the technical field of image recognition, in particular to an image segmentation method and an image segmentation device, wherein the method comprises the following steps: acquiring a target image to be segmented; respectively carrying out semantic segmentation and object segmentation on the target image, and correspondingly obtaining a semantic segmentation result and an object segmentation result; filtering out a first class object segmentation result in the object segmentation results, and reserving a second class object segmentation result in the object segmentation results, wherein the first class object segmentation result is an object segmentation result with a score lower than a preset score threshold value, and the second class object segmentation result is an object segmentation result with a score equal to or higher than the preset score threshold value; carrying out non-maximum suppression processing on the second object segmentation result; and fusing the semantic segmentation result and the second object segmentation result subjected to non-maximum suppression processing, and outputting a panoramic segmentation result corresponding to the target image. The invention improves the accuracy of image recognition.

Description

Image segmentation method and device

Technical Field

The invention relates to the technical field of image recognition, in particular to an image segmentation method and device.

Background

Image recognition refers to a technique of processing, analyzing, and understanding an image with a computer to recognize individual objects in the image. The process of image recognition often involves image segmentation, a technique and process that divides an image into several specific regions with unique properties and addresses objects of interest. With the continuous development of technologies such as smart phones, internet of things and automatic driving, higher requirements are put forward on the accuracy of image analysis. Early image recognition techniques included the following two: the first is semantic segmentation, which is to assign a class label to each pixel in an image, for example, there are three classes of person, cat, and dog in a graph, and then the result of semantic segmentation is to identify the three classes of person, cat, and dog in the graph; the second is object segmentation, which is the detection and segmentation of each target instance in an image, e.g., identifying person1, person2, cat1, cat2, dog1, and dog 2. With the continuous development of the technology, more application scenes such as automatic driving, a vehicle recorder or live broadcasting need to be subjected to panoramic segmentation, namely, the semantic segmentation and the object segmentation are completed simultaneously, and because the semantic segmentation and the object segmentation are two mutually independent processes, the segmentation results of the semantic segmentation and the object segmentation are often contradictory and cannot be directly fused, so that the accuracy of image recognition is reduced.

Disclosure of Invention

In view of the above, the present invention has been made to provide an image segmentation method and apparatus that overcome or at least partially solve the above problems.

According to a first aspect of the present invention, there is provided an image segmentation method, the method comprising:

acquiring a target image to be segmented;

respectively carrying out semantic segmentation and object segmentation on the target image, and correspondingly obtaining a semantic segmentation result and an object segmentation result;

filtering out a first class object segmentation result in the object segmentation results, and reserving a second class object segmentation result in the object segmentation results, wherein the first class object segmentation result is an object segmentation result with a score lower than a preset score threshold value, and the second class object segmentation result is an object segmentation result with a score equal to or higher than the preset score threshold value;

carrying out non-maximum suppression processing on the second object segmentation result;

and fusing the semantic segmentation result and the second object segmentation result subjected to non-maximum suppression processing, and outputting a panoramic segmentation result corresponding to the target image.

Preferably, the performing of the non-maximum suppression processing on the second object segmentation result includes:

and respectively carrying out non-maximum suppression processing on the second class object segmentation result corresponding to each confusion class label, and carrying out non-maximum suppression processing on the second class object segmentation results corresponding to other class labels except the confusion class label.

Preferably, before the performing the non-maximum suppression processing on the second-class object segmentation result corresponding to each confusion class label and performing the non-maximum suppression processing on the second-class object segmentation results corresponding to other class labels except the confusion class label, the method further includes:

and determining the confusion class label by counting each class label in the class label training set.

Preferably, the fusing the semantic segmentation result and the second object segmentation result subjected to the non-maximum suppression processing includes:

and selecting the result with the largest score as the target segmentation result of each pixel in the target image based on the semantic segmentation result and the second-class object segmentation result subjected to non-maximum suppression processing.

Preferably, performing semantic segmentation on the target image to obtain the semantic segmentation result includes:

and adding a category label to each pixel in the target image based on a semantic segmentation model, and obtaining the score of each pixel belonging to the category label.

Preferably, the object segmentation is performed on the target image to obtain the object segmentation result, and the method includes:

separating each object from the target image;

adding a class label to each object, and obtaining the score of each object belonging to the class label.

According to a second aspect of the present invention, there is provided an image segmentation apparatus, comprising:

the acquisition module is used for acquiring a target image to be segmented;

the segmentation module is used for respectively carrying out semantic segmentation and object segmentation on the target image to correspondingly obtain a semantic segmentation result and an object segmentation result;

the filtering module is used for filtering out a first class object segmentation result in the object segmentation results and reserving a second class object segmentation result in the object segmentation results, wherein the first class object segmentation result is an object segmentation result with a score lower than a preset score threshold value, and the second class object segmentation result is an object segmentation result with a score equal to or higher than the preset score threshold value;

the non-maximum suppression processing module is used for performing non-maximum suppression processing on the segmentation result of the second object;

and the fusion module is used for fusing the semantic segmentation result and the second object segmentation result subjected to non-maximum suppression processing, and outputting a panoramic segmentation result corresponding to the target image.

Preferably, the non-maximum suppression processing module includes:

and the non-maximum suppression processing unit is used for respectively performing non-maximum suppression processing on the second object segmentation result corresponding to each confusion class label and performing non-maximum suppression processing on the second object segmentation results corresponding to other class labels except the confusion class label.

Preferably, the apparatus further comprises:

and the statistic module is used for determining the confusion class label by counting each class label in the class label training set.

Preferably, the fusion module comprises:

and the fusion unit is used for selecting the result with the largest score as the target segmentation result of each pixel in the target image based on the semantic segmentation result and the second-class object segmentation result subjected to non-maximum suppression processing.

Preferably, the segmentation module includes:

and the semantic segmentation unit is used for adding a class label to each pixel in the target image based on a semantic segmentation model and obtaining the score of each pixel belonging to the class label.

Preferably, the segmentation module includes: an object segmentation unit;

the object segmentation unit includes:

a segmentation subunit, configured to separate each object from the target image;

and the adding subunit is used for adding a class label to each object and obtaining the score of each object belonging to the class label.

According to a third aspect of the present invention, a computer-readable storage medium is provided, on which a computer program is stored which, when being executed by a processor, carries out the method steps as in the first aspect described above.

According to a fourth aspect of the present invention, there is provided a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method steps as in the first aspect when executing the program.

According to the image segmentation method and the device, after a target image to be segmented is obtained, semantic segmentation and object segmentation are respectively carried out on the target image, a semantic segmentation result and an object segmentation result are correspondingly obtained, then a first class object segmentation result in the object segmentation result is filtered, a second class object segmentation result in the object segmentation result is reserved, wherein the first class object segmentation result is an object segmentation result with the score lower than a preset score threshold value, the second class object segmentation result is an object segmentation result with the score equal to or higher than the preset score threshold value, then non-maximum value suppression processing is carried out on the second class object segmentation result, finally the semantic segmentation result and the second class object segmentation result subjected to the non-maximum value suppression processing are fused, and a panoramic segmentation result corresponding to the target image is output, because the object segmentation result is filtered and subjected to non-maximum suppression processing after filtering, the semantic segmentation result and the object segmentation result can be effectively fused, the condition of contradiction is avoided, and the accuracy of image identification is improved.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 shows a flow chart of an image segmentation method in an embodiment of the invention;

FIG. 2 is a schematic diagram illustrating a target image to be segmented in an embodiment of the present invention;

FIG. 3 is a diagram illustrating a panoramic segmentation result according to an embodiment of the present invention;

FIG. 4 is a diagram showing a configuration of an image segmentation apparatus in the embodiment of the present invention;

fig. 5 shows a block diagram of a computer device in an embodiment of the invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The embodiment of the invention provides an image segmentation method, which is applied to the field of image processing, and as shown in figure 1, the method comprises the following steps:

step 101: and acquiring a target image to be segmented.

Specifically, the image segmentation method provided by the embodiment of the invention can be applied to the analysis process of pictures and videos. For pictures, each picture is a target image to be segmented. For a video, each frame of image in the video is a target image to be segmented.

After step 101 is completed, step 102 is performed: and respectively carrying out semantic segmentation and object segmentation on the target image, and correspondingly obtaining a semantic segmentation result and an object segmentation result.

Specifically, semantic segmentation is performed on the target image to obtain a semantic segmentation result, and object segmentation is performed on the target image to obtain an object segmentation result.

Semantic segmentation, which is a typical computer vision problem, is used to convert an image as input into a mask with highlighted regions of interest. Each pixel in the image can be assigned a class label using semantic segmentation, while for each pixel there is also a score corresponding to the class label, which is used to characterize the probability that the pixel belongs to the class label. Specifically, the specific process of semantic segmentation includes: and adding a class label to each pixel in the target image based on the semantic segmentation model, and obtaining the score of each pixel belonging to the corresponding class label. The semantic segmentation model is obtained by training a data set, the architecture adopted by the semantic segmentation model comprises the classification by using a convolutional neural network and the optimization by using a conditional random field, and the architecture of a classifier applied by the semantic segmentation model comprises U-Net, deep Lab, translation 10 and the like.

For example, for a target image, after semantic segmentation is performed on the target image, a first pixel in the target image has a person label, a score of the first pixel is 80, a probability that 80 represents that the first pixel is person is 80%, a second pixel has a dog label, a score of the second pixel is 70, a probability that 70 represents that the second pixel is dog is 70%, a score of the third pixel has a cat label, a score of the third pixel is 90, and a probability that 90 represents that the third pixel is cat is 90%.

It should be noted that the labels allocated in the semantic segmentation are parent labels, that is, the semantic segmentation does not refine each parent label, that is, the semantic segmentation does not distinguish different instances belonging to the same category. For example, different object objects are not distinguished for the pixels with person labels, and in practical applications, if the pixels with person labels are all filled with blue, then person a and person b are both filled with blue, and semantic segmentation does not distinguish person a from person b.

For object segmentation, processes of target detection, image classification and image segmentation are integrated, the object segmentation is used for distinguishing different object objects in an image, the object segmentation not only performs pixel-level classification, but also distinguishes different instances on the basis of specific categories, for example, not only can a plurality of people be detected in the image, but also can be used for distinguishing which pixels belong to a first person and which pixels belong to a second person. With object segmentation it is possible to assign a class label to each pixel in the image, while for each pixel there is also a score corresponding to the class label, which score is also used to characterize the probability that the pixel belongs to the class label. Specifically, the specific process of object segmentation includes: separating each object from the target image; and adding a class label to each object, and obtaining the score of each object belonging to the class label. Common object segmentation methods include DEEPMASK, Multi-task Network cassettes, INSTANCEFCN, MASK R-CNN, and the like.

It should be noted that, in the embodiment of the present invention, the process of adding the category label to the object includes adding the category label to each pixel included in the object, that is, after one category label is given to one object, all pixels included in the object are given the category label, and similarly, each pixel included in the object also has a score corresponding to the object.

For example, for a target image, after object segmentation is performed on the target image, the target image includes a person a and a person b, the person a is an object, the person b is an object, the person a has a person1 label and has 70 points, the person b has a person2 label and has 80 points, all pixels included in a region corresponding to the person a have a person1 label and have 70 points, all pixels included in a region corresponding to the person b have a person2 label and have 80 points, the probability that the region corresponding to the person a belongs to an independent person (i.e., belongs to the person1) is represented by 70 points, and the probability that the region corresponding to the person b belongs to an independent person (i.e., belongs to the person2) is represented by 80 points.

It should be noted that, in the embodiment of the present invention, the labels assigned in the object segmentation are child-level labels, and the child-level labels are more detailed labels of parent-level labels, for example, if the parent-level label is person, then the child-level labels may be person1, person2, person3, and so on.

The object segmentation is also referred to as example segmentation, and the semantic segmentation and the object segmentation can be realized by any of the semantic segmentation methods and the object segmentation methods in the related art, and are not limited to the above-mentioned embodiments.

After the object segmentation result is obtained, step 103 is executed: and filtering out a first class object segmentation result in the object segmentation results, and keeping a second class object segmentation result in the object segmentation results, wherein the first class object segmentation result is an object segmentation result with a score lower than a preset score threshold value, and the second class object segmentation result is an object segmentation result with a score equal to or higher than the preset score threshold value.

Specifically, a score threshold is preset, the score threshold is a preset score threshold, the preset score threshold is set according to the credibility of the segmentation result, a user can adjust the height of the preset score threshold by himself, and the segmentation result with low credibility in the object segmentation result can be filtered through the preset score threshold. In the embodiment of the invention, the object segmentation results are filtered, the object segmentation results with the scores lower than the preset score threshold value are filtered, and only the object segmentation results with the scores equal to or higher than the preset score threshold value are reserved. Specifically, the object segmentation result with the score lower than the preset score threshold is the first class of object segmentation result, the object segmentation result with the score equal to or higher than the preset score threshold is the second class of object segmentation result, the first class of object segmentation result is filtered out by filtering all the object segmentation results, and only the second class of object segmentation result is reserved.

After step 103 is completed, step 104 is performed: and carrying out non-maximum suppression processing on the second object segmentation result.

Specifically, step 104 includes: and respectively carrying out non-maximum suppression processing on the second class object segmentation result corresponding to each confusion class label, and carrying out non-maximum suppression processing on the second class object segmentation results corresponding to other class labels except the confusion class label.

Further, before the non-maximum suppression processing is performed on the second-class object segmentation result corresponding to each confusion class label and the non-maximum suppression processing is performed on the second-class object segmentation results corresponding to other class labels except the confusion class label, the method further includes: and determining the confusion class label by counting each class label in the class label training set.

In particular, there are often similar category labels, and the similar category labels have a higher degree of overlap, for example, the degree of overlap is higher than a preset degree of overlap threshold, such as 80%, and such category labels are easily confused, i.e., are confusing category labels. For example, the two categories of a standing person and a bicyclist are easily confused, i.e., they are confusing category labels. The confusion class label can be obtained by counting the class label training set, and the method for obtaining the confusion class label belongs to the prior art, and the embodiment of the invention is not limited.

In the embodiment of the present invention, for the confusion class labels, non-maximum suppression processing is used inside each confusion class label, that is, non-maximum suppression processing is performed on the segmentation result of the second class object corresponding to each confusion class label. For example, the retained second-class object segmentation result includes a "standing person" class label and a "bicyclist" class label, and since the two class labels belong to the confusion class label, the non-maximum suppression processing is performed on the second-class object segmentation result corresponding to the "standing person" class label, and the non-maximum suppression processing is performed on the second-class object segmentation result corresponding to the "bicyclist" class label.

In addition, for other category labels except the confusion category label, the non-maximum suppression processing is performed on all the second-type object segmentation results corresponding to the other category labels. For example, all the category labels included in the retained second-class object segmentation result are: a "standing person" category label, a "bicyclist" category label, a "dog" category label, and a "cat" category label, then non-maximum suppression processing is used together with all second class object segmentation results corresponding to the "dog" category label and the "cat" category label.

In the embodiment of the present invention, the two non-maximum suppression processing procedures may be implemented by using the same algorithm, or may be implemented by using different algorithms, which is not limited herein. By adopting the non-maximum value inhibition processing process, the optimal target boundary box can be found from the overlapped candidate boxes in the segmentation result, and the redundant boundary box is eliminated.

After completing step 104, step 105 is performed: and fusing the semantic segmentation result and the second object segmentation result subjected to non-maximum suppression processing, and outputting a panoramic segmentation result corresponding to the target image.

Further, the process of fusing comprises: and selecting the result with the largest score as the target segmentation result of each pixel in the target image based on the semantic segmentation result and all the second-class object segmentation results subjected to non-maximum suppression processing, so as to obtain one target segmentation result for each pixel, and finally forming a panoramic segmentation result by all the target segmentation results corresponding to all the pixels in the target image.

Specifically, after semantic segmentation, a category label and a score of each pixel in the target image can be obtained according to the semantic segmentation result, and the category label and the score can also be called as a semantic category label and a semantic segmentation score. After the non-maximum suppression processing, each pixel corresponding to the second-class object segmentation result also obtains a class label and a score, and the class label and the score may also be referred to as an object class label and an object segmentation score. Further, for each pixel in the target image, when a plurality of category labels exist in a certain pixel, the category label with the highest score is used as the category label of the pixel, and the category label is the target segmentation result of the pixel.

The image segmentation method of the present invention will be illustrated with reference to an example:

as shown in fig. 2, an existing target image to be segmented is firstly subjected to semantic segmentation to obtain a semantic segmentation result, where the semantic segmentation result includes a semantic class label and a semantic segmentation score corresponding to each pixel in the target image, and the target image is subjected to object segmentation to obtain an object segmentation result. After obtaining the object segmentation result, the object segmentation result contains an object class label and an object segmentation score for each object, the object class label and the object segmentation score for each pixel in the object being the same as the object. Then, filtering out object segmentation results with scores lower than a preset score threshold value in the object segmentation results, and only keeping object segmentation results with scores equal to or higher than the preset score threshold value in the object segmentation results, namely only keeping object segmentation results of the second type. Then, the second-class object segmentation result is subjected to non-maximum suppression processing, if the confusion class labels exist in the retained object segmentation results, the second-class object segmentation result corresponding to each confusion class label is subjected to non-maximum suppression processing, the second-class object segmentation results corresponding to other class labels except the confusion class labels are subjected to non-maximum suppression processing, and if the confusion class labels do not exist in the retained object segmentation results, the non-maximum suppression processing is performed on all the retained second-class object segmentation results. And finally, fusing the semantic segmentation result and the second object segmentation result subjected to non-maximum suppression processing, and outputting a panoramic segmentation result corresponding to the target image, as shown in fig. 3.

Based on the same inventive concept, an embodiment of the present invention further provides an image segmentation apparatus, as shown in fig. 4, the apparatus includes:

an obtaining module 401, configured to obtain a target image to be segmented;

a segmentation module 402, configured to perform semantic segmentation and object segmentation on the target image, respectively, and obtain a semantic segmentation result and an object segmentation result correspondingly;

a filtering module 403, configured to filter out a first class object segmentation result in the object segmentation results, and retain a second class object segmentation result in the object segmentation results, where the first class object segmentation result is an object segmentation result with a score lower than a preset score threshold, and the second class object segmentation result is an object segmentation result with a score equal to or higher than the preset score threshold;

a non-maximum suppression processing module 404, configured to perform non-maximum suppression processing on the second object segmentation result;

and a fusion module 405, configured to fuse the semantic segmentation result and the second-class object segmentation result subjected to non-maximum suppression processing, and output a panoramic segmentation result corresponding to the target image.

Preferably, the non-maximum suppression processing module 404 includes:

Preferably, the apparatus further comprises:

Preferably, the fusion module 405 includes:

Preferably, the segmentation module 402 includes:

Preferably, the segmentation module 402 includes: an object segmentation unit;

the object segmentation unit includes:

Based on the same inventive concept, embodiments of the present invention further provide a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the method steps described in the foregoing embodiments.

Based on the same inventive concept, an embodiment of the present invention further provides a computer apparatus, as shown in fig. 5, for convenience of description, only the portions related to the embodiment of the present invention are shown, and details of the specific technology are not disclosed, please refer to the method portion of the embodiment of the present invention. The computer device may be any terminal device including a mobile phone, a tablet computer, a PDA (personal digital Assistant), a POS (Point of Sales), a vehicle-mounted computer, and the like, taking the computer device as the mobile phone as an example:

fig. 5 is a block diagram illustrating a partial structure associated with a computer device provided by an embodiment of the present invention. Referring to fig. 5, the computer apparatus includes: a memory 501 and a processor 502. Those skilled in the art will appreciate that the computer device configuration illustrated in FIG. 5 does not constitute a limitation of computer devices, and may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components.

The following describes the components of the computer device in detail with reference to fig. 5:

the memory 501 may be used to store software programs and modules, and the processor 502 executes various functional applications and data processing by operating the software programs and modules stored in the memory 501. The memory 501 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.), and the like. Further, the memory 501 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The processor 502 is a control center of the computer device, and performs various functions and processes data by operating or executing software programs and/or modules stored in the memory 501 and calling data stored in the memory 501. Alternatively, processor 502 may include one or more processing units; preferably, the processor 502 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications.

In the embodiment of the present invention, the processor 502 included in the computer device may have functions corresponding to the method steps in any of the foregoing embodiments.

In summary, according to the image segmentation method and apparatus of the present invention, after a target image to be segmented is obtained, semantic segmentation and object segmentation are performed on the target image, so as to obtain a semantic segmentation result and an object segmentation result correspondingly, a first class object segmentation result in the object segmentation result is filtered, a second class object segmentation result in the object segmentation result is retained, wherein the first class object segmentation result is an object segmentation result with a score lower than a preset score threshold, the second class object segmentation result is an object segmentation result with a score equal to or higher than the preset score threshold, then non-maximum suppression processing is performed on the second class object segmentation result, and finally the semantic segmentation result and the second class object segmentation result subjected to the non-maximum suppression processing are fused to output a panoramic segmentation result corresponding to the target image, because the object segmentation result is filtered and subjected to non-maximum suppression processing after filtering, the semantic segmentation result and the object segmentation result can be effectively fused, the condition of contradiction is avoided, and the accuracy of image identification is improved.

The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components in accordance with embodiments of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

A1, an image segmentation method, comprising:

acquiring a target image to be segmented;

A2, the image segmentation method according to a1, wherein the performing of the non-maximum suppression process on the second-type object segmentation result includes:

A3, the image segmentation method according to a2, wherein before the performing the non-maximum value suppression processing on the second-class object segmentation result corresponding to each confusion class label and the performing the non-maximum value suppression processing on the second-class object segmentation results corresponding to the class labels other than the confusion class label, the method further comprises:

A4, the image segmentation method according to a1, wherein the fusing the semantic segmentation result and the object segmentation result of the second type that has undergone non-local maximum suppression processing, comprises:

A5, the image segmentation method according to A1, wherein the semantic segmentation of the target image to obtain the semantic segmentation result comprises:

The image segmentation method as claimed in a1, a6, wherein the object segmentation is performed on the target image to obtain the object segmentation result, and the method comprises:

separating each object from the target image;

B7, an image segmentation apparatus, comprising:

the acquisition module is used for acquiring a target image to be segmented;

B8, the image segmentation apparatus according to B7, wherein the non-local maximum suppression processing module includes:

B9, the image segmentation apparatus according to B8, wherein the apparatus further comprises:

B10, the image segmentation apparatus according to B7, wherein the fusion module comprises:

B11, the image segmentation apparatus according to B7, wherein the segmentation module comprises:

B12, the image segmentation apparatus according to B7, wherein the segmentation module comprises: an object segmentation unit;

the object segmentation unit includes:

C13, a computer-readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, carries out the method steps according to any one of a1-a 6.

D14, a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor realizes the method steps according to any of a1-a6 when executing the program.

Claims

1. A method of image segmentation, the method comprising:

acquiring a target image to be segmented;

2. The image segmentation method according to claim 1, wherein the performing of the non-maximum suppression processing on the second-type object segmentation result comprises:

3. The image segmentation method according to claim 2, wherein before performing the non-maximum suppression processing on the second-class object segmentation result corresponding to each confusion class label and performing the non-maximum suppression processing on the second-class object segmentation results corresponding to other class labels except the confusion class label, the method further comprises:

4. The image segmentation method according to claim 1, wherein the fusing the semantic segmentation result and the non-maxima suppression processed object segmentation result of the second type includes:

5. The image segmentation method of claim 1, wherein performing semantic segmentation on the target image to obtain the semantic segmentation result comprises:

6. The image segmentation method of claim 1, wherein performing object segmentation on the target image to obtain the object segmentation result comprises:

separating each object from the target image;

7. An image segmentation apparatus, characterized in that the apparatus comprises:

the acquisition module is used for acquiring a target image to be segmented;

8. The image segmentation apparatus of claim 7, wherein the non-maxima suppression processing module comprises:

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method steps of any one of claims 1 to 6.

10. Computer arrangement comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor realizes the method steps of any of claims 1-6 when executing the program.