WO2020079704A1 - Method and system for performing semantic segmentation of plurality of entities in an image - Google Patents

Method and system for performing semantic segmentation of plurality of entities in an image Download PDF

Info

Publication number
WO2020079704A1
WO2020079704A1 PCT/IN2019/050528 IN2019050528W WO2020079704A1 WO 2020079704 A1 WO2020079704 A1 WO 2020079704A1 IN 2019050528 W IN2019050528 W IN 2019050528W WO 2020079704 A1 WO2020079704 A1 WO 2020079704A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
entity
wroi
neural network
model
Prior art date
Application number
PCT/IN2019/050528
Other languages
French (fr)
Inventor
Shivam Shah
Harshit Pande
Original Assignee
Sigtuple Technologies Private Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sigtuple Technologies Private Limited filed Critical Sigtuple Technologies Private Limited
Publication of WO2020079704A1 publication Critical patent/WO2020079704A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30041Eye; Retina; Ophthalmic

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure relates to a method and system for performing semantic segmentation of plurality of entities in an image. In an embodiment, the present disclosure discloses identifying a Weak Region of Interest (WRoI) comprising at least one smaller entity enclosed within a larger entity using a first model of a neural network. Further, a location of center of the WRoI is estimated for weakly predicting a portion of the larger entity. Thereafter, the portion of the larger entity is dynamically cropped from the WRoI and analyzed using a second model of the neural network for segmenting each of the plurality of entities in the image. In an embodiment, the present disclosure helps in segmenting multiple entities in an image using a weakly predicted portion of a larger entity in the image, thereby reducing the computational complexity involved in segmentation of the plurality of entities.

Description

METHOD AND SYSTEM FOR PERFORMING SEMANTIC SEGMENTATION OF PLURALITY OF ENTITIES IN AN IMAGE
The following specification particularly describes the invention and the manner in which it is to be performed.
TECHNICAL FIELD
The present subject matter is, in general, related to image segmentation and more particularly, but not exclusively, to method and system for performing semantic segmentation of a plurality of entities in an image.
BACKGROUND
Medical images are used for analyzing medical condition of a subject. Automatic screening of the medical images helps in identifying objects present in the medical images or in identifying any abnormal condition. As an example, screening of medical images of a blood/urine smear helps in identifying blood platelets or bacteria present in blood/urine sample. Similarly, medical image obtained from fundus camera helps in analyzing cup or disc in the fundus image. Generally, the screening of medical images is performed by a process of segmentation. There exist several segmentation techniques for segmenting the medical images.
One of the segmentation techniques that is largely in use to analyze the medical images is semantic segmentation. However, the semantic segmentation has a common problem of data skewness, since each pixel is treated as a data point during the semantic segmentation. Further, in the semantic segmentation, it is a challenge to segment multiple sub-entities belonging to an object, since the data points are in minute ratio compared to that of background pixels in the medical images. Thus, any trivial approach would fail to segment multiple entities using a single trained neural network model and would require multiple trained models to segment these entities.
Particularly, in case of fundus photographs, an ophthalmologist may be required to evaluate cup-to-disc ratio, as well as characteristic configuration for disc-rim thickness to make various analysis. Both of these parameters may be deduced by accurately segmenting the cup and the disc regions in the fundus photographs. However, the semantic segmentation approach may fail to produce desired results when working with fundus photographs obtained from multiple sources/cameras as the data points would differ depending on the source/camera being used. Further, since the cup region is located inside the disc, the semantic segmentation approach may require multiple trained models for segmenting both the cup and the disc regions. Consequently, the semantic segmentation approach involved in segmentation of multiple entities would be computationally complex as it involves multiple training models, multiple deep learning architectures and numerous encoder- decoder parameters.
SUMMARY
One or more shortcomings of the prior art may be overcome, and additional advantages may be provided through the present disclosure. Additional features and advantages may be realized through the techniques of the present disclosure. Other embodiments and aspects of the disclosure are described in detail herein and are considered a part of the claimed disclosure.
Disclosed herein is a method for semantic segmentation of a plurality of entities in an image. The method comprises identifying, by a segmentation system, a Weak Region of Interest (WRoI) comprising at least one smaller entity enclosed within a larger entity, in the image by analyzing the image using a first model of a neural network trained for predicting the WRoI. Upon identifying, the method comprises estimating a location of center of the WRoI for weakly predicting a portion of the larger entity. Thereafter, the method comprises dynamically cropping the portion of the larger entity from the WRoI. Finally, the method comprises analyzing the cropped portion of the larger entity using a second model of the neural network, trained for segmenting each of the at least one smaller entity and the larger entity, thereby segmenting each of the plurality of entities in the image.
Further, the present disclosure relates to a segmentation system for performing semantic segmentation of a plurality of entities in an image. The segmentation system comprises a processor and a memory. The memory is communicatively coupled to the processor and stores processor-executable instructions, which on execution, cause the processor to identify a Weak Region of Interest (WRoI), comprising at least one smaller entity enclosed within a larger entity, in the image by analyzing the image using a first model of a neural network trained for predicting the WRoI. Further, the instructions cause the processor to estimate a location of center of the WRoI for weakly predicting a portion of the larger entity. Thereafter, the instructions cause the processor to dynamically crop the portion of the larger entity from the WRoI. Finally, the instructions cause the processor to analyze the cropped portion of the WRoI using a second model of the neural network, trained for segmenting each of the at least one smaller entity and the larger entity, thereby segmenting each of the plurality of entities in the image.
The foregoing summary is illustrative only and is not intended to be in any way limiting. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features will become apparent by reference to the drawings and the following detailed description.
BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS
The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, explain the disclosed principles. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the figures to reference like features and components. Some embodiments of system and/or methods in accordance with embodiments of the present subject matter are now described, by way of example only, and regarding the accompanying figures, in which:
FIG. 1 (Prior art) illustrates an exemplary architecture for performing semantic segmentation of a plurality of entities in an image in accordance with an existing method;
FIG. 2 illustrates an exemplary architecture for performing semantic segmentation of a plurality of entities in an image in accordance with some embodiments of the present disclosure;
FIG. 3 shows a detailed block diagram illustrating a segmentation system in accordance with some embodiments of the present disclosure;
FIG. 4 shows a flowchart illustrating a method of semantic segmentation of a plurality of entities in an image in accordance with some embodiments of the present disclosure; FIGS. 5A - 5F show exemplary images illustrating segmentation of a cup region and a disc region from a fundus image in accordance with some embodiments of the present disclosure; and
FIG. 6 illustrates a block diagram of an exemplary computer system for implementing embodiments consistent with the present disclosure.
It should be appreciated by those skilled in the art that any block diagrams herein represent conceptual views of illustrative systems embodying the principles of the present subject matter. Similarly, it will be appreciated that any flow charts, flow diagrams, state transition diagrams, pseudo code, and the like represent various processes which may be substantially represented in computer readable medium and executed by a computer or processor, whether such computer or processor is explicitly shown.
DETAILED DESCRIPTION
In the present document, the word "exemplary" is used herein to mean "serving as an example, instance, or illustration." Any embodiment or implementation of the present subject matter described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
While the disclosure is susceptible to various modifications and alternative forms, specific embodiment thereof has been shown by way of example in the drawings and will be described in detail below. It should be understood, however that it is not intended to limit the disclosure to the specific forms disclosed, but on the contrary, the disclosure is to cover all modifications, equivalents, and alternative falling within the scope of the disclosure.
The terms“comprises”,“comprising”,“includes”, or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a setup, device, or method that comprises a list of components or steps does not include only those components or steps but may include other components or steps not expressly listed or inherent to such setup or device or method. In other words, one or more elements in a system or apparatus proceeded by“comprises... a” does not, without more constraints, preclude the existence of other elements or additional elements in the system or method. FIG. 1 illustrates an existing architecture for performing semantic segmentation of plurality of entities from an image. As an example, suppose the existing segmentation system 100 is used to segment a disc region and a cup region from a fundus image. In this scenario, since the cup region is located inside the disc region, it may be very difficult to segment both the cup and the disc region using a single deep learning model. Therefore, the existing segmentation system 100 may employ two different deep learning models that are trained separately to identify the cup region and the disc region. As an example, as shown in FIG. 1, the existing segmentation system 100 may contain plurality of training models 1031, 1032 ..., 103N, each trained to identify a particular region from the input image, i.e., the fundus image. For example, the deep learning model 1031 may be trained to identify Entity 1 (disc) in the input image A and may require a learning architecture specific to identification of entity 1. Similarly, the training model 1132 may be trained to identify the Entity 2 (cup) and may require a different learning architecture which is specific to identification of entity 2. Further, each of the deep learning models may require separate set of encoder-decoder network and encoder-decoder parameters for segmenting the plurality of entities from the input images. As a result, the computational complexity of the existing segmentation system is on a much higher side.
The present disclosure attempts to overcome the above limitations and discloses a method and a segmentation system for performing semantic segmentation of a plurality of entities in an image, using a single neural network model. In an embodiment, the present disclosure provides a multi-modal training approach for segmenting fine grained objects from an image.
In the following detailed description of the embodiments of the disclosure, reference is made to the accompanying drawings that form a part hereof, and in which are shown by way of illustration specific embodiments in which the disclosure may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the disclosure, and it is to be understood that other embodiments may be utilized and that changes may be made without departing from the scope of the present disclosure. The following description is, therefore, not to be taken in a limiting sense. FIG. 2 illustrates an exemplary architecture for performing semantic segmentation of a plurality of entities in an image in accordance with some embodiments of the present disclosure.
In an embodiment, the segmentation system 200 may be configured with a neural network 203, which is trained for segmenting a plurality of entities from an input image 201. As an example, the neural network 203 may be a Convolutional Neural Network (CNN) or any other deep learning and/or machine learning technique. In an implementation, a portion of the neural network 203, namely, a first model of the neural network 203A may be trained for predicting a Weak Region of Interest (WRoI) from the input image 201. As an example, the WRoI may be a region in the image, which comprises at least one smaller entity enclosed within a larger entity. In an embodiment, the input image 201 may be analyzed by first model of the neural network 203A identifying the WRoI in the input image 201.
Similarly, a second portion of the neural network 203, namely, second model of the neural network 203B may be trained for predicting and segmenting each of the plurality of entities, including a smaller entity enclosed within a larger entity, from the WRoI of the input image 201. That is, when a cropped portion 213 of the input image 201, comprising the WRoI, is analyzed by the second model of the neural network 203B, the second model may segment each of the plurality of entities from the input image 201.
In an embodiment, both the first model and the second model of the neural network 203B may comprise a plurality of encoders and decoders, encoder-decoder parameters and loss functions. In an embodiment, both the first model and the second model may use same encoder 205 and encoder parameters for identifying the WRoI and segmenting the plurality of entities from WRoI, respectively. On the other hand, the decoders (207, 215) and decoder parameters of the first model and the second model may be distinct, depending on the nature of the plurality of entities to be segmented from the input image 201.
In an embodiment, the input image 201 may be analyzed using the first model of the neural network 203A for identifying the WRoI from the input image 201. Thereafter, the segmentation system 200 may estimate a location of center of the WRoI for weakly predicting a portion of the larger entity within the WRoI. Subsequently, the segmentation system 200 may dynamically crop the portion of the larger entity from the WRoI. In an embodiment, upon cropping the portion of the larger entity from the WRoI, the cropped portion 213 of the larger entity is passed on to the second model of the neural network 203B for segmenting the at least one smaller entity enclosed within the larger entity. Thus, the segmentation system 200 performs segmentation of the plurality of entities in the input image 201 using a single neural network 203.
FIG.3 shows a detailed block diagram illustrating a segmentation system 200 in accordance with some embodiments of the present disclosure.
In an implementation, the segmentation system 200 may include an I/O Interface 301, a processor 303, and a memory 305. The I/O Interface 301 may be configured to receive an input image 201 from one or more sources associated with the segmentation system 200. As an example, the one or more sources of the input image 201 may include, without limiting to, an image capturing device, an image repository and the like. The processor 303 may be configured to perform one or more functions of the segmentation system 200 while segmenting a plurality of entities in the input image 201. In some implementation, the memory 305 may be communicatively coupled to the processor 303 and may store data 307 and other intermediate results generated when the input image 201 is being analyzed.
In some implementations, the segmentation system 200 may include one or more modules 309 for performing various operations in accordance with embodiments of the present disclosure. In an embodiment, the data 307 may be stored within the memory 305 and may include, without limiting to, input image 201, details of Weak Region of Interest (WRoI) 313, cropped portion 213 and other data 317.
In some embodiments, the data 307 may be stored within the memory 305 in the form of various data structures. Additionally, the data 307 may be organized using data models, such as relational or hierarchical data models. The other data 317 may include data and other temporary data and files generated by one or more modules 309 for performing various functions of the segmentation system 200.
In an embodiment, the input image 201 may be an image of a region of interest, which comprises a plurality of entities to be segmented. As an example, the input image 201 may be an image of fundus of an animal. Accordingly, the plurality of entities present in the input image 201 may include, without limiting to, at least a cup region of the fundus and a disc region of the fundus. In an embodiment, the input image 201 may be received from an image capturing device associated with the segmentation system 200. In an embodiment, one or more image pre-processing actions may be performed on the input image 201 before analyzing the input image 201. As an example, the one or more image pre-processing actions may include, without limiting to, image resizing, de-blurring and the like.
In an embodiment, the details of the WRoI 313 may include information related to the WRoI identified in the input image 201. The WRoI may be identified from the input image 201 by analyzing the input image 201 using a first model of a neural network configured in the segmentation system 200. In an embodiment, the WRoI may be a region comprising at least one smaller entity enclosed within a larger entity.
In an embodiment, the cropped portion 213 of the input image 201 may correspond to portion of the larger entity in the WRoI. In an implementation, the portion of the larger entity may be predicted from the WRoI by estimating location of a center of the WRoI in the input image 201. That is, the cropped portion 213 of the input image 201 may be equivalent to the portion of the larger entity, measured from the center of the WRoI.
In an embodiment, each of the data 307 stored in the segmentation system 200 may be processed by the modules 309 of the segmentation system 200. In one implementation, the modules 309 may be stored as a part of the processor 303. In another implementation, the modules 309 may be communicatively coupled to the processor 303 for performing one or more functions of the segmentation system 200. The modules 309 may include, without limiting to, an image acquiring module 319, a first model of the neural network 203A, a region estimation module 323, a cropping module 325, a second model of the neural network 203B and other modules 329.
As used herein, the term module refers to an Application Specific Integrated Circuit (ASIC), an electronic circuit, a processor (shared, dedicated, or group) and memory that execute one or more software or firmware programs, a combinational logic circuit, and/or other suitable components that provide the described functionality. In an embodiment, the other modules 329 may be used to perform various miscellaneous functionalities of the segmentation system 200. It will be appreciated that such modules 309 may be represented as a single module or a combination of different modules. In an embodiment, the image acquiring module 319 may be used for acquiring and/or receiving the input image 201 from one or more external sources. In an implementation, the image acquiring module 319 may be an image capturing device used for capturing images of a region of interest. In another implementation, the image acquiring module 319 may be communicatively associated with an external image capturing device for capturing/acquiring images of the region of interest. Further, the one or more external sources associated with the image acquiring module 319 may include an image database, which stores a plurality of images of the region of interest.
In an embodiment, the first model of the neural network 203A may be used for identifying the WRoI from the input image 201. In an implementation, the first model of the neural network 203A may correspond to a portion of the neural network 203, which is trained for identifying the WRoI from the input image 201. The first model of the neural network 203A may be trained using a set of encoders 205 and decoders (207, 215), encoder-decoder parameters and loss functions required for identifying the WRoI from the input image 201.
In an embodiment, the region estimation module 323 may be used for estimating a location of center of the WRoI identified by the first model of the neural network 203A. Further, the region estimation module 323 may use the location of the center of the WRoI for weakly predicting a portion of the larger entity in the WRoI. That is, the region estimation module 323 may identify a region covered by the larger entity within the WRoI.
In an embodiment, the cropping module 325 may be used for dynamically cropping the portion of the larger entity from the WRoI That is, the cropping module 325 ensures that only the portion of the larger entity, enclosing the at least one smaller entity, is extracted from the WRoI. In an embodiment, the cropping module 325 may use the location of the center of the WRoI as a reference point for cropping the portion of the larger entity.
In an embodiment, the second model of the neural network 203B may be used for analyzing the cropped portion 213 of the larger entity and segmenting each of the at least one smaller entities from the cropped portion 213 of the larger entity. In an implementation, the second model of the neural network 203B may correspond to a second portion of the neural network 203, which is trained for segmenting the plurality of entities in the input image 201. The second model of the neural network 203B may be trained using encoders 205 and decoders (207, 215), encoder-decoder parameters and loss functions required for identifying and segmenting the plurality of entities from the input image 201. In an implementation, same set of the encoder 205 and the encoder parameters may be used for training the first model and the second model of the neural network 203B. That is, the first model may identify the WRoI by analyzing the input image 201 using a first set of encoders 205 and encoder parameters and a first set of decoder 207 and decoder parameters. Thereafter, the second model may segment each of the plurality of entities from the cropped portion 213 of the WRoI using the first set of encoders 205 and related parameters (used by the first model) and a second set of decoders 215 and related parameters, which are different from the first set of decoders 207 and related parameters used by the first model.
That is, the segmentation system 200 uses a single trained neural network 203 for segmentation of the plurality of entities in the input image 201 by configuring specific portions of the trained neural network 203 to perform specific actions. Consequently, the operation of the segmentation system 200 may be computationally less complex, since a single trained neural network 203 is used for identifying the plurality of entities. Additionally, the computational complexity of the segmentation system 200 may be further reduced since common encoder 205 and encoder parameters are used by both the first model and the second model of the neural network 203.
FIG. 4 shows a flowchart illustrating a method of semantic segmentation of a plurality of entities in accordance with some embodiments of the present disclosure.
As illustrated in FIG. 4, the method 400 includes one or more blocks illustrating a method of semantic segmentation of a plurality of entities in an image using a segmentation system 200, for example, the segmentation system 200 shown in FIG. 2. The method 400 may be described in the general context of computer executable instructions. Generally, computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, and functions, which perform specific functions or implement specific abstract data types.
The order in which the method 400 is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method. Additionally, individual blocks may be deleted from the methods without departing from the spirit and scope of the subject matter described herein. Furthermore, the method can be implemented in any suitable hardware, software, firmware, or combination thereof.
At block 401, the method 400 includes identifying, by the segmentation system 200, a Weak Region of Interest (WRoI) in the image by analyzing the image using a first model 203A of a neural network trained for predicting the WRoI. In an embodiment, the WRoI may include, without limiting to, at least one smaller entity enclosed within a larger entity.
At block 403, the method 400 includes estimating, by the segmentation system 200, a location of center of the WRoI for weakly predicting a portion of the larger entity.
At block 405, the method 400 includes dynamically cropping, by the segmentation system 200, the portion of the larger entity from the WRoI. In an embodiment, dynamically cropping the portion of the larger entity may be based on a predetermined cropping factor dynamically selected based on size of the image.
At block 407, the method 400 includes analyzing, by the segmentation system 200, the cropped portion 213 of the larger entity using a second model 203B of the neural network 203 trained for segmenting each of the at least one smaller entity and the larger entity, thereby segmenting each of the plurality of entities in the image. In an embodiment, segmenting the at least one smaller entity and the larger entity may include extracting the portion of the larger entity using a first set of encoders 205 and a first set of decoders 207 corresponding to the second model of the neural network 203B. Further, the segmenting may include extracting region of the at least one smaller entity from the portion of the larger entity using the first set of encoders 205 and a second set of decoders 215 corresponding to the second model of the neural network 203B. In an embodiment, the first set of decoders 207 may be different from the second set of decoders 215.
In an implementation, the neural network 203 may include, without limiting to, encoders 205, decoders 207 and encoder-decoder parameters corresponding to the first model and the second model of the neural network 203B.
Exemplary embodiment:
FIGS. 5A - 5F show exemplary images illustrating segmentation of a cup region and a disc region from a fundus image in accordance with some embodiments of the present disclosure. FIG. 5A shows an exemplary fundus image comprising a cup region and a disc region (encircled regions in the center of FIG. 5A), which needs to be segmented for further analysis of the fundus. In FIG. 5 A, the outer region and/or the larger region may be identified as the disc region of the fundus. Further, the inner region and/or the smaller region enclosed by the disc region may be identified as the cup region of the fundus. In an embodiment, segmentation of the cup region and the disc region may be necessary for examining one or more anomalies in the fundus.
FIG. 5B shows a pre-processed version of the input image of FIG. 5 A. As an example, one of the pre-processing action performed on the input image of FIG. 5A may be resizing of the input image into a predetermined resolution/size. As an example, the predetermined resolution may be 512 * 512 pixels. Thus, FIG. 5B shows a resized version of the input image in FIG. 5A.
FIG. 5C shows a Weak Region of Interest (WRoI) identified from the pre-processed input image in FIG. 5B. That is, upon pre-processing the input image, the pre-processed image may be passed on to the first model of the neural network 203A, which analyzes the input image of FIG. 5B and identifies the WRoI from the input image, as shown in FIG. 5C. Here, it may be observed that the identified WRoI corresponds to a combined region comprising both the larger entity (i.e., the disc region) and the at least one smaller entity (i.e., the cup region) enclosed within the larger entity.
FIG. 5D shows a cropped portion of the input image, which is obtained by extracting the identified WRoI from the input image of FIG. 5B. As an example, the cropped portion of the WRoI may correspond to the portion of the disc region in the fundus image. In an embodiment, the cropped portion may be obtained by cropping the disc region from the WRoI with reference to location of a center of the WRoI. In an embodiment, the location of the center of the WRoI may be estimated using equations (1) and (2) below:
Figure imgf000014_0001
Figure imgf000015_0002
Wherein,
The matrix“^d-ise ( diseN l € 0, 1)” represents the disk mask prediction in the WRoI;
‘N’ = 512 (the number of pixels across the WRoI);
Figure imgf000015_0001
respectively represent row and column values of the cropped
M·ί .„
center in the cropped matrix.
In an embodiment, once the location of the center of the WRoI is estimated, a 256 * 256 region may be cropped from the identified WRoI with reference to the location of the center of the WRoI. Thus, the cropped portion may be of the size 256 * 256.
FIG. 5E shows the segmented disc and cup region from the cropped portion shown in FIG. 5D. As an example, the bright/white region in the center of the image FIG. 5D may correspond to the cup region of the fundus. Similarly, the outer shaded region, enclosing the cup region, may correspond to the disc region of the fundus. In an embodiment, the cup region and the disc region, as shown in FIG. 5E, may be obtained by passing the cropped image of FIG. 5D through the second model of the neural network 203B.
FIG. 5F shows final outcome of the segmentation. The outcome of FIG. 5F may be obtained by remapping the segmented cup and disc regions with the original input image of the fundus shown in FIG. 5B. Thus, the outcome of the segmentation system 200 may be the segmented regions corresponding to each of the plurality of entities comprised in the input image.
Computer System: FIG. 6 illustrates a block diagram of an exemplary computer system 600 for implementing embodiments consistent with the present disclosure. In an embodiment, the computer system 600 may be segmentation system 200 shown in FIG. 2, which may be used for segmenting a plurality of entities from an input image 201. The computer system 600 may include a central processing unit (“CPU” or“processor”) 602. The processor 602 may comprise at least one data processor for executing program components for executing user- or system-generated business processes. A user may include a person, an analyst or a user of the segmentation system 200, and the like. The processor 602 may include specialized processing units such as integrated system (bus) controllers, memory management control units, floating point units, graphics processing units, digital signal processing units, etc.
The processor 602 may be disposed in communication with one or more input/output (I/O) devices (611 and 612) via I/O interface 601. The I/O interface 601 may employ communication protocols/methods such as, without limitation, audio, analog, digital, stereo, IEEE- 1394, serial bus, Universal Serial Bus (USB), infrared, PS/2, BNC, coaxial, component, composite, Digital Visual Interface (DVI), high-definition multimedia interface (HDMI), Radio Frequency (RF) antennas, S-Video, Video Graphics Array (VGA), IEEE 802. n /b/g/n/x, Bluetooth, cellular (e.g., Code-Division Multiple Access (CDMA), High- Speed Packet Access (HSPA+), Global System For Mobile Communications (GSM), Long- Term Evolution (LTE) or the like), etc. Using the EO interface 601, the computer system 600 may communicate with one or more EO devices 611 and 612. In some implementations, the EO interface 601 may also be used to connect to one or more data sources for receiving the input image 201.
In some embodiments, the processor 602 may be disposed in communication with a communication network 609 via a network interface 603. The network interface 603 may communicate with the communication network 609. The network interface 603 may employ connection protocols including, without limitation, direct connect, Ethernet (e.g., twisted pair 10/100/1000 Base T), Transmission Control Protocol/Internet Protocol (TCP/IP), token ring, IEEE 802. l la/b/g/n/x, etc. Using the network interface 603 and the communication network 609, the computer system 600 may receive the input image 201 from one or more external and/or internal sources of the input image 201. In an implementation, the communication network 609 can be implemented as one of the several types of networks, such as intranet or Local Area Network (LAN) and such within the organization. The communication network 609 may either be a dedicated network or a shared network, which represents an association of several types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), Wireless Application Protocol (WAP), etc., to communicate with each other. Further, the communication network 609 may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices, etc.
In some embodiments, the processor 602 may be disposed in communication with a memory 605 (e.g., RAM 613, ROM 614, etc. as shown in FIG. 6) via a storage interface 604. The storage interface 604 may connect to memory 605 including, without limitation, memory drives, removable disc drives, etc., employing connection protocols such as Serial Advanced Technology Attachment (SATA), Integrated Drive Electronics (IDE), IEEE- 1394, Universal Serial Bus (USB), fiber channel, Small Computer Systems Interface (SCSI), etc. The memory drives may further include a drum, magnetic disc drive, magneto-optical drive, optical drive, Redundant Array of Independent Discs (RAID), solid-state memory devices, solid-state drives, etc.
The memory 605 may store a collection of program or database components, including, without limitation, user/application interface 606, an operating system 607, a web browser 608, and the like. In some embodiments, computer system 600 may store user/application data 606, such as the data, variables, records, etc. as described in this invention. Such databases may be implemented as fault-tolerant, relational, scalable, secure databases such as Oracle® or Sybase®.
The operating system 607 may facilitate resource management and operation of the computer system 600. Examples of operating systems include, without limitation, APPLE® MACINTOSH® OS X®, UNIX®, UNIX-like system distributions (E.G., BERKELEY SOFTWARE DISTRIBUTION® (BSD), FREEBSD®, NETBSD®, OPENBSD, etc ), LINUX® DISTRIBUTIONS (E.G, RED HAT®, UBUNTU®, KUBUNTU®, etc ), IBM® OS/2®, MICROSOFT® WINDOWS® (XP®, VISTA®/7/8, 10 etc.), APPLE® IOS®, GOOGLE™ ANDROID™, BLACKBERRY® OS , or the like. The user interface 606 may facilitate display, execution, interaction, manipulation, or operation of program components through textual or graphical facilities. For example, the user interface 606 may provide computer interaction interface elements on a display system operatively connected to the computer system 600, such as cursors, icons, check boxes, menus, scrollers, windows, widgets, and the like. Further, Graphical User Interfaces (GUIs) may be employed, including, without limitation, APPLE® MACINTOSH® operating systems’ Aqua®, IBM® OS/2®, MICROSOFT® WINDOWS® (e.g., Aero, Metro, etc ), web interface libraries (e.g., ActiveX®, IAVA®, IAVASCRIPT®, AJAX, HTML, ADOBE® FLASH®, etc.), or the like.
The web browser 608 may be a hypertext viewing application. Secure web browsing may be provided using Secure Hypertext Transport Protocol (HTTPS), Secure Sockets Layer (SSL), Transport Layer Security (TLS), and the like. The web browsers 608 may utilize facilities such as AJAX, DHTML, ADOBE® FLASH®, JAVASCRIPT®, JAVA®, Application Programming Interfaces (APIs), and the like. Further, the computer system 600 may implement a mail server stored program component. The mail server may utilize facilities such as ASP, ACTIVEX®, ANSI® C++/C#, MICROSOFT®, .NET, CGI SCRIPTS, JAVA®, JAVASCRIPT®, PERL®, PHP, PYTHON®, WEBOBJECTS®, etc The mail server may utilize communication protocols such as Internet Message Access Protocol (IMAP), Messaging Application Programming Interface (MAPI), MICROSOFT® exchange, Post Office Protocol (POP), Simple Mail Transfer Protocol (SMTP), or the like. In some embodiments, the computer system 600 may implement a mail client stored program component. The mail client may be a mail viewing application, such as APPLE® MAIL, MICROSOFT® ENTOURAGE®, MICROSOFT® OUTLOOK®, MOZILLA® THUNDERBIRD®, and the like.
Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present invention. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term“computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, that is, non-transitory. Examples include Random Access Memory (RAM), Read-Only Memory (ROM), volatile memory, nonvolatile memory, hard drives, Compact Disc (CD) ROMs, Digital Video Disc (DVDs), flash drives, disks, and any other known physical storage media.
Advantages of the embodiment of the present disclosure are illustrated herein.
In an embodiment, the method of present disclosure segments multiple entities in an image using a weakly predicted portion of a larger entity in the image, thereby reducing the computational complexity involved in segmentation of the plurality of entities.
In an embodiment, the method of present disclosure uses a single neural network for predicting the plurality of entities in the image. Thus, the method of present disclosure eliminates requirement of training and using multiple learning models for segmenting the plurality of entities in the region. Thus, the method of present disclosure is computationally less complex and makes best utilization of the computational resources.
In an embodiment, the method of present disclosure may be used to segment any number of entities from the input image. Thus, the method of present disclosure offers high scalability with no or minor additional computational requirements.
The terms "an embodiment", "embodiment", "embodiments", "the embodiment", "the embodiments", "one or more embodiments", "some embodiments", and "one embodiment" mean "one or more (but not all) embodiments of the invention(s)" unless expressly specified otherwise.
The terms "including", "comprising",“having” and variations thereof mean "including but not limited to", unless expressly specified otherwise.
The enumerated listing of items does not imply that any or all the items are mutually exclusive, unless expressly specified otherwise. The terms "a", "an" and "the" mean "one or more", unless expressly specified otherwise.
A description of an embodiment with several components in communication with each other does not imply that all such components are required. On the contrary, a variety of optional components are described to illustrate the wide variety of possible embodiments of the invention. When a single device or article is described herein, it will be clear that more than one device/article (whether they cooperate) may be used in place of a single device/article. Similarly, where more than one device or article is described herein (whether they cooperate), it will be clear that a single device/article may be used in place of the more than one device or article or a different number of devices/articles may be used instead of the shown number of devices or programs. The functionality and/or the features of a device may be alternatively embodied by one or more other devices which are not explicitly described as having such functionality/features. Thus, other embodiments of the invention need not include the device itself.
Finally, the language used in the specification has been principally selected for readability and instructional purposes, and it may not have been selected to delineate or circumscribe the inventive subject matter. It is therefore intended that the scope of the invention be limited not by this detailed description, but rather by any claims that issue on an application based here on. Accordingly, the embodiments of the present invention are intended to be illustrative, but not limiting, of the scope of the invention, which is set forth in the following claims. While various aspects and embodiments have been disclosed herein, other aspects and embodiments will be apparent to those skilled in the art. The various aspects and embodiments disclosed herein are for purposes of illustration and are not intended to be limiting, with the true scope and spirit being indicated by the following claims.
Referral Numerals:
Figure imgf000021_0001
Figure imgf000022_0001

Claims

CLAIMS:
1. A method of semantic segmentation of a plurality of entities in an image, the method comprising:
identifying, by a segmentation system, a Weak Region of Interest (WRoI), comprising at least one smaller entity enclosed within a larger entity, in the image by analyzing the image using a first model of a neural network, trained for predicting the WRoI;
estimating, by the segmentation system, a location of center of the WRoI for weakly predicting a portion of the larger entity;
dynamically cropping, by the segmentation system, the portion of the larger entity from the WRoI; and
analyzing, by the segmentation system, the cropped portion of the larger entity using a second model of the neural network, trained for segmenting each of the at least one smaller entity and the larger entity, thereby segmenting each of the plurality of entities in the image.
2. The method as claimed in claim 1, wherein the neural network comprises encoders, decoders and encoder-decoder parameters corresponding to the first model and the second model of the neural network.
3. The method as claimed in claim 1, wherein dynamically cropping the portion of the larger entity is based on a predetermined cropping factor dynamically selected based on size of the image.
4. The method as claimed in claim 1, wherein segmenting the at least one smaller entity and the larger entity comprises:
extracting the portion of the larger entity using a first set of encoders and a first set of decoders corresponding to the second model of the neural network; and extracting region of the at least one smaller entity from the portion of the larger entity using the first set of encoders and a second set of decoders corresponding to the second model of the neural network, wherein the first set of decoders is different from the second set of decoders.
5. The method as claimed in claim 1 further comprises mapping each of the plurality of segmented entities with the image for generating segmentation results, indicating location of each of the plurality of entities within the image.
6. A segmentation system for performing semantic segmentation of a plurality of entities in an image, the segmentation system comprises:
a processor; and
a memory, communicatively coupled to the processor, wherein the memory stores processor-executable instructions, which on execution, cause the processor to:
identify a Weak Region of Interest (WRoI), comprising at least one smaller entity enclosed within a larger entity, in the image by analyzing the image using a first model of a neural network, trained for predicting the WRoI;
estimate a location of center of the WRoI for weakly predicting a portion of the larger entity;
dynamically crop the portion of the larger entity from the WRoI; and analyze the cropped portion of the WRoI using a second model of the neural network, trained for segmenting each of the at least one smaller entity and the larger entity, thereby segmenting each of the plurality of entities in the image.
7. The segmentation system as claimed in claim 6, wherein the neural network comprises encoders, decoders and encoder-decoder parameters corresponding to the first model and the second model of the neural network.
8. The segmentation system as claimed in claim 6, wherein dynamically cropping the portion of the larger entity is based on a predetermined cropping factor dynamically selected based on size of the image.
9. The segmentation system as claimed in claim 6, wherein the processor segments the at least one smaller entity and the larger entity by:
extracting the portion of the larger entity using a first set of encoders and a first set of decoders corresponding to the second model of the neural network; and extracting region of the at least one smaller entity from the portion of the larger entity using the first set of encoders and a second set of decoders corresponding to the second model of the neural network, wherein the first set of decoders is different from the second set of decoders.
10. The segmentation system as claimed in claim 6, wherein the processor is further configured to map each of the plurality of segmented entities with the image to generate segmentation results, indicating location of each of the plurality of entities within the image.
PCT/IN2019/050528 2018-10-16 2019-07-16 Method and system for performing semantic segmentation of plurality of entities in an image WO2020079704A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN201841039266 2018-10-16
IN201841039266A IN201841039266A (en) 2018-10-16 2019-07-16

Publications (1)

Publication Number Publication Date
WO2020079704A1 true WO2020079704A1 (en) 2020-04-23

Family

ID=70283743

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IN2019/050528 WO2020079704A1 (en) 2018-10-16 2019-07-16 Method and system for performing semantic segmentation of plurality of entities in an image

Country Status (2)

Country Link
IN (1) IN201841039266A (en)
WO (1) WO2020079704A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111640116A (en) * 2020-05-29 2020-09-08 广西大学 Aerial photography graph building segmentation method and device based on deep convolutional residual error network
CN113768461A (en) * 2021-09-14 2021-12-10 北京鹰瞳科技发展股份有限公司 Fundus image analysis method and system and electronic equipment
WO2024012251A1 (en) * 2022-07-11 2024-01-18 北京字跳网络技术有限公司 Semantic segmentation model training method and apparatus, and electronic device and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021226900A1 (en) * 2020-05-14 2021-11-18 安徽中科智能感知产业技术研究院有限责任公司 Cotton crop row detection method and apparatus based on computer vision, and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8175368B2 (en) * 2005-04-05 2012-05-08 Scimed Life Systems, Inc. Systems and methods for image segmentation with a multi-state classifier
US20140363052A1 (en) * 2013-06-06 2014-12-11 Xerox Corporation Adaptive character segmentation method and system for automated license plate recognition

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8175368B2 (en) * 2005-04-05 2012-05-08 Scimed Life Systems, Inc. Systems and methods for image segmentation with a multi-state classifier
US20140363052A1 (en) * 2013-06-06 2014-12-11 Xerox Corporation Adaptive character segmentation method and system for automated license plate recognition

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111640116A (en) * 2020-05-29 2020-09-08 广西大学 Aerial photography graph building segmentation method and device based on deep convolutional residual error network
CN111640116B (en) * 2020-05-29 2023-04-18 广西大学 Aerial photography graph building segmentation method and device based on deep convolutional residual error network
CN113768461A (en) * 2021-09-14 2021-12-10 北京鹰瞳科技发展股份有限公司 Fundus image analysis method and system and electronic equipment
WO2024012251A1 (en) * 2022-07-11 2024-01-18 北京字跳网络技术有限公司 Semantic segmentation model training method and apparatus, and electronic device and storage medium

Also Published As

Publication number Publication date
IN201841039266A (en) 2019-06-28

Similar Documents

Publication Publication Date Title
WO2020079704A1 (en) Method and system for performing semantic segmentation of plurality of entities in an image
US9554085B2 (en) Method and device for dynamically controlling quality of a video
US10846525B2 (en) Method and system for identifying cell region of table comprising cell borders from image document
EP3678088B1 (en) Method and system of convolution in neural network with variable dilation rate
US10678848B2 (en) Method and a system for recognition of data in one or more images
EP3355201A1 (en) A method and system for establishing a relationship between a plurality of user interface elements
EP3156966A1 (en) Method and device for generating panoramic images with real-time annotations
US10769472B2 (en) Method and system counting plurality of objects placed in a region
US10297056B2 (en) Method and system for remotely annotating an object in real-time
EP3906511A1 (en) Method and device for identifying machine learning models for detecting entities
EP3355240B1 (en) A method and a system for generating a multi-level classifier for image processing
US20170213326A1 (en) Method and system for processing an image extracted from a document
US10410340B2 (en) Method and system for marking content on the surface of an object using laser
US11340439B2 (en) Method and system for auto focusing a microscopic imaging system
US20230334656A1 (en) Method and system for identifying abnormal images in a set of medical images
US11269172B2 (en) Method and system for reconstructing a field of view
US10929992B2 (en) Method and system for rendering augmented reality (AR) content for textureless objects
EP3054669A1 (en) Method and device for assisting a user to capture images
EP3110133A1 (en) Systems and method for performing real-time image vectorization
US11187644B2 (en) Method and system for determining total count of red blood cells in peripheral blood smear
US10699417B2 (en) Method and system for acquisition of optimal images of object in multi-layer sample
WO2019229510A1 (en) Method and system for performing hierarchical classification of objects in microscopic image
US20210201031A1 (en) Method and system for providing augmented reality information for an object
WO2019220192A1 (en) A method and system for dynamically generating medical reports
US20200211192A1 (en) Method and system for generating a structure map for retinal images

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19874107

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19874107

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19874107

Country of ref document: EP

Kind code of ref document: A1