WO2020079704A1

WO2020079704A1 - Method and system for performing semantic segmentation of plurality of entities in an image

Info

Publication number: WO2020079704A1
Application number: PCT/IN2019/050528
Authority: WO
Inventors: Shivam Shah; Harshit Pande
Original assignee: Sigtuple Technologies Private Limited
Priority date: 2018-10-16
Filing date: 2019-07-16
Publication date: 2020-04-23
Also published as: IN201841039266A

Abstract

The present disclosure relates to a method and system for performing semantic segmentation of plurality of entities in an image. In an embodiment, the present disclosure discloses identifying a Weak Region of Interest (WRoI) comprising at least one smaller entity enclosed within a larger entity using a first model of a neural network. Further, a location of center of the WRoI is estimated for weakly predicting a portion of the larger entity. Thereafter, the portion of the larger entity is dynamically cropped from the WRoI and analyzed using a second model of the neural network for segmenting each of the plurality of entities in the image. In an embodiment, the present disclosure helps in segmenting multiple entities in an image using a weakly predicted portion of a larger entity in the image, thereby reducing the computational complexity involved in segmentation of the plurality of entities.

Description

METHOD AND SYSTEM FOR PERFORMING SEMANTIC SEGMENTATION OF PLURALITY OF ENTITIES IN AN IMAGE

The following specification particularly describes the invention and the manner in which it is to be performed.

TECHNICAL FIELD

The present subject matter is, in general, related to image segmentation and more particularly, but not exclusively, to method and system for performing semantic segmentation of a plurality of entities in an image.

BACKGROUND

Medical images are used for analyzing medical condition of a subject. Automatic screening of the medical images helps in identifying objects present in the medical images or in identifying any abnormal condition. As an example, screening of medical images of a blood/urine smear helps in identifying blood platelets or bacteria present in blood/urine sample. Similarly, medical image obtained from fundus camera helps in analyzing cup or disc in the fundus image. Generally, the screening of medical images is performed by a process of segmentation. There exist several segmentation techniques for segmenting the medical images.

One of the segmentation techniques that is largely in use to analyze the medical images is semantic segmentation. However, the semantic segmentation has a common problem of data skewness, since each pixel is treated as a data point during the semantic segmentation. Further, in the semantic segmentation, it is a challenge to segment multiple sub-entities belonging to an object, since the data points are in minute ratio compared to that of background pixels in the medical images. Thus, any trivial approach would fail to segment multiple entities using a single trained neural network model and would require multiple trained models to segment these entities.

Particularly, in case of fundus photographs, an ophthalmologist may be required to evaluate cup-to-disc ratio, as well as characteristic configuration for disc-rim thickness to make various analysis. Both of these parameters may be deduced by accurately segmenting the cup and the disc regions in the fundus photographs. However, the semantic segmentation approach may fail to produce desired results when working with fundus photographs obtained from multiple sources/cameras as the data points would differ depending on the source/camera being used. Further, since the cup region is located inside the disc, the semantic segmentation approach may require multiple trained models for segmenting both the cup and the disc regions. Consequently, the semantic segmentation approach involved in segmentation of multiple entities would be computationally complex as it involves multiple training models, multiple deep learning architectures and numerous encoder- decoder parameters.

SUMMARY

One or more shortcomings of the prior art may be overcome, and additional advantages may be provided through the present disclosure. Additional features and advantages may be realized through the techniques of the present disclosure. Other embodiments and aspects of the disclosure are described in detail herein and are considered a part of the claimed disclosure.

Disclosed herein is a method for semantic segmentation of a plurality of entities in an image. The method comprises identifying, by a segmentation system, a Weak Region of Interest (WRoI) comprising at least one smaller entity enclosed within a larger entity, in the image by analyzing the image using a first model of a neural network trained for predicting the WRoI. Upon identifying, the method comprises estimating a location of center of the WRoI for weakly predicting a portion of the larger entity. Thereafter, the method comprises dynamically cropping the portion of the larger entity from the WRoI. Finally, the method comprises analyzing the cropped portion of the larger entity using a second model of the neural network, trained for segmenting each of the at least one smaller entity and the larger entity, thereby segmenting each of the plurality of entities in the image.

Further, the present disclosure relates to a segmentation system for performing semantic segmentation of a plurality of entities in an image. The segmentation system comprises a processor and a memory. The memory is communicatively coupled to the processor and stores processor-executable instructions, which on execution, cause the processor to identify a Weak Region of Interest (WRoI), comprising at least one smaller entity enclosed within a larger entity, in the image by analyzing the image using a first model of a neural network trained for predicting the WRoI. Further, the instructions cause the processor to estimate a location of center of the WRoI for weakly predicting a portion of the larger entity. Thereafter, the instructions cause the processor to dynamically crop the portion of the larger entity from the WRoI. Finally, the instructions cause the processor to analyze the cropped portion of the WRoI using a second model of the neural network, trained for segmenting each of the at least one smaller entity and the larger entity, thereby segmenting each of the plurality of entities in the image.

The foregoing summary is illustrative only and is not intended to be in any way limiting. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features will become apparent by reference to the drawings and the following detailed description.

BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS

The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, explain the disclosed principles. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the figures to reference like features and components. Some embodiments of system and/or methods in accordance with embodiments of the present subject matter are now described, by way of example only, and regarding the accompanying figures, in which:

FIG. 1 (Prior art) illustrates an exemplary architecture for performing semantic segmentation of a plurality of entities in an image in accordance with an existing method;

FIG. 2 illustrates an exemplary architecture for performing semantic segmentation of a plurality of entities in an image in accordance with some embodiments of the present disclosure;

FIG. 3 shows a detailed block diagram illustrating a segmentation system in accordance with some embodiments of the present disclosure;

FIG. 4 shows a flowchart illustrating a method of semantic segmentation of a plurality of entities in an image in accordance with some embodiments of the present disclosure; FIGS. 5A - 5F show exemplary images illustrating segmentation of a cup region and a disc region from a fundus image in accordance with some embodiments of the present disclosure; and

FIG. 6 illustrates a block diagram of an exemplary computer system for implementing embodiments consistent with the present disclosure.

It should be appreciated by those skilled in the art that any block diagrams herein represent conceptual views of illustrative systems embodying the principles of the present subject matter. Similarly, it will be appreciated that any flow charts, flow diagrams, state transition diagrams, pseudo code, and the like represent various processes which may be substantially represented in computer readable medium and executed by a computer or processor, whether such computer or processor is explicitly shown.

DETAILED DESCRIPTION

In the present document, the word "exemplary" is used herein to mean "serving as an example, instance, or illustration." Any embodiment or implementation of the present subject matter described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

While the disclosure is susceptible to various modifications and alternative forms, specific embodiment thereof has been shown by way of example in the drawings and will be described in detail below. It should be understood, however that it is not intended to limit the disclosure to the specific forms disclosed, but on the contrary, the disclosure is to cover all modifications, equivalents, and alternative falling within the scope of the disclosure.

The terms“comprises”,“comprising”,“includes”, or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a setup, device, or method that comprises a list of components or steps does not include only those components or steps but may include other components or steps not expressly listed or inherent to such setup or device or method. In other words, one or more elements in a system or apparatus proceeded by“comprises... a” does not, without more constraints, preclude the existence of other elements or additional elements in the system or method. FIG. 1 illustrates an existing architecture for performing semantic segmentation of plurality of entities from an image. As an example, suppose the existing segmentation system 100 is used to segment a disc region and a cup region from a fundus image. In this scenario, since the cup region is located inside the disc region, it may be very difficult to segment both the cup and the disc region using a single deep learning model. Therefore, the existing segmentation system 100 may employ two different deep learning models that are trained separately to identify the cup region and the disc region. As an example, as shown in FIG. 1, the existing segmentation system 100 may contain plurality of training models 1031, 1032 ..., 103N, each trained to identify a particular region from the input image, i.e., the fundus image. For example, the deep learning model 1031 may be trained to identify Entity 1 (disc) in the input image A and may require a learning architecture specific to identification of entity 1. Similarly, the training model 113₂ may be trained to identify the Entity 2 (cup) and may require a different learning architecture which is specific to identification of entity 2. Further, each of the deep learning models may require separate set of encoder-decoder network and encoder-decoder parameters for segmenting the plurality of entities from the input images. As a result, the computational complexity of the existing segmentation system is on a much higher side.

The present disclosure attempts to overcome the above limitations and discloses a method and a segmentation system for performing semantic segmentation of a plurality of entities in an image, using a single neural network model. In an embodiment, the present disclosure provides a multi-modal training approach for segmenting fine grained objects from an image.

In the following detailed description of the embodiments of the disclosure, reference is made to the accompanying drawings that form a part hereof, and in which are shown by way of illustration specific embodiments in which the disclosure may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the disclosure, and it is to be understood that other embodiments may be utilized and that changes may be made without departing from the scope of the present disclosure. The following description is, therefore, not to be taken in a limiting sense. FIG. 2 illustrates an exemplary architecture for performing semantic segmentation of a plurality of entities in an image in accordance with some embodiments of the present disclosure.

In an embodiment, the segmentation system 200 may be configured with a neural network 203, which is trained for segmenting a plurality of entities from an input image 201. As an example, the neural network 203 may be a Convolutional Neural Network (CNN) or any other deep learning and/or machine learning technique. In an implementation, a portion of the neural network 203, namely, a first model of the neural network 203A may be trained for predicting a Weak Region of Interest (WRoI) from the input image 201. As an example, the WRoI may be a region in the image, which comprises at least one smaller entity enclosed within a larger entity. In an embodiment, the input image 201 may be analyzed by first model of the neural network 203A identifying the WRoI in the input image 201.

Similarly, a second portion of the neural network 203, namely, second model of the neural network 203B may be trained for predicting and segmenting each of the plurality of entities, including a smaller entity enclosed within a larger entity, from the WRoI of the input image 201. That is, when a cropped portion 213 of the input image 201, comprising the WRoI, is analyzed by the second model of the neural network 203B, the second model may segment each of the plurality of entities from the input image 201.

In an embodiment, both the first model and the second model of the neural network 203B may comprise a plurality of encoders and decoders, encoder-decoder parameters and loss functions. In an embodiment, both the first model and the second model may use same encoder 205 and encoder parameters for identifying the WRoI and segmenting the plurality of entities from WRoI, respectively. On the other hand, the decoders (207, 215) and decoder parameters of the first model and the second model may be distinct, depending on the nature of the plurality of entities to be segmented from the input image 201.

In an embodiment, the input image 201 may be analyzed using the first model of the neural network 203A for identifying the WRoI from the input image 201. Thereafter, the segmentation system 200 may estimate a location of center of the WRoI for weakly predicting a portion of the larger entity within the WRoI. Subsequently, the segmentation system 200 may dynamically crop the portion of the larger entity from the WRoI. In an embodiment, upon cropping the portion of the larger entity from the WRoI, the cropped portion 213 of the larger entity is passed on to the second model of the neural network 203B for segmenting the at least one smaller entity enclosed within the larger entity. Thus, the segmentation system 200 performs segmentation of the plurality of entities in the input image 201 using a single neural network 203.

FIG.3 shows a detailed block diagram illustrating a segmentation system 200 in accordance with some embodiments of the present disclosure.

In an implementation, the segmentation system 200 may include an I/O Interface 301, a processor 303, and a memory 305. The I/O Interface 301 may be configured to receive an input image 201 from one or more sources associated with the segmentation system 200. As an example, the one or more sources of the input image 201 may include, without limiting to, an image capturing device, an image repository and the like. The processor 303 may be configured to perform one or more functions of the segmentation system 200 while segmenting a plurality of entities in the input image 201. In some implementation, the memory 305 may be communicatively coupled to the processor 303 and may store data 307 and other intermediate results generated when the input image 201 is being analyzed.

In some implementations, the segmentation system 200 may include one or more modules 309 for performing various operations in accordance with embodiments of the present disclosure. In an embodiment, the data 307 may be stored within the memory 305 and may include, without limiting to, input image 201, details of Weak Region of Interest (WRoI) 313, cropped portion 213 and other data 317.

In some embodiments, the data 307 may be stored within the memory 305 in the form of various data structures. Additionally, the data 307 may be organized using data models, such as relational or hierarchical data models. The other data 317 may include data and other temporary data and files generated by one or more modules 309 for performing various functions of the segmentation system 200.

In an embodiment, the input image 201 may be an image of a region of interest, which comprises a plurality of entities to be segmented. As an example, the input image 201 may be an image of fundus of an animal. Accordingly, the plurality of entities present in the input image 201 may include, without limiting to, at least a cup region of the fundus and a disc region of the fundus. In an embodiment, the input image 201 may be received from an image capturing device associated with the segmentation system 200. In an embodiment, one or more image pre-processing actions may be performed on the input image 201 before analyzing the input image 201. As an example, the one or more image pre-processing actions may include, without limiting to, image resizing, de-blurring and the like.

In an embodiment, the details of the WRoI 313 may include information related to the WRoI identified in the input image 201. The WRoI may be identified from the input image 201 by analyzing the input image 201 using a first model of a neural network configured in the segmentation system 200. In an embodiment, the WRoI may be a region comprising at least one smaller entity enclosed within a larger entity.

In an embodiment, the cropped portion 213 of the input image 201 may correspond to portion of the larger entity in the WRoI. In an implementation, the portion of the larger entity may be predicted from the WRoI by estimating location of a center of the WRoI in the input image 201. That is, the cropped portion 213 of the input image 201 may be equivalent to the portion of the larger entity, measured from the center of the WRoI.

In an embodiment, each of the data 307 stored in the segmentation system 200 may be processed by the modules 309 of the segmentation system 200. In one implementation, the modules 309 may be stored as a part of the processor 303. In another implementation, the modules 309 may be communicatively coupled to the processor 303 for performing one or more functions of the segmentation system 200. The modules 309 may include, without limiting to, an image acquiring module 319, a first model of the neural network 203A, a region estimation module 323, a cropping module 325, a second model of the neural network 203B and other modules 329.

As used herein, the term module refers to an Application Specific Integrated Circuit (ASIC), an electronic circuit, a processor (shared, dedicated, or group) and memory that execute one or more software or firmware programs, a combinational logic circuit, and/or other suitable components that provide the described functionality. In an embodiment, the other modules 329 may be used to perform various miscellaneous functionalities of the segmentation system 200. It will be appreciated that such modules 309 may be represented as a single module or a combination of different modules. In an embodiment, the image acquiring module 319 may be used for acquiring and/or receiving the input image 201 from one or more external sources. In an implementation, the image acquiring module 319 may be an image capturing device used for capturing images of a region of interest. In another implementation, the image acquiring module 319 may be communicatively associated with an external image capturing device for capturing/acquiring images of the region of interest. Further, the one or more external sources associated with the image acquiring module 319 may include an image database, which stores a plurality of images of the region of interest.

In an embodiment, the first model of the neural network 203A may be used for identifying the WRoI from the input image 201. In an implementation, the first model of the neural network 203A may correspond to a portion of the neural network 203, which is trained for identifying the WRoI from the input image 201. The first model of the neural network 203A may be trained using a set of encoders 205 and decoders (207, 215), encoder-decoder parameters and loss functions required for identifying the WRoI from the input image 201.

In an embodiment, the region estimation module 323 may be used for estimating a location of center of the WRoI identified by the first model of the neural network 203A. Further, the region estimation module 323 may use the location of the center of the WRoI for weakly predicting a portion of the larger entity in the WRoI. That is, the region estimation module 323 may identify a region covered by the larger entity within the WRoI.

In an embodiment, the cropping module 325 may be used for dynamically cropping the portion of the larger entity from the WRoI That is, the cropping module 325 ensures that only the portion of the larger entity, enclosing the at least one smaller entity, is extracted from the WRoI. In an embodiment, the cropping module 325 may use the location of the center of the WRoI as a reference point for cropping the portion of the larger entity.

In an embodiment, the second model of the neural network 203B may be used for analyzing the cropped portion 213 of the larger entity and segmenting each of the at least one smaller entities from the cropped portion 213 of the larger entity. In an implementation, the second model of the neural network 203B may correspond to a second portion of the neural network 203, which is trained for segmenting the plurality of entities in the input image 201. The second model of the neural network 203B may be trained using encoders 205 and decoders (207, 215), encoder-decoder parameters and loss functions required for identifying and segmenting the plurality of entities from the input image 201. In an implementation, same set of the encoder 205 and the encoder parameters may be used for training the first model and the second model of the neural network 203B. That is, the first model may identify the WRoI by analyzing the input image 201 using a first set of encoders 205 and encoder parameters and a first set of decoder 207 and decoder parameters. Thereafter, the second model may segment each of the plurality of entities from the cropped portion 213 of the WRoI using the first set of encoders 205 and related parameters (used by the first model) and a second set of decoders 215 and related parameters, which are different from the first set of decoders 207 and related parameters used by the first model.

That is, the segmentation system 200 uses a single trained neural network 203 for segmentation of the plurality of entities in the input image 201 by configuring specific portions of the trained neural network 203 to perform specific actions. Consequently, the operation of the segmentation system 200 may be computationally less complex, since a single trained neural network 203 is used for identifying the plurality of entities. Additionally, the computational complexity of the segmentation system 200 may be further reduced since common encoder 205 and encoder parameters are used by both the first model and the second model of the neural network 203.

FIG. 4 shows a flowchart illustrating a method of semantic segmentation of a plurality of entities in accordance with some embodiments of the present disclosure.

As illustrated in FIG. 4, the method 400 includes one or more blocks illustrating a method of semantic segmentation of a plurality of entities in an image using a segmentation system 200, for example, the segmentation system 200 shown in FIG. 2. The method 400 may be described in the general context of computer executable instructions. Generally, computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, and functions, which perform specific functions or implement specific abstract data types.

The order in which the method 400 is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method. Additionally, individual blocks may be deleted from the methods without departing from the spirit and scope of the subject matter described herein. Furthermore, the method can be implemented in any suitable hardware, software, firmware, or combination thereof.

At block 401, the method 400 includes identifying, by the segmentation system 200, a Weak Region of Interest (WRoI) in the image by analyzing the image using a first model 203A of a neural network trained for predicting the WRoI. In an embodiment, the WRoI may include, without limiting to, at least one smaller entity enclosed within a larger entity.

At block 403, the method 400 includes estimating, by the segmentation system 200, a location of center of the WRoI for weakly predicting a portion of the larger entity.

At block 405, the method 400 includes dynamically cropping, by the segmentation system 200, the portion of the larger entity from the WRoI. In an embodiment, dynamically cropping the portion of the larger entity may be based on a predetermined cropping factor dynamically selected based on size of the image.

At block 407, the method 400 includes analyzing, by the segmentation system 200, the cropped portion 213 of the larger entity using a second model 203B of the neural network 203 trained for segmenting each of the at least one smaller entity and the larger entity, thereby segmenting each of the plurality of entities in the image. In an embodiment, segmenting the at least one smaller entity and the larger entity may include extracting the portion of the larger entity using a first set of encoders 205 and a first set of decoders 207 corresponding to the second model of the neural network 203B. Further, the segmenting may include extracting region of the at least one smaller entity from the portion of the larger entity using the first set of encoders 205 and a second set of decoders 215 corresponding to the second model of the neural network 203B. In an embodiment, the first set of decoders 207 may be different from the second set of decoders 215.

In an implementation, the neural network 203 may include, without limiting to, encoders 205, decoders 207 and encoder-decoder parameters corresponding to the first model and the second model of the neural network 203B.

Exemplary embodiment:

FIGS. 5A - 5F show exemplary images illustrating segmentation of a cup region and a disc region from a fundus image in accordance with some embodiments of the present disclosure. FIG. 5A shows an exemplary fundus image comprising a cup region and a disc region (encircled regions in the center of FIG. 5A), which needs to be segmented for further analysis of the fundus. In FIG. 5 A, the outer region and/or the larger region may be identified as the disc region of the fundus. Further, the inner region and/or the smaller region enclosed by the disc region may be identified as the cup region of the fundus. In an embodiment, segmentation of the cup region and the disc region may be necessary for examining one or more anomalies in the fundus.

FIG. 5B shows a pre-processed version of the input image of FIG. 5 A. As an example, one of the pre-processing action performed on the input image of FIG. 5A may be resizing of the input image into a predetermined resolution/size. As an example, the predetermined resolution may be 512 * 512 pixels. Thus, FIG. 5B shows a resized version of the input image in FIG. 5A.

FIG. 5C shows a Weak Region of Interest (WRoI) identified from the pre-processed input image in FIG. 5B. That is, upon pre-processing the input image, the pre-processed image may be passed on to the first model of the neural network 203A, which analyzes the input image of FIG. 5B and identifies the WRoI from the input image, as shown in FIG. 5C. Here, it may be observed that the identified WRoI corresponds to a combined region comprising both the larger entity (i.e., the disc region) and the at least one smaller entity (i.e., the cup region) enclosed within the larger entity.

FIG. 5D shows a cropped portion of the input image, which is obtained by extracting the identified WRoI from the input image of FIG. 5B. As an example, the cropped portion of the WRoI may correspond to the portion of the disc region in the fundus image. In an embodiment, the cropped portion may be obtained by cropping the disc region from the WRoI with reference to location of a center of the WRoI. In an embodiment, the location of the center of the WRoI may be estimated using equations (1) and (2) below:

Wherein,

The matrix“^d-ise ( diseN l € 0, 1)” represents the disk mask prediction in the WRoI;

‘N’ = 512 (the number of pixels across the WRoI);

respectively represent row and column values of the cropped

M_·ί .„

center in the cropped matrix.

In an embodiment, once the location of the center of the WRoI is estimated, a 256 * 256 region may be cropped from the identified WRoI with reference to the location of the center of the WRoI. Thus, the cropped portion may be of the size 256 * 256.

FIG. 5E shows the segmented disc and cup region from the cropped portion shown in FIG. 5D. As an example, the bright/white region in the center of the image FIG. 5D may correspond to the cup region of the fundus. Similarly, the outer shaded region, enclosing the cup region, may correspond to the disc region of the fundus. In an embodiment, the cup region and the disc region, as shown in FIG. 5E, may be obtained by passing the cropped image of FIG. 5D through the second model of the neural network 203B.

FIG. 5F shows final outcome of the segmentation. The outcome of FIG. 5F may be obtained by remapping the segmented cup and disc regions with the original input image of the fundus shown in FIG. 5B. Thus, the outcome of the segmentation system 200 may be the segmented regions corresponding to each of the plurality of entities comprised in the input image.

Computer System: FIG. 6 illustrates a block diagram of an exemplary computer system 600 for implementing embodiments consistent with the present disclosure. In an embodiment, the computer system 600 may be segmentation system 200 shown in FIG. 2, which may be used for segmenting a plurality of entities from an input image 201. The computer system 600 may include a central processing unit (“CPU” or“processor”) 602. The processor 602 may comprise at least one data processor for executing program components for executing user- or system-generated business processes. A user may include a person, an analyst or a user of the segmentation system 200, and the like. The processor 602 may include specialized processing units such as integrated system (bus) controllers, memory management control units, floating point units, graphics processing units, digital signal processing units, etc.

The processor 602 may be disposed in communication with one or more input/output (I/O) devices (611 and 612) via I/O interface 601. The I/O interface 601 may employ communication protocols/methods such as, without limitation, audio, analog, digital, stereo, IEEE- 1394, serial bus, Universal Serial Bus (USB), infrared, PS/2, BNC, coaxial, component, composite, Digital Visual Interface (DVI), high-definition multimedia interface (HDMI), Radio Frequency (RF) antennas, S-Video, Video Graphics Array (VGA), IEEE 802. n /b/g/n/x, Bluetooth, cellular (e.g., Code-Division Multiple Access (CDMA), High- Speed Packet Access (HSPA+), Global System For Mobile Communications (GSM), Long- Term Evolution (LTE) or the like), etc. Using the EO interface 601, the computer system 600 may communicate with one or more EO devices 611 and 612. In some implementations, the EO interface 601 may also be used to connect to one or more data sources for receiving the input image 201.

In some embodiments, the processor 602 may be disposed in communication with a communication network 609 via a network interface 603. The network interface 603 may communicate with the communication network 609. The network interface 603 may employ connection protocols including, without limitation, direct connect, Ethernet (e.g., twisted pair 10/100/1000 Base T), Transmission Control Protocol/Internet Protocol (TCP/IP), token ring, IEEE 802. l la/b/g/n/x, etc. Using the network interface 603 and the communication network 609, the computer system 600 may receive the input image 201 from one or more external and/or internal sources of the input image 201. In an implementation, the communication network 609 can be implemented as one of the several types of networks, such as intranet or Local Area Network (LAN) and such within the organization. The communication network 609 may either be a dedicated network or a shared network, which represents an association of several types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), Wireless Application Protocol (WAP), etc., to communicate with each other. Further, the communication network 609 may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices, etc.

In some embodiments, the processor 602 may be disposed in communication with a memory 605 (e.g., RAM 613, ROM 614, etc. as shown in FIG. 6) via a storage interface 604. The storage interface 604 may connect to memory 605 including, without limitation, memory drives, removable disc drives, etc., employing connection protocols such as Serial Advanced Technology Attachment (SATA), Integrated Drive Electronics (IDE), IEEE- 1394, Universal Serial Bus (USB), fiber channel, Small Computer Systems Interface (SCSI), etc. The memory drives may further include a drum, magnetic disc drive, magneto-optical drive, optical drive, Redundant Array of Independent Discs (RAID), solid-state memory devices, solid-state drives, etc.

The memory 605 may store a collection of program or database components, including, without limitation, user/application interface 606, an operating system 607, a web browser 608, and the like. In some embodiments, computer system 600 may store user/application data 606, such as the data, variables, records, etc. as described in this invention. Such databases may be implemented as fault-tolerant, relational, scalable, secure databases such as Oracle^® or Sybase^®.

The operating system 607 may facilitate resource management and operation of the computer system 600. Examples of operating systems include, without limitation, APPLE^® MACINTOSH^® OS X^®, UNIX^®, UNIX-like system distributions (E.G., BERKELEY SOFTWARE DISTRIBUTION^® (BSD), FREEBSD^®, NETBSD^®, OPENBSD, etc ), LINUX^® DISTRIBUTIONS (E.G, RED HAT^®, UBUNTU^®, KUBUNTU^®, etc ), IBM^® OS/2^®, MICROSOFT^® WINDOWS^® (XP^®, VISTA^®/7/8, 10 etc.), APPLE^® IOS^®, GOOGLE™ ANDROID™, BLACKBERRY^® OS , or the like. The user interface 606 may facilitate display, execution, interaction, manipulation, or operation of program components through textual or graphical facilities. For example, the user interface 606 may provide computer interaction interface elements on a display system operatively connected to the computer system 600, such as cursors, icons, check boxes, menus, scrollers, windows, widgets, and the like. Further, Graphical User Interfaces (GUIs) may be employed, including, without limitation, APPLE^® MACINTOSH^® operating systems’ Aqua^®, IBM^® OS/2^®, MICROSOFT^® WINDOWS^® (e.g., Aero, Metro, etc ), web interface libraries (e.g., ActiveX^®, IAVA^®, IAVASCRIPT^®, AJAX, HTML, ADOBE^® FLASH^®, etc.), or the like.

The web browser 608 may be a hypertext viewing application. Secure web browsing may be provided using Secure Hypertext Transport Protocol (HTTPS), Secure Sockets Layer (SSL), Transport Layer Security (TLS), and the like. The web browsers 608 may utilize facilities such as AJAX, DHTML, ADOBE^® FLASH^®, JAVASCRIPT^®, JAVA^®, Application Programming Interfaces (APIs), and the like. Further, the computer system 600 may implement a mail server stored program component. The mail server may utilize facilities such as ASP, ACTIVEX^®, ANSI^® C++/C#, MICROSOFT^®, .NET, CGI SCRIPTS, JAVA^®, JAVASCRIPT^®, PERL^®, PHP, PYTHON^®, WEBOBJECTS^®, etc The mail server may utilize communication protocols such as Internet Message Access Protocol (IMAP), Messaging Application Programming Interface (MAPI), MICROSOFT^® exchange, Post Office Protocol (POP), Simple Mail Transfer Protocol (SMTP), or the like. In some embodiments, the computer system 600 may implement a mail client stored program component. The mail client may be a mail viewing application, such as APPLE^® MAIL, MICROSOFT^® ENTOURAGE^®, MICROSOFT^® OUTLOOK^®, MOZILLA^® THUNDERBIRD^®, and the like.

Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present invention. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term“computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, that is, non-transitory. Examples include Random Access Memory (RAM), Read-Only Memory (ROM), volatile memory, nonvolatile memory, hard drives, Compact Disc (CD) ROMs, Digital Video Disc (DVDs), flash drives, disks, and any other known physical storage media.

Advantages of the embodiment of the present disclosure are illustrated herein.

In an embodiment, the method of present disclosure segments multiple entities in an image using a weakly predicted portion of a larger entity in the image, thereby reducing the computational complexity involved in segmentation of the plurality of entities.

In an embodiment, the method of present disclosure uses a single neural network for predicting the plurality of entities in the image. Thus, the method of present disclosure eliminates requirement of training and using multiple learning models for segmenting the plurality of entities in the region. Thus, the method of present disclosure is computationally less complex and makes best utilization of the computational resources.

In an embodiment, the method of present disclosure may be used to segment any number of entities from the input image. Thus, the method of present disclosure offers high scalability with no or minor additional computational requirements.

The terms "an embodiment", "embodiment", "embodiments", "the embodiment", "the embodiments", "one or more embodiments", "some embodiments", and "one embodiment" mean "one or more (but not all) embodiments of the invention(s)" unless expressly specified otherwise.

The terms "including", "comprising",“having” and variations thereof mean "including but not limited to", unless expressly specified otherwise.

The enumerated listing of items does not imply that any or all the items are mutually exclusive, unless expressly specified otherwise. The terms "a", "an" and "the" mean "one or more", unless expressly specified otherwise.

A description of an embodiment with several components in communication with each other does not imply that all such components are required. On the contrary, a variety of optional components are described to illustrate the wide variety of possible embodiments of the invention. When a single device or article is described herein, it will be clear that more than one device/article (whether they cooperate) may be used in place of a single device/article. Similarly, where more than one device or article is described herein (whether they cooperate), it will be clear that a single device/article may be used in place of the more than one device or article or a different number of devices/articles may be used instead of the shown number of devices or programs. The functionality and/or the features of a device may be alternatively embodied by one or more other devices which are not explicitly described as having such functionality/features. Thus, other embodiments of the invention need not include the device itself.

Finally, the language used in the specification has been principally selected for readability and instructional purposes, and it may not have been selected to delineate or circumscribe the inventive subject matter. It is therefore intended that the scope of the invention be limited not by this detailed description, but rather by any claims that issue on an application based here on. Accordingly, the embodiments of the present invention are intended to be illustrative, but not limiting, of the scope of the invention, which is set forth in the following claims. While various aspects and embodiments have been disclosed herein, other aspects and embodiments will be apparent to those skilled in the art. The various aspects and embodiments disclosed herein are for purposes of illustration and are not intended to be limiting, with the true scope and spirit being indicated by the following claims.

Referral Numerals:

Claims

CLAIMS:

1. A method of semantic segmentation of a plurality of entities in an image, the method comprising:

identifying, by a segmentation system, a Weak Region of Interest (WRoI), comprising at least one smaller entity enclosed within a larger entity, in the image by analyzing the image using a first model of a neural network, trained for predicting the WRoI;

estimating, by the segmentation system, a location of center of the WRoI for weakly predicting a portion of the larger entity;

dynamically cropping, by the segmentation system, the portion of the larger entity from the WRoI; and

analyzing, by the segmentation system, the cropped portion of the larger entity using a second model of the neural network, trained for segmenting each of the at least one smaller entity and the larger entity, thereby segmenting each of the plurality of entities in the image.

2. The method as claimed in claim 1, wherein the neural network comprises encoders, decoders and encoder-decoder parameters corresponding to the first model and the second model of the neural network.

3. The method as claimed in claim 1, wherein dynamically cropping the portion of the larger entity is based on a predetermined cropping factor dynamically selected based on size of the image.

4. The method as claimed in claim 1, wherein segmenting the at least one smaller entity and the larger entity comprises:

extracting the portion of the larger entity using a first set of encoders and a first set of decoders corresponding to the second model of the neural network; and extracting region of the at least one smaller entity from the portion of the larger entity using the first set of encoders and a second set of decoders corresponding to the second model of the neural network, wherein the first set of decoders is different from the second set of decoders.

5. The method as claimed in claim 1 further comprises mapping each of the plurality of segmented entities with the image for generating segmentation results, indicating location of each of the plurality of entities within the image.

6. A segmentation system for performing semantic segmentation of a plurality of entities in an image, the segmentation system comprises:

a processor; and

a memory, communicatively coupled to the processor, wherein the memory stores processor-executable instructions, which on execution, cause the processor to:

identify a Weak Region of Interest (WRoI), comprising at least one smaller entity enclosed within a larger entity, in the image by analyzing the image using a first model of a neural network, trained for predicting the WRoI;

estimate a location of center of the WRoI for weakly predicting a portion of the larger entity;

dynamically crop the portion of the larger entity from the WRoI; and analyze the cropped portion of the WRoI using a second model of the neural network, trained for segmenting each of the at least one smaller entity and the larger entity, thereby segmenting each of the plurality of entities in the image.

7. The segmentation system as claimed in claim 6, wherein the neural network comprises encoders, decoders and encoder-decoder parameters corresponding to the first model and the second model of the neural network.

8. The segmentation system as claimed in claim 6, wherein dynamically cropping the portion of the larger entity is based on a predetermined cropping factor dynamically selected based on size of the image.

9. The segmentation system as claimed in claim 6, wherein the processor segments the at least one smaller entity and the larger entity by:

10. The segmentation system as claimed in claim 6, wherein the processor is further configured to map each of the plurality of segmented entities with the image to generate segmentation results, indicating location of each of the plurality of entities within the image.