WO2022171267A1

WO2022171267A1 - System, method, and computer executable code for organism quantification

Info

Publication number: WO2022171267A1
Application number: PCT/EP2021/053025
Authority: WO
Inventors: Aries DWIPUTERA; Ivan Tay; Wai YAN
Original assignee: Aquaeasy Pte. Ltd.
Priority date: 2021-02-09
Filing date: 2021-02-09
Publication date: 2022-08-18
Also published as: CN116830162A

Abstract

An aspect of the disclosure relates to an organism quantification system including: a camera; and a controller configured to control the camera to take a digital image of organisms; the system including a distance estimator configured to estimate, from the digital image, a distance from the camera to the organisms; a bounding box generator including a first trained neural network configured to estimate bounding boxes of detected organisms, detected on the digital image; a semantic segmenter including a second trained neural network, the semantic segmenter being configured to generate masked representations corresponding the bounding boxes; and a property estimator including a third trained neural network configured to estimate a property of the detected organisms based on the masked representation. Other aspects relate to a method for quantifying organisms including estimating, a non-transitory computer-readable medium, and a computer executable code comprising instructions to carry out the method.

Description

l

SYSTEM, METHOD, AND COMPUTER EXECUTABLE CODE FOR ORGANISM

QUANTIFICATION

TECHNICAL FIELD

[0001] An aspect of the disclosure relates to an organism quantification system. Another aspect of the disclosure relates to a method for quantifying organisms. Another aspect of the disclosure relates to a non-transitory computer-readable medium storing computer executable code.

BACKGROUND

[0002] Shrimp farmers have a well-known process of estimating the size of shrimps periodically to get a sense of how good their harvest is at any one time. They do this by randomly sampling using a feed tray shrimps that are captured by the tray. The shrimps found captured in the feed tray are then measured or estimated in terms of length and/or weight. Currently this length and/or weight data collection process is done manually which is time consuming. Therefore, there is a need to provide for an improved method and system of measuring shrimps.

SUMMARY

[0003] An aspect of the disclosure relates to an organism quantification system (also simply referred herein as system). The system may include a mobile computing device, e.g., a smartphone or a tablet, including a camera and a controller configured to control the camera to take a digital image of organisms. The system may include a distance estimator configured to estimate, from the digital image, a distance from the camera to the organisms. The system may include a bounding box generator including a first trained neural network configured to estimate bounding boxes of detected organisms, detected on the digital image. The system may include a semantic segmenter including a second trained neural network, the semantic segmenter being configured to generate masked representations corresponding the bounding boxes. The system may include a property estimator including a third trained neural network configured to estimate a property of the detected organisms based on the masked representation. The mobile computing device may further include a display. And the controller may be configured to output the estimated property, or an information based on the estimated property, onto the display. [0004] An aspect of the disclosure relates to a method for quantifying organisms. The method may include, estimating, by a distance estimator, a distance from the camera to the organisms. The method may include, determining that the distance may be within a pre determined range and taking a digital image of organisms with a camera, for example of a smartphone or a tablet. The method may include generating, by a bounding box generator including a first trained neural network, estimated bounding boxes corresponding to detected organisms detected on the digital image. The method may include generating, by a semantic segmenter including a second trained neural network, masked representations corresponding to the bounding boxes. The method may include estimating, by a property estimator including a third trained neural network, a property of the detected organisms based on the masked representations. The method may include outputting the estimated property, or an information based on the estimated property, onto a display.

[0005] An aspect of the disclosure relates to a non-transitory computer-readable medium storing computer executable code including instructions for quantifying organisms according to the method disclosed herein.

[0006] An aspect of the disclosure relates to a computer executable code including instructions for quantifying organisms according to the method disclosed herein.

BRIEF DESCRIPTION OF THE DRAWINGS

[0007] The invention will be better understood with reference to the detailed description when considered in conjunction with the non-limiting examples and the accompanying drawings, in which:

- FIG. 1 shows a schematic illustration of a use condition of the system in accordance with various embodiments;

- FIG. 2 illustrates a schematic diagram of a system in accordance with various embodiments, by way of example;

- FIG. 3 illustrates a schematic diagram of a system as a variation of the system in FIG. 2;

- FIG. 4 illustrates a method 500 for quantifying organisms in accordance with various embodiments;

- FIG. 5 shows an exemplary bounding box generator 200 including a first trained neural network NN1 in the form of a Faster R-CNN.

- FIG. 6 shows an example 280 including a digital image of a tray 20 having organisms 10;

- FIG. 7 shows an example of the architecture of the second neural network NN2, in the form of a U-net; - FIG. 8 shows an example 380 including, on left a cropped image corresponding to a bounding box BB3 and an organism AN3, which after being segmented by the semantic segmenter 300 results in a masked representation 384, in accordance with various embodiments;

- FIG. 9 shows an example 380 including, from the left to the right, a cropped image corresponding to a bounding box BB3 and an organism AN3; a mask 383 generated based on the cropped image; and a masked representation 384, in accordance with various embodiments;

- FIG. 10 shows an example of training the second neural network NN2 including a training sample 390 included in training data, in accordance with various embodiments;

- FIG. 11 shows an example of an architecture 400 of the third neural network, in accordance with various embodiments;

- FIG. 12 shows an exemplary flow chart for estimating the property of the organisms, in accordance with various embodiments;

- FIG. 13 shows an exemplary workflow for training the third trained neural network NN3 of the property estimator 400;

- FIG. 14 shows an example of intermediate images during processing, the images are shown as illustration purposes; and

- FIG. 15 shows a tables 610 and 620 showing a property of the detected organisms AN1, AN2, AN3.

[0008] In the figures, a same schematic image may be shown for training and quantification for illustrating purposes, however the skilled person would understand that training and carrying out organism quantification use different images.

DETAILED DESCRIPTION

[0009] The following detailed description refers to the accompanying drawings that show, by way of illustration, specific details and embodiments in which the disclosure may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the disclosure. Other embodiments may be utilized and structural, and logical changes may be made without departing from the scope of the disclosure. The various embodiments are not necessarily mutually exclusive, as some embodiments can be combined with one or more other embodiments to form new embodiments. [0010] Embodiments described in the context of one of the systems or methods are analogously valid for the other systems or methods. Similarly, embodiments described in the context of a system are analogously valid for a method, and vice-versa.

[0011] Features that are described in the context of an embodiment may correspondingly be applicable to the same or similar features in the other embodiments. Features that are described in the context of an embodiment may correspondingly be applicable to the other embodiments, even if not explicitly described in these other embodiments. Furthermore, additions and/or combinations and/or alternatives as described for a feature in the context of an embodiment may correspondingly be applicable to the same or similar feature in the other embodiments.

[0012] In the context of various embodiments, the articles “a”, “an” and “the” as used with regard to a feature or element include a reference to one or more of the features or elements. [0013] As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.

[0014] In the context of various embodiments, the term “organism” may refer to an animal, for example a crustaceous, such as a shrimp. The organism may be a farmed organism. The organism may be from a same species, from a same variety, or from a same breed, for example the same species may be Litopenaeus Vannamei. While shrimps are used herein in the form of examples, the disclosure is not limited thereto.

[0015] In the context of various embodiments, the term “shrimp” and “prawn” may be used interchangeably and may be belong, e.g., to the sub order of Dendrobranchiata or infra order Caridea.

[0016] The neural networks as described herein, such as the first neural network, the second neural network, and the third neural network may be trained, after which they are named trained NNs, such as the first trained neural network, the second trained neural network, and the third trained neural network respectively. Training may be performed for the organism to be detected, for example, for shrimps, such as Litopenaeus Vannamei. Features, such as structural neural network features, that are described in the context of a neural network may correspondingly be applicable to the neural network after training.

[0017] FIG. 1 shows a schematic illustration of a use condition of the system. A mobile computing device 110, e.g., a smartphone, may be held by a user at a distance D1 of the organisms 10. In the example of FIG. 1, a tray 20 including organisms, in this case 3 shrimps, is shown. When the distance is within a pre-determined range, the mobile computing device 110 may take an image 30. In FIG. 1, image 30 is used to represent the ‘real’ organisms, for ease of reference. Also, the schematic of FIG. 1 is shown in 2 dimensions instead of an accurate 3D perspective for simplicity.

[0018] The present disclosure describes, among others, a smartphone camera-based shrimp size estimation system without necessarily requiring additional lab equipment, holding- stand, boxes, conveyor belt and unique cameras (such as the depth camera, multi-camera sensors). The use of state-of-the-art deep convolutional neural networks with smartphone camera vision, as disclosed herein, allows to estimate the shrimp weight, quickly, precisely and easily.

[0019] FIG. 2 illustrates a schematic diagram of a system in accordance with various embodiments, by way of example. According to various embodiments, the organism quantification system 100 may include a mobile computing device 110, e.g., a smartphone or a tablet including a camera 102 and a controller 106 configured to control the camera 102 to take a digital image 30 (or simple named herein as image) of organisms 10. In examples, the controller may include or consist of one or more of: a CPU, a GPU, a Neural Processing Unit (NPU), a system on a chip (SOIC), a memory, or a combination thereof. A SOIC may include integrated one or more of a CPU, a GPU, a NPU, image signal processor (ISP), and other elements. The system, for example in the mobile computing device 110, may include other elements, such as one or more of: a display 104, a memory 108, a communication interface 107.

[0020] According to various embodiments, the system 100 may include a distance estimator 120 configured to estimate, from the digital image 30, a distance D1 from the camera to the organisms 10. For example, the distance estimator may output a signal indicating that the distance D1 is shorter than the predefined range, i.e., too short, or longer than the predefined range, i.e., too long. Indications corresponding to the signal may be indicated on the display, which may help the user to place the camera at a correct distance to the organisms 10. In the example, the distance estimator may further output a signal indicating that the distance D1 is within the predefined range, which signal may be used to trigger a digital image acquisition. According to various embodiments, the lower and upper bounds of the predefined range may be selected between 20 cm and 50 cm. The range may have a range width selected between 2 cm and 20 cm. An exemplary range is 25 cm to 30 cm. alternatively or in addition, according to various embodiments, the predefined range may be determined based on a field of view of the camera, and/or based on the resolution of the camera.

[0021] In the present disclosure, and in accordance with various embodiments, an automatic distance estimation method may be used to determine the distance between the mobile computing device to animals during or before the digital image acquisition process. According to various embodiments, the distance from the camera to the animals may be for example, a distance to the animal that is proximal to the camera, or a distance to a tray holding the animals, for example a shrimp feeding tray. Setting the pre-defined range to include the distance at which images of a training dataset were taken may provide better results with the system and method.

[0022] In accordance with various embodiments, a color filter may be implemented to improve detection of the animals and/or the feed tray. For example, color filtering based object detection algorithm may be employed to detect the feed tray. In one example, the color filtering may be HSV color filtering.

[0023] In accordance with various embodiments, the distance estimation between the camera and the feed tray may be carried out by using a single point projection principle of the object height in the digital image and the actual feed tray's height. Alternatively other means than a feed tray may be used for height estimation.

[0024] According to various embodiments, it is determined that the distance is within the pre-determined range, for example, if the camera is close to the feed tray at a pre-determined distance (e.g., 28 cm), the digital image may be taken automatically.

[0025] Organisms that leave in water, e.g., shrimps, may be estimated outside the water environment, this may prevent environmental bias due to water distortion.

[0026] According to various embodiments, the system 100 may include a bounding box generator 200 including a first trained neural network NN1 configured to estimate bounding boxes of detected organisms AN1, AN2, AN3, detected on the digital image 30. The first trained neural network NN1 may estimate bounding boxes, each bounding box of the bounding boxes corresponding to one of the organisms, for example a first bounding box BB1 corresponding to a first organism AN1, a second bounding box BB2 corresponding to a second organism AN2, a third bounding box BB3 corresponding to a third organism AN3, and so forth, for example, for all detected organisms.

[0027] The present disclosure may make reference to 3 organisms AN1, AN2, AN3 and their calculated results, such as 3 bounding boxes Bl, BB2, BB3, 3 masked representations, and others. Such references are for illustration purposes only and the disclosure is not limited to 3, as the number of organisms may be less or may be more, for example more than 10, more than 20, or more than 30. According to various embodiments, a masked representation may be a segmented image.

[0028] According to various embodiments, the system 100 may include a semantic segmenter 300 including a second trained neural network NN2, the semantic segmenter 300 may be configured to generate masked representations 384 corresponding the bounding boxes BB1, BB2, BB3. Each masked representation of the masked representations 384 may correspond to one of the bounding boxes, and to one of the organisms, for example a first masked representation corresponding to a first organism AN1 (and first bounding box BB1), a second masked representation corresponding to a second organism AN2 (and second bounding box BB2), a third masked representation corresponding to a third organism AN3 (and third bounding box BB3), and so forth, for example, for all detected organisms.

[0029] According to various embodiments, the system 100 may include a property estimator 400 including a third trained neural network NN3 configured to estimate a property of the detected organisms AN1, AN2, AN3 based on the masked representation 384. According to various embodiments, the property may include a property of each of the organism, for example a first property for the first organism AN1, a second property to the second organism AN2, a third property to the third organism AN3, and so forth, e.g, for all detected organism. Alternatively or in addition, the property may include a property representative of more than one organism, for example representative of all the detected organisms. In examples, the property may include a first weight for the first organism AN1, a second weight to the second organism AN2, a third weight to the third organism AN3, and so forth, e.g, for all detected organism. In examples, the property may include an average weight of all detected organisms (10).

[0030] According to various embodiments, the controller 106 may be configured to store the estimated property, or an information based on the estimated property, into the memory 108, and/or send the estimated property, or the information based on the estimated property to an external device via the communication interface 107. Examples of an external device are a server or a cloud. Alternatively or in addition, according to various embodiments, the controller 106 may be configured to output the estimated property, or the information based on the estimated property, onto the display. According to various embodiments, an information based on the estimated property may be, e.g., a property average such as weight average, or a property distribution such as a weight distribution.

[0031] In the system shown in connection with FIG. 2, and in accordance with some embodiments, the bounding box generator 200, the semantic segmenter 300, and the property estimator 400 are implemented in the mobile computing device 110, for example the one, two, or all of the bounding box generator 200, the semantic segmenter 300, and the property estimator 400 may be implemented as a set of instructions stored in memory 108 which are configured to cause the controller, when executed on the controller, to perform the tasks they are configured to as explained in accordance with various embodiments. Alternatively, one, two, or all of the bounding box generator 200, the semantic segmenter 300, and the property estimator 400 may be implemented external to the mobile computing device 110, for example on a server. FIG. 3 illustrates a schematic diagram of a system as a variation of the system in FIG. 2, in accordance with some embodiments, wherein the bounding box generator 200, the semantic segmenter 300, and the property estimator 400 are implemented external to the mobile computing device 110, for example on a computing system 130 which may be communication coupled to the communication interface 107. The system 100 may include the computing system 130. The computing system 130 may be, for example, a server, a cloud, a laptop.

[0032] A method 500 for quantifying organisms in accordance with various embodiments will be described in connection with FIG. 4 by way of illustration. According to various embodiments, the method 500 may be implemented on the system 100.

[0033] According to various embodiments, the method 500 may include estimating 502, by a distance estimator 120, a distance from the camera 102 to the organisms 10. According to various embodiments, the method 500 may include determining 504 that the distance may be within a pre-determined range and taking a digital image 30 of organisms 10 with the camera 102. For example, the distance estimator may output a signal indicating that the distance D1 is shorter than the predefined range, i.e., too short, or longer than the predefined range, i.e., too long. Indications corresponding to the signal may be indicated on the display, which may help the user to place the camera at a correct distance to the organisms 10. In the example, the distance estimator may further output a signal indicating that the distance D1 is within the predefined range, which signal may be used to trigger a digital image acquisition. According to various embodiments, the lower and upper bounds of the predefined range may be selected between 20 cm and 50 cm. The range may have a range width selected between 2 cm and 20 cm. An exemplary range is 25 cm to 30 cm. alternatively or in addition, according to various embodiments, the predefined range may be determined based on a field of view of the camera, and/or based on the resolution of the camera.

[0034] According to various embodiments, the method 500 may include generating 506, by a bounding box generator 200 including a first trained neural network NN1, estimated bounding boxes BB1, BB2, BB3. For example, a bounding box may be estimated for each of the organisms, also named as detected organisms AN1, AN2, AN3, which are detected on the digital image 30 by the first trained neural network NN1. The first trained neural network NN1 may estimate bounding boxes, each bounding box of the bounding boxes corresponding to one of the organisms, for example a first bounding box BB1 corresponding to a first organism AN1, a second bounding box BB2 corresponding to a second organism AN2, a third bounding box BB3 corresponding to a third organism AN3, and so forth, for example, for all detected organisms.

[0035] According to various embodiments, the method 500 may include generating 508, by a semantic segmenter 300 including a second trained neural network NN2, masked representations 384 corresponding to the bounding boxes BB1, BB2, BB3. Each masked representation of the masked representations 384 may correspond to one of the bounding boxes, and to one of the organisms, for example a first masked representation corresponding to a first organism AN1 (and first bounding box BB1), a second masked representation corresponding to a second organism AN2 (and second bounding box BB2), a third masked representation corresponding to a third organism AN3 (and third bounding box BB3), and so forth, for example, for all detected organisms.

[0036] According to various embodiments, the method 500 may include estimating 510, by a property estimator 400 including a third trained neural network NN3, a property of the detected organisms AN1, AN2, AN3 based on the masked representations 384. According to various embodiments, the property may include a property of each of the organism, for example a first property for the first organism AN1, a second property to the second organism AN2, a third property to the third organism AN3, and so forth, e.g, for all detected organism. Alternatively or in addition, the property may include a property representative of more than one organism, for example representative of all the detected organisms. In examples, the property may include a first weight for the first organism AN1, a second weight to the second organism AN2, a third weight to the third organism AN3, and so forth, e.g, for all detected organism. In examples, the property may include an average weight of all detected organisms (10).

[0037] According to various embodiments, the method may include, storing the estimated property, or an information based on the estimated property, into a memory 108, and/or sending the estimated property, or the information based on the estimated property to an external device, e.g., via the communication interface 107. Examples of an external device are a server or a cloud. Alternatively or in addition, according to various embodiments, the method may include, by the controller 106, outputting the estimated property, or the information based on the estimated property, onto the display. According to various embodiments, an information based on the estimated property may be, e.g., a property average such as weight average, or a property distribution such as a weight distribution. [0038] According to various embodiments the first trained neural network NN1 may be a Faster Region-based Convolutional Neural Network FASTER R-CNN. [0039] According to various embodiments, a Faster R-CNN may include feature extractor 210 (e.g., including convolutional layers) for extracting features based on an image 30 (shown as, but not limited to shrimps), a region proposal network 212 for generating proposals 214 and a box classifier 216. The feature extractor allows for shared feature extraction through convolutional layers. The region proposal network generates object proposals, and may be, e.g., a region proposal network. The region proposal network may include an object classification module and a proposal pruning module. The box classifier may be, e.g., a classification and regression network which returns a detection score of the region. The box classifier may include a classifier and a box regression module. FIG. 5 shows an exemplary bounding box generator 200 including a first trained neural network NN1 in the form of a Faster R-CNN.

[0040] The overview of the FASTER R-CNN object detection model in the shrimp detection task is that it first extracts high-level feature maps from the input image using a pre-trained lnception-V3 backbone deep convolutional neural network. The extracted feature maps are passed through the RPN (Region Proposal Network layer), which returns the organism’s (e.g. shrimp’s) body object proposals. After which the output may be applied to the an Rol (Region of Interest) pooling layer to bring down all the object proposals to the same size. Finally, these object proposals are passed through the box classifier (e.g. two fully connected layers) to classify as organism (e.g. shrimp) or not and predict the bounding box coordinate of the organism’s (shrimp’s) location.

[0041] FIG. 6 shows an example 280 including a digital image of a tray 20 having organisms 10, in this example, shrimps. Each of the organisms 10 has a bounding box as result of the detection. On the left side of FIG. 6, an image corresponding to a bounding box BB1 and a single organism is shown. A record may be generated for each organism (e.g. the one from BB1), for example a label (such as AN1, AN2, ...) and bounding box coordinates, e.g., in the format xmin, ymin, xmax, ymax which represents relative offsets from a pre-defined coordinate origin in the digital image. Alternatively or in addition, the cropped image may be stored with the record.

[0042] According to various embodiments, a deep learning-based object detection algorithm (NN1) may be used to detect the organisms (e.g., live shrimps) and localize each detected organism’s pixel area inside the image. The organism detection model generates the input image result as bounding box location of each organism body pixel area (xmin, ymin, xmax, ymax), and may further output the number of organism detections as count.

[0043] According to various embodiments, each of the first trained neural network NN1, the second trained neural network NN2, and the third neural network NN3, may have been trained, by using a training dataset as ground truth, which training dataset may be split into a training subset and a testing subset. The process of training may proceed until errors become satisfactory low (e.g., a minimum has been achieved), else the training, such as forward and backward propagation and update of network parameters, may continue. Errors may be deviations of the results from the ground truth.

[0044] According to various embodiments, for training the first neural network into the first trained neural network NN1, the training dataset may include training images (each including one more organism) and one or more annotated bounding boxes for each training image, in other words, one bounding box for each organism captured by the image. The annotation may be human made. Each of the organism of each image of the training dataset may be annotated by providing a respective bounding box and/or a label. Example of labels are label including an indexing number, e.g., is ‘organisml’, ‘organism2’, ....

[0045] According to various embodiments, for training the second neural network into the second trained neural network NN2, the training dataset may include pairs of images, e.g., each pair including an original crop images (e.g., cropped images cropped according to a bounding box) and corresponding segmentation masks. The segmentation masks may be human annotated. Cropped images may be images cropped by human manual annotation and/or by a trained neural network, e.g., the first trained neural network (such as an output of a Faster- RCNN detection model).

[0046] According to various embodiments, for training the third neural network NN3 into the third trained neural network, the training dataset may include masked representations annotated with their corresponding desired properties’ values during the data collection process. For example, the property may be weight.

[0047] According to various embodiments, the first trained neural network NN1 may be trained, e.g., by using a training dataset, which may be split into a training subset and a testing subset. The training dataset may include sets of different images of organisms, for example by using different sizes of the organisms. The images may be taken by having a fixed distance of the camera to the organisms, e.g., to a shrimp tray holding shrimps. The training dataset may include 1000 or more images, such as 5000 or more images.

[0048] In one example for data acquisition, different sizes of live shrimps (e.g., 0.1 grams to 35 grams weight of vannamei) are put inside the feed tray for 5 seconds and images are taken via a smartphone camera, e.g., with 4000 pixel x 3000 pixel resolution. Collected shrimp images may be taken from a predefined fixed distance, e.g., of 28 cm away from the camera. Such an exemplary data acquisition as described above was used in examples to create a training dataset. A schematic example of acquisition of the images is explained herein in connection with FIG. 1.

[0049] In one example, each of the organism of each image of the training dataset is annotated by providing a respective bounding box and/or a label. Accurate bounding box locations of objects and large amount of labeled images help train a robust detector. Therefore, the ground truth labels and bounding box coordinate of organism inside the image are manually annotated by human inspectors before training. The annotated training dataset may then be split into a training sub set for training the detector and a test sub set for evaluating the detector. For example, 80% of the annotated images of the training dataset may be used for training and the rest for performance evaluation purposes.

[0050] In the training phase, the first neural network may update the parameter values for the fully connected layers using a backpropagation algorithm, e.g., with stochastic gradient descent optimization technique. The process of backpropagation keeps on repeating until the errors satisfy a minimum error condition, else it will again propagate forward, backward, and update the network parameters.

[0051] FIG. 7 shows an example of the architecture of the second neural network NN2, in the form of a U-net, included in the semantic segmenter 300. According to various embodiments, a U-Net segmentation model architecture may include two parts. The first part may be the contraction path (also called the encoder) used to capture the image's context. The encoder may be a traditional stack of convolutional and max-pooling layers. The second part may be a symmetric expanding path (also called the decoder), which enables precise localization using transposed convolutions. Thus, the second neural network NN2 may be, or include, an end-to-end fully convolutional network (FCN). A FCN is free of any Dense layer which can accept the image of any size, e.g., containing only convolutional layers.

[0052] According to various embodiments the semantic segmenter 300 may be configured to generate the masked representation 384 of the cropped image 381 based on the mask 383, for example generate a masked image in which pixels of the another digital image portion which may be free of the detected organism may be set to a pre-defined constant value. [0053] According to various embodiments the second trained neural network NN2 may be configured to receive a cropped image 381, which may be cropped from the digital image 30 according to a bounding box BB3 of the estimated bounding boxes BB1, BB2, BB3 and corresponds to a detected organism AN3 of the detected organisms AN1, AN2, AN3. The second trained neural network NN2 may be configured to generate a mask 383, for example in the image format as a mask image. According to various embodiments, first pixels of the mask image correspond to a digital image portion 386 of the detected organism AN3 and second pixels different from the first pixels correspond to another digital image portion 387 which may be free of the detected organism AN3.

[0054] According to various embodiments generating masked representations 508 may include receiving, by the second trained neural network NN2, a cropped image 381, which has been cropped from the digital image 30 according to a bounding box BB3 of the estimated bounding boxes BB1, BB2, BB3 and corresponds to a detected organism AN3 of the detected organisms AN1, AN2, AN3. Generating masked representations 508 may include generating, e.g., by the second trained neural network NN2, a mask 383, for example in the image format as a mask image. According to various embodiments, first pixels of the mask image correspond to a digital image portion 386 of the detected organism AN3 and second pixels different from the first pixels correspond to another digital image portion 387 which may be free of the detected organism AN3. The method may be repeated for the remaining cropped images.

[0055] FIG. 8 shows an example 380 including, on left a cropped image corresponding to a bounding box BB3 and an organism AN3, which after being segmented by the semantic segmenter 300 results in a masked representation 384. It can be seen that cropped image on the left partially shows another organism (AN2), which is successfully removed with semantic segmentation in masked representation 384.

[0056] FIG. 9 shows an example 380 including, from the left to the right, a cropped image corresponding to a bounding box BB3 and an organism AN3; a mask 383 generated based on the cropped image; and a masked representation 384. Using FIG. 9 as illustration, according to various embodiments, the first pixels 386 may be set to a first constant value and the second pixels 387 may be set to a second constant value, such that when applied with an operation to the cropped image 381, the information of the another digital image portion 387 which is free of the detected organism AN3 is deleted, represented in FIG. 9, for illustration purposes as the white area in 384. For example, first pixels may be 1 (or the max value of each pixel) and second pixels may be 0, and the operation may be a logical AND operation. The information of the digital image portion is kept (illustrated by shrimp AN3). According to various embodiments, the operation may be carried out, e.g., on the digital image, in newly a created cropped masked representations, or in another variant.

[0057] According to various embodiments, generating masked representations 508 (e.g., via semantic segmentation) is found to be useful for identifying organisms’ present in the digital image, counting the organisms, and extracting each organisms’ pixel location individually to do primary segmentation. When multiple shrimps are overlapped/intersect each other in the digital image, rectangular bounding box segmentation may fail to extract the organism body from the background. Therefore, semantic segmentation deep learning algorithm (e.g., U-Net) on bounding box crop image for secondary segmentation produce more reliable results. [0058] According to various embodiments, the training dataset may include pairs of images, such as original crop images and masks images. These pairs may be used for building the supervised learning-based segmentation model of the second neural network NN2. FIG. 10 shows an example of training the second neural network NN2 including a training sample 390 included in training data, the training sample 390 includes human manual annotated regions segmentation mask (masked representations), such as mask MGT1 (regions 396 and 397), of original crop images, such as cropped image BB . Cropped images may be images cropped by human manual annotation and/or by a trained neural network, e.g., the first trained neural network (such as an output of the previous Faster-RCNN shrimp detection model). [0059] According to various embodiments, training of U-Net segmentation model architecture (second neural network NN2) may include two parts. The first part may be the contraction path (also called the encoder) used to capture the image's context. The encoder may be a traditional stack of convolutional and max-pooling layers. The second part may be a symmetric expanding path (also called the decoder), which enables precise localization using transposed convolutions. Thus, the second neural network NN2 may be, or include, an end-to-end FCN. A FCN is free of any Dense layer which can accept the image of any size, e.g., containing only convolutional layers.

[0060] As illustrated in FIG. 10, during training of the second neural network NN2, the training cropped images BB are encoded in the second neural network NN2 via forward propagation and the training masks MGT1 are decoded via backward propagation.

[0061] In examples, data for semantic segmentation was prepared as follows. Annotating the region of each shrimp's body part inside the image was carried out to train the U-Net semantic segmentation model to achieve precise segmentation results. The training images dataset (“first training dataset”) included 5000 cropped shrimp images stored in memory from the output of the first trained neural network NN1 trained to detect shrimps, e.g., Faster-RCNN detection model. Those images include background noises and overlap regions of multiple shrimps. The training images dataset also includes 5000 images, which are human manual annotated regions segmentation mask of cropped shrimp images, these images may be, e.g., gray scale, or having only two colors. These pairs were used for training the second neural network, e.g., building a supervised learning-based segmentation model.

[0062] In the examples, the training procedure for shrimp segmentation included a U-Net Model compiled with Adam optimizer and using binary-cross-entropy loss function because only two classes where used, namely shrimp body part and background. The process of backpropagation kept on repeating until the errors become satisfactory low, e.g., minimum, else the process propagates forward, backward, and update the network parameters, as shown in FIG. 7.

[0063] According to various embodiments and a previously described, the second trained neural network (e.g., semantic shrimp segmentation model) takes the previous detection from the bounding box generator (200) result image as input in the prediction phase, and returns the output as the shrimp body predicted mask region. The predicted mask is then overlaid on the original input image (the cropped image) for the final segmentation result, thus generating the masked representations 384. The masked representations may be further processed by the property estimator 400 and the third trained neural network NN3 included in the property estimator 400.

[0064] According to various embodiments the third trained neural network NN3 may include a sequence of convolution layers and max pooling layers followed by an end. The end may include one or more, e.g., 2 fully connected layers FC which may be followed by a softmax for output. Each convolutional layer of the sequence of convolution layers may include a Rectified Linear Unit (ReLu) activation. An example of an architecture 400 of the third neural network is shown in FIG. 11.

[0065] According to various embodiments the third trained neural network NN3 may include a convolutional regression neural network.

[0066] According to various embodiments the convolutional regression neural network may be a VGG convolutional regression neural network configured to extract features from the masked representation 384.

[0067] According to various embodiments the third trained neural network may include a fully connected layer configured to estimate the property based on the features of convolutional layers. According to various embodiments the convolutional regression neural network may be a VGG convolutional regression neural network and wherein estimating 510 a property of the detected organisms AN1, AN2, AN3 may include extracting, by the VGG convolutional regression neural network, features from the masked representation 384. Examples of the third neural network are VGG NET-D, a VGG including 16 layers (VGG16), VGG NET-E, or a variation thereof.

[0068] FIG. 12 shows an exemplary flow chart for estimating the property of the organisms, in the example, a masked representation 384 of an organism is shown, in this case a shrimp, which is input into the property estimator 400, the output of the property estimator 400 is a weight prediction 410. [0069] FIG. 13 shows an exemplary workflow for training the third trained neural network NN3 of the property estimator 400.

[0070] As a dataset for training the third neural network NN3, the cropped shrimp images 384 in the first training dataset were annotated with their corresponding weight values during the data collection process. These segmented shrimp images with labeled weight values were then used to train the third neural network NN3, e.g., an image-based regression neural network. The weight values are used as ground truth 430.

[0071] In examples, the shrimp weight regression neural network used pre-trained parameters of the VGG-16 (ImageNet) transfer learning model as a feature extractor and add one fully connected layer with linear-activation function for weight estimation 410. In the training phase, pre-trained parameter values of the VGG-16 model are frozen (no update parameter with backpropagation) for low-level image feature extraction. The shrimp regression network only updates the last fully connected layer's parameter values using a backpropagation algorithm with Adam optimization technique. The backpropagation process keeps repeating until the mean square error and mean-absolute-error become satisfactory, e.g., a minimum (in 420); it will again propagate forward, backward, and update the network parameters.

[0072] After training is done, the segmented input image may be passed through the shrimp weight estimation model to predict shrimp's weight as numerical values with measurement unit (e.g., grams) as shown in FIG. 12.

[0073] According to various embodiments the property may be weight, which is shown in the examples for illustration purposes.

[0074] According to various embodiments the organisms 10 may be shrimps of a same species, as shown in the examples for illustration purposes.

[0075] FIG. 14 shows an example of intermediate images during processing, the images are shown as illustration purposes, while its corresponding data may be stored in any suitable digital formats. Image 32 shows an image of a tray with 3 shrimps and their corresponding bounding boxes BB1, BB2, and BB3 annotated on the image. Image 33’ shows, how each cropped image (of image 33) is segmented thereby deleting the background of the image and effectively separating shrimps of overlapping bounding boxes. Each of the shrimp images is fed into the property estimator 400 including the third trained neural network NN3, for estimating a property of the detected organisms (e.g., shrimps).

[0076] FIG. 15 shows a table 610 and 620 showing a property of the detected organisms AN1, AN2, and AN3. In table 610 the property includes weight which is 1.9 grams, 2.1 grams, and 2.5 grams for organisms AN1, AN2, and AN3, respectively. In table 620, the count of 3 is shown and the property includes the average weight of 2.167 grams.

[0077] The present disclosure describes an end-to-end deep-learning-based organism size estimation approach that doesn't require a manual handcrafted measurement features (such as length, width, and carapace length) like conventional machine learning techniques and without the use of pre-determined thresholding based image processing algorithms. [0078] According to various embodiments, to do the organism (e.g., shrimp) size estimation, a farmer need to take pictures of live shrimps inside the feed tray using a smartphone camera. According to various embodiments wherein the organisms are aquatic organisms, e.g., shrimps, the tray may be drained of water thereby voiding distortions of the image. In some embodiments, if the camera is at a pre-determined distance or distance range to the feed tray, the image is taken automatically. The resulting image will then pass to the organism size estimation Al model to obtain the shrimp count, and estimated individual organism’s weight inside the mobile application.

[0079] According to various embodiments, a non-transitory computer-readable medium storing computer executable code including instructions for quantifying organisms according to the method disclosed herein.

According to various embodiments, a computer executable code including instructions for quantifying organisms according to the method disclosed herein.

[0080] While the disclosure has been particularly shown and described with reference to specific embodiments, it should be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. The scope of the invention is thus indicated by the appended claims and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced.

Claims

1. An organism quantification system (100) comprising: a mobile computing device (110) comprising: a camera (102); and a controller (106) configured to control the camera (102) to take a digital image (30) of organisms (10); a distance estimator (120) configured to estimate, from the digital image (30), a distance from the camera to the organisms (10); a bounding box generator (200) comprising a first trained neural network (NN1) configured to estimate bounding boxes of detected organisms (AN1, AN2, AN3), detected on the digital image (30); a semantic segmenter (300) comprising a second trained neural network (NN2), the semantic segmenter (300) being configured to generate masked representations (384) corresponding the bounding boxes (BB1, BB2, BB3); and a property estimator (400) comprising a third trained neural network (NN3) configured to estimate a property of the detected organisms (AN1, AN2, AN3) based on the masked representation (384).

2. The organism quantification system (100) of claim 1, wherein the first trained neural network (NN1) is a Faster Region-based Convolutional Neural Network (FASTER R-CNN).

3. The organism quantification system (100) of claim 1 or claim 2, wherein the second trained neural network (NN2) is configured to: receive a cropped image (381), which is cropped from the digital image (30) according to a bounding box (BB3) of the estimated bounding boxes (BB1, BB2, BB3) and corresponds to a detected organism (AN3) of the detected organisms (AN1, AN2, AN3); and generate a mask (383) wherein: first pixels correspond to a digital image portion (386) of the detected organism (AN3) and second pixels different from the first pixels correspond to another digital image portion (387) which is free of the detected organism (AN3).

4. The organism quantification system (100) of any of the previous claims, wherein the semantic segmenter (300) is configured to generate the masked representation (384) of the cropped image (381) based on the mask (383).

5. The organism quantification system (100) of any of the previous claims, wherein the third trained neural network (NN3) comprises a convolutional regression neural network.

6. The organism quantification system (100) of claim 5, wherein the convolutional regression neural network is a VGG convolutional regression neural network configured to extract features from the masked representation (384).

7. The organism quantification system (100) of claim 6, wherein the third trained neural network comprises a fully connected layer configured to estimate the property based on the features.

8. The organism quantification system (100) of any of the previous claims, wherein the property is weight.

9. The organism quantification system (100) of any of the previous claims, wherein the organisms (10) are shrimps of a same species.

10. A method (500) for quantifying organisms comprising: estimating (502), by a distance estimator (120), a distance from the camera (102) to the organisms (10); determining (504) that the distance is within a pre-determined range and taking a digital image (30) of organisms (10) with a camera (102); generating (506), by a bounding box generator (200) comprising a first trained neural network (NN1), estimated bounding boxes (BB1, BB2, BB3) corresponding to detected organisms (AN1, AN2, AN3) detected on the digital image (30); generating (508), by a semantic segmenter (300) comprising a second trained neural network (NN2), masked representations (384) corresponding to the bounding boxes (BB1, BB2, BB3); and estimating (510), by a property estimator (400) comprising a third trained neural network (NN3), a property of the detected organisms (AN1, AN2, AN3) based on the masked representations (384).

11. The method (500) of claim 10, wherein the first trained neural network (NN1) is a Faster Region-based Convolutional Neural Network (FASTER R-CNN).

12. The method (500) of claim 10 or claim 11, wherein generating masked representations (508) comprises: receiving, by the second trained neural network (NN2), a cropped image (381), which has been cropped from the digital image (30) according to a bounding box (BB3) of the estimated bounding boxes (BB1, BB2, BB3) and corresponds to a detected organism (AN3) of the detected organisms (AN1, AN2, AN3); and generating a mask (383) wherein: first pixels correspond to a digital image portion (386) of the detected organism (AN3) and second pixels different from the first pixels correspond to another digital image portion (387) which is free of the detected organism (AN3).

13. The method (500) of any of claims 10 to 12, wherein generating masked representations (508) is based on the mask (383).

14. The method (500) of any of the claims 10 to 13, wherein the third trained neural network (NN3) comprises a convolutional regression neural network.

15. The method (500) of claim 14, wherein the convolutional regression neural network is a VGG convolutional regression neural network and wherein estimating (510) a property of the detected organisms (AN1, AN2, AN3) comprises extracting, by the VGG convolutional regression neural network, features from the masked representation (384).

16. The method (500) of claim 15, wherein the third trained neural network comprises a fully connected layer configured to estimate the property based on the features.

17. The method (500) of any of the claims 10 to 16, wherein the property is weight.

18. The method (500) of any of the claims 10 to 17, wherein the organisms (10) are shrimps of a same species.

19. A non-transitory computer-readable medium storing computer executable code comprising instructions for quantifying organisms according to the method of any one of claims 10 to 18.

20. A computer executable code comprising instructions for quantifying organisms according to the method of any one of claims 10 to 18.