WO2019015785A1 - Method and system for training a neural network to be used for semantic instance segmentation - Google Patents

Method and system for training a neural network to be used for semantic instance segmentation Download PDF

Info

Publication number
WO2019015785A1
WO2019015785A1 PCT/EP2017/068550 EP2017068550W WO2019015785A1 WO 2019015785 A1 WO2019015785 A1 WO 2019015785A1 EP 2017068550 W EP2017068550 W EP 2017068550W WO 2019015785 A1 WO2019015785 A1 WO 2019015785A1
Authority
WO
WIPO (PCT)
Prior art keywords
vectors
neural network
vector
loss function
template image
Prior art date
Application number
PCT/EP2017/068550
Other languages
French (fr)
Inventor
Hiroaki Shimizu
Davy NEVEN
Bert DE BRABANDERE
Luc Van Gool
Marc PROESMANS
Nico CORNELIS
Original Assignee
Toyota Motor Europe
Katholieke Universiteit Leuven
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Toyota Motor Europe, Katholieke Universiteit Leuven filed Critical Toyota Motor Europe
Priority to PCT/EP2017/068550 priority Critical patent/WO2019015785A1/en
Priority to JP2020502990A priority patent/JP6989688B2/en
Publication of WO2019015785A1 publication Critical patent/WO2019015785A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present invention relates to the field of semantic instance segmentation, and more precisely to the training of neural networks used for semantic instance segmentation. Description of the Related Art
  • Semantic instance segmentation is a method for determining the types of objects in an image, for example acquired by a camera, while being able to differentiate objects of a same type.
  • semantic segmentation methods have been used to differentiate objects having different types on an image.
  • a semantic segmentation method cannot differentiate two objects having the same type. For example, if an image to be analyzed comprises two overlapping cars and two overlapping pedestrians, a semantic segmentation method will detect an area in the image corresponding to cars and an area in the image corresponding to pedestrians.
  • Various methods for semantic segmentation have been proposed which generally use (deep) convolutional networks.
  • Instance segmentation methods only aim at identifying separate objects regardless of their type. If the above-mentioned image is analyzed using an instance segmentation method then what will be detected is four separate objects.
  • Various methods have been proposed to achieve this, and most notably methods using (deep) convolutional networks.
  • Some known methods require specific network architectures or rely on object proposals. For example, some methods use a multistage (or cascaded) pipeline in which the object proposal (or bounding box generation) is followed by a separate segmentation and/or classification step. These methods are not satisfying in terms of speed (because of the multistage computations) and segmentation quality (in particular when faced with occlusions).
  • a desired output of a semantic instance segmentation method for the above image could be a mask highlighting each car and each pedestrian with different colors and labels indicating, for example, carl, car2, pedestrianl, pedestrian2.
  • training a neural network is an iterative task which can be performed using template images in which each element has already been identified, and a loss-function.
  • a loss function typically consists in a calculation performed on the output of a neural network to determine if this output is valid, i.e. this output leads to a good detection of each element and its type.
  • the loss function is generally a score which represents how far the output of a neural network is from an expected output.
  • the method of this document uses a loss function which will ensure that pixels that correspond to a same instance object are close in the space of the output of the neural network (typically, for each pixel in an image, a neural network outputs a vector).
  • This loss function also ensures that pixels that correspond to different objects remain far from each other in the network's output representation.
  • the loss function of this document therefore has a lower value when in the output of a neural network the vectors of pixels of a same object are close and when the vectors of pixels of different objects are far, and a lower value otherwise.
  • the neural network is then modified by taking into account the result of the loss function so as to obtain, in the next iteration, a lower score for the loss function.
  • the loss function of this document is unsatisfactory. More precisely, the loss function of this document relies on the random selection of a limited number of vectors for each object in the image, and uses extensive calculations.
  • the present invention overcomes one or more deficiencies of the prior art by proposing a method for training iteratively a neural network to be used for semantic instance segmentation, wherein, for each iteration, the neural network outputs a vector for each pixel of a template image, wherein the template image comprises predefined elements each associated with pixels of the template image and the corresponding vectors.
  • Training the neural network is performed using a loss function in which:
  • the loss function decreases until reaching a target value (for example zero) at least when: - for each vector belonging to an element, the distance between the vector and a center of the vectors of this element decreases, and
  • the target value is a value which is desired to obtain for the loss function. When the loss reaches or goes below the target value it can be considered that the training is complete.
  • the target value can be predetermined. In some iterations, or when used on a real image, the target value may not be reached.
  • An element may be an object appearing on the template image. In the template image, there is a plurality of elements which may be of the same type or of different types. Each pixel in the template image has a known association with an element.
  • the neural network is a neural network already able to perform semantic segmentation, and the above-defined loss function is defined so as to train the network to also perform instance segmentation.
  • the invention advantageously applies on any already available neural network which may have been trained for semantic segmentation, without requiring modifications of the architecture of the neural network.
  • the inventors of the present invention have observed that using a neural network which has already been trained for semantic segmentation allows obtaining better results for semantic instance segmentation.
  • the loss function of the invention is defined so as to take all the vectors into account (the distance between each vector and the corresponding center is calculated, and the distances between all the centers are calculated).
  • the center of the vectors of an element may be determined as the mean vector of all the vectors belonging to a same element of the template image.
  • the loss function treats all the vectors in a computationally efficient manner, it is possible to obtain a loss which reaches the target value quickly and which actually means that all the vectors meet the expected requirements. A fast convergence of the training is obtained.
  • the loss function decreases until reaching the target value at least when for each vector belonging to an element, the distance between the vector and a center of the vectors of this element decreases until this distance is inferior or equal to a first predefined distance threshold.
  • the loss function decreases until reaching the target valueat least when the distances between all the centers of the vectors of each element increases until each of the distances is superior or equal to a second predefined distance threshold.
  • the loss function is:
  • Nc the number of vectors in element c
  • This loss function can be computed efficiently.
  • the loss function is further defined so as do decrease until reaching the target value at least when the distances between each center of the vectors of each element and the origin of the space of the vectors decreases.
  • the loss function comprises an additional term and is:
  • Lreg is a term which pulls the vectors towards the origin of the space of the vectors.
  • y is preferably much less than or ⁇ as it plays a less preponderant role in the loss function.
  • a or ⁇ can have a value equal to 1 and ⁇ can be 0,001.
  • the coordinates of each pixel of an image inputted to the neural network are inputted to the neural network.
  • elements which have a similar appearance arranged in a specific manner may not be considered as two separate instances or elements.
  • the neural network receives enough information to differentiate the two elements.
  • the invention also provides a method for semantic instance segmentation comprising using the neural network trained using the above defined method.
  • the method further comprises a post-processing step in which the mean-shift algorithm or the k-means algorithm is applied to the vectors outputted by the neural network.
  • the vectors are likely to be placed in distinct and separate hyperspheres, this facilitates the implementation of the mean-shift algorithm or of the k-means algorithm. These algorithms facilitate the identification of pixels belonging to an object.
  • the invention also provides a system for training iteratively a neural network to be used for semantic instance segmentation, wherein, for each iteration, the neural network is configured to output a vector for each pixel of a template image,
  • the template image comprises predefined elements each associated with pixels of the template image and the corresponding vectors.
  • the system comprises a module for calculating a loss using a loss function for each iteration, the loss function being defined so as to decrease until reaching a target value at least when: - for each vector belonging to an element, the distance between the vector and a center of the vectors of this element decreases, and
  • This system may be configured to perform all the embodiments of the method for training a neural network as defined above.
  • the invention also provides a system for image semantic instance segmentation comprising the neural network trained using the method for training a network as defined above.
  • the steps of the method for training a neural network and/or the steps of the method for semantic instance segmentation are determined by computer program instructions.
  • the invention is also directed to a computer program for executing the steps of a method as described above when this program is executed by a computer.
  • This program can use any programming language and take the form of source code, object code or a code intermediate between source code and object code, such as a partially compiled form, or any other desirable form.
  • the invention is also directed to a computer-readable information medium containing instructions of a computer program as described above.
  • the information medium can be any entity or device capable of storing the program.
  • the medium can include storage means such as a ROM, for example a CD ROM or a microelectronic circuit ROM, or magnetic storage means, for example a diskette (floppy disk) or a hard disk.
  • the information medium can be an integrated circuit in which the program is incorporated, the circuit being adapted to execute the method in question or to be used in its execution.
  • FIG. 1 is a block diagram of an exemplary method for training a neural network
  • FIG. 2 is a block diagram of an exemplary semantic instance segmentation method
  • FIG. 3 is a schematic diagram of a system for training a neural network and a system for semantic instance segmentation
  • FIG. 4 is a representation of the vectors outputted by a neural network
  • FIG. 5 illustrates the training of a neural network
  • FIG. 6 illustrates the effect of inputting the coordinates of pixels to the neural network.
  • This training is performed using a template image 1 comprising various elements, for example elements of the same type which may or may not overlap (for example two overlapping cars).
  • each element in the template image is previously known, and each pixel of this template image has a previously known association with an element (for example car number 1, car number 2, background, etc.).
  • the neural network to be trained transforms the template image into a plurality of vectors, each vector corresponding to a pixel of the template image.
  • This plurality of vectors is sometimes called a tensor by the person skilled in the art, and this tensor has the same height and width as the template image, but a different depth equal to the length of the vectors.
  • the length of the vectors can be chosen depending on the neural network to be trained, or depending on the application. All the vectors have the same length and they all belong to the same vector space.
  • vectors outputted by a neural network are sometimes called pixel embedding by the person skilled in the art.
  • the neural network is initially, before the training, a neural network already able to perform semantic segmentation.
  • a neural network already able to perform semantic segmentation.
  • the skilled person will know which neural network can already perform semantic segmentation.
  • the neural network may be a neural network known to the skilled person under the name "Segnet” and described in document "A deep convolutional encoder-decoder architecture for image segmentation” (V '. Badrinarayanan et al., arXiv preprint arXiv:1511.00561, 2015. 2), or a neural network described in document “Fully convolutional networks for semantic segmentation” (X Long et al., CVPR, 2015).
  • the neural network outputs vectors referenced 2 on figure 1.
  • step E02. The loss is calculated using a loss function which delivers a scalar value which is positive or zero often called a loss.
  • the loss function L can be a linear combination of two terms:
  • a a predefined constant preferably positive, for example equal to 1
  • ⁇ a predefined constant preferably positive, for example equal to 1
  • Lvar Lvar a term which decreases until reaching zero at least when for each vector belonging to an element, the distance between the vector and a center of the vectors of this element decreases
  • Ldist a term which decreases until reaching zero at least when the distances between all the centers of the vectors of each element increase.
  • a and ⁇ may be chosen through a grid search or a hyperparameter search with an evaluation performed on a validation set. This can be performed by trying different settings in a structured way so as to choose the best value for and ⁇ .
  • the loss function can decrease until reaching zero at least when for each vector belonging to an element, the distance between the vector and a center of the vectors of this element decreases until this distance is inferior or equal to a first predefined distance threshold.
  • Nc the number of vectors in element c
  • the distance can be the LI or L2 distance well known to the person skilled in the art.
  • the loss function can decrease until reaching zero at least when for each vector belonging to an element, the distance between the vector and a center of the vectors of this element decreases until this distance is inferior or equal to a first predefined distance threshold.
  • Ldist and Lvar are defined so as to ensure that when the loss is equal to zero all the vectors associated with an object are located inside an hypersphere having a radius equal to ⁇ and the centers of all the hyperspheres are separated by at least 2Sd.
  • 5d is superior to 2 ⁇ .
  • the loss function can be further defined so as do decrease until reaching zero at least when the distances between each center of the vectors of each element and the origin of the space of the vectors decreases.
  • the loss function comprises an additional term Lreg and is:
  • y is preferably much less than a or ⁇ as it plays a less preponderant role in the loss function.
  • a or ⁇ can have a value equal to 1 and y can be 0,001.
  • step E02. it is possible to calculate the loss of step E02. If this loss is equal to zero then it is considered that the neural network is trained. Alternatively, it can be considered that the neural network is trained when the loss is below a predefined threshold.
  • Step E03 is then performed and in this step, the parameters or weights or the neural network are adjusted using the loss calculated in step E02.
  • Step E03 can be performed for example using the method known to the skilled person as stochastic gradient descent (step E03).
  • step E04 which consists in at least performing steps E01 and E02 with the adjusted neural network.
  • training the neural network can be performed on a plurality of different template images. Once the neural network has been trained using the method disclosed on figure 1, it can be used for semantic instance segmentation, as represented on figure 2.
  • the method of figure 2 is performed on an image referenced 3, for example an image which has been acquired by a camera.
  • This image can comprise, for example, two partially overlapping cars and two partially overlapping pedestrians.
  • step Ell the image 3 is inputted to the trained neural network so as to perform semantic instance segmentation.
  • Vectors 4 are obtained as the output of the trained neural network.
  • a post-processing step E12 is carried out.
  • the neural network has been trained using the above-defined loss function, the vectors are close to being in separate hyperspheres. It should be noted that in most cases and when used on a real image (and not a template image), the loss is typically slightly above zero.
  • a sub-step E120 is performed in which the K-means of mean shift algorithm is used on the vectors so as to group together in clusters the pixels which should belong to the same object. This increases the robustness of the method.
  • a final image 5 with semantic instance segmentation is outputted.
  • the system SI which can be a computer, comprises a processor PR1 and a non-volatile memory MEM1.
  • a set of instructions INST1 is stored in the non-volatile memory MEM1.
  • the set of instructions INST1 comprises instructions to perform a method for training a neural network to perform semantic instance segmentation, for example the method described in reference to figure 1.
  • the non-volatile memory MEM1 further comprises a neural network NN and at least one template image TIMG.
  • the neural network NN can be used in a separate system S2 configured to perform semantic instance segmentation.
  • the neural network NN can be communicated to the system S2 using a communication network INT, for example the Internet.
  • a communication network INT for example the Internet.
  • the system S2 comprises a processor PR2 and a non-volatile memory MEM2 in which a set of instructions INST2 is stored to perform semantic instance segmentation using an image IMG stored in the non- volatile memory MEM2 and the trained neural network TNN also stored in the non-volatile memory MEM2.
  • Figure 4 is a schematic representation of the vectors outputted by a neural network.
  • the neural network which has been used is not fully trained and the loss is not equal to zero or below a predefined threshold. Also, in this example and for the sake of simplicity, the neural network outputs vectors having a length of 2, which allows using a two- dimensional representation.
  • the various vectors outputted by the neural network are represented as dots 10, 20, and 30 each associated with a pixel of the template image inputted to the neural network.
  • Each pixel of the template image has a known association with an element visible on the template image.
  • the same is also true for the vectors outputted by the neural network.
  • the vectors referenced 10 are all associated with a first object
  • the vectors referenced 20 are all associated with a second object
  • the vectors referenced 30 are all associated with a third object. Even if the training of the neural network is still being performed, the vectors 10, 20 and 30 already substantially form clusters of vectors respectively referenced CI, C2, and C3. It is then possible to determine the centers 11, 12, and 13, respectively of cluster CI, cluster C2, and cluster C3.
  • the loss function is defined so that the vectors 10 get closer (after each iteration of the training) to the center 11, and more precisely so that the vectors 10 are within a distance from the center which is less than a first predefined distance threshold ⁇ .
  • the vectors 10 are expected to all be inside the circle having a radius ⁇ and a center 11 represented on the figure.
  • the vectors 20 are expected to all be inside the circle having a radius ⁇ and a center 12 represented on the figure
  • the vectors 30 are expected to all be inside the circle having a radius ⁇ and a center 13 represented on the figure.
  • the loss function is further defined so that the centers 11, 12 and 13 get further away from each other (after each iteration of the training). More precisely, so that the centers are each separated by a second predefined distance threshold equal to 2 ⁇ .
  • the circles of radius ⁇ and centers 11, 12 and 13 are also represented on the figures.
  • the movements of the vectors carried out to move the clusters away from each other are represented using thick arrows on the figure.
  • Figure 5 illustrates the training of a neural network through various representations.
  • a template image 100 is represented on the figure, and this image 100 is a photograph of a plant having a variety of leafs and a background to be segmented.
  • Each pixel in the template image 100 has a known association with a specific leaf and it is possible to represent the image 100 as the final segmented image 200 shown below the template image 100.
  • the skilled person may refer to the segmented image 200 as the "ground truth”.
  • the row referenced 300 on figure 5 represents the positions of the vectors outputted by the neural network (in this example, the output of the network is in two dimensions) at seven different stages of the training of the neural network in consecutive order from left to right. This training is done by using the stochastic gradient descent method to adjust the neural network after each training iteration.
  • the seven different stages represented in the row 300 correspond respectively to 0, 2, 4, 8, 16, 32, and 64 adjustments to the neural network using the stochastic gradient descent method.
  • the row referenced 400 represents the output of the neural network without a post-processing step.
  • the images of this row are obtained by taking the output of the neural network which delivers vectors having two dimensions, and using each component of each vector respectively as a red value and as green value, the blue value being set at zero (the figure is in greyscale).
  • the row referenced 500 represents the result of a post-processing step in which a thresholding is performed with a radius equal to Sd.
  • Figure 6 illustrates the effect of inputting the coordinates of pixels to the neural network.
  • the wording location awareness refers to the inputting of the coordinates of each pixel to the neural network.
  • the output of the neural network (vectors and corresponding image) is then shown for the two cases which are with and without location awareness.
  • the neural network has difficulties to differentiate the two squares when they respectively are close to the upper left corner and the lower right corner without location awareness.
  • the neural network is always able to differentiate the two squares.
  • the above described embodiments allow obtaining a neural network which can be used for semantic instance segmentation with good results.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

A method for training iteratively a neural network to be used for semantic instance segmentation, wherein, for each iteration, the neural network outputs a vector (10, 20, 30) for each pixel of a template image, wherein the template image comprises predefined elements each associated with pixels of the template image and the corresponding vectors, characterized in that training the neural network is performed using a loss function defined so that the loss function decreases until reaching a target value at least when: - for each vector belonging to an element, the distance between the vector and a center of the vectors of this element decreases, and - the distances between all the centers of the vectors of each element increase. The invention also concerns a method for semantic instance segmentation and corresponding systems.

Description

Method and system for training a neural network to be used for semantic instance segmentation
Field of the invention
The present invention relates to the field of semantic instance segmentation, and more precisely to the training of neural networks used for semantic instance segmentation. Description of the Related Art
Semantic instance segmentation is a method for determining the types of objects in an image, for example acquired by a camera, while being able to differentiate objects of a same type.
In the prior art, both instance segmentation methods and semantic segmentation methods have been proposed and these methods often use neural networks or deep neural networks. Semantic segmentation methods have been used to differentiate objects having different types on an image. A semantic segmentation method cannot differentiate two objects having the same type. For example, if an image to be analyzed comprises two overlapping cars and two overlapping pedestrians, a semantic segmentation method will detect an area in the image corresponding to cars and an area in the image corresponding to pedestrians. Various methods for semantic segmentation have been proposed which generally use (deep) convolutional networks.
Instance segmentation methods only aim at identifying separate objects regardless of their type. If the above-mentioned image is analyzed using an instance segmentation method then what will be detected is four separate objects. Various methods have been proposed to achieve this, and most notably methods using (deep) convolutional networks. Some known methods require specific network architectures or rely on object proposals. For example, some methods use a multistage (or cascaded) pipeline in which the object proposal (or bounding box generation) is followed by a separate segmentation and/or classification step. These methods are not satisfying in terms of speed (because of the multistage computations) and segmentation quality (in particular when faced with occlusions).
It follows from the above that instance segmentation methods and semantic segmentation methods do not provide a complete answer on what actually appears on an image, and semantic instance segmentation methods are thus needed. A desired output of a semantic instance segmentation method for the above image could be a mask highlighting each car and each pedestrian with different colors and labels indicating, for example, carl, car2, pedestrianl, pedestrian2.
Various approaches have been proposed and most notably, some approaches suggest training a neural network to transform an image into a representation that is clustered, and In which each cluster of points corresponds to an instance (sometimes referred to as an element) in the image. This clustered representation may then be post-processed to obtain a representation of the image which highlights the different elements.
It should be noted that training a neural network is an iterative task which can be performed using template images in which each element has already been identified, and a loss-function.
A loss function typically consists in a calculation performed on the output of a neural network to determine if this output is valid, i.e. this output leads to a good detection of each element and its type. The loss function is generally a score which represents how far the output of a neural network is from an expected output.
Defining a loss function for a semantic instance segmentation method is highly critical, and known loss functions are not satisfying.
The document "Semantic Instance Segmentation via Deep Metric Learning" by Fathi et al. (hereinafter "Fathi", available for download on the arXiv pre-print server at the URL https://arxiv.org/pdf/1703.10277.pdf) discloses a known method for semantic instance segmentation.
The method of this document uses a loss function which will ensure that pixels that correspond to a same instance object are close in the space of the output of the neural network (typically, for each pixel in an image, a neural network outputs a vector). This loss function also ensures that pixels that correspond to different objects remain far from each other in the network's output representation. The loss function of this document therefore has a lower value when in the output of a neural network the vectors of pixels of a same object are close and when the vectors of pixels of different objects are far, and a lower value otherwise.
The neural network is then modified by taking into account the result of the loss function so as to obtain, in the next iteration, a lower score for the loss function.
The loss function of this document is unsatisfactory. More precisely, the loss function of this document relies on the random selection of a limited number of vectors for each object in the image, and uses extensive calculations.
The selection of a limited number of vectors also leads to a loss function which can be equal to zero while not all pixels verify the above conditions. This leads to a slow convergence of the training.
It is a primary object of the invention to provide methods and system that overcome the deficiencies of the currently available systems and methods.
Summary of the invention
The present invention overcomes one or more deficiencies of the prior art by proposing a method for training iteratively a neural network to be used for semantic instance segmentation, wherein, for each iteration, the neural network outputs a vector for each pixel of a template image, wherein the template image comprises predefined elements each associated with pixels of the template image and the corresponding vectors.
Training the neural network is performed using a loss function in which:
for each vector belonging to an element, the distance between the vector and a center of the vectors of this element is calculated,
the distances between all the centers of the vectors of each element are calculated,
so that the loss function decreases until reaching a target value (for example zero) at least when: - for each vector belonging to an element, the distance between the vector and a center of the vectors of this element decreases, and
- the distances between all the centers of the vectors of each element increase.
The target value is a value which is desired to obtain for the loss function. When the loss reaches or goes below the target value it can be considered that the training is complete. Optionally, the target value can be predetermined. In some iterations, or when used on a real image, the target value may not be reached. An element may be an object appearing on the template image. In the template image, there is a plurality of elements which may be of the same type or of different types. Each pixel in the template image has a known association with an element.
It has been observed that in the space of the vectors outputted by the neural network, a good semantic instance segmentation is obtained when:
- All the vectors of an element remain close in a cluster of vectors, - Clusters of vectors associated with different elements should be spaced apart.
Preferably, the neural network is a neural network already able to perform semantic segmentation, and the above-defined loss function is defined so as to train the network to also perform instance segmentation. The invention advantageously applies on any already available neural network which may have been trained for semantic segmentation, without requiring modifications of the architecture of the neural network.
The inventors of the present invention have observed that using a neural network which has already been trained for semantic segmentation allows obtaining better results for semantic instance segmentation.
The above loss function allows obtaining this result. Additionally, and contrary to the loss function of document Fathi mentioned above, the loss function of the invention is defined so as to take all the vectors into account (the distance between each vector and the corresponding center is calculated, and the distances between all the centers are calculated).
The use of centers of the vectors of an element allows taking into account all the vectors while limiting the computational requirements. As a matter of fact, if all the vectors of an element are close to their corresponding center, then the vectors of this element are all close together.
Also, if all the different centers remain far away from each other, then the vectors of an element are far away from the vectors of another element.
By way of example, the center of the vectors of an element may be determined as the mean vector of all the vectors belonging to a same element of the template image.
Because the loss function treats all the vectors in a computationally efficient manner, it is possible to obtain a loss which reaches the target value quickly and which actually means that all the vectors meet the expected requirements. A fast convergence of the training is obtained.
According to a particular embodiment, the loss function decreases until reaching the target value at least when for each vector belonging to an element, the distance between the vector and a center of the vectors of this element decreases until this distance is inferior or equal to a first predefined distance threshold.
Thus, when the training is finished or has converged (i.e. when the loss function has reached the target value): all the vectors of an element will be inside a hypersphere centered on the center of the vectors of this element, this hypersphere having a radius equal to the first predefined distance threshold.
According to a particular embodiment, the loss function decreases until reaching the target valueat least when the distances between all the centers of the vectors of each element increases until each of the distances is superior or equal to a second predefined distance threshold.
Thus, when the training is finished or has converged, (i.e. when the loss function has reached the target value): all the vectors of an element will be spaced apart from the vectors of another element by at least the second predefined distance threshold.
If the first and second predefined distance thresholds are used, then all the vectors from an element are within a hypersphere which is spaced apart from other hyperspheres of other elements.
It should be noted that this result cannot be obtained using the loss function disclosed in document Fathi. According to a particular embodiment, the loss function is:
L = · Lvar + β Ldist
with:
Figure imgf000008_0001
a predefined constant,
β a predefined constant,
δν the first predetermined distance threshold,
6d being half the second predetermined distance threshold,
xi a vector of index i,
[x]+ the positive part of x,
||xl - x2\\ the distance between vector xl and vector x2,
C the number of elements in the template image,
Nc the number of vectors in element c,
the center of element c, and
cA and cB elements A and B.
This loss function can be computed efficiently.
According to a particular embodiment, the loss function is further defined so as do decrease until reaching the target value at least when the distances between each center of the vectors of each element and the origin of the space of the vectors decreases.
This prevents the vectors from being too far away from the origin of the space of the vectors. This feature prevents the emergence of mathematical errors (for example infinity errors known to the skilled person).
According to a particular embodiment, the loss function comprises an additional term and is:
Lvar + β Ldist + yLreg
With:
Figure imgf000008_0002
γ being a predefined constant. Thus, Lreg is a term which pulls the vectors towards the origin of the space of the vectors. It should be noted that yis preferably much less than or β as it plays a less preponderant role in the loss function. For example, a or β can have a value equal to 1 and γ can be 0,001.
According to a particular embodiment, the coordinates of each pixel of an image inputted to the neural network, the coordinates of this pixel are inputted to the neural network.
It has been observed by the inventors that elements which have a similar appearance arranged in a specific manner (for example an element in an upper right corner and a similar element in a lower left corner) may not be considered as two separate instances or elements. By inputting the coordinates of the pixels of the template image to the neural network, the neural network receives enough information to differentiate the two elements.
The invention also provides a method for semantic instance segmentation comprising using the neural network trained using the above defined method.
According to a particular embodiment, the method further comprises a post-processing step in which the mean-shift algorithm or the k-means algorithm is applied to the vectors outputted by the neural network.
In the output of the trained network, the vectors are likely to be placed in distinct and separate hyperspheres, this facilitates the implementation of the mean-shift algorithm or of the k-means algorithm. These algorithms facilitate the identification of pixels belonging to an object.
The invention also provides a system for training iteratively a neural network to be used for semantic instance segmentation, wherein, for each iteration, the neural network is configured to output a vector for each pixel of a template image,
wherein the template image comprises predefined elements each associated with pixels of the template image and the corresponding vectors.
The system comprises a module for calculating a loss using a loss function for each iteration, the loss function being defined so as to decrease until reaching a target value at least when: - for each vector belonging to an element, the distance between the vector and a center of the vectors of this element decreases, and
- the distances between all the centers of the vectors of each element increase.
This system may be configured to perform all the embodiments of the method for training a neural network as defined above.
The invention also provides a system for image semantic instance segmentation comprising the neural network trained using the method for training a network as defined above.
In one particular embodiment, the steps of the method for training a neural network and/or the steps of the method for semantic instance segmentation are determined by computer program instructions.
Consequently, the invention is also directed to a computer program for executing the steps of a method as described above when this program is executed by a computer.
This program can use any programming language and take the form of source code, object code or a code intermediate between source code and object code, such as a partially compiled form, or any other desirable form.
The invention is also directed to a computer-readable information medium containing instructions of a computer program as described above.
The information medium can be any entity or device capable of storing the program. For example, the medium can include storage means such as a ROM, for example a CD ROM or a microelectronic circuit ROM, or magnetic storage means, for example a diskette (floppy disk) or a hard disk.
Alternatively, the information medium can be an integrated circuit in which the program is incorporated, the circuit being adapted to execute the method in question or to be used in its execution.
Brief description of the drawings
How the present invention may be put into effect will now be described by way of example with reference to the appended drawings, in which: - figure 1 is a block diagram of an exemplary method for training a neural network,
- figure 2 is a block diagram of an exemplary semantic instance segmentation method,
- figure 3 is a schematic diagram of a system for training a neural network and a system for semantic instance segmentation,
- figure 4 is a representation of the vectors outputted by a neural network,
- figure 5 illustrates the training of a neural network, and
- figure 6 illustrates the effect of inputting the coordinates of pixels to the neural network.
Description of the embodiments A method for training iteratively a neural network is represented on figure 1.
This training is performed using a template image 1 comprising various elements, for example elements of the same type which may or may not overlap (for example two overlapping cars).
The position of each element in the template image is previously known, and each pixel of this template image has a previously known association with an element (for example car number 1, car number 2, background, etc.).
In a first step E01, the neural network to be trained transforms the template image into a plurality of vectors, each vector corresponding to a pixel of the template image. This plurality of vectors is sometimes called a tensor by the person skilled in the art, and this tensor has the same height and width as the template image, but a different depth equal to the length of the vectors.
The length of the vectors can be chosen depending on the neural network to be trained, or depending on the application. All the vectors have the same length and they all belong to the same vector space.
It should be noted that vectors outputted by a neural network are sometimes called pixel embedding by the person skilled in the art.
Preferably, the neural network is initially, before the training, a neural network already able to perform semantic segmentation. The skilled person will know which neural network can already perform semantic segmentation.
By way of example, the neural network may be a neural network known to the skilled person under the name "Segnet" and described in document "A deep convolutional encoder-decoder architecture for image segmentation" (V '. Badrinarayanan et al., arXiv preprint arXiv:1511.00561, 2015. 2), or a neural network described in document "Fully convolutional networks for semantic segmentation" (X Long et al., CVPR, 2015).
The neural network outputs vectors referenced 2 on figure 1.
These vectors are then used to calculate a loss in step E02. The loss is calculated using a loss function which delivers a scalar value which is positive or zero often called a loss.
More precisely, in the loss function, for each vector belonging to an element, the distance between the vector and a center of the vectors of this element is calculated,
the distances between all the centers of the vectors of each element are calculated.
These calculations are used to define a loss function which decreases (between two consecutive iterations) until reaching the target value, which is zero in the present example, at least when:
- for each vector belonging to an element, the distance between the vector and a center of the vectors of this element decreases, and
- the distances between all the centers of the vectors of each element increase.
For example, the loss function L can be a linear combination of two terms:
L = a · Lvar + β · Ldist
With:
a a predefined constant (preferably positive, for example equal to 1), β a predefined constant (preferably positive, for example equal to 1), Lvar a term which decreases until reaching zero at least when for each vector belonging to an element, the distance between the vector and a center of the vectors of this element decreases,
Ldist a term which decreases until reaching zero at least when the distances between all the centers of the vectors of each element increase. a and β may be chosen through a grid search or a hyperparameter search with an evaluation performed on a validation set. This can be performed by trying different settings in a structured way so as to choose the best value for and β.
It should be noted that these values may both be set at 1.
For example, the loss function can decrease until reaching zero at least when for each vector belonging to an element, the distance between the vector and a center of the vectors of this element decreases until this distance is inferior or equal to a first predefined distance threshold.
Thus, this example can be implemented by using a term Ivor written as:
Figure imgf000013_0001
With:
δν the first predetermined distance threshold,
xi a vector of index i,
[x]+ the positive part of x,
11*1 - x2|| the distance between vector xl and vector x2,
C the number of elements in the template image,
Nc the number of vectors in element c, and
the center of element c
It should be noted that the distance can be the LI or L2 distance well known to the person skilled in the art.
Also for example, the loss function can decrease until reaching zero at least when for each vector belonging to an element, the distance between the vector and a center of the vectors of this element decreases until this distance is inferior or equal to a first predefined distance threshold.
Thus, this example can be implemented by using a term Ldist written as:
Figure imgf000013_0002
With (the notations used above for Lvar are also used for Ldist):
5d being half the second predetermined distance threshold, and
cA and cB elements A and B. The two above defined terms Ldist and Lvar are defined so as to ensure that when the loss is equal to zero all the vectors associated with an object are located inside an hypersphere having a radius equal to δν and the centers of all the hyperspheres are separated by at least 2Sd.
Preferably, 5d is superior to 2 δν.
By way of example, it should be noted that the loss function can be further defined so as do decrease until reaching zero at least when the distances between each center of the vectors of each element and the origin of the space of the vectors decreases.
In this example, the loss function comprises an additional term Lreg and is:
L = - Lvar + β · Ldist + yLreg
With:
Figure imgf000014_0001
γ being a predefined constant.
It should be noted that yis preferably much less than a or β as it plays a less preponderant role in the loss function. For example, a or β can have a value equal to 1 and y can be 0,001.
Using for example the above defined functions, it is possible to calculate the loss of step E02. If this loss is equal to zero then it is considered that the neural network is trained. Alternatively, it can be considered that the neural network is trained when the loss is below a predefined threshold.
If the loss is above zero (or above a predefined threshold) then the training is not completed. Step E03 is then performed and in this step, the parameters or weights or the neural network are adjusted using the loss calculated in step E02.
Step E03 can be performed for example using the method known to the skilled person as stochastic gradient descent (step E03).
Then, the next iteration is carried out (step E04) which consists in at least performing steps E01 and E02 with the adjusted neural network.
It should be noted that training the neural network can be performed on a plurality of different template images. Once the neural network has been trained using the method disclosed on figure 1, it can be used for semantic instance segmentation, as represented on figure 2.
The method of figure 2 is performed on an image referenced 3, for example an image which has been acquired by a camera. This image can comprise, for example, two partially overlapping cars and two partially overlapping pedestrians.
In step Ell, the image 3 is inputted to the trained neural network so as to perform semantic instance segmentation.
Vectors 4 are obtained as the output of the trained neural network.
In order to represent the image 3 under a representation in which the different elements of the image are segmented and labelled (for example car number one in a first color, car number two in a second color, pedestrian one in a third color, and pedestrian two in a fourth color), a post-processing step E12 is carried out.
Because the neural network has been trained using the above- defined loss function, the vectors are close to being in separate hyperspheres. It should be noted that in most cases and when used on a real image (and not a template image), the loss is typically slightly above zero.
In order to facilitate the post-processing, a sub-step E120 is performed in which the K-means of mean shift algorithm is used on the vectors so as to group together in clusters the pixels which should belong to the same object. This increases the robustness of the method.
Once the vectors (or pixels) have been grouped in separate clusters, then it is possible to output an image with different colors for each cluster and the post processing is finished. This can be performed by thresholding around said centers with a radius which can be of Sd.
A final image 5 with semantic instance segmentation is outputted.
The steps of the methods described in reference to figures 1 and 2 can be determined by computer instructions. These instructions can be executed on a processor of a computer, as represented on figure 3.
On this figure, a system for training a neural network SI has been represented. The system SI, which can be a computer, comprises a processor PR1 and a non-volatile memory MEM1. In the non-volatile memory MEM1, a set of instructions INST1 is stored. The set of instructions INST1 comprises instructions to perform a method for training a neural network to perform semantic instance segmentation, for example the method described in reference to figure 1.
The non-volatile memory MEM1 further comprises a neural network NN and at least one template image TIMG.
Once trained, the neural network NN can be used in a separate system S2 configured to perform semantic instance segmentation.
By way of example, the neural network NN can be communicated to the system S2 using a communication network INT, for example the Internet.
The system S2 comprises a processor PR2 and a non-volatile memory MEM2 in which a set of instructions INST2 is stored to perform semantic instance segmentation using an image IMG stored in the non- volatile memory MEM2 and the trained neural network TNN also stored in the non-volatile memory MEM2.
Figure 4 is a schematic representation of the vectors outputted by a neural network.
In this example, the neural network which has been used is not fully trained and the loss is not equal to zero or below a predefined threshold. Also, in this example and for the sake of simplicity, the neural network outputs vectors having a length of 2, which allows using a two- dimensional representation.
The various vectors outputted by the neural network are represented as dots 10, 20, and 30 each associated with a pixel of the template image inputted to the neural network. Each pixel of the template image has a known association with an element visible on the template image. Thus, the same is also true for the vectors outputted by the neural network.
On figure 4, the vectors referenced 10 are all associated with a first object, the vectors referenced 20 are all associated with a second object, and the vectors referenced 30 are all associated with a third object. Even if the training of the neural network is still being performed, the vectors 10, 20 and 30 already substantially form clusters of vectors respectively referenced CI, C2, and C3. It is then possible to determine the centers 11, 12, and 13, respectively of cluster CI, cluster C2, and cluster C3.
These centers are used to calculate the distances between each vector 10 and the center 11, the distances between each vector 20 and the center 12, and the distances between each vector 30 and the center 13.
Additionally, the distances between all the centers 11, 12 and 13 are computed.
The loss function is defined so that the vectors 10 get closer (after each iteration of the training) to the center 11, and more precisely so that the vectors 10 are within a distance from the center which is less than a first predefined distance threshold δν. The vectors 10 are expected to all be inside the circle having a radius δν and a center 11 represented on the figure.
Similarly, the vectors 20 are expected to all be inside the circle having a radius δν and a center 12 represented on the figure, and the vectors 30 are expected to all be inside the circle having a radius δν and a center 13 represented on the figure.
The movements of the vectors towards their center have been represented using thin arrows on the figure.
The loss function is further defined so that the centers 11, 12 and 13 get further away from each other (after each iteration of the training). More precisely, so that the centers are each separated by a second predefined distance threshold equal to 2δά.
The circles of radius δά and centers 11, 12 and 13 are also represented on the figures. The movements of the vectors carried out to move the clusters away from each other are represented using thick arrows on the figure.
Figure 5 illustrates the training of a neural network through various representations.
A template image 100 is represented on the figure, and this image 100 is a photograph of a plant having a variety of leafs and a background to be segmented.
Each pixel in the template image 100 has a known association with a specific leaf and it is possible to represent the image 100 as the final segmented image 200 shown below the template image 100. The skilled person may refer to the segmented image 200 as the "ground truth".
The row referenced 300 on figure 5 represents the positions of the vectors outputted by the neural network (in this example, the output of the network is in two dimensions) at seven different stages of the training of the neural network in consecutive order from left to right. This training is done by using the stochastic gradient descent method to adjust the neural network after each training iteration.
The seven different stages represented in the row 300 correspond respectively to 0, 2, 4, 8, 16, 32, and 64 adjustments to the neural network using the stochastic gradient descent method.
As can be seen in the last stage, the vectors are all placed in non- overlapping circles. These circles have a radius equal to Sd described in reference to figure 4.
The row referenced 400 represents the output of the neural network without a post-processing step. The images of this row are obtained by taking the output of the neural network which delivers vectors having two dimensions, and using each component of each vector respectively as a red value and as green value, the blue value being set at zero (the figure is in greyscale). The row referenced 500 represents the result of a post-processing step in which a thresholding is performed with a radius equal to Sd.
Figure 6 illustrates the effect of inputting the coordinates of pixels to the neural network.
On this figures three different input images are used in which two similar elements (a square) are placed an upper left position and at a lower right position. The two squares are spaced differently in the three input images.
On the figures, the wording location awareness refers to the inputting of the coordinates of each pixel to the neural network.
The output of the neural network (vectors and corresponding image) is then shown for the two cases which are with and without location awareness.
It can be seen that the neural network has difficulties to differentiate the two squares when they respectively are close to the upper left corner and the lower right corner without location awareness. However, by inputting the coordinates of the pixels of the image to the neural network, the neural network is always able to differentiate the two squares.
The above described embodiments allow obtaining a neural network which can be used for semantic instance segmentation with good results.
Very few mistakes are seen on the template images at the end of the training, because the loss function can reach a value of zero which truly indicates that the training on a template image is complete.
Using at least the metric known to the skilled person as Symmetric
Best Dice (SBD - disclosed in document "Leaf segmentation in plant phenotyping: a collation study", H. Scharr et al, Machine vision and applications, 27(4):585-606), it is possible to obtain an SBD score of 84.2.
The neural networks obtained using the above embodiments therefore provide good results.

Claims

1. A method for training iteratively a neural network to be used for semantic instance segmentation, wherein, for each iteration, the neural network outputs a vector (10, 20, 30) for each pixel of a template image, wherein the template image comprises predefined elements each associated with pixels of the template image and the corresponding vectors,
characterized in that training the neural network is performed using a loss function (L) in which:
for each vector belonging to an element, the distance between the vector and a center (11, 12, 13) of the vectors of this element is calculated,
the distances between all the centers of the vectors of each element are calculated,
so that the loss function decreases until reaching a target value at least when:
- for each vector belonging to an element, the distance between the vector and a center of the vectors of this element decreases, and
- the distances between all the centers of the vectors of each element increase.
2. The method according to claim 1, wherein the loss function decreases until reaching the target value at least when for each vector belonging to an element, the distance between the vector and a center of the vectors of this element decreases until this distance is inferior or equal to a first predefined distance threshold (δν).
3. The method according to claim 1 or 2, wherein the loss function decreases until reaching the target value at least when the distances between all the centers of the vectors of each element increases until each of the distances is superior or equal to a second predefined distance threshold (Sd).
4. The method according to any one of claims 1 to 3, wherein the loss function is: L = a - Lvar + β Ldist
with:
Ldist \
Figure imgf000021_0001
a a predefined constant,
β a predefined constant,
δν the first predetermined distance threshold,
Sd being half the second predetermined distance threshold,
xi a vector of index i,
[x]+ the positive part of x,
11*1 - x2\\ the distance between vector xl and vector x2,
C the number of elements in the template image,
Nc the number of vectors in element c,
μ the center of element c, and
cA and cB elements A and B.
5. The method according to any one of claims 1 to 4, wherein the loss function is further defined so as do decrease until reaching the target value at least when the distances between each center of the vectors of each element and the origin of the space of the vectors decreases.
6. The method according to the combination of claims 4 and 5, wherein the loss function comprises an additional term and is:
L = a - Lvar + β · Ldist + yLreg
With:
Figure imgf000021_0002
γ being a predefined constant.
7. The method according to any one of claims 1 to 6, wherein the coordinates of each pixel of an image inputted to the neural network, the coordinates of this pixel are inputted to the neural network.
8. A method for semantic instance segmentation comprising using the neural network trained using the method pursuant to any one of claims 1 to 7 on an image.
9. The method of claim 7, further comprising a post-processing step in which the mean-shift algorithm or the k-means algorithm is applied to the vectors outputted by the neural network.
10. A system for training iteratively a neural network to be used for semantic instance segmentation, wherein, for each iteration, the neural network (NN) is configured to output a vector for each pixel of a template image (TIMG),
wherein the template image comprises predefined elements each associated with pixels of the template image and the corresponding vectors,
characterized in that the system comprises a module (PR1, INST1) for calculating a loss using a loss function in which:
for each vector belonging to an element, the distance between the vector and a center of the vectors of this element is calculated,
the distances between all the centers of the vectors of each element are calculated,
so that the loss function decreases until reaching a target value at least when:
- for each vector belonging to an element, the distance between the vector and a center of the vectors of this element decreases, and
- the distances between all the centers of the vectors of each element increase.
11. A system for image semantic instance segmentation comprising the neural network trained using the method pursuant to any one of claims 1 to 7.
12. A computer program including instructions for executing the steps of a method according to any one of claims 1 to 9 when said program is executed by a computer.
13. A recording medium readable by a computer and having recorded thereon a computer program including instructions for executing the steps of a method according to any one of claims 1 to 9.
PCT/EP2017/068550 2017-07-21 2017-07-21 Method and system for training a neural network to be used for semantic instance segmentation WO2019015785A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/EP2017/068550 WO2019015785A1 (en) 2017-07-21 2017-07-21 Method and system for training a neural network to be used for semantic instance segmentation
JP2020502990A JP6989688B2 (en) 2017-07-21 2017-07-21 Methods and systems for training neural networks used for semantic instance segmentation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2017/068550 WO2019015785A1 (en) 2017-07-21 2017-07-21 Method and system for training a neural network to be used for semantic instance segmentation

Publications (1)

Publication Number Publication Date
WO2019015785A1 true WO2019015785A1 (en) 2019-01-24

Family

ID=59581854

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2017/068550 WO2019015785A1 (en) 2017-07-21 2017-07-21 Method and system for training a neural network to be used for semantic instance segmentation

Country Status (2)

Country Link
JP (1) JP6989688B2 (en)
WO (1) WO2019015785A1 (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110751659A (en) * 2019-09-27 2020-02-04 北京小米移动软件有限公司 Image segmentation method and device, terminal and storage medium
CN110766281A (en) * 2019-09-20 2020-02-07 国网宁夏电力有限公司电力科学研究院 Transmission conductor wind damage early warning method and terminal based on deep learning
CN110765916A (en) * 2019-10-17 2020-02-07 北京中科原动力科技有限公司 Farmland seedling ridge identification method and system based on semantics and example segmentation
CN111028195A (en) * 2019-10-24 2020-04-17 西安电子科技大学 Example segmentation based redirected image quality information processing method and system
CN111210452A (en) * 2019-12-30 2020-05-29 西南交通大学 Certificate photo portrait segmentation method based on graph segmentation and mean shift
CN111507343A (en) * 2019-01-30 2020-08-07 广州市百果园信息技术有限公司 Training of semantic segmentation network and image processing method and device thereof
CN111709293A (en) * 2020-05-18 2020-09-25 杭州电子科技大学 Chemical structural formula segmentation method based on Resunet neural network
CN111967373A (en) * 2020-08-14 2020-11-20 东南大学 Self-adaptive enhanced fusion real-time instance segmentation method based on camera and laser radar
US20210080590A1 (en) * 2018-08-03 2021-03-18 GM Global Technology Operations LLC Conflict resolver for a lidar data segmentation system of an autonomous vehicle
KR20210074353A (en) * 2019-02-25 2021-06-21 텐센트 테크놀로지(센젠) 컴퍼니 리미티드 Point cloud segmentation method, computer readable storage medium and computer device
CN113673505A (en) * 2021-06-29 2021-11-19 北京旷视科技有限公司 Example segmentation model training method, device and system and storage medium
CN114529191A (en) * 2022-02-16 2022-05-24 支付宝(杭州)信息技术有限公司 Method and apparatus for risk identification
US11562171B2 (en) 2018-12-21 2023-01-24 Osaro Instance segmentation by instance label factorization

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7430815B2 (en) * 2020-09-29 2024-02-13 富士フイルム株式会社 Information processing device, information processing method, and information processing program
CN112560496B (en) * 2020-12-09 2024-02-02 北京百度网讯科技有限公司 Training method and device of semantic analysis model, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080292194A1 (en) * 2005-04-27 2008-11-27 Mark Schmidt Method and System for Automatic Detection and Segmentation of Tumors and Associated Edema (Swelling) in Magnetic Resonance (Mri) Images
CN106897390A (en) * 2017-01-24 2017-06-27 北京大学 Target precise search method based on depth measure study
US9704257B1 (en) * 2016-03-25 2017-07-11 Mitsubishi Electric Research Laboratories, Inc. System and method for semantic segmentation using Gaussian random field network

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3492991B2 (en) * 2000-09-13 2004-02-03 株式会社東芝 Image processing apparatus, image processing method, and recording medium
JP4799105B2 (en) * 2005-09-26 2011-10-26 キヤノン株式会社 Information processing apparatus and control method therefor, computer program, and storage medium
US10043112B2 (en) * 2014-03-07 2018-08-07 Qualcomm Incorporated Photo management

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080292194A1 (en) * 2005-04-27 2008-11-27 Mark Schmidt Method and System for Automatic Detection and Segmentation of Tumors and Associated Edema (Swelling) in Magnetic Resonance (Mri) Images
US9704257B1 (en) * 2016-03-25 2017-07-11 Mitsubishi Electric Research Laboratories, Inc. System and method for semantic segmentation using Gaussian random field network
CN106897390A (en) * 2017-01-24 2017-06-27 北京大学 Target precise search method based on depth measure study

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
BERT DE BRABANDERE ET AL: "Semantic Instance Segmentation for Autonomous Driving", HTTP://JUXI.NET/WORKSHOP/DEEP-LEARNING-ROBOTIC-VISION-CVPR-2017/PAPERS/, 19 May 2017 (2017-05-19), XP055462033, Retrieved from the Internet <URL:http://juxi.net/workshop/deep-learning-robotic-vision-cvpr-2017/papers/16.pdf> [retrieved on 20180322] *
FATHI ET AL., SEMANTIC INSTANCE SEGMENTATION VIA DEEP METRIC LEARNING, Retrieved from the Internet <URL:https://arxiv.org/pdf/1703.10277.pdf>
H. SCHARR ET AL.: "Leaf segmentation in plant phenotyping: a collation study", MACHINE VISION AND APPLICATIONS, vol. 27, no. 4, pages 585 - 606, XP035857192, DOI: doi:10.1007/s00138-015-0737-3
J. LONG ET AL.: "Fully convolutional networks for semantic segmentation", CVPR, 2015
KILIAN Q WEINBERGER ET AL: "Distance Metric Learning for Large Margin Nearest Neighbor Classification", JOURNAL OF MACHINE LEARNING RESEARCH, MIT PRESS, CAMBRIDGE, MA, US, vol. 10, 2 June 2009 (2009-06-02), pages 207 - 244, XP058264216, ISSN: 1532-4435 *
V. BADRINARAYANAN ET AL., A DEEP CONVOLUTIONAL ENCODER-DECODER ARCHITECTURE FOR IMAGE SEGMENTATION, vol. 2, 2015

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210080590A1 (en) * 2018-08-03 2021-03-18 GM Global Technology Operations LLC Conflict resolver for a lidar data segmentation system of an autonomous vehicle
US11915427B2 (en) * 2018-08-03 2024-02-27 GM Global Technology Operations LLC Conflict resolver for a lidar data segmentation system of an autonomous vehicle
US11562171B2 (en) 2018-12-21 2023-01-24 Osaro Instance segmentation by instance label factorization
CN111507343B (en) * 2019-01-30 2021-05-18 广州市百果园信息技术有限公司 Training of semantic segmentation network and image processing method and device thereof
CN111507343A (en) * 2019-01-30 2020-08-07 广州市百果园信息技术有限公司 Training of semantic segmentation network and image processing method and device thereof
KR102510745B1 (en) 2019-02-25 2023-03-15 텐센트 테크놀로지(센젠) 컴퍼니 리미티드 Point cloud segmentation method, computer readable storage medium and computer device
KR20210074353A (en) * 2019-02-25 2021-06-21 텐센트 테크놀로지(센젠) 컴퍼니 리미티드 Point cloud segmentation method, computer readable storage medium and computer device
CN110766281B (en) * 2019-09-20 2022-04-26 国网宁夏电力有限公司电力科学研究院 Transmission conductor wind damage early warning method and terminal based on deep learning
CN110766281A (en) * 2019-09-20 2020-02-07 国网宁夏电力有限公司电力科学研究院 Transmission conductor wind damage early warning method and terminal based on deep learning
CN110751659A (en) * 2019-09-27 2020-02-04 北京小米移动软件有限公司 Image segmentation method and device, terminal and storage medium
CN110751659B (en) * 2019-09-27 2022-06-10 北京小米移动软件有限公司 Image segmentation method and device, terminal and storage medium
CN110765916A (en) * 2019-10-17 2020-02-07 北京中科原动力科技有限公司 Farmland seedling ridge identification method and system based on semantics and example segmentation
CN110765916B (en) * 2019-10-17 2022-08-30 北京中科原动力科技有限公司 Farmland seedling ridge identification method and system based on semantics and example segmentation
CN111028195A (en) * 2019-10-24 2020-04-17 西安电子科技大学 Example segmentation based redirected image quality information processing method and system
CN111210452A (en) * 2019-12-30 2020-05-29 西南交通大学 Certificate photo portrait segmentation method based on graph segmentation and mean shift
CN111210452B (en) * 2019-12-30 2023-04-07 西南交通大学 Certificate photo portrait segmentation method based on graph segmentation and mean shift
CN111709293B (en) * 2020-05-18 2023-10-03 杭州电子科技大学 Chemical structural formula segmentation method based on Resunet neural network
CN111709293A (en) * 2020-05-18 2020-09-25 杭州电子科技大学 Chemical structural formula segmentation method based on Resunet neural network
CN111967373B (en) * 2020-08-14 2021-03-30 东南大学 Self-adaptive enhanced fusion real-time instance segmentation method based on camera and laser radar
CN111967373A (en) * 2020-08-14 2020-11-20 东南大学 Self-adaptive enhanced fusion real-time instance segmentation method based on camera and laser radar
CN113673505A (en) * 2021-06-29 2021-11-19 北京旷视科技有限公司 Example segmentation model training method, device and system and storage medium
CN114529191A (en) * 2022-02-16 2022-05-24 支付宝(杭州)信息技术有限公司 Method and apparatus for risk identification

Also Published As

Publication number Publication date
JP6989688B2 (en) 2022-01-05
JP2020527812A (en) 2020-09-10

Similar Documents

Publication Publication Date Title
WO2019015785A1 (en) Method and system for training a neural network to be used for semantic instance segmentation
Neamah et al. Discriminative features mining for offline handwritten signature verification
CN108197644A (en) A kind of image-recognizing method and device
CN109740606B (en) Image identification method and device
US8285056B2 (en) Method and apparatus for computing degree of matching
CN104866868A (en) Metal coin identification method based on deep neural network and apparatus thereof
JP2014041476A (en) Image processing apparatus, image processing method, and program
CN106408037A (en) Image recognition method and apparatus
JP6107531B2 (en) Feature extraction program and information processing apparatus
US11417129B2 (en) Object identification image device, method, and computer program product
JP2010067252A (en) Object region extraction device and object region extraction program
KR102166117B1 (en) Semantic matchaing apparatus and method
JP5430243B2 (en) Image search apparatus, control method therefor, and program
CN111160142B (en) Certificate bill positioning detection method based on numerical prediction regression model
JPWO2015068417A1 (en) Image collation system, image collation method and program
Omarov et al. Machine learning based pattern recognition and classification framework development
Perwej et al. The Kingdom of Saudi Arabia Vehicle License Plate Recognition using Learning Vector Quantization Artificial Neural Network
KR20190134380A (en) A Method of Association Learning for Domain Invariant Human Classifier with Convolutional Neural Networks and the method thereof
Kurlin et al. A persistence-based approach to automatic detection of line segments in images
CN114170465A (en) Attention mechanism-based 3D point cloud classification method, terminal device and storage medium
Patil et al. Deep learning-based approach for indian license plate recognition using optical character recognition
Mitra et al. Machine learning approach for signature recognition by harris and surf features detector
CN105868789B (en) A kind of target detection method estimated based on image-region cohesion
Khan et al. A new feedback-based method for parameter adaptation in image processing routines
Sun et al. An edge detection method based on adjacent dispersion

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17751279

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020502990

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17751279

Country of ref document: EP

Kind code of ref document: A1