EP4233014A1

EP4233014A1 - Method for classifying an input image representing a particle in a sample

Info

Publication number: EP4233014A1
Application number: EP21807185.0A
Authority: EP
Inventors: Pierre Mahé; Meriem EL AZAMI; Elodie DEGOUT-CHARMETTE; Zohreh SEDAGHAT; Quentin JOSSO; Fabian ROL
Original assignee: Biomerieux SA; Bioaster
Current assignee: Biomerieux SA; Bioaster
Priority date: 2020-10-20
Filing date: 2021-10-19
Publication date: 2023-08-30
Also published as: JP2023546191A; FR3115386A1; WO2022084618A1; CN116868237A; US20240020949A1

Abstract

The invention relates to a method for classifying at least one input image representing a target particle (11a-11f) in a sample (12), the method being characterized in that it involves implementing, by data processing means (20) of a client (2), steps of: (b) extracting the characteristic map of the target particle (11a-11f) by means of a convolutional neural network pre-trained on a base of public images; (c) classifying the input image according to said extracted characteristic map.

Description

Method for classifying an input image representing a particle in a sample

GENERAL TECHNICAL AREA

The present invention relates to the field of optical acquisition of biological particles. The biological particles can be microorganisms such as bacteria, fungi or yeasts for example. It can also be cells, multicellular organisms, or any other particle of the polluting particle type, dust.

The invention finds a particularly advantageous application for analyzing the state of a biological particle, for example to know the metabolic state of a bacterium following the application of an antibiotic. The invention makes it possible, for example, to carry out an antibiogram of a bacterium.

STATE OF THE ART

An antibiogram is a laboratory technique aimed at testing the phenotype of a bacterial strain against one or more antibiotics. An antibiogram is conventionally carried out by culturing a sample containing bacteria and an antibiotic.

European patent application No. 2,603,601 describes a method for carrying out an antibiogram by visualizing the state of the bacteria after an incubation period in the presence of an antibiotic. To visualize the bacteria, the bacteria are labeled with fluorescent markers to reveal their structures. The measurement of the fluorescence of the markers then makes it possible to determine whether the antibiotic has acted effectively on the bacteria.

The conventional process for determining the antibiotics that are effective on a bacterial strain consists of taking a sample containing said strain (eg from a patient, an animal, a food batch, etc.) then sending the sample to an analysis center. When the analysis center receives the sample, it first proceeds to the culture of the bacterial strain to obtain at least one colony of it, culture between 24 hours and 72 time. He then prepares from this colony several samples comprising different antibiotics and/or different concentrations of antibiotics, then puts the samples again to incubate. After a new culture period also between 24 and 72 hours, each sample is analyzed manually to determine whether the antibiotic has acted effectively. The results are then transmitted to the practitioner to apply the most effective antibiotic and/or antibiotic concentration.

However, the labeling process is particularly long and complex to perform and these chemical markers have a cytotoxic effect on bacteria. It follows that this mode of visualization does not make it possible to observe the bacteria at several times during the culture of the bacteria, hence the need to use a sufficiently long culture time, of the order of 24 to 72 hours. , to guarantee the reliability of the measurement. Other methods of viewing biological particles use a microscope, allowing non-destructive measurement of a sample.

Digital holographic microscopy or DHM (Digital Holography Microscopy) is an imaging technique that overcomes the depth of field constraints of conventional optical microscopy. Schematically, it consists in recording a hologram formed by the interference between the light waves diffracted by the observed object and a reference wave exhibiting spatial coherence. This technique is described in the review article by Myung K. Kim entitled “Principles and techniques of digital holography microscopy” published in SPIE Reviews Vol. 1 , No. 1, January 2010.

Recently, it has been proposed to use digital holographic microscopy to identify microorganisms in an automated way. Thus, the international application WO2017/207184 describes a method for acquiring a particle integrating a simple acquisition without focusing associated with a digital reconstruction of the focusing, making it possible to observe a biological particle while limiting the acquisition time.

Typically, this solution makes it possible to detect the structural modifications of a bacterium in the presence of an antibiotic after an incubation of only about ten minutes, and its sensitivity after two hours (detection of the presence or absence of a division or a pattern encoding the division) unlike the conventional process previously described which can take several days. Indeed, the measurements being non-destructive, it is possible to carry out analyzes very early in the culture process without risking destroying the sample and therefore prolonging the analysis time.

It is even possible to follow a particle over several successive images so as to form a film representing the evolution of a particle over time (since the particles are not altered after the first analysis) in order to visualize its behavior, by example its speed of movement or its process of cell division.

It is therefore understood that the visualization process gives excellent results. The difficulty lies in the interpretation in itself of these images or this film if one wishes for example to conclude on the susceptibility of a bacterium to the antibiotic present in the sample, in particular automatically.

Various techniques have been proposed, ranging from simple counting of bacteria over time to so-called morphological analysis aimed at detecting specific "configurations" by image analysis. For example, when a bacterium prepares for division, there appear two poles in the distribution, well before the division itself which results in two distinct portions of the distribution.

It was proposed in the article Choi, J., Yoo, J., Lee, M., et al. (2014). A rapid antimicrobial susceptibility test based on single-cell morphological analysis. Science Translational Medicine, 6(267). https://doi.org/10.1126/scitranslmed.3009650 to combine the two techniques to assess an antibiotic effect. However, as underlined by the authors, their approach requires a very fine calibration of a certain number of thresholds which strongly depend on the nature of the morphological changes caused by the antibiotics.

More recently, Yu, H., Jing, W., Iriya, R., et al. (2018). Phenotypic Antimicrobial Susceptibility Testing with Deep Learning Video Microscopy. Analytical Chemistry, 90(10), 6314-6322. https://doi.org/10.1021/acs.analchem.8b01128 describes an approach based on deep learning. The authors propose to extract the morphological characteristics as well as characteristics related to the movement of bacteria using a convolutional neural network (CNN). However, this solution turns out to be very cumbersome in terms of terms of computing resources, and requires a large base of training images to train the CNN.

The objective technical problem of the present invention is, therefore, to be able to have a solution that is both more efficient and lighter for classifying images of a biological particle.

PRESENTATION OF THE INVENTION

According to a first aspect, the present invention relates to a method for classifying at least one input image representing a target particle in a sample, the method being characterized in that it comprises the implementation, by processing means customer data, steps of:

(b) Extraction of a feature map of said target particle by means of a pre-trained convolutional neural network on a public image database;

(c) Classification of said input image based on said extracted feature map.

According to advantageous and non-limiting characteristics:

The particles are represented in a homogeneous way in the input image and in each elementary image, in particular centered and aligned according to a predetermined direction.

The method comprises a step (a) of extracting said input image from a global image of the sample, so as to represent said target particle in said homogeneous manner.

Step (a) comprises segmenting said global image so as to detect said target particle in the sample, then cropping the input image onto said detected target particle.

Step (a) comprises obtaining said overall image from an intensity image of the sample acquired by an observation device.

Step (b) is implemented using a feature extraction subnet of said pre-trained convolutional neural network. Said pre-trained convolutional neural network is an image classification network, in particular of the VGG, AlexNet, Inception or ResNet type.

A global pooling layer is added at the end of said feature extraction sub-network so that the extracted feature map has a spatial size of 1x1.

Step (c) is implemented by means of a classifier, the method comprising a step (aO) of learning, by data processing means of a server, the parameters of said classifier from a learning base of maps of already classified characteristics of particles in said sample.

Said classifier is chosen from among a support vector machine, a k-nearest neighbors algorithm, or a convolutional neural network.

Step (c) involves reducing the number of variables in the feature map using the t-SNE algorithm.

The method is a method of classifying a sequence of input images representing said target particle in a sample over time, wherein step (b) comprises concatenating the extracted feature maps for each image of entry of said sequence.

According to a second aspect, there is proposed a system for classifying at least one input image representing a target particle in a sample comprising at least one client comprising data processing means, characterized in that said data processing means are configured to implement:

- the extraction of a map of characteristics of said target particle by means of a pre-trained convolutional neural network on a public image database;

- the classification of said input image according to said extracted feature map.

According to advantageous and non-limiting characteristics, the system also comprises a device for observing said target particle in the sample. According to a third and a fourth aspect there is provided a computer program product comprising code instructions for performing a method according to the first aspect for classifying at least one input image representing a target particle in a sample ; and a computer-readable storage means on which a computer program product includes code instructions for performing a method according to the first classification aspect of at least one input image representing a target particle in a sample.

PRESENTATION OF FIGURES

Other characteristics and advantages of the present invention will appear on reading the following description of a preferred embodiment. This description will be given with reference to the appended drawings in which:

- Figure 1 is a diagram of an architecture for implementing the method according to the invention;

- Figure 2a shows an example of a device for observing particles in a sample used in a preferred embodiment of the method according to the invention;

- Figure 3a illustrates the obtaining of the input image in an embodiment of the method according to the invention;

- Figure 3b illustrates the obtaining of the input image in a preferred embodiment of the method according to the invention;

- Figure 4 shows the steps of a preferred embodiment of the method according to the invention;

- Figure 5 shows an example of convolutional neural network architecture used in a preferred embodiment of the method according to the invention;

- Figure 6 shows an example of t-SNE projection used in a preferred embodiment of the method according to the invention. DETAILED DESCRIPTION

Architecture

The invention relates to a method for classifying at least one input image representative of a particle 11a-11f present in a sample 12, referred to as the target particle. It should be noted that the method can be implemented in parallel for all or some of the particles 11a-11f present in a sample 12, each being considered a target particle in turn.

As will be seen, this method may include one or more machine learning components, and in particular one or more classifiers, including a convolutional neural network, CNN.

The input or training data are of the image type, and represent the target particle 11a-11f in a sample 12 (in other words, these are images of the sample in which the target particle is visible). As will be seen, one can have as input a sequence of images of the same target particle 11a-11f (and if necessary a plurality of sequences of images of particles 11a-11f of the sample 12 if several particles are considered).

Sample 12 consists of a liquid such as water, a buffer solution, a culture medium or a reactive medium (including or not including an antibiotic), in which the particles 11a-11f to be observed are found.

Alternatively, sample 12 may be in the form of a solid medium, preferably translucent, such as agar-agar, in which particles 11a-11f are found. Sample 12 can also be a gaseous medium. Particles 11a-11f can be located inside the medium or on the surface of sample 12.

The particles 11a-11f can be microorganisms such as bacteria, fungi or yeasts. It can also be cells, multicellular organisms, or any other particle of the polluting particle type, dust. In the rest of the description, we will take the preferred example in which the particle is a bacterium (and as we will see the sample 12 incorporates an antibiotic) The size of the particles 11a-11f observed varies between 500 nm and several hundred of pm, even a few millimeters. The “classification” of an input image (or of a sequence of input images) consists in the determination of at least one class among a set of possible classes descriptive of the image. For example, in the case of bacteria-type particles, there can be a binary classification, ie two possible classes of “division” or “no division” effect, respectively testifying to resistance or not to an antibiotic. The present invention will not be limited to any kind of particular classification, even if the example of a binary classification of the effect of an antibiotic on said target particle 11a-11f will mainly be described.

The present methods are implemented within an architecture such as represented by FIG. 1, thanks to a server 1 and a client 2. The server 1 is the learning equipment (implementing the learning method ) and the client 2 is user equipment (implementing the classification method), for example a terminal of a doctor or a hospital.

It is quite possible that the two devices 1, 2 are combined, but preferably the server 1 is a remote device, and the client 2 is a device for the general public, in particular an office computer, a laptop, etc. The client equipment 2 is advantageously connected to an observation device 10, so as to be able to directly acquire said input image (or as will be seen later “raw” acquisition data such as a global image sample 12, or even electromagnetic matrices), typically to process it live, alternatively the input image will be loaded onto the client equipment 2.

In all cases, each device 1, 2 is typically a remote computer device connected to a local network or a wide area network such as the Internet network for the exchange of data. Each comprises data processing means 3, 20 of the processor type, and data storage means 4, 21 such as a computer memory, for example a flash memory or a hard disk. Client 2 typically includes a user interface 22 such as a screen for interacting.

The server 1 advantageously stores a training database, ie a set of images of particles 11a-11f under various conditions (see below) and/or a set of maps of characteristics already classified (for example, associated with “split” or “splitless” labels indicating antibiotic susceptibility or resistance). Note that the learning data may be associated with labels defining the test conditions, for example indicating for bacteria cultures "strains", "antibiotic conditions", "time", etc.

Acquisition

Although as explained the present method can directly take as input any image of the target particle 11a-11f, obtained in any way. Preferably, the present method begins with a step (a) of obtaining the input image from data provided by an observation device 10.

In a known manner, those skilled in the art may use DHM digital holographic microscopy techniques, in particular as described in international application WO2017/207184. In particular, it will be possible to acquire an intensity image of the sample 12 called a hologram, which is not focused on the target particle (we speak of an “out-of-focus” image), and which can be processed by data processing means (integrated into the device 10 or those 20 of the client 2 for example, see below). It is understood that the hologram "represents" in a certain way all the particles 11a-11f in the sample.

2 illustrates an example of a device 10 for observing a particle 11a-11f present in a sample 12. The sample 12 is placed between a light source 15, which is spatially and temporally coherent (e.g. a laser) or pseudo- coherent (e.g. a light-emitting diode, a laser diode), and a digital sensor 16 sensitive in the spectral range of the light source. Preferably, the light source 15 has a low spectral width, for example less than 200 nm, less than 100 nm or even less than 25 nm. In the following, reference is made to the central emission wavelength of the light source, for example in the visible range. The light source 15 emits a coherent signal Sn oriented on a first face 13 of the sample, for example conveyed by a waveguide such as an optical fiber.

The sample 12 (as typically explained a culture medium) is contained in an analysis chamber, delimited vertically by a blade lower and an upper slide, for example conventional microscope slides. The analysis chamber is delimited laterally by an adhesive or by any other waterproof material. The lower and upper blades are transparent to the wavelength of the light source 15, the sample and the chamber allowing for example more than 50% of the wavelength of the light source to pass under normal incidence on the lower blade.

Preferably, the particles 11a-11f are placed in the sample 12 at the level of the upper blade. The underside of the upper blade comprises for this purpose ligands making it possible to attach the particles, for example polycations (e.g. poly-Llysine) in the context of microorganisms This makes it possible to contain the particles in a thickness equal to, or close to , the depth of field of the optical system, namely in a thickness less than 1 mm (e.g. tube lens), and preferably less than 100 pm (e.g. microscope objective). Particles 11a-11f can nevertheless move in sample 12.

Preferably, the device comprises an optical system 23 consisting, for example, of a microscope objective and of a tube lens, placed in the air and at a fixed distance from the sample. The 23 optical system is optionally equipped with a filter that can be located in front of the objective or between the objective and the tube lens. The optical system 23 is characterized by its optical axis, its object plane, also called focusing plane, at a distance from the objective, and its image plane, conjugate of the object plane by the optical system . In other words, to an object located in the object plane, corresponds a sharp image of this object in the image plane, also called the focal plane. The optical properties of system 23 are fixed (e.g. fixed focal length optics). The object and image planes are orthogonal to the optical axis.

The image sensor 16 is located, facing a second face 14 of the sample, in the focal plane or close to the latter. The sensor, for example a CCD or CMOS sensor, comprises a periodic two-dimensional network of sensitive elementary sites, and proximity electronics which regulate the exposure time and the resetting of the sites, in a manner known per se. The output signal of an elementary site is a function of the quantity of radiation of the incident spectral range on said site during the exposure time. This signal is then converted, for example by the proximity electronics, into an image point, or “pixel”, of a digital image. The sensor thus produces a digital image in the form of a matrix with C columns and L rows. Each pixel of this matrix, of coordinates (c, I) in the matrix, corresponds in a manner known per se to a position of Cartesian coordinates (x(c, I), y(c, I)) in the focal plane of the optical system 23, for example the position of the center of the elementary sensitive site of rectangular shape.

The pitch and the filling factor of the periodic grating are chosen to respect the Shannon-Nyquist criterion with respect to the size of the particles observed, so as to define at least two pixels per particle. Thus, the image sensor 16 acquires a transmission image of the sample in the spectral range of the light source.

The image acquired by the image sensor 16 includes holographic information insofar as it results from the interference between a wave diffracted by the particles 11a-11f and a reference wave having passed through the sample without having interacted with him. It is obviously understood, as described above, that in the context of a CMOS or CCD sensor, the acquired digital image is an intensity image, the phase information therefore being coded here in this intensity image.

Alternatively, it is possible to divide the coherent signal Sn coming from the light source 15 into two components, for example by means of a semi-transparent plate. The first component then serves as a reference wave and the second component is diffracted by the sample 12, the image in the image plane of the optical system 23 resulting from the interference between the diffracted wave and the reference wave.

Referring to Figure 3a, it is possible in step (a) to reconstruct from the hologram at least one global image of the sample 12, then to extract said input image from the global image of the sample.

It is in fact understood that the target particle 11a-11f must be represented in a homogeneous manner in the input image, in particular centered and aligned in a predetermined direction (for example the horizontal direction). The input images must also have a standardized size (It is also desirable that only the target particle 11a-11f is seen in the input image). We calls the input image “thumbnail” in this way, we can define for example a size of 250x250 pixels. In the case of a sequence of input images, one image is taken for example per minute during a time interval of 120 minutes, the sequence thus forming a 3D “stack” of size 250×250×120.

The reconstruction of the global image is implemented as explained by data processing means of the device 10 or those 20 of the client 2.

Typically, one builds (for a moment of acquisition) a series of complex matrices called "electromagnetic matrices", modeling from the image in intensity of the sample 12 (the hologram) the light wave front propagated the along the optical axis for a plurality of deviations from the plane of focus of the optical system 23, and in particular deviations positioned in the sample.

These matrices can be projected into real space (for example via the Hermitian norm), so as to constitute a stack of global images at various focusing distances.

From there one can determine an average focusing distance (and select the corresponding global image, or recalculate it from the hologram), or even determine an optimal focusing distance for the target particle (and again select the corresponding global image, or recalculate it from the hologram).

In any case, with reference to Figure 3b, step (a) advantageously comprises the segmentation of said global image(s) so as to detect said target particle in the sample, then the cropping. In particular, said input image can be extracted from the global image of the sample, so as to represent said target particle in said homogeneous way.

In general, the segmentation makes it possible to detect all the particles of interest, by removing the artifacts such as filaments or micro-colonies, so as to improve the global image(s), then one of the detected particles is selected as the target particle. , and the corresponding thumbnail is extracted. As explained, we can do this work for all detected particles. The segmentation may be implemented in any known manner. In the example of FIG. 3b, one begins with a fine segmentation to eliminate the artefacts, then one implements a less fine segmentation for this time detecting the particles 11a-11f. A person skilled in the art may use any known segmentation technique.

If we wish to obtain a sequence of input images for a target particle 11a-11f, we can implement tracking techniques to follow the possible displacements of the particle from one global image to the next.

It should be noted that all the input images obtained for a sample (for several or even all the particles of sample 12, and this over time) can be pooled to form a descriptive base for sample 12 ( in other words a descriptive base of the experience), as can be seen on the right of FIG. 3a, in particular copied from the storage means 21 of the client 2. "particle". For example, if the particles 11a-11f are bacteria and the sample 12 contains (or not an antibiotic), this descriptive database contains all the information on the growth, morphology, internal structure and optical properties of these bacteria. over the entire field of acquisition. As we will see, this descriptive base can be transmitted to the server 1 for integration into said learning base.

Feature extraction

With reference to FIG. 4, the present method is particularly distinguished in that it separates a step (b) of extracting a feature map from the input image, then a step (c) classifying the input image based on said feature map, instead of attempting to classify the input image directly. As will be seen, each step can involve an independent automatic learning mechanism, hence the fact that said learning base of the server 1 can comprise both images of particles and maps of characteristics, and this not necessarily already classified. The main step (b) is thus a step of extraction by the data processing means 20 of the client 2 of a map of characteristics of said target particle, that is to say a "coding" of the particle target.

In the remainder of this description, a distinction will be made between the number of "dimensions" of the feature maps, in the geometric sense, that is to say the number of independent directions in which these maps extend (for example, a vector is an object of dimension 1, and the present characteristic maps are at least of dimension 2, advantageously of dimension 3), and the number of "variables" of these characteristic maps, i.e. the size according to each dimension, i.e. the number of independent degrees of freedom (which corresponds in practice to the notion of dimension in a vector space - more precisely, the set of feature maps having a given number of variables constitutes a vector space of dimension equal to this number of variables).

An example will thus be described below in which the feature map extracted at the end of step (b) is a three-dimensional object (i.e. of dimension 3) of size 7x7x512, thus having 25088 variables.

Here, we propose to use a convolutional neural network, CNN, for step (b). It is recalled that CNNs are particularly suitable for vision tasks. Generally, a CNN is able to directly classify an input image (i.e. to do both steps (b) and (c)).

Here the fact of decoupling step (b) and step (c) makes it possible to limit the use of the CNN to the extraction of characteristics, and one can for this step (b) only use a network of convolutional neurons pre-trained on a public image database, i.e. for which training has already taken place independently. This is called “transfer learning”.

In other words, it is not necessary to train or retrain the CNN on the training base of particle images 11a-11f, which can therefore be free of annotations. Indeed, we understand that annotating thousands of images by hand would be very long and very expensive. This could also prove to be complex because in the case of bacteria this would require deciding on a division time for each bacterium. However, this may not be well defined at the level of the individual bacterium. Indeed, to carry out the task of feature extraction, it suffices for the CNN to be discriminating, that is to say capable of identifying differences between images, including on a public image database which has nothing to do with it. to do with the present input images. Advantageously, said CNN is an image classification network, insofar as it is known that such networks will manipulate maps of characteristics that are specially discriminating with respect to the classes of the images, and therefore particularly suitable in the present context of the 11a-11f particles to be classified even though this is not the task for which the CNN was originally trained. It will be understood that detection, recognition or even image segmentation networks are particular cases of classification networks, since they in fact carry out the task of classification (of the whole image or of objects in the image) more another task (like determining bounding box coordinates of classified objects for a detection network, or generating a segmentation mask for a segmentation network).

As regards the public base of training images, we can for example take the famous public base ImageNet, which includes more than 1.5 million annotated images, and which is suitable for the supervised learning of almost any CNN of image processing (for classification recognition tasks, etc.).

Thus, we can advantageously take an “off-the-shelf” CNN without the very need to carry out the learning. Classification CNNs are known, for example of the VGG type (“Visual Geometry Group”, for example the VGG-16 model), AlexNet, Inception or even ResNet, pre-trained on the ImageNet database (i.e. they can be recovered with the parameters initialized to the correct values obtained after training on ImageNet). Fig. 5 represents the architecture of VGG-16 (with 16 layers).

Generally, a CNN consists of two parts:

- A first feature extraction sub-network, most often comprising a succession of blocks composed of convolution layers and activation layers (for example the ReLU function) to increase the depth of the feature maps, terminated by a pooling layer to reduce the size of the feature map (usually by a factor of 2). So in the example of Figure 5, the VGG-16 has as explained 16 layers divided into 5 blocks. The first takes between the input image (224x224 spatial size, with 3 channels corresponding to the RGB character of the image) includes 2 convolution+ReLU sequences (a convolution layer and an activation layer with ReLLI function) raising the depth to 64 then a layer of max pooling (we can also use global average pooling), with the output of a feature map of size 112x112x64 (the first two dimensions are the spatial dimensions, and the third dimension is the depth - thus we divide by two each spatial dimension). The second block has an architecture identical to the first block and generates at the output of the last convolution+ReLU set a feature map of size 112x112x128 (doubled depth) and at the output of the max pooling layer a feature map of size 56x56x128. The third block presents this time three convolution+ReLU sets and generates from the last convolution+ReLU set a feature map of size 56x56x256 (doubled depth) and as output at the output of the max pooling layer a feature map of size 28x28x256 . The fourth and fifth blocks have an architecture identical to the third block and successively generate at output maps of characteristics of size 14x14x512 and 7x7x512 (the depth no longer increases). This characteristic card is the “final” card. It will be understood that we are limited to no card sizes at any level whatsoever, and that the sizes mentioned above are only examples.

- A second feature processing sub-network, and in particular a classifier if the CNN is a classification network. This subnet takes as input the final feature map generated by the first subnet, and returns the expected result, for example the class of the input image if the CNN does classification. This second subnet typically contains one or more fully connected layers (FC) and a final activation layer, for example softmax (which is the case of VGG-16). Both subnets are usually trained at the same time in a supervised manner. Thus, step (b) is preferably implemented by means of the feature extraction sub-network of said pretrained convolutional neural network, i.e. the first part as highlighted in the figure 5 for the example of VGG-16.

More precisely, said pre-trained CNN such as VGG-16 is not supposed to return feature maps, this one being only an internal state. By "truncating" the pre-learned CNN, i.e. using only the layers of the first sub-network, the final feature map containing the "deepest" information is obtained as output.

It is understood that it is also entirely possible to take as a feature extraction sub-network a part that does not go as far as the final feature map, for example only blocks 1 to 3 instead of going up to block 5. The information is more extensive but less profound.

In the case where there is a sequence of input images, step (b) thus advantageously comprises the extraction of a map of features per input image, which can be combined in the form of a single feature map called the "profile" of the target particle. More precisely, the maps are all the same size and form a sequence of maps, so it suffices to concatenate them according to the order of the input images so as to obtain a feature map of “great depth”.

Alternatively or in addition, the feature maps corresponding to several input images associated with several particles 11a-11f of sample 12 can be summed.

The present technique thus makes it possible to obtain a high semantic level feature map without requiring either high computing power or an annotated database.

Note that the number of variables in the feature map can remain enormous, especially in the case of an input image sequence.

In order to reduce this, it can be noticed that the position of the activated zones in the map of characteristics maps does not matter. Indeed, the 11a-11f particle is generally alone in the middle of the input image, even if there are sometimes small clusters. In any case, since we are not trying to locate the particles 11 a-11 f, an averaged information on the image is enough to discriminate effectively.

Thus we can reduce the spatial size of the feature map down to 1x1 (without affecting the depth, i.e. the extracted map is of size 1x1 xP), i.e. we transform this map into a vector ( of the same size P as the depth of the feature map), for example by means of a global pooling layer, in particular global average pooling, that is to say an averaging over the two spatial dimensions.

In other words, we add at the end of the feature extraction sub-network said global pooling layer (after the max pooling layer of the last block). We can do this at each block depending on the desired depth of the feature map, and we understand that the gain is all the greater when the global pooling layer is inserted "early", since we have larger spatial dimensions and shallowest depths.

For example, taking VGG-16 truncated after block 5, we go from a feature map of size 7x7x512 to a feature map of size 1x1x512, i.e. a vector of size 512. In the case of a stack of 120 input images, we obtain a vector of size 512x120=61440. By taking VGG-16 truncated after block 2, we go from a feature map of size 56x56x128 to a feature map of size 1 x1 x128, i.e. a vector of size 128. In the case of a stack of 120 images input, we obtain a vector of size 128x120=15360

Classification

In a step (c), said input image is classified according to said extracted feature map (if applicable the reduced map).

It is understood that any technique allowing a descriptive analysis of the characteristic map(s) could be used, in particular classifiers learned on said learning database, several examples of which will be seen. As such, like step (bO), the method can include a step (aO) of learning, by the data processing means 3 of the server 1, from a learning base , of the classifier. This step is typically implemented very upstream, in particular by the remote server 1. As explained, the learning base can include a number of learning image feature maps, which takes up very little space.

The feature map obtained in step (b) (especially in case of input image stack) can have a very high number of variables so it is better to use reduction techniques.

In this respect, the t-SNE (t-distributed stochastic neighbor embedding) algorithm can be used, which is a non-linear method for reducing the number of variables for data visualization, allowing the representation of a set of points of a high-dimensional space (the value space of feature maps) into a two- or three-dimensional space, the data can then be visualized with a scatter plot. The t-SNE algorithm attempts to find an optimal configuration (known as the t-SNE projection, in English “embedding”) according to an information theory criterion to respect the proximities between points: two points that are close (respectively distant) in the original space will have to be close (respectively distant) in the low-dimensional space.

The t-SNE algorithm can be implemented both at the particle level (a target particle 11a-11f compared to the individual particles for which a map is available in the learning base) and at the field level ( for the whole sample 12 - case of a plurality of input images representing a plurality of particles 11a-11f), in particular in the case of single images rather than stacks.

It should be noted that the t-SNE projection of the learning base can be made very upstream, all that remains is to place the map of characteristics of the input image considered there. In practice, we do not necessarily have an explicit formulation of the projection function so that it may still be necessary to recalculate the projections each time. However, it is possible to speed up the calculations and reduce the memory footprint, go through a first step of linear reduction of the number of variables (for example PCA - Principal Component Analysis) before calculating the t-SNE projection of the characteristics maps of the learning base and the considered input image. In this case, the projections by PCA of the learning base can be stored in memory. For the classifier strictly speaking, we can use the method of the k nearest neighbors (k-nearest neighbors, k-NN), in particular based on the result of the t-SNE algorithm (the projection, or "embedding" obtained ).

The idea is to look at the neighboring points of the point corresponding to the feature map of the considered input image(s), and to look at their classification. For example, if the neighboring points are classified as “no division”, we can assume that the considered input image must be classified as “no division”. It should be noted that the neighbors considered can possibly be limited, for example according to the strain, the antibiotic, etc. Figure 6 shows two examples of t-SNE embeddings obtained at the field level for a strain of E. coli for various concentrations of cefpodoxime. In the top example, we can clearly see two blocks, making it possible to visually show the existence of a minimum inhibitory concentration (MIC) from which we have an impact on morphology and therefore cell division. We can classify a card falling near the upper part as "division" and a card falling near the lower part as "no division". In the bottom example we see that only the highest concentration stands out (and therefore seems to have an antibiotic effect).

According to a second embodiment, a support vector machine (SVM) is used as classifier, again for a binary classification (for example again “division” or “no division”). This simple method is particularly effective on simple input images (SVM applied to feature maps). The hyper-parameter C of the SVM can be optimized by using a grid search and a cross-validation (known as "k-folds" with in particular k=5, in which one divides the original base into k samples, then one selects a k samples as the validation set and the other k-1 samples will constitute the training set).

According to a third embodiment, in the case where there are input image sequences (3D stack) and therefore deeper feature maps, a convolutional neural network (CNN) is used as a classifier.

It will be possible for this CNN to choose relatively simple architectures, for example a succession of blocks of a convolution layer, an activation layer (ReLU function for example) and a pooling layer (pooling, for example max pooling). Two such blocks are sufficient for efficient binary classification. It is also possible to under-sample the inputs (in particular on the “time” dimension) to further reduce its memory footprint.

CNN learning can be done in a classical way. The learning cost function can be composed of a classical data attachment - cross-entropy - to be minimized via a gradient descent algorithm.

In all the embodiments, the learned classifier can be stored if necessary on data storage means 21 of the client 2 for use in classification. Note that the same classifier can be embedded on many clients 2, only one learning is necessary.

computer program product

According to a second and a third aspect, the invention relates to a computer program product comprising code instructions for the execution (in particular on the data processing means 3, 20 of the server 1 and/or of the client 2) a method for classifying at least one input image representing a target particle 11a-11f in a sample 12, as well as storage means readable by computer equipment (a memory 4, 21 of the server 1 and/or of the client 2) on which this computer program product is found.

Claims

22 CLAIMS

1. Method for classifying at least one input image representing a target particle (11a-11f) in a sample (12), the method being characterized in that it comprises the implementation, by means of processing of data (20) of a customer (2), of steps of:

(b) Extraction of a feature map of said target particle (11a-11f) by means of a pre-trained convolutional neural network on a public image database;

(c) Classification of said input image based on said extracted feature map.

2. Method according to claim 1, in which the particles (11a-11f) are represented in a homogeneous manner in the input image and in each elementary image, in particular centered and aligned in a predetermined direction.

3. Method according to claim 2, comprising a step (a) of extracting said input image from a global image of the sample, so as to represent said target particle (11a-11f) in said homogeneous manner

4. Method according to claim 3, in which step (a) comprises segmenting said global image so as to detect said target particle (11a-11f) in the sample (12), then cropping the input image on said detected target particle (11a-11f).

5. Method according to one of claims 3 and 4, in which step (a) comprises obtaining said global image from an intensity image of the sample (12) acquired by a device for observations (10).

6. Method according to one of claims 1 to 5, in which step (b) is implemented by means of a feature extraction sub-network of said pre-trained convolutional neural network.

7. Method according to claim 6, in which said pre-trained convolutional neural network is an image classification network, in particular of the VGG, AlexNet, Inception or ResNet type.

Method according to one of claims 6 and 7, wherein a global pooling layer is added at the end of said feature extraction sub-network so that the extracted feature map has a spatial size of 1x1 .

9. Method according to one of claims 1 to 8, in which step (c) is implemented by means of a classifier, the method comprising a step (aO) of learning, by means of processing of data (3) from a server (1), parameters of said classifier from a learning base of maps of already classified characteristics of particles (11a-11f) in said sample (12).

10. The method of claim 9, wherein said classifier is selected from a support vector machine, a k-nearest neighbor algorithm, or a convolutional neural network.

11. Method according to one of claims 1 to 10, in which step (c) comprises reducing the number of variables of the feature map by means of the t-SNE algorithm.

12. Method according to one of claims 1 to 11, for classifying a sequence of input images representing said target particle (11a-11f) in a sample (12) over time, in which the step (b) comprises concatenating the extracted feature maps for each input image of said sequence.

13. System for classifying at least one input image representing a target particle (11a-11f) in a sample (12) comprising at least one client (2) comprising data processing means (20), characterized in that said data processing means (20) are configured to implement:

- extracting a feature map of said target particle (11a-11f) by means of a convolutional neural network pretrained on a public image database;

14. System according to claim 12, further comprising a device (10) for observing said target particle (11a-11f) in the sample (12).

15. Computer program product comprising code instructions for the execution of a method according to one of claims 1 to 12 for classifying at least one input image representing a target particle (11a-11f) in a sample (12), when said program is executed on a computer.

16. Storage means readable by computer equipment on which a computer program product comprises code instructions for the execution of a method according to one of claims 1 to 12 for classifying at least one image of entry representing a target particle (11a-11f) in a sample (12).