WO2020057753A1

WO2020057753A1 - A method and a system for training a model performing semantic segmentation of nighttime images

Info

Publication number: WO2020057753A1
Application number: PCT/EP2018/075681
Authority: WO
Inventors: Nicolas VIGNARD; Patrizia ZUPPINGER; Dengxin DAI; Luc Van GOOL
Original assignee: Toyota Motor Europe; Eth Zurich
Priority date: 2018-09-21
Filing date: 2018-09-21
Publication date: 2020-03-26

Abstract

The present invention relates to a method and system for training a semantic segmentation model performing semantic segmentation of images taken at nighttime, comprising: a - obtaining (S01) a first set of labelled images (101) taken at daylight, the labelled images being annotated with predefined semantic segmentation labels, b - training (S02) a semantic segmentation model using the first set of labelled images, c - applying (S03) the semantic segmentation model of step b to a second set of unlabeled images (102) taken at twilight of a first predefined degree, where solar illumination is less than at daylight and more than at nighttime, to obtain semantic segmentations (102') of the images of the second set, d - labelling (S04) the second set of unlabeled images (102) with the semantic segmentations (102') of the images of the second set to obtain a second set of labelled images (102"), and e - training (S05) the semantic segmentation model using the first set of labelled images (101) and the second set of labelled images (102").

Description

A method and a system for training a model performing semantic segmentation of nighttime images

Field of the invention

The present invention relates to the field of image processing, and more precisely to the semantic segmentation of images taken at night time, i.e. without solar illumination.

Description of the Related Art

Semantic image segmentation is a method to automatically determine the semantic labels of the objects which appear in an image. For example, the image may be acquired by a camera mounted in a vehicle. Semantic segmentation of such an image allows recognizing cars, pedestrians, traffic lanes, etc. Therefore, semantic segmentation is the backbone technique for autonomous driving systems or other automated systems.

Semantic image segmentation typically uses models such as neural networks to perform the segmentation. These models need to be trained.

Training a model typically comprises inputting known images to the model. For these images, a predetermined semantic segmentation is already known (an operator may have prepared the predetermined semantic segmentations of each image by annotating the images). The output of the model is then evaluated in view of the predetermined semantic segmentation, and the parameters of the model are adjusted if the output of the model differs from the predetermined semantic segmentation of an image. It follows that in order to train a semantic segmentation model, a large number of images and predetermined semantic segmentations are necessary.

Is has been observed that the illumination condition at nighttime (in particular when there is no direct solar illumination but e.g. only electrical street lights, i.e. nocturnal artificial lighting) creates visibility problems for drivers and for automated systems. While sensors and computer vision algorithms are constantly getting better, the improvements are usually benchmarked with images taken during daylight time. Those methods often fail to work in nighttime condition. This prevents the automated systems from actually being used: it is not conceivable for a vehicle to avoid nighttime, and the vehicle has to be able to distinguish different objects during in daytime and nighttime conditions.

Compared to daylight, nocturnal artificial lighting degrades the visibility of a scene significantly, according to the darkness of the driving scene.

It is thus desirable to train semantic segmentation models with nighttime images (images taken at night time with no direct solar illumination but e.g. only electrical street lights, i.e. nocturnal artificial lighting).

However, obtaining semantic segmentation data on nighttime images (for example nighttime pictures taken by a camera) is particularly difficult and time-consuming, especially if an operator has to annotate the nighttime images by hand with semantic labels before feeding the nighttime images to the model.

With regard to another problem in the field of image segmentation it has been proposed a method for dense fog scene understanding by using both synthetic and real fog (cf. : European Conference of Computer Vision, Sakaridis, C, Dai, D., Hecker, S., Van Gool, L, 2018, or: International Journal of Computer Vision, Sakaridis, C., Dai, D., Van Gool, L, 2018: "Semantic foggy scene understanding with synthetic data")· In their work, images taken under light fog are used as a bridge to transfer semantic knowledge from clear-weather condition to dense-fog condition. However, due to the different objective and in particular due to the different characteristics of foggy images compared to nighttime images (in particular with nocturnal artificial lighting), the proposed learning algorithm is not suitable for the objective of the present disclosure.

Robust object recognition using visible light cameras remains a difficult problem. This is because the structural, textural and/or color features needed for object recognition sometimes do not exist or highly disbursed by artificial lights, to the point where it is difficult to recognize the objects even for human. The problem is further compounded by camera noise and motion blur. Far-infrared (FIR) cameras can be another choice, cf. e.g.: A. Gonzalez, Z. Fang, Y. Socarras, J. Serrat, D. Vazquez, J. Xu, and A. M. Lopez. Pedestrian detection at day/night time with visible and fir cameras: A comparison. Sensors, 16(6), 2016. They, however, are expensive and only provide images of relatively low-resolution.

It is a primary object of the invention to provide methods and system that overcome the deficiencies of the currently available systems and methods.

In particular, it remains desirable to reliably train a model performing semantic segmentation of nighttime images, which does not require labelled nighttime images, and to provide a semantic segmentation model for semantic segmentation of nighttime images.

Summary of the invention

The present invention overcomes one or more deficiencies of the prior art by proposing a method for training a semantic segmentation model performing semantic segmentation of images taken at nighttime, comprising: a - obtaining a first set of labelled images taken at daylight, the labelled images being annotated with predefined semantic segmentation labels, b - training a semantic segmentation model using the first set of labelled images,

c - applying the semantic segmentation model of step b to a second set of unlabeled images taken at twilight of a first predefined degree, where solar illumination is less than at daylight and more than at nighttime, to obtain semantic segmentations of the images (desirably of each image) of the second set,

d - labelling the second set of unlabeled images (desirably each image of the second set) with the semantic segmentations of the images of the second set to obtain a second set of labelled images, and

e - training the semantic segmentation model using the first set of labelled images and the second set of labelled images.

Accordingly, the present disclosure desirably adopts visible light cameras for semantic segmentation of nighttime road scenes. Another reason of this desired choice is that large-scale datasets (with annotations) are available for daytime images (videos) by visible light cameras, cf. e.g.: M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele. The cityscapes dataset for semantic urban scene understanding. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

In other words, to overcome the problem of collecting and annotating images for all illumination conditions, the present disclosure proposes to depart from this traditional paradigm of manually labelling all images and propose another route, also different from moving to learning with synthetic scenes (as e.g. disclosed in International Journal of Computer Vision, Sakaridis, C., Dai, D., Van Gool, L, 2018: "Semantic foggy scene understanding with synthetic data"). Instead, it is proposed to progressively adapt the semantic models trained for daytime scenes to nighttime scenes, by using images taken at the twilight time. Accordingly, the method is based on progressively self-learning scheme.

Hence, the present disclosure makes model adaptation from daytime to nighttime feasible.

The images of the first set of labelled images and of the second set of unlabeled images may represent each a driving scene.

The second set of unlabeled images may represent different driving or similar driving scenes when compared to the first set of labelled images.

In step e an untrained semantic segmentation model (e.g. having untrained random parameters or weights) may be trained using the first set of labelled images and the second set of labelled images. Alternatively, the already trained semantic segmentation model of step b may be further trained, e.g. based on a mixed combination of the first set of labelled images and the second set of labelled images.

In step d the second set of labelled images may be automatically be obtained by providing the second set of unlabeled images and the related semantic segmentations of the images of the second set, i.e. as a result of the segmentation preformed in step c.

The method may further comprise (e.g. after step e):

c' - applying the semantic segmentation model of step e to a third set of unlabeled images taken at twilight of a second predefined degree, where solar illumination is less than at twilight of the first predefined degree and more than at nighttime, to obtain semantic segmentations of the images of the third set,

d' - labelling the third set of unlabeled images with the semantic segmentations of the images of the third set to obtain a third set of labelled images, e' - training the semantic segmentation model using the first set of labelled images, the second set of labelled images and the third set of labelled images.

Accordingly, the semantic models for daytime scenes can be trained even more progressively to nighttime scenes, by using images taken at a first more illuminated twilight and a second darker twilight.

In step e' an untrained semantic segmentation model (e.g. having untrained random parameters or weights) may be trained using the first, second and third sets of labelled images. Alternatively, the already trained semantic segmentation model of step e may be further trained, e.g. based on a mixed combination of the first, second and third sets of labelled images.

In step e' the semantic segmentation model might also be trained using only the first set of labelled images and the third set of labelled images. This option might be used e.g. when in step e' the model already trained in step e is further trained.

The method may further comprise (e.g. after step e'):

c" - applying the semantic segmentation model of step e' to a fourth set of unlabeled images taken at twilight of a third predefined degree, where solar illumination is less than at twilight of the second predefined degree and more than at nighttime, to obtain semantic segmentations of the images of the fourth set,

d" - labelling the fourth set of unlabeled images with the semantic segmentations of the images of the third set to obtain a fourth set of labelled images,

e" - training the semantic segmentation model using the first set of labelled images, the second set of labelled images, the third set of labelled images and the fourth set of labelled images.

Accordingly, the semantic models for daytime scenes can be trained even more progressively to nighttime scenes, by using images taken at a first more illuminated twilight; a second darker twilight and a third even darker twilight.

In step e" an untrained semantic segmentation model (e.g. having untrained random parameters or weights) may be trained using the first, second, third and fourth sets of labelled images. Alternatively, the already trained semantic segmentation model of step e' may be further trained, e.g. based on a mixed combination of the first, second, third and fourth sets of labelled images.

In step e" the semantic segmentation model might also be trained using only the first set of labelled images and the fourth set of labelled images. This option might be used e.g. when in step e" the model already trained in step e' is further trained.

The above described progressive adaptation may be continued by adding one or several further sets of increasingly darker twilight images are added.

In other words, the semantic segmentation may be progressively adapted to be usable for semantic segmentation of images taken at nighttime, by repeating the sequence of steps c to e for one or several times (e.g. in sequences c' to e', c" to e", c"' to e"', and so on), wherein in each subsequent sequence c' to e', c" to e", c"' to e'"..., a further set of unlabeled images taken at increasingly darker twilight is added in step c', c", c'" ... and so on and the semantic segmentation model is trained in step e', e", e'"... using all sets of labelled images.

The sets of labelled images may be mixed (or in other words: sampled) proportionally to form a combined set of labelled images which is used for training the semantic segmentation model.

Proportionally mixing may mean that sets of labelled images may be mixed according to their quantity of (sample) images. For example, in case a first set comprise the double quantity compared to the images of a second set, the combined set may comprise repeatedly two images of the first set and then one image of the second set.

The twilight of the first predefined degree corresponds to civil twilight. In particular, civil twilight may be defined by a solar elevation angle in the range of 0 to 6 degree below horizon.

The twilight of the second predefined degree corresponds to nautical twilight. In particular, nautical twilight is defined by a solar elevation angle in the range of 6 to 12 degree below horizon

The twilight of the third predefined degree corresponds to astronomical twilight. In particular, astronomical twilight is defined by a solar elevation angle in the range of 12 to 18 degree below horizon.

Accordingly, since in each of the above mentioned cases the sun is below the horizon, no direct illumination by the sun is possible but only indirect illumination by the atmosphere still illuminated by the sun.

Desirably, in nighttime there is no direct illumination by the sun, either and more desirably also the atmosphere is not illuminated. Instead there may only be illumination by the moon and stars and by artificial illumination. For example, nighttime is defined by a solar elevation angle of more than 18 degree below horizon.

In contrast, daylight desirably comprises direct solar illumination where the sun is above the horizon. Direct solar illumination more desirably also includes light scattering by e.g. clouds.

Step e may comprise:

wherein:

1° is the number of images of the first set of labelled images,

x? is an image of index i in the first set of labelled images,

y is the predefined semantic segmentation label of image x·⁵, L(x,y) is the cross entropy loss function of x and y,

1 (A:-⁾) is the output of the semantic segmentation model, l¹ is the number of images of the second set of labelled images l¹ is a predefined weight,

xf is an image of index j in the second set of labelled images, yf is the predefined semantic segmentation label of image xf .

Step e' may comprise:

wherein:

1° is the number of images of the first set of labelled images, ? is an image of index i in the first set of labelled images, y is the predefined semantic segmentation label of image x_£°, L(x,y) is the cross entropy loss function of x and y,

0¹(x_i°) is the output of the semantic segmentation model,

1¹ is the number of images of the second set of labelled images L¹ is a predefined weight,

xf is an image of index j in the second set of labelled images, yf is the predefined semantic segmentation label of image xf

1² is the number of images of the third set of labelled images l² is a predefined weight,

x is an image of index k in the third set of labelled images, y is the predefined semantic segmentation label of image x . Step e" may comprise:

wherein:

Z° is the number of images of the first set of labelled images,

x? is an image of index i in the first set of labelled images,

y°is the predefined semantic segmentation label of image *·³,

L(x,y) is the cross entropy loss function of x and y,

FH*?) is the output of the semantic segmentation model,

f¹ is the number of images of the second set of labelled images

l¹ is a predefined weight,

xf is an image of index j in the second set of labelled images,

yf is the predefined semantic segmentation label of image xf

1² is the number of images of the third set of labelled images

L² is a predefined weight,

x is an image of index k in the third set of labelled images,

yl is the predefined semantic segmentation label of image x .

1³ is the number of images of the fourth set of labelled images

A³ is a predefined weight,

x¾ is an image of index q in the fourth set of labelled images,

y| is the predefined semantic segmentation label of image x³.

The present disclosure further relates to a semantic segmentation method comprising using the model of step e, e' or e", as described above.

The present disclosure further relates to a system for training a semantic segmentation model performing semantic segmentation of images taken at nighttime, comprising: a module A for obtaining a first set of labelled images taken at daylight, the labelled images being annotated with predefined semantic segmentation labels,

a module B training a semantic segmentation model using the first set of labelled images,

a module C for applying the semantic segmentation model of step b to a second set of unlabeled images taken at twilight of a first predefined degree, where solar illumination is less than at daylight and more than at nighttime, to obtain semantic segmentations of the images of the second set,

a module D for labelling the second set of unlabeled images with the semantic segmentations of the images of the second set to obtain a second set of labelled images,

a module E for training the semantic segmentation model using the first set of labelled images and the second set of labelled images.

The system may comprise further features or modules corresponding to the steps of the method described above, e.g. modules C' to E' and C" to E" corresponding to steps c' to e' and c" to e", respectively. Hence, this system may be configured to perform all the embodiments of the method as defined above.

The present disclosure further relates to a system for semantic segmentation of an image comprising the model of step e, e' or e" or of module E as described above.

The present disclosure further relates to a computer program including instructions for executing the steps of a method as described above when said program is executed by a computer.

This program can use any programming language and take the form of source code, object code or a code intermediate between source code and object code, such as a partially compiled form, or any other desirable form. The present disclosure further relates to a recording medium readable by a computer and having recorded thereon a computer program including instructions for executing the steps of a method as described above.

The information medium can be any entity or device capable of storing the program. For example, the medium can include storage means such as a ROM, for example a CD ROM or a microelectronic circuit ROM, or magnetic storage means, for example a diskette (floppy disk) or a hard disk.

Alternatively, the information medium can be an integrated circuit in which the program is incorporated, the circuit being adapted to execute the method in question or to be used in its execution.

It is intended that combinations of the above-described elements and those within the specification may be made, except where otherwise contradictory.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure, as claimed.

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, and serve to explain the principles thereof.

Brief description of the drawings

How the present invention may be put into effect will now be described by way of example with reference to the appended drawings, in which: Fig. 1 shows a schematic flow chart of the steps of for training a model performing semantic segmentation of nighttime images according to embodiments of the present disclosure; and

Fig. 2 shows a schematic block diagram of a system with an electronic device according to embodiments of the present disclosure.

Description of the embodiments

Reference will now be made in detail to exemplary embodiments of the disclosure, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts.

Training a segmentation model with a large amount of human annotations should work for nighttime images, similar to what has been achieved for daytime scene understanding. However, applying this protocol to other illumination conditions is problematic as it is hardly affordable to annotate the same amount of data for all different conditions and their combinations. It is thus proposed to depart from this protocol and propose an automated approach to transfer the knowledge from existing annotations of daytime scenes to nighttime scenes. The approach leverages the fact that illumination changes continuously between daytime and nighttime, through the twilight time. Twilight is the time between dawn and sunrise, or between sunset and dusk. Twilight is defined according to the solar elevation angle, which is the position of the geometric center of the sun relative to the horizon, cf. e.g. Definitions from the US astronomical applications dept (usno). Retrieved 2011-07-22.

During a large portion of twilight time, solar illumination suffices enough for cameras to capture the terrestrial objects and suffices enough to alleviate the interference of artificial lights to a limited amount. These observations lead to our conjecture that the domain discrepancy between daytime scenes and twilight scenes, and the domain discrepancy between twilight scenes and nighttime scenes are both smaller than the domain discrepancy between daytime scenes and nighttime scenes. Thus, images captured during twilight time can in principle be used to serve our purpose — knowledge transfer from daytime to nighttime. That is, twilight time constructs a bridge for knowledge transfer from our source domain daytime to our target domain nighttime.

In particular, it is proposed to train a semantic segmentation model on daytime images using the standard supervised learning paradigm, and apply the model to a large dataset recorded at civil twilight time to generate the class responses. The three subgroups of twilight are used: civil twilight, nautical twilight, and astronomical twilight. Since the domain gap between daytime condition and civil twilight condition is relatively small, these class responses, along with the images, can then be used to fine-tune the semantic segmentation model so that it can adapt to civil twilight time. The same procedure is continued through nautical twilight and astronomical twilight. Then the final fine-tuned model may be applied to nighttime images. In other words, the semantic knowledge of annotations of daytime scenes may be transferred to nighttime scenes via the unlabeled images recorded at twilight time.

Fig. 1 shows a schematic flow chart of the steps of for training a model performing semantic segmentation of nighttime images according to embodiments of the present disclosure.

This model may be, initially, a neural network or a convolutional neural network which may have been conceived to perform semantic segmentation on images. However, initially, the model has not been trained to perform semantic segmentation of nighttime images.

The images which may be processed by the model (after complete training) may be photographs taken by image sensors. A plurality of objects may be visible on these images, preferably objects of different types which may or may not overlap.

By way of example, the images show a scene which may be visible from a vehicle on a road, for example in a street.

In a first step SOI, a first set of labelled daylight images is obtained. The labelled images are annotated with predefined semantic segmentation labels.

In an example, an image is denoted by x, and the image taken at daytime, civil twilight time, nautical twilight time, astronomical twilight

1 i0o time and nighttime is indicate by x°, x¹, x², x³, and x⁴, respectively. The corresponding human annotation for x° is provided and denoted by y° where y°(m; n) E {1, ..., C} the label of pixel (m; n), and C is the total number of classes. Then, the training data consist of labeled data at daytime

1155

T>° = {(x® , y®) }J_{= 1} and three unlabeled datasets for the three twilight categories:

2200

where 1°, I¹, I², and I³ are the total number of images in the corresponding dataset.

In step S02 a semantic segmentation model is trained using the first set of labelled images.

In the example this is done by:

where L(.,.) is the cross entropy loss function.

In step S03 the trained semantic segmentation model is applied to a second set of unlabeled images 102 of a first twilight degree, e.g. civil twilight. Consequently, semantic segmentations 102' of the images of the second set are obtained.

In the example this is done by apply segmentation model 0°(x°) to the images recorded at civil twilight time to obtain "noisy" semantic labels: y¹ = (f (x¹ )

In step S04 the second set of unlabeled images 102 is labelled with the semantic segmentations 102' of the images of the second set to obtain a second set of labelled images 102".

In the example this is done by augmenting the dataset D1 to

In step S05 the semantic segmentation model is trained using a combination of the sets of labelled images 101, 102".

In the example this is done by fine-tuning (e.g. retraining) the semantic model on D° and D¹:

F¹ <- F⁰, (2) and then (eq. 3):

wherein:

1° is the number of images of the first set of labelled images,

x is an image of index i in the first set of labelled images,

y° is the predefined semantic segmentation label of image x°,

L(x, y) is the cross entropy loss function of x and y,

1(x ) is the output of the semantic segmentation model, i¹ is the number of images of the second set of labelled images

L¹ is a predefined weight, in particular a hyper-parameter balancing the weights of the two data sources,

xj is an image of index j in the second set of labelled images,

yf is the predefined semantic segmentation label of image xj.

The sequence of steps S03 to S05 (or c to e) may be repeated for one or a plurality of times as steps S03' to S05', S03" to S05", etc. (i.e. c' to e', c" to e", etc.).

In each sequence in step c', c", etc. the trained semantic segmentation model to may be applied to a further set of unlabeled images 103, 104, etc. of a further increasingly darker twilight degree, e.g. for nautical and astronomical twilight, and semantic segmentations 103', 104', etc. of the images of the further set may be obtained.

In a subsequent step d', d", etc. the further set of unlabeled images 103, 104, etc. may be labelled with the semantic segmentations 103', 104', etc. of the images of the further set to obtain a further set of labelled images 103", 104", etc.

In a subsequent step e', e", etc. the semantic segmentation model may be trained using a combination of the sets of all labelled images (101, 102", 103", 104", etc.).

In the example, step e' (adding images of e.g. nautical twilight) comprises fine-tuning (e.g. retraining) the semantic model on D° and D¹,

wherein:

l² is the number of images of the third set of labelled images

l² is a predefined weight, in particular a hyper-parameter balancing the weight of D²,

x\ is an image of index k in the third set of labelled images,

yl is the predefined semantic segmentation label of image x .

In the example, step e" (adding images of e.g. astronomical twilight) comprises fine-tuning (e.g. retraining) the semantic model on D° and D¹, D², D³ \ f 4- f ² , (6) and then (eq. 7):

wherein:

l³ is the number of images of the fourth set of labelled images

A³ is a predefined weight, in particular a hyper-parameter balancing the weight of D³,

x is an image of index q in the fourth set of labelled images,

3?| is the predefined semantic segmentation label of image x¾ .

The resulting model may then be applied to nighttime images for performing image seglentation.

The method may be termed Progressive Model Adaptation. During training, in order to balance the weights of different data sources (in Equation 3, Equation 5 and Equation 7), equal weight may be e empirically given to all training datasets. An optimal value can be obtained via cross- validation. The optimization of Equation 3, Equation 5 and Equation 7 are implemented by feeding to the training algorithm a stream of hybrid data, for which images in the considered datasets are sampled (i.e. mixed) proportionally according to the parameters l¹, l², and l³. For example, they all may be set to 1, which means all datasets are sampled at the same rate.

Rather than applying the model trained on daytime images directly to nighttime images, Progressive Model Adaptation breaks down the problem to three progressive steps to adapt the semantic model. In each of the step, the domain gap is much smaller than the domain gap between daytime domain and nighttime domain. Due to the unsupervised nature of this domain adaptation, the algorithm will also be affected by the noise in the labels. The daytime dataset D¹ is always used for the training, to balance between noisy data of similar domains and clean data of a distinct domain.

The steps of the method described in reference to figure 1 can be determined by computer program instructions. These instructions can be executed by a processor of a system, as represented on figure 2.

In this figure, a system 200 for training a model has been represented. This system 200, which may be a computer, comprises a processor 201 and a non-volatile memory 202. The system 200 may also comprise, be configured to be integrated in or form a part of a vehicle 400. The system 200 may not only be configured for training a semantic segmentation model but also to apply the trained model on nighttime images (in particular in case it is part of a vehicle 400). The system 200 may further be connected to a (passive) optical sensor 300, in particular a digital camera. The digital camera 300 is configured such that it can record a scene in front of the vehicle 400, and in particular output digital data providing appearance (color, e.g. RGB) information of the scene. The camera 300 is desirably generates image data comprising a 2D or 3D image of the environment. There may also be provided a set of monocular cameras which generate a panoramic 2D or 3D image.

In the non-volatile memory 202, a set of instructions is stored and this set of instructions comprises instructions to perform a method for training a model.

In particular, these instructions and the processor 201 may respectively form a plurality of modules:

a module A for obtaining a first set of labelled images (101) taken at daylight, the labelled images being annotated with predefined semantic segmentation labels,

a module C for applying the semantic segmentation model of step b to a second set of unlabeled images 102 taken at twilight of a first predefined degree, where solar illumination is less than at daylight and more than at nighttime, to obtain semantic segmentations 102' of the images of the second set,

a module D for labelling the second set of unlabeled images 102 with the semantic segmentations 102' of the images of the second set to obtain a second set of labelled images 102",

a module E for training the semantic segmentation model using the first set of labelled images 101 and the second set of labelled images 102".

Although the present invention has been described above with reference to certain specific embodiments, it will be understood that the invention is not limited by the particularities of the specific embodiments. Numerous variations, modifications and developments may be made in the above-described embodiments within the scope of the appended claims.

Claims

1. A method for training a semantic segmentation model performing semantic segmentation of images taken at nighttime, comprising:

a - obtaining (SOI) a first set of labelled images (101) taken at daylight, the labelled images being annotated with predefined semantic segmentation labels,

b - training (S02) a semantic segmentation model using the first set of labelled images,

c - applying (S03) the semantic segmentation model of step b to a second set of unlabeled images (102) taken at twilight of a first predefined degree, where solar illumination is less than at daylight and more than at nighttime, to obtain semantic segmentations (102') of the images of the second set,

d - labelling (S04) the second set of unlabeled images (102) with the semantic segmentations (102') of the images of the second set to obtain a second set of labelled images (102"), and

e - training (S05) the semantic segmentation model using the first set of labelled images (101) and the second set of labelled images (102").

2. The method of claim 1, further comprising:

c' - applying (SO30 the semantic segmentation model of step e to a third set of unlabeled images (103) taken at twilight of a second predefined degree, where solar illumination is less than at twilight of the first predefined degree and more than at nighttime, to obtain semantic segmentations (1030 of the images of the third set, d' - labelling (S04 the third set of unlabeled images (103) with the semantic segmentations (103') of the images of the third set to obtain a third set of labelled images (103"),

e' - training (S05') the semantic segmentation model using the first set of labelled images (101), the second set of labelled images (102") and the third set of labelled images (103").

3. The method of claim 2, further comprising:

c" - applying (S03") the semantic segmentation model of step e' to a fourth set of unlabeled images (104) taken at twilight of a third predefined degree, where solar illumination is less than at twilight of the second predefined degree and more than at nighttime, to obtain semantic segmentations (104') of the images of the fourth set,

d" - labelling (S04") the fourth set of unlabeled images (104) with the semantic segmentations (104') of the images of the third set to obtain a fourth set of labelled images (104")

e" - training (S05") the semantic segmentation model using the first set of labelled images (101), the second set of labelled images (102"), the third set of labelled images (103") and the fourth set of labelled images (104").

4. The method according to any one of the preceding claims, wherein the semantic segmentation is progressively adapted to be usable for semantic segmentation of images taken at nighttime, by repeating the sequence of steps c to e for one or several times (c' to e', c" to e"), wherein in each subsequent sequence c' to e', c" to e" a further set of unlabeled images taken at increasingly darker twilight is added in step c', c" and the semantic segmentation model is trained in step e', e" using all sets of labelled images (101, 102", 103", 104").

5. The method according to any one of the preceding claims, wherein the sets of labelled images (101, 102", 103", 104") are mixed proportionally to form a combined set of labelled images which is used for training (S05', S05", S05'") the semantic segmentation model.

6. The method according to any one of the preceding claims, wherein

the twilight of the first predefined degree corresponds to civil twilight, an/or

the twilight of the second predefined degree corresponds to nautical twilight, an/or

the twilight of the third predefined degree corresponds to astronomical twilight.

7. The method according to the preceding claim, wherein civil twilight is defined by a solar elevation angle in the range of 0 to 6 degree below horizon,

nautical twilight is defined by a solar elevation angle in the range of 6 to 12 degree below horizon, and

astronomical twilight is defined by a solar elevation angle in the range of 12 to 18 degree below horizon.

8. The method of any one of the preceding claims, wherein step e comprises:

wherein:

1° is the number of images of the first set of labelled images,

x? is an image of index i in the first set of labelled images, y is the predefined semantic segmentation label of image x°,

L(x, y) is the cross entropy loss function of x and y,

is the output of the semantic segmentation model,

l ¹ is the number of images of the second set of labelled images

L¹ is a predefined weight,

xf is an image of index j in the second set of labelled images,

yf is the predefined semantic segmentation label of image xf .

9. The method of any one of the preceding claims 2-8, wherein step e' comprises:

wherein:

i² is the number of images of the third set of labelled images

L² is a predefined weight,

x is an image of index k in the third set of labelled images,

y is the predefined semantic segmentation label of image x\.

10. The method of any one of the preceding claims 3-9, wherein step e" comprises:

wherein: ί³ is the number of images of the fourth set of labelled images

A³ is a predefined weight,

x³ is an image of index q in the fourth set of labelled images,

y¾ is the predefined semantic segmentation label of image x³.

11. A semantic segmentation method comprising using the model of step e, e' or e" of any one of claims 1 to 8 on an image.

12. A system for training a semantic segmentation model performing semantic segmentation of images taken at nighttime, comprising:

a module C for applying the semantic segmentation model of step b to a second set of unlabeled images (102) taken at twilight of a first predefined degree, where solar illumination is less than at daylight and more than at nighttime, to obtain semantic segmentations (102') of the images of the second set,

a module D for labelling the second set of unlabeled images (102) with the semantic segmentations (102') of the images of the second set to obtain a second set of labelled images (102"),

a module E for training the semantic segmentation model using the first set of labelled images (101) and the second set of labelled images (102").

13. A system for semantic segmentation of an image comprising the model of step e, e' or e" of any one of claims 1 to 11 or of module E of claim 12.

14. A computer program including instructions for executing the steps of a method according to any one of claims 1 to 11 when said program is executed by a computer.

15. A recording medium readable by a computer and having recorded thereon a computer program including instructions for executing the steps of a method according to any one of claims 1 to 11.