WO2020057753A1 - A method and a system for training a model performing semantic segmentation of nighttime images - Google Patents

A method and a system for training a model performing semantic segmentation of nighttime images Download PDF

Info

Publication number
WO2020057753A1
WO2020057753A1 PCT/EP2018/075681 EP2018075681W WO2020057753A1 WO 2020057753 A1 WO2020057753 A1 WO 2020057753A1 EP 2018075681 W EP2018075681 W EP 2018075681W WO 2020057753 A1 WO2020057753 A1 WO 2020057753A1
Authority
WO
WIPO (PCT)
Prior art keywords
images
semantic segmentation
labelled
twilight
semantic
Prior art date
Application number
PCT/EP2018/075681
Other languages
French (fr)
Inventor
Nicolas VIGNARD
Patrizia ZUPPINGER
Dengxin DAI
Luc Van GOOL
Original Assignee
Toyota Motor Europe
Eth Zurich
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Toyota Motor Europe, Eth Zurich filed Critical Toyota Motor Europe
Priority to PCT/EP2018/075681 priority Critical patent/WO2020057753A1/en
Publication of WO2020057753A1 publication Critical patent/WO2020057753A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle

Definitions

  • the present invention relates to the field of image processing, and more precisely to the semantic segmentation of images taken at night time, i.e. without solar illumination.
  • Semantic image segmentation is a method to automatically determine the semantic labels of the objects which appear in an image.
  • the image may be acquired by a camera mounted in a vehicle. Semantic segmentation of such an image allows recognizing cars, pedestrians, traffic lanes, etc. Therefore, semantic segmentation is the backbone technique for autonomous driving systems or other automated systems.
  • Semantic image segmentation typically uses models such as neural networks to perform the segmentation. These models need to be trained.
  • Training a model typically comprises inputting known images to the model. For these images, a predetermined semantic segmentation is already known (an operator may have prepared the predetermined semantic segmentations of each image by annotating the images). The output of the model is then evaluated in view of the predetermined semantic segmentation, and the parameters of the model are adjusted if the output of the model differs from the predetermined semantic segmentation of an image. It follows that in order to train a semantic segmentation model, a large number of images and predetermined semantic segmentations are necessary.
  • obtaining semantic segmentation data on nighttime images is particularly difficult and time-consuming, especially if an operator has to annotate the nighttime images by hand with semantic labels before feeding the nighttime images to the model.
  • FIR Far-infrared
  • sensors can be another choice, cf. e.g.: A. Gonzalez, Z. Fang, Y. Socarras, J. Serrat, D. Vazquez, J. Xu, and A. M. Lopez. Pedestrian detection at day/night time with visible and fir cameras: A comparison. Sensors, 16(6), 2016. They, however, are expensive and only provide images of relatively low-resolution.
  • the present invention overcomes one or more deficiencies of the prior art by proposing a method for training a semantic segmentation model performing semantic segmentation of images taken at nighttime, comprising: a - obtaining a first set of labelled images taken at daylight, the labelled images being annotated with predefined semantic segmentation labels, b - training a semantic segmentation model using the first set of labelled images,
  • step c applying the semantic segmentation model of step b to a second set of unlabeled images taken at twilight of a first predefined degree, where solar illumination is less than at daylight and more than at nighttime, to obtain semantic segmentations of the images (desirably of each image) of the second set,
  • the present disclosure desirably adopts visible light cameras for semantic segmentation of nighttime road scenes.
  • large-scale datasets are available for daytime images (videos) by visible light cameras, cf. e.g.: M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele.
  • M. Cordts M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele.
  • the cityscapes dataset for semantic urban scene understanding In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  • the present disclosure proposes to depart from this traditional paradigm of manually labelling all images and propose another route, also different from moving to learning with synthetic scenes (as e.g. disclosed in International Journal of Computer Vision, Sakaridis, C., Dai, D., Van Gool, L, 2018: "Semantic foggy scene understanding with synthetic data”).
  • it is proposed to progressively adapt the semantic models trained for daytime scenes to nighttime scenes, by using images taken at the twilight time. Accordingly, the method is based on progressively self-learning scheme.
  • the present disclosure makes model adaptation from daytime to nighttime feasible.
  • the images of the first set of labelled images and of the second set of unlabeled images may represent each a driving scene.
  • the second set of unlabeled images may represent different driving or similar driving scenes when compared to the first set of labelled images.
  • an untrained semantic segmentation model (e.g. having untrained random parameters or weights) may be trained using the first set of labelled images and the second set of labelled images.
  • the already trained semantic segmentation model of step b may be further trained, e.g. based on a mixed combination of the first set of labelled images and the second set of labelled images.
  • the second set of labelled images may be automatically be obtained by providing the second set of unlabeled images and the related semantic segmentations of the images of the second set, i.e. as a result of the segmentation preformed in step c.
  • the method may further comprise (e.g. after step e):
  • step e applying the semantic segmentation model of step e to a third set of unlabeled images taken at twilight of a second predefined degree, where solar illumination is less than at twilight of the first predefined degree and more than at nighttime, to obtain semantic segmentations of the images of the third set,
  • the semantic models for daytime scenes can be trained even more progressively to nighttime scenes, by using images taken at a first more illuminated twilight and a second darker twilight.
  • an untrained semantic segmentation model (e.g. having untrained random parameters or weights) may be trained using the first, second and third sets of labelled images.
  • the already trained semantic segmentation model of step e may be further trained, e.g. based on a mixed combination of the first, second and third sets of labelled images.
  • step e' the semantic segmentation model might also be trained using only the first set of labelled images and the third set of labelled images. This option might be used e.g. when in step e' the model already trained in step e is further trained.
  • the method may further comprise (e.g. after step e'):
  • step e' applying the semantic segmentation model of step e' to a fourth set of unlabeled images taken at twilight of a third predefined degree, where solar illumination is less than at twilight of the second predefined degree and more than at nighttime, to obtain semantic segmentations of the images of the fourth set,
  • the semantic models for daytime scenes can be trained even more progressively to nighttime scenes, by using images taken at a first more illuminated twilight; a second darker twilight and a third even darker twilight.
  • an untrained semantic segmentation model (e.g. having untrained random parameters or weights) may be trained using the first, second, third and fourth sets of labelled images.
  • the already trained semantic segmentation model of step e' may be further trained, e.g. based on a mixed combination of the first, second, third and fourth sets of labelled images.
  • step e" the semantic segmentation model might also be trained using only the first set of labelled images and the fourth set of labelled images. This option might be used e.g. when in step e" the model already trained in step e' is further trained.
  • the above described progressive adaptation may be continued by adding one or several further sets of increasingly darker twilight images are added.
  • the semantic segmentation may be progressively adapted to be usable for semantic segmentation of images taken at nighttime, by repeating the sequence of steps c to e for one or several times (e.g. in sequences c' to e', c" to e", c"' to e"', and so on), wherein in each subsequent sequence c' to e', c" to e", c"' to e'"..., a further set of unlabeled images taken at increasingly darker twilight is added in step c', c", c'" ... and so on and the semantic segmentation model is trained in step e', e", e'"... using all sets of labelled images.
  • the sets of labelled images may be mixed (or in other words: sampled) proportionally to form a combined set of labelled images which is used for training the semantic segmentation model.
  • Proportionally mixing may mean that sets of labelled images may be mixed according to their quantity of (sample) images. For example, in case a first set comprise the double quantity compared to the images of a second set, the combined set may comprise repeatedly two images of the first set and then one image of the second set.
  • the twilight of the first predefined degree corresponds to civil twilight.
  • civil twilight may be defined by a solar elevation angle in the range of 0 to 6 degree below horizon.
  • the twilight of the second predefined degree corresponds to nautical twilight.
  • nautical twilight is defined by a solar elevation angle in the range of 6 to 12 degree below horizon
  • the twilight of the third predefined degree corresponds to astronomical twilight.
  • astronomical twilight is defined by a solar elevation angle in the range of 12 to 18 degree below horizon.
  • nighttime there is no direct illumination by the sun, either and more desirably also the atmosphere is not illuminated. Instead there may only be illumination by the moon and stars and by artificial illumination.
  • nighttime is defined by a solar elevation angle of more than 18 degree below horizon.
  • daylight desirably comprises direct solar illumination where the sun is above the horizon.
  • Direct solar illumination more desirably also includes light scattering by e.g. clouds.
  • Step e may comprise:
  • is the number of images of the first set of labelled images
  • x? is an image of index i in the first set of labelled images
  • y is the predefined semantic segmentation label of image x ⁇ 5
  • L(x,y) is the cross entropy loss function of x and y
  • l 1 is the number of images of the second set of labelled images l 1 is a predefined weight
  • xf is an image of index j in the second set of labelled images
  • yf is the predefined semantic segmentation label of image xf .
  • Step e' may comprise:
  • is the number of images of the first set of labelled images
  • ? is an image of index i in the first set of labelled images
  • y is the predefined semantic segmentation label of image x £ °
  • L(x,y) is the cross entropy loss function of x and y
  • xf is an image of index j in the second set of labelled images
  • yf is the predefined semantic segmentation label of image xf
  • 1 2 is the number of images of the third set of labelled images l 2 is a predefined weight
  • Step e" may comprise:
  • is the number of images of the first set of labelled images
  • x? is an image of index i in the first set of labelled images
  • L(x,y) is the cross entropy loss function of x and y
  • FH* is the output of the semantic segmentation model
  • f 1 is the number of images of the second set of labelled images
  • l 1 is a predefined weight
  • xf is an image of index j in the second set of labelled images
  • 1 2 is the number of images of the third set of labelled images
  • L 2 is a predefined weight
  • x is an image of index k in the third set of labelled images
  • yl is the predefined semantic segmentation label of image x .
  • 1 3 is the number of images of the fourth set of labelled images
  • a 3 is a predefined weight
  • x3 ⁇ 4 is an image of index q in the fourth set of labelled images
  • is the predefined semantic segmentation label of image x 3 .
  • the present disclosure further relates to a semantic segmentation method comprising using the model of step e, e' or e", as described above.
  • the present disclosure further relates to a system for training a semantic segmentation model performing semantic segmentation of images taken at nighttime, comprising: a module A for obtaining a first set of labelled images taken at daylight, the labelled images being annotated with predefined semantic segmentation labels,
  • the system may comprise further features or modules corresponding to the steps of the method described above, e.g. modules C' to E' and C" to E" corresponding to steps c' to e' and c" to e", respectively.
  • this system may be configured to perform all the embodiments of the method as defined above.
  • the present disclosure further relates to a system for semantic segmentation of an image comprising the model of step e, e' or e" or of module E as described above.
  • the present disclosure further relates to a computer program including instructions for executing the steps of a method as described above when said program is executed by a computer.
  • This program can use any programming language and take the form of source code, object code or a code intermediate between source code and object code, such as a partially compiled form, or any other desirable form.
  • the present disclosure further relates to a recording medium readable by a computer and having recorded thereon a computer program including instructions for executing the steps of a method as described above.
  • the information medium can be any entity or device capable of storing the program.
  • the medium can include storage means such as a ROM, for example a CD ROM or a microelectronic circuit ROM, or magnetic storage means, for example a diskette (floppy disk) or a hard disk.
  • the information medium can be an integrated circuit in which the program is incorporated, the circuit being adapted to execute the method in question or to be used in its execution.
  • FIG. 1 shows a schematic flow chart of the steps of for training a model performing semantic segmentation of nighttime images according to embodiments of the present disclosure
  • Fig. 2 shows a schematic block diagram of a system with an electronic device according to embodiments of the present disclosure.
  • Fig. 1 shows a schematic flow chart of the steps of for training a model performing semantic segmentation of nighttime images according to embodiments of the present disclosure.
  • This model may be, initially, a neural network or a convolutional neural network which may have been conceived to perform semantic segmentation on images. However, initially, the model has not been trained to perform semantic segmentation of nighttime images.
  • the images which may be processed by the model may be photographs taken by image sensors.
  • a plurality of objects may be visible on these images, preferably objects of different types which may or may not overlap.
  • the images show a scene which may be visible from a vehicle on a road, for example in a street.
  • a first set of labelled daylight images is obtained.
  • the labelled images are annotated with predefined semantic segmentation labels.
  • an image is denoted by x, and the image taken at daytime, civil twilight time, nautical twilight time, astronomical twilight
  • 1 i0o time and nighttime is indicate by x°, x 1 , x 2 , x 3 , and x 4 , respectively.
  • the corresponding human annotation for x° is provided and denoted by y° where y°(m; n) E ⁇ 1, ..., C ⁇ the label of pixel (m; n), and C is the total number of classes.
  • the training data consist of labeled data at daytime
  • I 1 , I 2 , and I 3 are the total number of images in the corresponding dataset.
  • step S02 a semantic segmentation model is trained using the first set of labelled images.
  • step S03 the trained semantic segmentation model is applied to a second set of unlabeled images 102 of a first twilight degree, e.g. civil twilight. Consequently, semantic segmentations 102' of the images of the second set are obtained.
  • a first twilight degree e.g. civil twilight.
  • step S04 the second set of unlabeled images 102 is labelled with the semantic segmentations 102' of the images of the second set to obtain a second set of labelled images 102".
  • step S05 the semantic segmentation model is trained using a combination of the sets of labelled images 101, 102".
  • is the number of images of the first set of labelled images
  • x is an image of index i in the first set of labelled images
  • is the predefined semantic segmentation label of image x°
  • L(x, y) is the cross entropy loss function of x and y
  • i 1 is the number of images of the second set of labelled images
  • L 1 is a predefined weight, in particular a hyper-parameter balancing the weights of the two data sources,
  • xj is an image of index j in the second set of labelled images
  • yf is the predefined semantic segmentation label of image xj.
  • steps S03 to S05 may be repeated for one or a plurality of times as steps S03' to S05', S03" to S05", etc. (i.e. c' to e', c" to e", etc.).
  • the trained semantic segmentation model to may be applied to a further set of unlabeled images 103, 104, etc. of a further increasingly darker twilight degree, e.g. for nautical and astronomical twilight, and semantic segmentations 103', 104', etc. of the images of the further set may be obtained.
  • the further set of unlabeled images 103, 104, etc. may be labelled with the semantic segmentations 103', 104', etc. of the images of the further set to obtain a further set of labelled images 103", 104", etc.
  • the semantic segmentation model may be trained using a combination of the sets of all labelled images (101, 102", 103", 104", etc.).
  • step e' (adding images of e.g. nautical twilight) comprises fine-tuning (e.g. retraining) the semantic model on D° and D 1 ,
  • l 2 is the number of images of the third set of labelled images
  • l 2 is a predefined weight, in particular a hyper-parameter balancing the weight of D 2 ,
  • x ⁇ is an image of index k in the third set of labelled images
  • yl is the predefined semantic segmentation label of image x .
  • step e" (adding images of e.g. astronomical twilight) comprises fine-tuning (e.g. retraining) the semantic model on D° and D 1 , D 2 , D 3 ⁇ f 4- f 2 , (6) and then (eq. 7):
  • l 3 is the number of images of the fourth set of labelled images
  • a 3 is a predefined weight, in particular a hyper-parameter balancing the weight of D 3 ,
  • x is an image of index q in the fourth set of labelled images
  • the resulting model may then be applied to nighttime images for performing image seglentation.
  • the method may be termed Progressive Model Adaptation.
  • equal weight may be e empirically given to all training datasets.
  • An optimal value can be obtained via cross- validation.
  • the optimization of Equation 3, Equation 5 and Equation 7 are implemented by feeding to the training algorithm a stream of hybrid data, for which images in the considered datasets are sampled (i.e. mixed) proportionally according to the parameters l 1 , l 2 , and l 3 . For example, they all may be set to 1, which means all datasets are sampled at the same rate.
  • Fig. 2 shows a schematic block diagram of a system with an electronic device according to embodiments of the present disclosure.
  • a system 200 for training a model has been represented.
  • This system 200 which may be a computer, comprises a processor 201 and a non-volatile memory 202.
  • the system 200 may also comprise, be configured to be integrated in or form a part of a vehicle 400.
  • the system 200 may not only be configured for training a semantic segmentation model but also to apply the trained model on nighttime images (in particular in case it is part of a vehicle 400).
  • the system 200 may further be connected to a (passive) optical sensor 300, in particular a digital camera.
  • the digital camera 300 is configured such that it can record a scene in front of the vehicle 400, and in particular output digital data providing appearance (color, e.g. RGB) information of the scene.
  • the camera 300 is desirably generates image data comprising a 2D or 3D image of the environment.
  • a set of instructions is stored and this set of instructions comprises instructions to perform a method for training a model.
  • these instructions and the processor 201 may respectively form a plurality of modules:

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The present invention relates to a method and system for training a semantic segmentation model performing semantic segmentation of images taken at nighttime, comprising: a - obtaining (S01) a first set of labelled images (101) taken at daylight, the labelled images being annotated with predefined semantic segmentation labels, b - training (S02) a semantic segmentation model using the first set of labelled images, c - applying (S03) the semantic segmentation model of step b to a second set of unlabeled images (102) taken at twilight of a first predefined degree, where solar illumination is less than at daylight and more than at nighttime, to obtain semantic segmentations (102') of the images of the second set, d - labelling (S04) the second set of unlabeled images (102) with the semantic segmentations (102') of the images of the second set to obtain a second set of labelled images (102"), and e - training (S05) the semantic segmentation model using the first set of labelled images (101) and the second set of labelled images (102").

Description

A method and a system for training a model performing semantic segmentation of nighttime images
Field of the invention
The present invention relates to the field of image processing, and more precisely to the semantic segmentation of images taken at night time, i.e. without solar illumination.
Description of the Related Art
Semantic image segmentation is a method to automatically determine the semantic labels of the objects which appear in an image. For example, the image may be acquired by a camera mounted in a vehicle. Semantic segmentation of such an image allows recognizing cars, pedestrians, traffic lanes, etc. Therefore, semantic segmentation is the backbone technique for autonomous driving systems or other automated systems.
Semantic image segmentation typically uses models such as neural networks to perform the segmentation. These models need to be trained.
Training a model typically comprises inputting known images to the model. For these images, a predetermined semantic segmentation is already known (an operator may have prepared the predetermined semantic segmentations of each image by annotating the images). The output of the model is then evaluated in view of the predetermined semantic segmentation, and the parameters of the model are adjusted if the output of the model differs from the predetermined semantic segmentation of an image. It follows that in order to train a semantic segmentation model, a large number of images and predetermined semantic segmentations are necessary.
Is has been observed that the illumination condition at nighttime (in particular when there is no direct solar illumination but e.g. only electrical street lights, i.e. nocturnal artificial lighting) creates visibility problems for drivers and for automated systems. While sensors and computer vision algorithms are constantly getting better, the improvements are usually benchmarked with images taken during daylight time. Those methods often fail to work in nighttime condition. This prevents the automated systems from actually being used: it is not conceivable for a vehicle to avoid nighttime, and the vehicle has to be able to distinguish different objects during in daytime and nighttime conditions.
Compared to daylight, nocturnal artificial lighting degrades the visibility of a scene significantly, according to the darkness of the driving scene.
It is thus desirable to train semantic segmentation models with nighttime images (images taken at night time with no direct solar illumination but e.g. only electrical street lights, i.e. nocturnal artificial lighting).
However, obtaining semantic segmentation data on nighttime images (for example nighttime pictures taken by a camera) is particularly difficult and time-consuming, especially if an operator has to annotate the nighttime images by hand with semantic labels before feeding the nighttime images to the model.
With regard to another problem in the field of image segmentation it has been proposed a method for dense fog scene understanding by using both synthetic and real fog (cf. : European Conference of Computer Vision, Sakaridis, C, Dai, D., Hecker, S., Van Gool, L, 2018, or: International Journal of Computer Vision, Sakaridis, C., Dai, D., Van Gool, L, 2018: "Semantic foggy scene understanding with synthetic data")· In their work, images taken under light fog are used as a bridge to transfer semantic knowledge from clear-weather condition to dense-fog condition. However, due to the different objective and in particular due to the different characteristics of foggy images compared to nighttime images (in particular with nocturnal artificial lighting), the proposed learning algorithm is not suitable for the objective of the present disclosure.
Robust object recognition using visible light cameras remains a difficult problem. This is because the structural, textural and/or color features needed for object recognition sometimes do not exist or highly disbursed by artificial lights, to the point where it is difficult to recognize the objects even for human. The problem is further compounded by camera noise and motion blur. Far-infrared (FIR) cameras can be another choice, cf. e.g.: A. Gonzalez, Z. Fang, Y. Socarras, J. Serrat, D. Vazquez, J. Xu, and A. M. Lopez. Pedestrian detection at day/night time with visible and fir cameras: A comparison. Sensors, 16(6), 2016. They, however, are expensive and only provide images of relatively low-resolution.
It is a primary object of the invention to provide methods and system that overcome the deficiencies of the currently available systems and methods.
In particular, it remains desirable to reliably train a model performing semantic segmentation of nighttime images, which does not require labelled nighttime images, and to provide a semantic segmentation model for semantic segmentation of nighttime images.
Summary of the invention
The present invention overcomes one or more deficiencies of the prior art by proposing a method for training a semantic segmentation model performing semantic segmentation of images taken at nighttime, comprising: a - obtaining a first set of labelled images taken at daylight, the labelled images being annotated with predefined semantic segmentation labels, b - training a semantic segmentation model using the first set of labelled images,
c - applying the semantic segmentation model of step b to a second set of unlabeled images taken at twilight of a first predefined degree, where solar illumination is less than at daylight and more than at nighttime, to obtain semantic segmentations of the images (desirably of each image) of the second set,
d - labelling the second set of unlabeled images (desirably each image of the second set) with the semantic segmentations of the images of the second set to obtain a second set of labelled images, and
e - training the semantic segmentation model using the first set of labelled images and the second set of labelled images.
Accordingly, the present disclosure desirably adopts visible light cameras for semantic segmentation of nighttime road scenes. Another reason of this desired choice is that large-scale datasets (with annotations) are available for daytime images (videos) by visible light cameras, cf. e.g.: M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele. The cityscapes dataset for semantic urban scene understanding. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
In other words, to overcome the problem of collecting and annotating images for all illumination conditions, the present disclosure proposes to depart from this traditional paradigm of manually labelling all images and propose another route, also different from moving to learning with synthetic scenes (as e.g. disclosed in International Journal of Computer Vision, Sakaridis, C., Dai, D., Van Gool, L, 2018: "Semantic foggy scene understanding with synthetic data"). Instead, it is proposed to progressively adapt the semantic models trained for daytime scenes to nighttime scenes, by using images taken at the twilight time. Accordingly, the method is based on progressively self-learning scheme.
Hence, the present disclosure makes model adaptation from daytime to nighttime feasible.
The images of the first set of labelled images and of the second set of unlabeled images may represent each a driving scene.
The second set of unlabeled images may represent different driving or similar driving scenes when compared to the first set of labelled images.
In step e an untrained semantic segmentation model (e.g. having untrained random parameters or weights) may be trained using the first set of labelled images and the second set of labelled images. Alternatively, the already trained semantic segmentation model of step b may be further trained, e.g. based on a mixed combination of the first set of labelled images and the second set of labelled images.
In step d the second set of labelled images may be automatically be obtained by providing the second set of unlabeled images and the related semantic segmentations of the images of the second set, i.e. as a result of the segmentation preformed in step c.
The method may further comprise (e.g. after step e):
c' - applying the semantic segmentation model of step e to a third set of unlabeled images taken at twilight of a second predefined degree, where solar illumination is less than at twilight of the first predefined degree and more than at nighttime, to obtain semantic segmentations of the images of the third set,
d' - labelling the third set of unlabeled images with the semantic segmentations of the images of the third set to obtain a third set of labelled images, e' - training the semantic segmentation model using the first set of labelled images, the second set of labelled images and the third set of labelled images.
Accordingly, the semantic models for daytime scenes can be trained even more progressively to nighttime scenes, by using images taken at a first more illuminated twilight and a second darker twilight.
In step e' an untrained semantic segmentation model (e.g. having untrained random parameters or weights) may be trained using the first, second and third sets of labelled images. Alternatively, the already trained semantic segmentation model of step e may be further trained, e.g. based on a mixed combination of the first, second and third sets of labelled images.
In step e' the semantic segmentation model might also be trained using only the first set of labelled images and the third set of labelled images. This option might be used e.g. when in step e' the model already trained in step e is further trained.
The method may further comprise (e.g. after step e'):
c" - applying the semantic segmentation model of step e' to a fourth set of unlabeled images taken at twilight of a third predefined degree, where solar illumination is less than at twilight of the second predefined degree and more than at nighttime, to obtain semantic segmentations of the images of the fourth set,
d" - labelling the fourth set of unlabeled images with the semantic segmentations of the images of the third set to obtain a fourth set of labelled images,
e" - training the semantic segmentation model using the first set of labelled images, the second set of labelled images, the third set of labelled images and the fourth set of labelled images.
Accordingly, the semantic models for daytime scenes can be trained even more progressively to nighttime scenes, by using images taken at a first more illuminated twilight; a second darker twilight and a third even darker twilight.
In step e" an untrained semantic segmentation model (e.g. having untrained random parameters or weights) may be trained using the first, second, third and fourth sets of labelled images. Alternatively, the already trained semantic segmentation model of step e' may be further trained, e.g. based on a mixed combination of the first, second, third and fourth sets of labelled images.
In step e" the semantic segmentation model might also be trained using only the first set of labelled images and the fourth set of labelled images. This option might be used e.g. when in step e" the model already trained in step e' is further trained.
The above described progressive adaptation may be continued by adding one or several further sets of increasingly darker twilight images are added.
In other words, the semantic segmentation may be progressively adapted to be usable for semantic segmentation of images taken at nighttime, by repeating the sequence of steps c to e for one or several times (e.g. in sequences c' to e', c" to e", c"' to e"', and so on), wherein in each subsequent sequence c' to e', c" to e", c"' to e'"..., a further set of unlabeled images taken at increasingly darker twilight is added in step c', c", c'" ... and so on and the semantic segmentation model is trained in step e', e", e'"... using all sets of labelled images.
The sets of labelled images may be mixed (or in other words: sampled) proportionally to form a combined set of labelled images which is used for training the semantic segmentation model.
Proportionally mixing may mean that sets of labelled images may be mixed according to their quantity of (sample) images. For example, in case a first set comprise the double quantity compared to the images of a second set, the combined set may comprise repeatedly two images of the first set and then one image of the second set.
The twilight of the first predefined degree corresponds to civil twilight. In particular, civil twilight may be defined by a solar elevation angle in the range of 0 to 6 degree below horizon.
The twilight of the second predefined degree corresponds to nautical twilight. In particular, nautical twilight is defined by a solar elevation angle in the range of 6 to 12 degree below horizon
The twilight of the third predefined degree corresponds to astronomical twilight. In particular, astronomical twilight is defined by a solar elevation angle in the range of 12 to 18 degree below horizon.
Accordingly, since in each of the above mentioned cases the sun is below the horizon, no direct illumination by the sun is possible but only indirect illumination by the atmosphere still illuminated by the sun.
Desirably, in nighttime there is no direct illumination by the sun, either and more desirably also the atmosphere is not illuminated. Instead there may only be illumination by the moon and stars and by artificial illumination. For example, nighttime is defined by a solar elevation angle of more than 18 degree below horizon.
In contrast, daylight desirably comprises direct solar illumination where the sun is above the horizon. Direct solar illumination more desirably also includes light scattering by e.g. clouds.
Step e may comprise:
Figure imgf000010_0001
wherein:
1° is the number of images of the first set of labelled images,
x? is an image of index i in the first set of labelled images,
y is the predefined semantic segmentation label of image x·5, L(x,y) is the cross entropy loss function of x and y,
1 (A:-)) is the output of the semantic segmentation model, l1 is the number of images of the second set of labelled images l1 is a predefined weight,
xf is an image of index j in the second set of labelled images, yf is the predefined semantic segmentation label of image xf .
Step e' may comprise:
Figure imgf000011_0001
wherein:
1° is the number of images of the first set of labelled images, ? is an image of index i in the first set of labelled images, y is the predefined semantic segmentation label of image x£°, L(x,y) is the cross entropy loss function of x and y,
01(xi°) is the output of the semantic segmentation model,
11 is the number of images of the second set of labelled images L1 is a predefined weight,
xf is an image of index j in the second set of labelled images, yf is the predefined semantic segmentation label of image xf
12 is the number of images of the third set of labelled images l2 is a predefined weight,
x is an image of index k in the third set of labelled images, y is the predefined semantic segmentation label of image x . Step e" may comprise:
Figure imgf000012_0001
wherein:
Z° is the number of images of the first set of labelled images,
x? is an image of index i in the first set of labelled images,
y°is the predefined semantic segmentation label of image *·3,
L(x,y) is the cross entropy loss function of x and y,
FH*?) is the output of the semantic segmentation model,
f1 is the number of images of the second set of labelled images
l1 is a predefined weight,
xf is an image of index j in the second set of labelled images,
yf is the predefined semantic segmentation label of image xf
12 is the number of images of the third set of labelled images
L2 is a predefined weight,
x is an image of index k in the third set of labelled images,
yl is the predefined semantic segmentation label of image x .
13 is the number of images of the fourth set of labelled images
A3 is a predefined weight,
x¾ is an image of index q in the fourth set of labelled images,
y| is the predefined semantic segmentation label of image x3.
The present disclosure further relates to a semantic segmentation method comprising using the model of step e, e' or e", as described above.
The present disclosure further relates to a system for training a semantic segmentation model performing semantic segmentation of images taken at nighttime, comprising: a module A for obtaining a first set of labelled images taken at daylight, the labelled images being annotated with predefined semantic segmentation labels,
a module B training a semantic segmentation model using the first set of labelled images,
a module C for applying the semantic segmentation model of step b to a second set of unlabeled images taken at twilight of a first predefined degree, where solar illumination is less than at daylight and more than at nighttime, to obtain semantic segmentations of the images of the second set,
a module D for labelling the second set of unlabeled images with the semantic segmentations of the images of the second set to obtain a second set of labelled images,
a module E for training the semantic segmentation model using the first set of labelled images and the second set of labelled images.
The system may comprise further features or modules corresponding to the steps of the method described above, e.g. modules C' to E' and C" to E" corresponding to steps c' to e' and c" to e", respectively. Hence, this system may be configured to perform all the embodiments of the method as defined above.
The present disclosure further relates to a system for semantic segmentation of an image comprising the model of step e, e' or e" or of module E as described above.
The present disclosure further relates to a computer program including instructions for executing the steps of a method as described above when said program is executed by a computer.
This program can use any programming language and take the form of source code, object code or a code intermediate between source code and object code, such as a partially compiled form, or any other desirable form. The present disclosure further relates to a recording medium readable by a computer and having recorded thereon a computer program including instructions for executing the steps of a method as described above.
The information medium can be any entity or device capable of storing the program. For example, the medium can include storage means such as a ROM, for example a CD ROM or a microelectronic circuit ROM, or magnetic storage means, for example a diskette (floppy disk) or a hard disk.
Alternatively, the information medium can be an integrated circuit in which the program is incorporated, the circuit being adapted to execute the method in question or to be used in its execution.
It is intended that combinations of the above-described elements and those within the specification may be made, except where otherwise contradictory.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure, as claimed.
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, and serve to explain the principles thereof.
Brief description of the drawings
How the present invention may be put into effect will now be described by way of example with reference to the appended drawings, in which: Fig. 1 shows a schematic flow chart of the steps of for training a model performing semantic segmentation of nighttime images according to embodiments of the present disclosure; and
Fig. 2 shows a schematic block diagram of a system with an electronic device according to embodiments of the present disclosure.
Description of the embodiments
Reference will now be made in detail to exemplary embodiments of the disclosure, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts.
Training a segmentation model with a large amount of human annotations should work for nighttime images, similar to what has been achieved for daytime scene understanding. However, applying this protocol to other illumination conditions is problematic as it is hardly affordable to annotate the same amount of data for all different conditions and their combinations. It is thus proposed to depart from this protocol and propose an automated approach to transfer the knowledge from existing annotations of daytime scenes to nighttime scenes. The approach leverages the fact that illumination changes continuously between daytime and nighttime, through the twilight time. Twilight is the time between dawn and sunrise, or between sunset and dusk. Twilight is defined according to the solar elevation angle, which is the position of the geometric center of the sun relative to the horizon, cf. e.g. Definitions from the US astronomical applications dept (usno). Retrieved 2011-07-22.
During a large portion of twilight time, solar illumination suffices enough for cameras to capture the terrestrial objects and suffices enough to alleviate the interference of artificial lights to a limited amount. These observations lead to our conjecture that the domain discrepancy between daytime scenes and twilight scenes, and the domain discrepancy between twilight scenes and nighttime scenes are both smaller than the domain discrepancy between daytime scenes and nighttime scenes. Thus, images captured during twilight time can in principle be used to serve our purpose — knowledge transfer from daytime to nighttime. That is, twilight time constructs a bridge for knowledge transfer from our source domain daytime to our target domain nighttime.
In particular, it is proposed to train a semantic segmentation model on daytime images using the standard supervised learning paradigm, and apply the model to a large dataset recorded at civil twilight time to generate the class responses. The three subgroups of twilight are used: civil twilight, nautical twilight, and astronomical twilight. Since the domain gap between daytime condition and civil twilight condition is relatively small, these class responses, along with the images, can then be used to fine-tune the semantic segmentation model so that it can adapt to civil twilight time. The same procedure is continued through nautical twilight and astronomical twilight. Then the final fine-tuned model may be applied to nighttime images. In other words, the semantic knowledge of annotations of daytime scenes may be transferred to nighttime scenes via the unlabeled images recorded at twilight time.
Fig. 1 shows a schematic flow chart of the steps of for training a model performing semantic segmentation of nighttime images according to embodiments of the present disclosure.
This model may be, initially, a neural network or a convolutional neural network which may have been conceived to perform semantic segmentation on images. However, initially, the model has not been trained to perform semantic segmentation of nighttime images.
The images which may be processed by the model (after complete training) may be photographs taken by image sensors. A plurality of objects may be visible on these images, preferably objects of different types which may or may not overlap.
By way of example, the images show a scene which may be visible from a vehicle on a road, for example in a street.
In a first step SOI, a first set of labelled daylight images is obtained. The labelled images are annotated with predefined semantic segmentation labels.
In an example, an image is denoted by x, and the image taken at daytime, civil twilight time, nautical twilight time, astronomical twilight
1 i0o time and nighttime is indicate by x°, x1, x2, x3, and x4, respectively. The corresponding human annotation for x° is provided and denoted by y° where y°(m; n) E {1, ..., C} the label of pixel (m; n), and C is the total number of classes. Then, the training data consist of labeled data at daytime
1155
T>° = {(x® , y®) }J= 1 and three unlabeled datasets for the three twilight categories:
Figure imgf000017_0001
2200
Figure imgf000017_0002
where 1°, I1, I2, and I3 are the total number of images in the corresponding dataset.
In step S02 a semantic segmentation model is trained using the first set of labelled images.
In the example this is done by:
Figure imgf000017_0003
where L(.,.) is the cross entropy loss function.
In step S03 the trained semantic segmentation model is applied to a second set of unlabeled images 102 of a first twilight degree, e.g. civil twilight. Consequently, semantic segmentations 102' of the images of the second set are obtained.
In the example this is done by apply segmentation model 0°(x°) to the images recorded at civil twilight time to obtain "noisy" semantic labels: y1 = (f (x1 )
In step S04 the second set of unlabeled images 102 is labelled with the semantic segmentations 102' of the images of the second set to obtain a second set of labelled images 102".
In the example this is done by augmenting the dataset D1 to
Figure imgf000018_0001
In step S05 the semantic segmentation model is trained using a combination of the sets of labelled images 101, 102".
In the example this is done by fine-tuning (e.g. retraining) the semantic model on D° and D1:
F1 <- F0, (2) and then (eq. 3):
Figure imgf000018_0002
wherein:
1° is the number of images of the first set of labelled images,
x is an image of index i in the first set of labelled images,
y° is the predefined semantic segmentation label of image x°,
L(x, y) is the cross entropy loss function of x and y,
1(x ) is the output of the semantic segmentation model, i1 is the number of images of the second set of labelled images
L1 is a predefined weight, in particular a hyper-parameter balancing the weights of the two data sources,
xj is an image of index j in the second set of labelled images,
yf is the predefined semantic segmentation label of image xj.
The sequence of steps S03 to S05 (or c to e) may be repeated for one or a plurality of times as steps S03' to S05', S03" to S05", etc. (i.e. c' to e', c" to e", etc.).
In each sequence in step c', c", etc. the trained semantic segmentation model to may be applied to a further set of unlabeled images 103, 104, etc. of a further increasingly darker twilight degree, e.g. for nautical and astronomical twilight, and semantic segmentations 103', 104', etc. of the images of the further set may be obtained.
In a subsequent step d', d", etc. the further set of unlabeled images 103, 104, etc. may be labelled with the semantic segmentations 103', 104', etc. of the images of the further set to obtain a further set of labelled images 103", 104", etc.
In a subsequent step e', e", etc. the semantic segmentation model may be trained using a combination of the sets of all labelled images (101, 102", 103", 104", etc.).
In the example, step e' (adding images of e.g. nautical twilight) comprises fine-tuning (e.g. retraining) the semantic model on D° and D1,
Figure imgf000019_0001
Figure imgf000020_0001
wherein:
l2 is the number of images of the third set of labelled images
l2 is a predefined weight, in particular a hyper-parameter balancing the weight of D2,
x\ is an image of index k in the third set of labelled images,
yl is the predefined semantic segmentation label of image x .
In the example, step e" (adding images of e.g. astronomical twilight) comprises fine-tuning (e.g. retraining) the semantic model on D° and D1, D2, D3 \ f 4- f 2 , (6) and then (eq. 7):
Figure imgf000020_0002
wherein:
l3 is the number of images of the fourth set of labelled images
A3 is a predefined weight, in particular a hyper-parameter balancing the weight of D3,
x is an image of index q in the fourth set of labelled images,
3?| is the predefined semantic segmentation label of image x¾ .
The resulting model may then be applied to nighttime images for performing image seglentation.
The method may be termed Progressive Model Adaptation. During training, in order to balance the weights of different data sources (in Equation 3, Equation 5 and Equation 7), equal weight may be e empirically given to all training datasets. An optimal value can be obtained via cross- validation. The optimization of Equation 3, Equation 5 and Equation 7 are implemented by feeding to the training algorithm a stream of hybrid data, for which images in the considered datasets are sampled (i.e. mixed) proportionally according to the parameters l1, l2, and l3. For example, they all may be set to 1, which means all datasets are sampled at the same rate.
Rather than applying the model trained on daytime images directly to nighttime images, Progressive Model Adaptation breaks down the problem to three progressive steps to adapt the semantic model. In each of the step, the domain gap is much smaller than the domain gap between daytime domain and nighttime domain. Due to the unsupervised nature of this domain adaptation, the algorithm will also be affected by the noise in the labels. The daytime dataset D1 is always used for the training, to balance between noisy data of similar domains and clean data of a distinct domain.
The steps of the method described in reference to figure 1 can be determined by computer program instructions. These instructions can be executed by a processor of a system, as represented on figure 2.
Fig. 2 shows a schematic block diagram of a system with an electronic device according to embodiments of the present disclosure.
In this figure, a system 200 for training a model has been represented. This system 200, which may be a computer, comprises a processor 201 and a non-volatile memory 202. The system 200 may also comprise, be configured to be integrated in or form a part of a vehicle 400. The system 200 may not only be configured for training a semantic segmentation model but also to apply the trained model on nighttime images (in particular in case it is part of a vehicle 400). The system 200 may further be connected to a (passive) optical sensor 300, in particular a digital camera. The digital camera 300 is configured such that it can record a scene in front of the vehicle 400, and in particular output digital data providing appearance (color, e.g. RGB) information of the scene. The camera 300 is desirably generates image data comprising a 2D or 3D image of the environment. There may also be provided a set of monocular cameras which generate a panoramic 2D or 3D image.
In the non-volatile memory 202, a set of instructions is stored and this set of instructions comprises instructions to perform a method for training a model.
In particular, these instructions and the processor 201 may respectively form a plurality of modules:
a module A for obtaining a first set of labelled images (101) taken at daylight, the labelled images being annotated with predefined semantic segmentation labels,
a module B training a semantic segmentation model using the first set of labelled images,
a module C for applying the semantic segmentation model of step b to a second set of unlabeled images 102 taken at twilight of a first predefined degree, where solar illumination is less than at daylight and more than at nighttime, to obtain semantic segmentations 102' of the images of the second set,
a module D for labelling the second set of unlabeled images 102 with the semantic segmentations 102' of the images of the second set to obtain a second set of labelled images 102",
a module E for training the semantic segmentation model using the first set of labelled images 101 and the second set of labelled images 102".
Although the present invention has been described above with reference to certain specific embodiments, it will be understood that the invention is not limited by the particularities of the specific embodiments. Numerous variations, modifications and developments may be made in the above-described embodiments within the scope of the appended claims.

Claims

Claims
1. A method for training a semantic segmentation model performing semantic segmentation of images taken at nighttime, comprising:
a - obtaining (SOI) a first set of labelled images (101) taken at daylight, the labelled images being annotated with predefined semantic segmentation labels,
b - training (S02) a semantic segmentation model using the first set of labelled images,
c - applying (S03) the semantic segmentation model of step b to a second set of unlabeled images (102) taken at twilight of a first predefined degree, where solar illumination is less than at daylight and more than at nighttime, to obtain semantic segmentations (102') of the images of the second set,
d - labelling (S04) the second set of unlabeled images (102) with the semantic segmentations (102') of the images of the second set to obtain a second set of labelled images (102"), and
e - training (S05) the semantic segmentation model using the first set of labelled images (101) and the second set of labelled images (102").
2. The method of claim 1, further comprising:
c' - applying (SO30 the semantic segmentation model of step e to a third set of unlabeled images (103) taken at twilight of a second predefined degree, where solar illumination is less than at twilight of the first predefined degree and more than at nighttime, to obtain semantic segmentations (1030 of the images of the third set, d' - labelling (S04 the third set of unlabeled images (103) with the semantic segmentations (103') of the images of the third set to obtain a third set of labelled images (103"),
e' - training (S05') the semantic segmentation model using the first set of labelled images (101), the second set of labelled images (102") and the third set of labelled images (103").
3. The method of claim 2, further comprising:
c" - applying (S03") the semantic segmentation model of step e' to a fourth set of unlabeled images (104) taken at twilight of a third predefined degree, where solar illumination is less than at twilight of the second predefined degree and more than at nighttime, to obtain semantic segmentations (104') of the images of the fourth set,
d" - labelling (S04") the fourth set of unlabeled images (104) with the semantic segmentations (104') of the images of the third set to obtain a fourth set of labelled images (104")
e" - training (S05") the semantic segmentation model using the first set of labelled images (101), the second set of labelled images (102"), the third set of labelled images (103") and the fourth set of labelled images (104").
4. The method according to any one of the preceding claims, wherein the semantic segmentation is progressively adapted to be usable for semantic segmentation of images taken at nighttime, by repeating the sequence of steps c to e for one or several times (c' to e', c" to e"), wherein in each subsequent sequence c' to e', c" to e" a further set of unlabeled images taken at increasingly darker twilight is added in step c', c" and the semantic segmentation model is trained in step e', e" using all sets of labelled images (101, 102", 103", 104").
5. The method according to any one of the preceding claims, wherein the sets of labelled images (101, 102", 103", 104") are mixed proportionally to form a combined set of labelled images which is used for training (S05', S05", S05'") the semantic segmentation model.
6. The method according to any one of the preceding claims, wherein
the twilight of the first predefined degree corresponds to civil twilight, an/or
the twilight of the second predefined degree corresponds to nautical twilight, an/or
the twilight of the third predefined degree corresponds to astronomical twilight.
7. The method according to the preceding claim, wherein civil twilight is defined by a solar elevation angle in the range of 0 to 6 degree below horizon,
nautical twilight is defined by a solar elevation angle in the range of 6 to 12 degree below horizon, and
astronomical twilight is defined by a solar elevation angle in the range of 12 to 18 degree below horizon.
8. The method of any one of the preceding claims, wherein step e comprises:
Figure imgf000026_0001
wherein:
1° is the number of images of the first set of labelled images,
x? is an image of index i in the first set of labelled images, y is the predefined semantic segmentation label of image x°,
L(x, y) is the cross entropy loss function of x and y,
is the output of the semantic segmentation model,
l 1 is the number of images of the second set of labelled images
L1 is a predefined weight,
xf is an image of index j in the second set of labelled images,
yf is the predefined semantic segmentation label of image xf .
9. The method of any one of the preceding claims 2-8, wherein step e' comprises:
Figure imgf000027_0001
wherein:
i2 is the number of images of the third set of labelled images
L2 is a predefined weight,
x is an image of index k in the third set of labelled images,
y is the predefined semantic segmentation label of image x\.
10. The method of any one of the preceding claims 3-9, wherein step e" comprises:
Figure imgf000027_0002
wherein: ί3 is the number of images of the fourth set of labelled images
A3 is a predefined weight,
x3 is an image of index q in the fourth set of labelled images,
y¾ is the predefined semantic segmentation label of image x3.
11. A semantic segmentation method comprising using the model of step e, e' or e" of any one of claims 1 to 8 on an image.
12. A system for training a semantic segmentation model performing semantic segmentation of images taken at nighttime, comprising:
a module A for obtaining a first set of labelled images (101) taken at daylight, the labelled images being annotated with predefined semantic segmentation labels,
a module B training a semantic segmentation model using the first set of labelled images,
a module C for applying the semantic segmentation model of step b to a second set of unlabeled images (102) taken at twilight of a first predefined degree, where solar illumination is less than at daylight and more than at nighttime, to obtain semantic segmentations (102') of the images of the second set,
a module D for labelling the second set of unlabeled images (102) with the semantic segmentations (102') of the images of the second set to obtain a second set of labelled images (102"),
a module E for training the semantic segmentation model using the first set of labelled images (101) and the second set of labelled images (102").
13. A system for semantic segmentation of an image comprising the model of step e, e' or e" of any one of claims 1 to 11 or of module E of claim 12.
14. A computer program including instructions for executing the steps of a method according to any one of claims 1 to 11 when said program is executed by a computer.
15. A recording medium readable by a computer and having recorded thereon a computer program including instructions for executing the steps of a method according to any one of claims 1 to 11.
PCT/EP2018/075681 2018-09-21 2018-09-21 A method and a system for training a model performing semantic segmentation of nighttime images WO2020057753A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2018/075681 WO2020057753A1 (en) 2018-09-21 2018-09-21 A method and a system for training a model performing semantic segmentation of nighttime images

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2018/075681 WO2020057753A1 (en) 2018-09-21 2018-09-21 A method and a system for training a model performing semantic segmentation of nighttime images

Publications (1)

Publication Number Publication Date
WO2020057753A1 true WO2020057753A1 (en) 2020-03-26

Family

ID=63683201

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2018/075681 WO2020057753A1 (en) 2018-09-21 2018-09-21 A method and a system for training a model performing semantic segmentation of nighttime images

Country Status (1)

Country Link
WO (1) WO2020057753A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111767831A (en) * 2020-06-28 2020-10-13 北京百度网讯科技有限公司 Method, apparatus, device and storage medium for processing image
WO2021223113A1 (en) * 2020-05-06 2021-11-11 深圳市大疆创新科技有限公司 Metering method, camera, electronic device, and computer-readable storage medium
CN113743410A (en) * 2021-02-09 2021-12-03 京东数字科技控股股份有限公司 Image processing method, apparatus and computer-readable storage medium
US20220113745A1 (en) * 2020-10-09 2022-04-14 Ford Global Technologies, Llc Systems And Methods For Nighttime Delivery Mobile Robot With Hybrid Infrared and Flash Lighting

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
A. GONZALEZ; Z. FANG; Y. SOCARRAS; J. SERRAT; D. VAZQUEZ; J. XU; A. M. LOPEZ: "Pedestrian detection at day/night time with visible and fir cameras: A comparison", SENSORS, vol. 16, no. 6, pages 2016
DAI DENGXIN ET AL: "Dark Model Adaptation: Semantic Image Segmentation from Daytime to Nighttime", 2018 21ST INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), IEEE, 4 November 2018 (2018-11-04), pages 3819 - 3824, XP033470035, ISBN: 978-1-7281-0321-1, [retrieved on 20181207], DOI: 10.1109/ITSC.2018.8569387 *
M. CORDTS; M. OMRAN; S. RAMOS; T. REHFELD; M. ENZWEILER; R. BENENSON; U. FRANKE; S. ROTH; B. SCHIELE: "The cityscapes dataset for semantic urban scene understanding", PROC. OF THE IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR, 2016
SAKARIDIS CHRISTOS ET AL: "Semantic Foggy Scene Understanding with Synthetic Data", INTERNATIONAL JOURNAL OF COMPUTER VISION, KLUWER ACADEMIC PUBLISHERS, NORWELL, US, vol. 126, no. 9, 23 March 2018 (2018-03-23), pages 973 - 992, XP036568429, ISSN: 0920-5691, [retrieved on 20180323], DOI: 10.1007/S11263-018-1072-8 *
SAKARIDIS, C.; DAI, D.; HECKER, S.; VAN GOOL, L., EUROPEAN CONFERENCE OF COMPUTER VISION, 2018
SAKARIDIS, C.; DAI, D.; VAN GOOL, L.: "Semantic foggy scene understanding with synthetic data", INTERNATIONAL JOURNAL OF COMPUTER VISION, 2018
WEI-CHIH HUNG ET AL: "Adversarial Learning for Semi-Supervised Semantic Segmentation", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 22 February 2018 (2018-02-22), XP081107064 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021223113A1 (en) * 2020-05-06 2021-11-11 深圳市大疆创新科技有限公司 Metering method, camera, electronic device, and computer-readable storage medium
CN111767831A (en) * 2020-06-28 2020-10-13 北京百度网讯科技有限公司 Method, apparatus, device and storage medium for processing image
CN111767831B (en) * 2020-06-28 2024-01-12 北京百度网讯科技有限公司 Method, apparatus, device and storage medium for processing image
US20220113745A1 (en) * 2020-10-09 2022-04-14 Ford Global Technologies, Llc Systems And Methods For Nighttime Delivery Mobile Robot With Hybrid Infrared and Flash Lighting
US11609583B2 (en) * 2020-10-09 2023-03-21 Ford Global Technologies, Llc Systems and methods for nighttime delivery mobile robot with hybrid infrared and flash lighting
CN113743410A (en) * 2021-02-09 2021-12-03 京东数字科技控股股份有限公司 Image processing method, apparatus and computer-readable storage medium
CN113743410B (en) * 2021-02-09 2024-04-09 京东科技控股股份有限公司 Image processing method, apparatus and computer readable storage medium

Similar Documents

Publication Publication Date Title
WO2020057753A1 (en) A method and a system for training a model performing semantic segmentation of nighttime images
Wang et al. SFNet-N: An improved SFNet algorithm for semantic segmentation of low-light autonomous driving road scenes
CN110399856B (en) Feature extraction network training method, image processing method, device and equipment
US11132771B2 (en) Bright spot removal using a neural network
Sun et al. Ess: Learning event-based semantic segmentation from still images
CN109410144B (en) End-to-end image defogging processing method based on deep learning
CN109509156B (en) Image defogging processing method based on generation countermeasure model
Zakaria et al. Lane detection in autonomous vehicles: A systematic review
CN110751206A (en) Multi-target intelligent imaging and identifying device and method
WO2020239196A1 (en) System and method for training a generative adversarial model generating image samples of different brightness levels
CN112513928A (en) Method and system for training a model to perform semantic segmentation on a hazy image
WO2020020445A1 (en) A method and a system for processing images to obtain foggy images
Zheng et al. Low-light image and video enhancement: A comprehensive survey and beyond
CN114972934A (en) Comparison self-supervision learning method for remote sensing image representation
CN114419603A (en) Automatic driving vehicle control method and system and automatic driving vehicle
Qin et al. An end-to-end traffic visibility regression algorithm
US10735660B2 (en) Method and device for object identification
Fursa et al. Worsening perception: Real-time degradation of autonomous vehicle perception performance for simulation of adverse weather conditions
US20220284696A1 (en) System and method for training a model to perform semantic segmentation on low visibility images using high visibility images having a close camera view
CN114565597B (en) Night road pedestrian detection method based on YOLO v3-tiny-DB and transfer learning
Li et al. Hybrid Feature based Pyramid Network for Nighttime Semantic Segmentation.
Gomes et al. A Deep Learning Approach for Reconstruction of Color Images in Different Lighting Conditions Based on Autoencoder Technique
Tang et al. NDPC-Net: A dehazing network in nighttime hazy traffic environments
Imam et al. Semantic segmentation under severe imaging conditions
CN113362236B (en) Point cloud enhancement method, point cloud enhancement device, storage medium and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18774027

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18774027

Country of ref document: EP

Kind code of ref document: A1