IL309458A

IL309458A - Automatic creation of a dataset of realistic training images

Info

Publication number: IL309458A
Application number: IL309458A
Authority: IL
Inventors: Savitzki Guy; Ben Tolilz Vania
Original assignee: Israel Aerospace Ind Ltd; Savitzki Guy; Ben Tolilz Vania
Priority date: 2023-12-17
Filing date: 2023-12-17
Publication date: 2025-07-01
Also published as: WO2025134109A1

Description

3 0 AUTOMATIC GENERATION OF DATASET(S) OF REALISTIC TRAINING IMAGES TECHNICAL FIELD The presently disclosed subject matter relates to the field of training of detection algorithms operative to detect targets based on data collected by one or more sensors.

BACKGROUND Images of a scene acquired by a camera mounted on a platform (such as an aerial vehicle) can be processed by a detection algorithm to detect the presence of one or more targets.

The detection algorithm needs to be trained in order to be able to perform a target detection in the images.

In order to perform this training, a dataset including a sufficient number of training images must be generated. A conventional approach uses real images of targets to generate the dataset. However, in practice, it is difficult to obtain real images of targets. There is therefore a need to provide new systems and methods which improve the current state of the art.

GENERAL DESCRIPTION In accordance with certain aspects of the presently disclosed subject matter, there is provided a system comprising one or more processing circuitries configured to obtain a given image of a sequence of images acquired by a sensor, given metadata associated with the given image, wherein the given metadata comprise data informative of the given image, obtain at least one synthetic entity whose representation depends on said at least part of the given metadata, and generate a training image which comprises the given image, or an image derived from said given image, and said at least one synthetic entity, wherein the training image is associated with a label informative of the at least one synthetic entity.

In addition to the above features, the system according to this aspect of the presently disclosed subject matter can optionally comprise one or more of features (i) to (xxxii) below, in any technically possible combination or permutation: PATENT office RECEIVED i. obtaining the at least one synthetic entity comprises obtaining at least one reference synthetic entity and generating the at least one synthetic data based on the at least one reference synthetic data and said at least part of the given metadata; ii. the system is configured to enable user selection of a type of the reference synthetic entity; iii. the label is informative at least one of a position of the least one synthetic entity in the training image, or a type of the least one synthetic entity; iv. the data informative of the given image is informative of acquisition of the given image by the sensor; v. the system is configured to use the training image and the label to train an algorithm to perform target detection in one or more images; vi. the system is configured to use the training image and the label to train an algorithm to perform target detection in one or more images, wherein the one or more images have been acquired by a given sensor, wherein a type of the given sensor matches a type of said sensor according to a matching criterion; vii. the system is configured to use the training image and the label to train an algorithm to perform target detection of one or more targets of a given type in one or more images, wherein the at least one synthetic entity present in the training image is informative of an entity of a type matching the given type according to a matching criterion; viii. the system is configured to, for each given image of a plurality of different images of a sequence of images acquired by a sensor, obtain given metadata associated with the given image, wherein the given metadata comprise data informative of the given image, obtain at least one given synthetic entity whose representation depends on said at least part of the given metadata, and generate a given labelled training image which comprises the given image, or an image derived from said given image, and said at least one given synthetic entity, wherein the given training image is associated with a given label informative of the at least one given synthetic entity, thereby obtaining a dataset of a plurality of labelled training images; ix. the system is configured to use the dataset of labelled training images to train an algorithm to perform target detection; x. the given synthetic entity is of a same type for all of the plurality of images; xi. the system is configured to use at least part of the given metadata to obtain a synthetic entity with a visual representation which is realistic with respect to the given image, according to a criterion; xii. the system is configured to enable user selection of a type of sensor of the sensor which acquired the sequence of images; xiii. the system is configured to enable user selection of one or more climate conditions, wherein the system is configured to use the one or more climate conditions to generate the synthetic entity whose representation depends on said one or more climate conditions; xiv. the given metadata is informative of a given orientation of the sensor in the acquisition of the given image, wherein an orientation of the synthetic entity is determined based on said given orientation; xv. the given metadata is informative of a given field of view or of a given zoom of the sensor in the acquisition of the given image, wherein one or more dimensions of the synthetic entity are determined based on said given field of view or said given zoom; xvi. the system is configured to use at least part of the given metadata to take into account one or more effects associated with the sensor acquisition of the given image, on the representation of said synthetic entity; xvii. the system is configured to use at least part of the given metadata to generate the synthetic entity with a visual representation which simulates an acquisition of the synthetic entity in a scene of the given image acquired by the sensor; xviii. the system is configured to use at least part of the given metadata to generate the synthetic entity with a visual representation which simulates a radiance of the synthetic entity as if the synthetic entity had been acquired by the sensor in a scene of the given image; xix. the given metadata is informative of at least one of: (i) time at which the given image has been acquired by the sensor; (ii) day on which the given image has been acquired by the sensor; (iii) location at which the given image has been acquired by the sensor; (iv) period of time during which patent offtce 17.12a 1- ، - ،o Receive the given image has been acquired by the sensor; (v) one or more climate conditions under which the given image has been acquired by the sensor. xx. the sensor is a camera, wherein the system is configured to simulate one or more effects associated with at least one of the time, the day, the location, the period of time or the one or more climate conditions on the representation of the synthetic entity, as if this synthetic entity had been acquired by the sensor at at least one of said time, day, location, period of time or under said one or more climate conditions; xxi. the one or more effects comprise a heat distribution of the synthetic entity; xxii. the one or more effects comprise a luminosity of the synthetic entity; xxiii. the system is configured to obtain data informative of climate conditions and use said data to generate the synthetic entity; xxiv. the system is configured to add the synthetic entity on the given image at a random position; xxv. the system is configured to add the synthetic entity at a position in the given image, which meets a criterion of realism; xxvi. the system is configured to select a location of the synthetic entity in the given image based on data Dtopography informative of a topography of a scene of the given image; xxvii. the given metadata comprises data Dposition informative of a position of a scene present in the given image acquired by the sensor, wherein the system is configured to use Dposition to determine data Dtopography informative of a topography of the scene and to select a position of the synthetic entity in the given image based on Dtopography; xxviii. the system is configured to select an orientation of the synthetic entity based on data informative of a topography of a scene of the given image in which the synthetic entity is to be displayed; xxix. the system is configured to add to the given image or to the training image, a shadow associated with said synthetic entity; xxx. the system is configured to generate said shadow based on at least part of the given metadata associated with the given image; xxxi. the system is configured to determine said shadow based on at least one of a time, a day, a location, a period of time, or one or more climate conditions associated with acquisition of the given image by the sensor; and xxxii. the sensor includes: a camera, a radar system, an IR camera, a day- camera, a night-camera, a Light Detection and Ranging system (LIDAR), or a Synthetic-aperture radar system (SAR).

In accordance with other aspects of the presently disclosed subject matter, there is provided a method comprising, by one or more processing circuitries, obtaining a given image of a sequence of images acquired by a sensor, given metadata associated with the given image, wherein the given metadata comprise data informative of the given image, obtaining at least one synthetic entity whose representation depends on said at least part of the given metadata, and generating a training image which comprises the given image, or an image derived from said given image, and said at least one synthetic entity, wherein the training image is associated with a label informative of the at least one synthetic entity.

The method can implement one or more of the features (i) to (xxxii) described above for the system.

In accordance with other aspects of the presently disclosed subject matter, there is provided a non-transitory computer readable medium comprising instructions that, when executed by one or more processing circuitries, cause the one or more processing circuitries to perform: obtaining a given image of a sequence of images acquired by a sensor, given metadata associated with the given image, wherein the given metadata comprise data informative of the given image, obtaining at least one synthetic entity whose representation depends on said at least part of the given metadata, and generating a training image which comprises the given image, or an image derived from said given image, and said at least one synthetic entity, wherein the training image is associated with a label informative of the at least one synthetic entity.

The non-transitory computer readable medium can comprise instructions that, when executed by one or more processing circuitries, cause the one or more processing circuitries to implement one or more of the features (i) to (xxxii) described above for the system.

In addition to the above features, the non-transitory computer readable medium comprises instructions that, when executed by one or more processing circuitries, cause the one or more processing circuitries to perform: using a dataset of labelled training images to train an algorithm to perform target detection, wherein generation of least one given labelled training image of the dataset includes obtaining a given image acquired by a sensor and given metadata associated with the given image, wherein the given metadata comprise data informative of the given image, obtaining at least one synthetic entity whose representation depends on said at least part of the given metadata, and generating said given labelled training image which comprises the given image, or an image derived from said given image, and said at least one synthetic entity, wherein the given labelled training image is associated with a label informative of the at least one synthetic entity.

According to some examples, the proposed solution enables generating dataset(s) of realistic training images.

According to some examples, the proposed solution enables automatically generating dataset(s) of realistic training images.

According to some examples, the proposed solution enables generating dataset(s) of realistic labelled training images.

According to some examples, the proposed solution enables generating very large set(s) of dataset(s) of realistic training images.

According to some examples, the proposed solution improves training of detection algorithm(s).According to some examples, the proposed solution is adaptive to different types of sensors. In particular, it can be used to train detection algorithm(s) associated with sensors acquiring data/images in the infrared (IR) range.

According to some examples, the proposed solution is flexible.

According to some examples, the proposed solution enables selection by the user of the type(s) of target(s), type(s) of sensor(s), and other parameters for which dataset(s) of realistic training images need to be generated.

According to some examples, the dataset(s) of realistic training images generated by the proposed solution can be used for various applications.

BRIEF DESCRIPTION OF THE DRAWINGS In order to understand the invention and to see how it can be carried out in practice, embodiments will be described, by way of non-limiting examples, with reference to the accompanying drawings, in which: Fig. 1illustrates an embodiment of a system usable to generate and/or use labelled training images; Fig. 2Aillustrates a generalized flow-chart of a method of generating labelled training images; Fig. 2Billustrates a non-limitative example of the method of Fig. 2A; ר Fig. 3Aillustrates a generalized flow-chart of a method of adding a synthetic entity to an image based on topographic data; Fig. 3Billustrates a non-limitative example of the method of Fig. 3A; Fig. 4Aillustrates a non-limitative example of a labelled training image; Fig. 4Billustrates a generalized flow-chart of another method of generating labelled training images; Fig. 5Aillustrates a generalized flow-chart of a method of selecting an orientation of a synthetic entity to be added to an image, based on metadata of the image; Figs. 5Bto 5Eillustrate non-limitative examples of the method of Fig. 5A; Fig. 6Aillustrates a generalized flow-chart of a method of selecting one or more dimensions of a synthetic entity to be added to an image, based on metadata of the image; Figs. 6Band 6Cillustrate non-limitative examples of the method of Fig. 6A; Fig. 7 Aillustrates a generalized flow-chart of a method of generating a synthetic entity to be added to an image, based on metadata indicative of a time and/or a location at which the image has been acquired; Fig. 7Billustrates a generalized flow-chart of a method of generating a synthetic entity to be added to an image, based on metadata indicative of climate conditions under which the image has been acquired; Figs. 8Ato 8Cillustrate generalized flow-chart of methods of simulating heat distribution of a synthetic entity to be added to an image, based on metadata of the image; Fig. 8Dillustrates a map storing the material of different parts of the synthetic entity; Fig. 8Eillustrates a non-limitative example of the method of Fig. 8A; Fig. 9Aillustrates a generalized flow-chart of a method of selecting an orientation of a synthetic entity to be added to an image, based on topographic data; Fig. 9Billustrates a non-limitative example of a training image including a real background and a synthetic entity; Fig. 9Cillustrates a non-limitative example of a fully simulated image; Fig. 10Aillustrates a generalized flow-chart of a method of generating a synthetic shadow; - Fig. 10B illustrates a non-limitative example of the method of Fig. 10A; Fig. 11A illustrates a generalized flow-chart of a method of training a detection algorithm using labelled training images; - Fig. 11B illustrates a non-limitative example of the method of Fig. 11 A; and - Fig. 11C illustrates a non-limitative example of target detection using a trained detection algorithm.

DETAILED DESCRIPTION Attention is now drawn to Fig. 1, which describes a system 100. System 100 (or at least part thereof) can be used to perform various methods, such as, but not limited to, one or more of the methods described hereinafter.

System 100 includes at least one processing circuitry 106. The processing circuitry 106 includes one or more processors operatively connected to one or more computer memories.

The processing circuitry 106 can receive data generated by at least one sensor 1 (or by a plurality of sensors 150). The data can be stored in a computer memory 1 (database) which can be accessed by the processing circuitry 106. The data include one or more images 151 (such as sequence(s) of images), and metadata 152 associated with the images. In some examples, the data include a plurality of files (e.g., video files), each file corresponding to a different sequence of images. In some examples, the one or more images 151 can be stored as a video.

Each given image 151 can be associated with given metadata. The given metadata 152 include data informative of the given image. Data informative of the given image can include e.g., data informative of acquisition of the given image by the sensor 150, such as values for different parameters describing the acquisition of the given image 151 by the sensor 150. The parameters can correspond to parameters of the sensor, and/or to environmental parameters, etc.

The given metadata 152 can include, for example: orientation of the sensor 150 at the time it acquired the given image, position of the sensor 150 at the time it acquired the given image, field of view at which the sensor 150 acquired the given image, time at which the sensor 150 acquired the given image (e.g. time of the day expressed in hours, minutes, and seconds - or any other adapted representation), day at which the sensor 150 acquired the given image (e.g. day, month, and year - or any other adapted representation), period of time at which the sensor 150 acquired the given image (e.g., summer, winter, quarter of the year, or any other adapted representation, etc.), field of view used by the sensor 150 to acquire the given image, zoom used by the sensor 150 to acquire the given image, etc.

Note that the sensor 150 can be mounted on a terrestrial vehicle (e.g., a car), an aerial vehicle (an aircraft, an Unmanned Aerial Vehicle, a helicopter, a balloon, etc.), a marine vehicle (e.g., a ship, an underwater vehicle), a space vehicle (e.g., a satellite) or any other adapted platform (which can be mobile or static).

The sensor 150 can correspond to at least one of: a camera, a radar system, an IR camera, a day-camera, a night-camera, a Light Detection and Ranging system (LIDAR), a Synthetic-aperture radar system (SAR), etc. This list is not limitative.

The processing circuitry 106 is operative to receive data (user input 119) provided by a user (also called operator). In some examples, the processing circuitry 1 implements a user interface 105. The user interface 105 is a computer-implemented user interface, such as a graphical user interface (GUI). A computer memory (accessible by the processing circuitry 106) can be loaded with executable instructions enabling operation of the user interface. The user interface 105 enables the user to provide instructions and/or select data. This will be discussed further hereinafter.

The processing circuitry 106 can communicate (e.g., through an interface) with an output unit 130 (display unit). In particular, data generated by the processing circuitry 1 can be displayed on the output unit 130. Note that the user interface can be displayed on the output unit 130 or on another output unit.

As explained hereinafter, the processing circuitry 106 is operative to generate, based on the images 151 and the metadata 152, one or more dataset(s) 200 of training images. Each dataset includes a plurality of labelled training images.

Each dataset 200 can be used to train at least one detection algorithm 1 (implemented by at least one processing circuitry 107). The detection algorithm 120 is operative to detect and/or localize and/or track one or more targets within one or more images provided by a given sensor. In order to perform this target detection, the detection algorithm 120 needs to be trained. The one or more dataset(s) 200 can be used to train the detection algorithm 120.

Elements of the system 100 depicted in Fig. I can be made up of any combination of software and hardware and/or firmware. Elements of the system depicted in Fig. I may be centralized in one location or dispersed over more than one location. In other examples of the presently disclosed subject matter, the system of Fig.

I may comprise fewer, more, and/or different elements than those shown in Fig. 1.

Likewise, the specific division of the functionality of the disclosed system to specific parts, as described below, is provided by way of example, and other various alternatives are also construed within the scope of the presently disclosed subject matter.

Attention is now drawn to Figs. 2A and 2B, which depict a method which can be used to generate one or more labelled training images.

The method includes (operation 200) obtaining a given image 235 (also called frame) of a sequence 240 of images acquired by a sensor 150.

In some examples, the method includes obtaining a sequence of images (video) acquired by the sensor 150. The sequence of images can be obtained from the sensor 1 itself and/or from a computer memory in which this sequence of images is stored. The given image is then selected within the sequence of images.

In some examples, the given image 235 is selected randomly within the sequence 240 of images.

In other examples, a user can select the given image 235 within the sequence 2 of images and/or can provide rules enabling selecting the given image 235. The rules can dictate e.g., the period of time in which the given image 235 to be selected appears in the sequence 240 of images, or other rules (e.g., type of scene present in the given image, location of the scene present in the given image, etc.) The method further includes (operation 210) obtaining given metadata 2 associated with the given image 235. As mentioned above, the given metadata 2 comprise data informative of the given image 235. The given metadata 236 can be obtained from a database 149 in which the sequence 240 of images (from which the given image 235 has been selected) is stored. Different examples of metadata have been provided above and can be obtained at operation 210.

The method further includes (operation 220) obtaining at least one synthetic entity (or a plurality of synthetic entities) whose representation (visual representation/visual display) depends on at least part of the given metadata 236.

In some examples, obtaining the at least one synthetic entity (at operation 220) can include the following operations.

A reference synthetic entity 241 (also called raw synthetic entity) is first obtained, for example from a database storing different types of synthetic entities and/or using a software enabling generation of reference synthetic entities. The user can select (using the user interface 105) parameters of the reference synthetic entity, such as (but not limited to), the type of the reference synthetic entity (e.g., a jeep), the number of the reference synthetic entities (for example, per image), one or more of the dimensions of the reference synthetic entity, etc. The representation of the reference synthetic entity 241 is then modified in order to obtain the synthetic entity 250, using at least part of the given metadata 236, as further explained hereinafter.

In some examples, the user can select climate conditions (cloud, fog, visibility, etc.) for which the synthetic entity has to be generated. The user can select climate conditions which correspond to the actual climate conditions that are present in the given image. The user can first visualize the given image in order to understand the actual climate conditions present in the given image, and/or can use an external database informative of worldwide climate conditions in order to determine actual climate conditions.

In other examples, the climate conditions are present in the metadata (the metadata already explicitly provide the climate conditions), or extracted from the given metadata (the climate conditions can be deduced from the location of the scene present in the given image, and the time at which the scene has been acquired, as explained hereinafter).

In some examples, a reference synthetic entity is obtained for reference climate conditions (with an optimal visibility), and the reference synthetic entity is modified, based on the selected climate conditions, in order to obtain the synthetic entity.

For example, if visibility is low, representation of the synthetic entity can be selected to be blurred. If there is rain, the colours of the synthetic entity can be adapted accordingly. These examples are not limitative.

The given image 235 is a real image, which corresponds to an image of a real scene acquired by the sensor 150. Note that the image provided by the sensor 150 may have been pre-processed to generate the given image 235 (such as by using an image processing algorithm/filtering algorithm), but remains informative of a real scene.

The synthetic entity can represent a real entity, such as, but not limited to: a ground vehicle (car, truck, military vehicle, etc.), an aerial vehicle (aircraft, plane, helicopter, drone, balloon), a marine vehicle (ship), or more generally a ground object, an aerial object, or a marine object.

Although the synthetic entity is generally informative of a real entity (this is however not limitative, and the synthetic entity could be informative of a virtual entity), it remains a synthetic object generated by a computer (in contrast to a real scene acquired by the sensor). Therefore, it is desired that the synthetic entity appears (as much as possible) realistically on the given image (real scene). The given metadata are therefore used to generate a synthetic entity whose visual representation is realistic with respect to the given image on which it is displayed, according to a criterion. The criterion can dictate that the synthetic entity appears, as much as possible, on the given image, as if it had been acquired by the given sensor which acquired the given image. As explained hereinafter, this is important in order to generate a realistic training image usable to train detection algorithm(s) operating on data provided by sensor(s) such as sensor 150.

This is enabled by the method of Fig. 2A, in which given metadata specifically informative of the acquisition of the given image are obtained and used to generate the synthetic entity. The visual representation of the synthetic entity can be tailored to the real scene appearing in the given image, by virtue of usage of the given metadata informative of the given image.

In some examples, the given metadata are used to obtain a synthetic entity with a visual representation which simulates an acquisition of the synthetic entity by the sensor complying with the given metadata. Note that the synthetic entity cannot be acquired by the sensor, since the synthetic entity is virtual and the sensor acquires real entities.

However, it is possible to simulate what the representation of the synthetic entity would be if it had been present in the scene acquired by the sensor, by using the given metadata informative of the given image of the scene acquired by the sensor.

Various methods enabling generating the synthetic entity by using the given metadata are described hereinafter.

Note that operation 220 can include obtaining a plurality of different synthetic entities (each synthetic entity having a visual representation which depends on the given metadata). This can be obtained by using different reference synthetic entities, which generate, in turn, different synthetic entities. For example, a first reference synthetic entity is informative of a truck, a second reference synthetic entity is informative of a motorcycle, etc.

The method of Fig. 2A further includes generating (operation 230) a training image 270 which includes the given image 235 (or an image derived from the given image), and the synthetic entity 250 (whose representation has been determined based on the given metadata 236).

Operation 230 can include adding the synthetic entity (whose representation has been determined based on the given metadata 236) to the given image. The synthetic entity is superimposed on the given image, in order to obtain a training image including both the given image (or an image derived from the given image) and the synthetic entity. In other words, the visual representation of the synthetic entity can be added to the given image, thereby obtaining the training image.

Note that if a plurality of synthetic entities has been obtained at operation 220, they can be added to the given image 235 in order to obtain the training image.

In some examples, the synthetic entity can be further modified using the given metadata, in order to generate the training image. For example, a shadow can be added to the synthetic entity in the training image. This will be discussed further hereinafter.

The position at which the synthetic entity is added in the given image can be selected using different methods.

In some examples, this position is selected randomly.

In other examples (see Figs. 3A and 3B), the position is selected using topographical data.

The method of Fig. 3A includes using (operation 300) at least part of the given metadata 350 associated with the given image 360 (to which the synthetic entity has to be added) to determine data Dposition (see reference 370) informative of a position of a scene visible in the given image. Dposition can be extracted from the given metadata 350. Dposition can correspond to the GPS coordinates (note that the coordinates can be expressed using a different referential) of the scene present in the given image.

The method of Fig. 3A further includes using (operation 310) data Dposition to determine data informative of a topography of at least part of the scene. Indeed, since the worldwide location (Dposition) of the scene is known, it is possible to extract from a topography map 380 (also called height map 380), data informative of the topography of the scene present in the given image. Data informative of the topography can include the height profile of the scene present in the given image.

The method of Fig. 3A further includes (operation 320) using data informative of the topography of the scene to determine a position at which the synthetic entity is added in the given image. Assume for example that the user requires the synthetic entity to be displayed on mountains, but not on valleys or flat locations. This can be selected by the user via the user interface 105 and/or can be defined as a predefined rule. Data informative of the topography of the scene can be used to differentiate between pixels of the given image which correspond to mountains, and pixels of the given image which do not correspond to mountains. This differentiation enables adding the synthetic entity only on mountains present in the given image (see e.g., area 385 in the given image 360). Note that this example is not limitative.

In some examples, the given metadata already include data informative of the topography of the scene present in the given image. This data can be used directly to select the location at which the synthetic entity is added in the given image.

In some examples, it is possible to generate a plurality of different training images, based on the same given image. This can be performed (note that other methods can be used) by adding the synthetic entity to different locations of the given image. A first training image includes the given image and the synthetic entity located at a first location in the training image, and a second training image includes the given image and the synthetic entity located at a second location (different from the first location) in the training image, etc.

Operation 230 includes associating the training image 270 with a label 2 informative of the at least one synthetic entity (a synthetic entity generated based on the given metadata 236). A labelled training image 271 is obtained. The label 265 can be stored as metadata of the training image 270 and/or as a tag of the training image 2 and/or as embedded information or data of the training image 270. In some examples, a pointer links the training image 270 to the label 265 in a computer memory. In some examples, the label 265 is informative of the position of the synthetic entity in the training image 270. It can indicate the pixel position of the synthetic entity (or of each a plurality of synthetic entities) in the training image 270. In some examples, this can include the position of a bounding box (e.g., rectangle, square) including the synthetic entity, and/or the position of a line delineating the contour of the synthetic entity in the training image 270. This is not limitative.

In some examples, the label 265 is informative of the type of the synthetic entity in the training image 270. Indeed, the synthetic entity is informative of an entity (e.g., a vehicle, a person, etc.) and the type of the synthetic entity corresponds to the type of this entity.

In addition to the parameters (describing position in the training image and type of the synthetic entity) described above, it is possible to store additional (optional) information in the label, as described hereinafter. This is however not limitative.

In some examples, the label 265 can be informative of the type of area (e.g., mountains, valleys, roads, etc.) in which the synthetic entity is located in the training image 270.

In some examples, the label 265is informative of the sensor which acquired the real image used to generate the training image (the real image is the background of the training image). The label 265can indicate e.g., the type of sensor.

In some examples, the label 265can include any additional relevant information informative of the synthetic entity and/or of the training image and/or of a relationship between the synthetic entity and the training image.

A non-limitative example is illustrated in Fig. 4A,which illustrates a non- limitative example of a label 465associated with a training image 470.In this example, the label 465provides the position of the synthetic entity 475in the training image 470, and the type (truck) of the synthetic entity 475.

The method of Fig. 2A(operations 210to 230)can be repeated (see operation 235).

In some examples, at the next iteration (noted iteration i+1) of the method of Fig. 2 A,at operation 200,a different given image is selected from the sequence of images. This given image is used to generate a different training image, with a different background than the one used at the previous iteration (noted iteration i) of the method. Since a different given image is selected, different given metadata can be obtained at operation 210.At operation 220,a synthetic entity, whose representation depends on the given metadata obtained at this new iteration (iteration i+1) of the method, is obtained.

As explained above, operation 220can include generating the synthetic entity based on a reference synthetic entity and the given metadata (associated with the given image). In some examples, the reference synthetic entity used at iteration i+1 can be the same as the reference synthetic entity used at iteration i. In some examples, the reference synthetic entity used at iteration i+1 can be different from the reference synthetic entity used at iteration i.

The representation of the synthetic entity obtained at operation 220 (at iteration i+1) may differ from the representation of the synthetic entity obtained at operation 2 (at iteration i). This can be due to various factors.

This can be due to the fact that a different reference synthetic entity is used. For example, at operation I, the reference synthetic entity represents a car of a first brand and at operation i+1, the reference synthetic entity represents a car of a second brand. This example is not limitative.

This can be also due to the fact that the given metadata obtained at operation 210 (at iteration i+1) differ from the given metadata obtained at operation 210(at iteration i).

For example, at iteration i, the sensor has a first orientation, and at iteration i+1, the sensor has a second orientation (different from the first orientation). As a consequence, even if the same reference synthetic entity is used (at iterations i and i+1), a different synthetic entity will be obtained (since it is as if the synthetic entity had been acquired by the sensor from two different angles). This example is not limitative.

The method of Fig. 2Acan be repeated N times (with N a predefined number, or selected by the user), to obtain a dataset of N labelled training images. As explained above, the N training can be different one from the other. They can differ by the background scene (because a different image acquired by the sensor is used), and/or by the representation and/or location of the synthetic entity added to the background scene. The label of each training image is informative of the synthetic entity present in the training image (and can also contain information pertaining to the background scene).

The N iterations can be performed simultaneously (parallel processing) or one after the other (sequential processing).

Fig. 4Billustrates an example of generating a dataset of training images, in which the synthetic entity is of the same type in the different training images. In some cases, it is desired to train a detection algorithm to detect a certain type of object or element in images acquired by a sensor. For example, it is desired to train a detection algorithm to detect trucks in images acquired by a camera. The method of Fig. 4Bcan be used to generate different training images, all including one or more trucks, at different locations and/or orientations and/or with a different background (or with other differences).

The method of Fig. 4Bincludes operation 400(similar to operation 200)and operation 410(similar to operation 210).The method of Fig. 4Bincludes obtaining (operation 415) a reference synthetic entity of a given type (for example, a truck). The method of Fig. 4Bfurther includes (operation 416)generating at least one synthetic entity based on the reference synthetic entity and the given metadata. Various examples are provided hereinafter to enable modification of the reference synthetic entity based on the given metadata (e.g., orientation of the sensor, climate conditions, etc.).

The method of Fig. 4Bfurther includes generating (operation 430)a training image which comprises the given image, or an image derived from said given image, and the synthetic entity, wherein the training image is associated with a label informative of the synthetic entity (labelled training image).

Operations 400 to 430 can be repeated N times. In this example, at each iteration, a different given image can be obtained at operation 400. However, the type of reference synthetic entity which is used at operation 415 can be the same between the different iterations. For example, each time the method is repeated, a reference synthetic entity representative of a truck can be used (this example is not limitative). The same reference synthetic entity can be used, or different reference synthetic entities (of the same type) can be used. For example, reference synthetic entities informative of trucks of different brands can be used.

As a consequence, a set of labelled training images is obtained, informative of trucks located in different real background scenes. This set of labelled training images can be used to train a detection algorithm to detect trucks in real images acquired by a sensor.

Attention is now drawn to Figs. 5A to 5D, which depict a method of obtaining a synthetic entity whose representation depends on the given metadata of a given image (to which the synthetic entity is to be added, as explained with reference to Fig. 2A).

As explained with reference to Fig. 2A, a given image is obtained, together with given metadata including data informative of the given image. The data can be informative of the acquisition of the given image by a sensor (operation 500, equivalent to operations 200 and 210).

According to some examples, the given metadata (obtained at operation 500) is informative of a given orientation of the sensor 150 (see Fig. 5B) in the acquisition of the given image.

The given orientation can include azimuth angle of the sensor 150 and/or elevation angle of the sensor 150, at the time the given image has been acquired by the sensor. The given orientation can be an absolute orientation (expressed in an Earth referential), or relative to the scene acquired in the given image.

In some examples, the given orientation is expressed with reference to a platform (e.g., ground or aerial vehicle) on which the sensor 150 is mounted. The metadata can further include the orientation of the platform (at the time the given image has been acquired). As a consequence, the orientation of the sensor in an absolute referential can be determined (based on the orientation of the sensor with respect to the platform and the orientation of the platform itself).

The given orientation of the sensor 150 is indicative of the direction of the line of sight 509 of the sensor 150.

The method of Fig. 5A further includes selecting (operation 510) an orientation of the synthetic entity based on the given orientation of the sensor which acquired the given image. In particular, operation 510 can include selecting an orientation of the synthetic entity such that the synthetic entity is displayed in the training image, as if the synthetic entity had been acquired by the sensor with this given orientation. For example, assume that the orientation of the sensor is such that it acquires the scene with a line of sight which is vertical to the scene (see Fig. 5B).As a consequence, a synthetic entity viewed from the top can be used. An example is depicted in Fig. 5D.

Another non-limitative example is illustrated in Fig. 5E.

Assume that a reference synthetic entity 520 is obtained from a database. In this example, the reference synthetic entity 520 is a truck, represented as if it had been acquired with a known reference orientation (azimuth angle aref, elevation angle pref) by the sensor 150.

The reference synthetic entity 520 can be converted into another synthetic entity 530 (with a different visual representation) based on the given orientation (azimuth angle aimage, elevation angle pimage) of the sensor 150 extracted from the metadata. This conversion can include rotating the reference synthetic entity 520, such that it is represented as if it had been acquired with the given orientation (azimuth angle aimage, elevation angle p1mage) of the sensor 150 used to acquire the scene. A rotation of the reference synthetic entity 520 proportional to the difference |a1mage-aref] and Iimage-Bref can be used.

Attention is now drawn to Figs. 6Ato 6C,which depict another method of obtaining a synthetic entity whose representation depends on the given metadata of a given image (to which the synthetic entity is to be added, as explained with reference to Fig. 2A).

As explained with reference to Fig. 2A,a given image is obtained, together with given metadata including data informative of the acquisition of the given image (operation 600,equivalent to operations 200and 210).

According to some examples, the given metadata (obtained at operation 600) is informative of the field of view 660and/or of the zoom of the sensor 150(see Fig. 6A)in the acquisition of the given image. The field of view (FOV) is a solid angle through which the sensor 150 is sensitive to electromagnetic radiation. The FOV decreases as the camera zooms in, and increases as the camera zooms out. "Zooming in" refers to increasing the size of the object which is acquired, without changing the position of the sensor, and, conversely, "zooming out" makes the object smaller.

The method of Fig. 6Afurther includes selecting (operation 610)one or more dimensions of the synthetic entity (to be displayed in the training image) based on the FOV or the zoom of the sensor 150 which acquired the given image.

In some examples, a database can store a reference synthetic entity with dimensions corresponding to a given reference value of the FOV or of the zoom. The actual FOV or actual zoom of the sensor 150 extracted from the given metadata and used to acquire the given image is then used to modify one or more dimensions of the reference synthetic entity, in order to obtain the synthetic entity. If the actual zoom indicates that a zoom-in has been performed with respect to the given reference value of the zoom, a synthetic entity with larger dimensions than the reference synthetic entity is generated. If the actual zoom indicates that a zoom-out has been performed with respect to the given reference value of the zoom, a synthetic entity with smaller dimensions than the reference synthetic entity is generated.

A non-limitative example is illustrated in Fig. 6C.

Assume that a reference synthetic entity 620 (corresponding to a reference zoom value Zref) is obtained from a database. Assume that the given metadata indicates that the actual zoom value used to acquire the given image is equal to Z1mage. A dilation (contraction or expansion) of the reference synthetic entity 620 proportional to the difference |Z1mage— Zref] can be used.

In the example of Fig. 6C,the actual zoom value Z1mage is equal to twice the reference zoom value Zref. Therefore, the reference synthetic entity 620is converted into another synthetic entity 630with dimensions doubled with respect to the reference synthetic entity 620.

In some examples, generation of the synthetic entity can take into account one or more effects associated with the sensor acquisition to generate said synthetic entity. In particular, the given metadata can be used to simulate the effect of the acquisition of the synthetic entity by the sensor, if the synthetic entity had been acquired in a real scene associated with this given metadata. Various examples are provided hereinafter.

Fig. 7 Aillustrates an example of generation of the synthetic entity. The method of Fig. 7Aincludes obtaining a given image, together with given metadata, including data informative of the acquisition of the given image by the sensor (operation 700,equivalent to operations 200and 210).

According to some examples, the given metadata (obtained at operation 700) is informative of a time at which the given image has been acquired by the sensor. This time can correspond to the time of the day. A non-limitative example is 5:23 PM.

According to some examples, the given metadata (obtained at operation 700) is informative of a day on which the given image has been acquired by the sensor. A non- limitative example is September 12, 2023.

According to some examples, the given metadata (obtained at operation 700) is informative of a location at which the given image has been acquired by the sensor. A non- limitative example is Paris.

According to some examples, the given metadata (obtained at operation 700) is informative of a period of time during which the given image has been acquired by the sensor. The period of time can include for example "summer", "autumn", etc.

The method of Fig. 7A further includes generating (operation 710) a synthetic entity whose representation depends on the time and/or the day and/or the location and/or the period of time during which the given image has been acquired. In particular, operation 710 can include generating the synthetic entity, such that its visual representation simulates an acquisition of the synthetic entity by the sensor at this time and/or day and/or location and/or period of time.

Indeed, depending on the type of sensor, the visual representation of the synthetic entity changes depending on the time and/or day and/or location and/or period of time, if this synthetic entity had been acquired by the sensor at least one of said time, day, location or period of time. For example, for a conventional camera operating in the range of visible wavelength, the luminosity of the synthetic entity (if it had been acquired by the camera) is higher during daytime than during the night. Similarly, the luminosity of the synthetic entity (if it had been acquired by the camera) is higher on a summer day than on a winter day. Similarly, the luminosity of the synthetic entity (if it had been acquired by the camera) is higher in Brazil than in Sweden.

The time and/or day and/or location and/or period of time during which the given image has been acquired enables determining expected climate conditions (such as light intensity, heat, temperature, humidity, etc.) for which the given image has been acquired, and using the expected climate conditions to generate the synthetic entity accordingly. In some examples, a database stores expected climate conditions for different values of the time, day, location, and/or period of time. The method can use the time and/or day and/or location and/or period of time at which the given image has been acquired, to extract the corresponding expected climate conditions. The corresponding expected climate conditions can be used to generate the synthetic entity, such that the visual representation of the synthetic entity corresponds to the visual representation that would have the synthetic entity if it had been acquired by the sensor in these climate conditions.

Luminosity and/or colour of the synthetic entity can be adapted to match the expected climate conditions.

In some examples, a reference synthetic entity can be obtained from a database.

The reference synthetic entity can be generated for reference values of the climate conditions, and/or for reference values of the time and/or day and/or location and/or period of time.

Assume for example that the reference synthetic entity has been generated for a reference value of the luminosity noted Lref. Assume that the given metadata indicates, or can be used to determine the actual luminosity L1mage in the given image. In some examples, the actual luminosity can be extracted from a database which stores expected luminosity for different values of the time and/or day and/or location and/or period of time.

The reference synthetic entity can be modified into a synthetic entity whose luminosity (or radiance) matches the actual luminosity Limage.

Fig. 7Billustrates a variant of the method of Fig. 7A.In Fig. 7B,the given metadata already includes one or more given climate conditions. The climate conditions can include temperature, luminosity, humidity, etc.

The method of Fig. 7Bincludes obtaining a given image, together with given metadata including data informative of the given image (operation 720, equivalent to operations 200and 210).

According to some examples, the given metadata (obtained at operation 720) is informative of one or more climate conditions under which the given image has been acquired by the sensor. This can include e.g., light intensity, heat, temperature, humidity, etc.

The method of Fig. 7Bfurther includes generating (operation 730)a synthetic entity such that its visual representation simulates an acquisition of the synthetic entity by the sensor under one or more of the extracted climate conditions.

Figs. 8Aand 8Bshow other examples of taking into account one or more effects associated with the sensor acquisition to generate the synthetic entity. In particular, these methods simulate the heat distribution of the synthetic entity, as if it had been present in the scene acquired by the sensor.

The method of Fig. 8Aincludes obtaining a given image, together with given metadata including data informative of the acquisition of the sensor which acquired the given image (operation 800,equivalent to operations 200and 210).

According to some examples, the given metadata (obtained at operation 800)is informative of at least one of: - time at which the given image has been acquired by the sensor; - day on which the given image has been acquired by the sensor; l ocation at which the given image has been acquired by the sensor; period of time during which the given image has been acquired by the sensor.

In some examples, the method of Fig. 8Afurther includes simulating the effect associated with at least one of the time, the day, the location, or the period of time on a visual representation of the synthetic entity, as if this synthetic entity had been acquired by the sensor at least one of said time, day, location, or period of time. The effect can correspond to a heat distribution of the synthetic entity, as if this synthetic entity had been acquired by the sensor at said time, day, location, or period of time.

Assume that the sensor is a night camera or an IR camera (note that this is not limitative). When this type of sensor acquires an object, it outputs an image informative of the heat (heat distribution) of the object. For a given object, this heat distribution differs depending on the time, day, location, or period of time. Indeed, the temperature/heat during daytime is higher than at night. Similarly, the temperature/heat is higher on a summer day than on a winter day. Similarly, the temperature/heat is higher in Brazil than in Sweden.

In some examples, a database can store a set of rules defining the effect of the time, day, location, or period of time on the heat distribution. This set of rules can be built using experimental data (multiple images acquired at different periods of time, which can be used to build a model using regression analysis, or other modelling techniques) and/or simulated data.

Note that the method of Fig. 8A(see Fig. 8B)can be performed equivalently by obtaining one or more climate conditions (heat, temperature, etc.) under which the given image has been acquired (operation 820). In some examples, the climate conditions can be extracted from the given metadata of the given image.

The method of Fig. 8Bincludes (operation 830)simulating the effect associated with the one or more climate conditions on a visual representation of the synthetic entity, as if this synthetic entity had been acquired by the sensor under these one or more climate conditions. The effect can correspond to a heat distribution of the synthetic entity, as if this synthetic entity had been acquired by the sensor under these one or more climate conditions.

In some examples (see Fig. 8C),the synthetic entity can be generated using a reference synthetic entity which is modified based on the given metadata. The method of Fig. 8Ccan include obtaining (operation 840)a given image of a sequence of images acquired by a sensor and given metadata associated with the given image, wherein the given metadata comprise data (noted Dclimate) informative of at least one of a time, a day, a location, a period of time, or climate conditions at which the given image has been acquired by the sensor. The method of Fig. 8Cfurther includes obtaining (operation 850) a reference synthetic entity associated with a reference heat distribution. The reference heat distribution can correspond to the heat distribution under predefined standard climate conditions (temperature, humidity), or for a given time of the year in a given location (e.g., mid-year in Paris). The method of Fig. 8Cfurther includes using (operation 860)the reference synthetic entity and the data Ddimate to generate a synthetic entity whose representation simulates a heat distribution of the synthetic entity, as if it had been acquired by the sensor at said time and/or day and/or location and/or period of time and/or under the climate conditions.

Implementation of operation 860can be performed as follows (this is however not limitative).

The synthetic entity can be associated with a "map" describing, for each part of the synthetic entity, the position of the part and the corresponding material represented by this part (a synthetic entity is virtual, which is why each part of the synthetic entity only represents/simulates a real part made of a real material). For example, if the synthetic entity corresponds to a car (see Fig. 80,in which each colour represents a different material), the map indicates that the body 861 of the car is made of metal (the type of metal can be stored in the map), the windshield 862 of the car is made of glass and the wheels 863are made of rubber. Note that the material of additional parts of the truck can be stored in the map.

The map can be used to determine the radiance of each part of the synthetic entity, for the given time or the given climate conditions for which the image has been acquired.

The radiance of the different parts of the synthetic entity (also called radiance map) corresponds to the heat distribution generated at operation 860.

Different methods can be used to determine the radiance of each part of the synthetic entity. in some examples, the following non-limitative model (Lambert’s Cosine Law) can be used: Radiance = Irradiance * Diffuse Reflectance * cos(0) In this model, Radiance corresponds to the light emitted from the material, Irradiance corresponds to the light received by the material, Diffuse Reflectance is a property of the material (which describes the fraction of light scattered in all directions, affecting overall brightness - it is also called diffuse reflection coefficient, between 0 and 1), and 6 is the angle between the surface normal and the light direction (known from the given metadata describing e.g. position of the sun).

Additional factors can be taken into account to determine the radiance of each part of the synthetic entity, such as (but not limited to): properties of the material, such as (but not limited to) absorption of the material (which indicates light absorbed by the material, decreasing overall radiance), texture and roughness of the material (which impacts light scattering and diffusion patterns, affecting radiance distribution), specular reflectance and/or lighting conditions, such as (but not limited to) spectrum of the light, shadowing and occlusion, which can block light from reaching certain areas, creating variations in radiance.

In some examples, the radiance of each part of the synthetic entity can be determined based on the map of materials by using a software operative to simulate radiance, such as (but not limited to), Mitsuba (see https://www.mitsuba-renderer.org) and/or Radiance (see https://www.radiance-online.org/). These software can handle various different material properties and lighting interactions, providing accurate radiance simulations through simulations.

In some examples, calculation of the radiance can take into account certain hot parts of the synthetic entity. For example, in the example of Fig. 8D,the map can store the location of the motor of the vehicle, which increases radiance of the cover of the motor (in particular, radiance of the centre of the cover, located above the motor, is increased due to the presence of the motor). The simulated radiance of the cover of the motor can be increased by a coefficient which takes into account the presence of the motor underneath.

The coefficient can be obtained using e.g., simulations, and/or based on real infrared images of vehicles.

Note that these methods and/or these software can be used in other embodiments described in the present application, e.g., for determining radiance of the synthetic entity in the visible spectrum (camera).

Fig. 8Eillustrates an example of the method of Fig. 8C,in which the reference synthetic entity 870is converted into the synthetic entity 880,in which the effect of the conditions under which the given image has been acquired on the heat distribution (radiance map) of the visual representation of the synthetic entity has been taken into account.

Attention is now drawn to Fig. 9A.

The method of Fig. 9Aincludes obtaining (operation 900)data informative of a topography of at least part of the scene present in the given image acquired by the sensor.

As mentioned with reference to Fig. 3A.the metadata can include a position of the scene present in the given image. This position can be used to determine data informative of a topography of at least part of the scene, using e.g., a topography map.

The method of Fig. 9Afurther includes selecting an orientation of the synthetic entity based on data informative of a topography of a scene of the given image in which the synthetic entity is to be displayed. For example, assume that the synthetic entity has to be added to a part of the given image corresponding to a slope (e.g. the slope of a mountain). The orientation of the synthetic entity is not the same when it has to be added to a slope than when it has to be added to a flat part. Therefore, by knowing the topography of the scene, and the location of the scene at which the synthetic entity is added, the correct orientation of the synthetic entity can be selected. The angle of the slope on which the synthetic entity is to be added can be used to select the correct orientation.

Attention is drawn to Fig. 9B.

Fig. 9Bdepicts an example of a training image 970,which can be obtained using e.g., the various methods in the present description. The training image 970 includes as background 910 a given image acquired by a sensor (in this non-limitative example, an infrared camera). The training image 970further includes the synthetic entity 920,which has been added to the background 910. In this example, the effect of the climate conditions (under which the given image has been acquired) on the heat distribution of the synthetic entity has been taken into account, in accordance with the methods of Figs. 8Ato 8C.

Note that the synthetic entity 920has been inserted at position which meets a criterion of realism: indeed the synthetic entity 920 is displayed at a realistic position (on the ground).

The criterion of realism indicates that the synthetic entity is displayed at a position which is compliant with its type: for example, a ground vehicle cannot appear in the sky.

Fig. 9C illustrates an image 980 including a synthetic entity 985 added on a simulated background 990 (that is to say a background generated by a computer, and not a background acquired by a sensor). It is clear that the training image 970 is much more realistic than the image 980. Therefore, training of a detection algorithm based on the training image 970 is much more efficient than training with the image 980.

Attention is now drawn to Fig. 10A.

As mentioned above, at least one synthetic entity (generated using given metadata informative of a given image) is added to a given image (displaying a real scene acquired by a sensor).

Given metadata including data informative of time and/or day and/or location and/or period of time and/or one or more climate conditions at which the given image has been acquired by the sensor are obtained (operation 1000).

The method of Fig. 10Aincludes adding (operation 1010)to the given image a (synthetic) shadow associated with the synthetic entity (see Fig. 10B),in order to obtain the training image. As mentioned above, it is desired to display the synthetic entity with a realistic visual representation in the given image (as if the synthetic entity had been acquired together with the given image by the sensor). Therefore, the method of Fig. 10A can include determining the shadow (associated with the synthetic entity) based on at least part of the given metadata of the given image. In particular, dimensions and/orientation of the shadow can be determined based on at least part of the given metadata of the given image.

In some examples, the shadow can be determined based on at least one of a time, day, or a location at which the given image has been acquired. Indeed, these parameters provide information on the position of the sun relative to the scene, which impact the position and/or orientation of the shadow.

In some examples, the shadow (dimensions, orientation) can be determined based on the climate conditions. Indeed, if the climate conditions indicate an absence of sun, it is expected that the shadow will be small or absent, and if the climate conditions indicate a presence of sun, it is expected that the shadow will be present.

In some examples, the shadow can be determined based on the dimensions of the synthetic entity in the training image. Indeed, a synthetic entity with large dimensions will be associated with a bigger shadow than a synthetic entity with smaller dimensions.

In some examples, a database can store, for each of a plurality of different times and/or days and/or locations and/or periods of time and/or one or more climate conditions: expected position and/or dimension(s) of the reference shadow for one or more reference objects (with predefined dimensions). Since the actual dimension(s) of the synthetic entity added to the given image are known, it is possible to adapt (using a rule of proportionality) dimension(s) of the reference shadow to obtain the required synthetic shadow.

Fig. 10B illustrates a non-limitative example, in which the training image 10 includes as background the (real) given image 1011, the synthetic entity 1020, and the synthetic shadow 1030 associated with the synthetic entity 1020. It can be seen that the position of the synthetic shadow 1030 associated with the synthetic entity 1020 is compliant with the position of the real shadow of other real objects (see the real shadow 1040 of the mountain 1050, which has a position relative to the mountain 1050 which matches the position of the synthetic shadow 1030 relative to the synthetic entity 1020).

Note that the various methods enabling generating the synthetic entity whose visual representation depends on the given metadata, can be performed sequentially, or at the same time.

Attention is now drawn to Fig. 11 A.

As explained in the various methods described above, a dataset of labelled training images is obtained (operation 1100). Each training image of the dataset includes one or more synthetic entities added to an image of a real background acquired by a sensor. For each training image, the label is indicative of the one or more synthetic entities of the training image (e.g., position of the synthetic entity in the training image, type of the synthetic entity, etc.).

The method of Fig. 11A further includes training (operation 1110) the detection algorithm 120 (implemented by the at least one processing circuitry 107). The training enables training the detection algorithm 120 to perform at least one of: detecting, localizing, tracking, determining the type of one or more targets within one or more images provided by a sensor. This is not limitative, and the dataset can be used to train a detection algorithm to perform other tasks pertaining to target recognition in images.

According to some examples, if the dataset includes training images which have been generated based on images acquired by a sensor of a given type, the detection algorithm 120 is trained to detect targets in images acquired by the sensor of this given type. In other words, the images on which the detection algorithm 120 is operative to perform target detection have been acquired by a sensor which is of a type matching the type of the sensor which acquired the training images, according to a matching criterion.

The matching criterion can indicate that the types are similar, or the same. This is however not limitative. Note that this does not mean that the detection algorithm 120 cannot perform target detection on images acquired by sensor(s) of a type different from the type of the sensor used to acquire the training images.

For example, assume that the dataset includes training images which have been generated based on images acquired by a night camera. The detection algorithm 120 is therefore trained to detect targets in images acquired by night cameras.

According to some examples, if the dataset includes training images which include synthetic entities of a given type, the detection algorithm 120 is trained to detect targets of this given type.

In other words, the type(s) of target(s) which can be detected by the detection algorithm 120 matches the type(s) of the synthetic entities present in the training images, according to a matching criterion. The matching criterion can indicate that the types are similar, or the same. This is however not limitative. Note that this does not mean that the detection algorithm 120 cannot perform target detection of targets of a type different from the type of the synthetic entities present in the training images.

For example, if the dataset includes training images with synthetic entities corresponding to ground vehicles, the detection algorithm 120 is trained to detect ground vehicles in real images acquired by a sensor.

Fig. 11C illustrates a non-limitative example of usage of the (trained) detection algorithm 120. The detection algorithm 120 is fed with a real image 1150 acquired by a sensor. The image 1150 includes a target 1160. The image 1150 is fed to the detection algorithm 120, which outputs a bounding box 1170 corresponding to the position of the detected target 1160 in the image 1150.

In some examples, the detection algorithm 120 includes one or more machine learning models 1200 (also called machine learning networks), which need to be trained.

By way of non-limiting example, the layers of the machine learning model 12 (e.g., DNN) can be organized in accordance with Convolutional Neural Network (CNN) architecture, such as a fully Convolutional Neural Network (CNN). This is not limitative.

In other examples, the layers of the machine learning model 1200 (e.g., DNN) can be organized in accordance with the Recurrent Neural Network architecture, Recursive Neural Networks architecture, Generative Adversarial Network (GAN) architecture, or otherwise. Optionally, at least some of the layers can be organized in a plurality of DNN sub-networks. Each layer of the DNN can include multiple basic computational elements (CE), typically referred to in the art as dimensions, neurons, or nodes.

Generally, computational elements of a given layer can be connected with CEs of a preceding layer and/or a subsequent layer. Each connection between a CE of a preceding layer and a CE of a subsequent layer is associated with a weighting value. A given CE can receive inputs from CEs of a previous layer via the respective connections, each given connection being associated with a weighting value which can be applied to the input of the given connection. The weighting values can determine the relative strength of the connections and thus the relative influence of the respective inputs on the output of the given CE. The given CE can be configured to compute an activation value (e.g., the weighted sum of the inputs) and further derive an output by applying an activation function to the computed activation. The activation function can be, for example, an identity function, a deterministic function (e.g., linear, sigmoid, threshold, or the like), a stochastic function, or other suitable function. The output from the given CE can be transmitted to CEs of a subsequent layer via the respective connections. Likewise, as above, each connection at the output of a CE can be associated with a weighting value which can be applied to the output of the CE prior to being received as an input of a CE of a subsequent layer. Further to the weighting values, there can be threshold values (including limiting functions) associated with the connections and CEs.

The weighting and/or threshold values of the machine learning model 12 (e.g., DNN) can be initially selected prior to training, and can be further iteratively adjusted or modified during training to achieve an optimal set of weighting and/or threshold values in a trained DNN. After each iteration, a difference (also called a loss function) can be determined between the actual output produced by the machine learning model 1200 (e.g., DNN) and the target output associated with the respective dataset of data. The difference can be referred to as an error value. Training can be determined to be complete when a cost or loss function indicative of the error value is less than a predetermined value, or when a limited change in performance between iterations is achieved. Optionally, at least some of the DNN subnetworks (if any) can be trained separately, prior to training the entire DNN.

Training of the machine learning model 1200 can include methods such as Backpropagation, or other known techniques.

The machine learning model 1200 can be trained to perform tasks pertaining to targets, such as detecting, localizing, tracking, and determining the type of one or more targets within one or more images provided by a sensor.

In the detailed description, numerous specific details have been set forth in order to provide a thorough understanding of the disclosure. However, it will be understood by those skilled in the art that the presently disclosed subject matter may be practiced without these specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail so as not to obscure the presently disclosed subject matter.

Unless specifically stated otherwise, as apparent from the aforementioned discussions, it is appreciated that throughout the specification, discussions utilizing terms such as "obtaining", "generating", "training", "using", "determining", "performing", "adding", "selecting", or the like, refer to the action(s) and/or process(es) of at least one processing circuitry that manipulates and/or transforms data into other data, said data represented as physical, such as electronic, quantities and/or said data representing the physical objects.

The terms "computer’ or "computer-based system" should be expansively construed to include any kind of hardware-based electronic device with a data processing circuitry (e.g., digital signal processor (DSP), a GPU, a TPU, a field programmable gate array (FPGA), an application specific integrated circuit (ASIC), microcontroller, microprocessor etc.), including, by way of non-limiting example, the computer-based system 100of Fig. 1and respective parts thereof disclosed in the present application. The data processing circuitry (designated also as processing circuitry, such as processing circuitry 106) can comprise, for example, one or more processors operatively connected to computer memory, loaded with executable instructions for executing operations, as further described below. The data processing circuitry encompasses a single processor or multiple processors, which may be located in the same geographical zone, or may, at least partially, be located in different zones, and may be able to communicate together. The one or more processors can represent one or more general-purpose processing devices such as a microprocessor, a central processing unit, or the like. More particularly, a given processor may be one of: a complex instruction set computing (CISC) microprocessor, a reduced instruction set computing (RISC) microprocessor, a very long instruction word (VLIW) microprocessor, a processor implementing other instruction sets, or a processor implementing a combination of instruction sets. The one or more processors may also be one or more special-purpose processing devices such as an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a digital signal processor (DSP), a network processor, or the like. The one or more processors are configured to execute instructions for performing the operations and steps discussed herein.

The memories referred to herein can comprise one or more of the following: internal memory, such as, e.g., processor registers and cache, etc., main memory such as, e.g., read-only memory (ROM), flash memory, dynamic random access memory (DRAM) such as synchronous DRAM (SDRAM), or Rambus DRAM (RDRAM), etc.

The terms "non-transitory memory" and " non-transitory storage medium" used herein should be expansively construed to cover any volatile or non-volatile computer memory suitable to the presently disclosed subject matter. The terms should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions. The terms shall also be taken to include any medium that is capable of storing or encoding a set of instructions for execution by the computer and that cause the computer to perform any one or more of the methodologies of the present disclosure. The terms shall accordingly be taken to include, but not be limited to, a read only memory ("ROM"), random access memory ("RAM"), magnetic disk storage media, optical storage media, flash memory devices, etc.

It is to be noted that while the present disclosure refers to the processing circuitry 106 (or other processing circuitries, such as processing circuitry 107) being configured to perform various functionalities and/or operations, the functional ities/operations can be performed by the one or more processors of the processing circuitry in various ways. By way of example, the operations described hereinafter can be performed by a specific processor, or by a combination of processors.

The operations described hereinafter can thus be performed by respective processors (or processor combinations) in the processing circuitry, while, optionally, at least some of these operations may be performed by the same processor. The present disclosure should not be limited to be construed as one single processor always performing all the operations.

It is appreciated that, unless specifically stated otherwise, features of the presently disclosed subject matter, which are described in the context of separate embodiments, can also be provided in combination in a single embodiment.

Conversely, various features of the presently disclosed subject matter, which are described in the context of a single embodiment, can also be provided separately or in any suitable sub-combination. It is to be noted that the various features described in the various embodiments or examples can be combined according to all possible technical combinations.

In embodiments of the presently disclosed subject matter, fewer, more, and/or different stages than those shown in the methods described in the appended figures may be executed. In embodiments of the presently disclosed subject matter, one or more stages illustrated in the methods described in the appended figures may be executed in a different order, and/or one or more groups of stages may be executed simultaneously.

It is to be understood that the invention is not limited in its application to the details set forth in the description contained herein or illustrated in the drawings.

It will also be understood that the system according to the invention may be, at least partly, implemented on a suitably programmed computer. Likewise, the invention contemplates a computer program being readable by a computer for executing the method of the invention. The invention further contemplates a non-transitory computer- readable memory tangibly embodying a program of instructions executable by one or more processing circuitries for executing the method(s) of the invention.

The invention is capable of other embodiments and of being practiced and carried out in various ways. Hence, it is to be understood that the phraseology and terminology employed herein are for the purpose of description and should not be regarded as limiting. As such, those skilled in the art will appreciate that the conception upon which this disclosure is based may readily be utilized as a basis for designing other structures, methods, and systems for carrying out the several purposes of the presently disclosed subject matter.

It is to be understood that the invention is not limited in its application to the details set forth in the description contained herein or illustrated in the drawings. The invention is capable of other embodiments and of being practiced and carried out in various ways. Hence, it is to be understood that the phraseology and terminology employed herein are for the purpose of description and should not be regarded as limiting. As such, those skilled in the art will appreciate that the conception upon which o F F X C E17.12.23 RECEIVED this disclosure is based can readily be utilized as a basis for designing other structures, methods, and systems for carrying out the several purposes of the presently disclosed subject matter.

Those skilled in the art will readily appreciate that various modifications and changes can be applied to the embodiments of the invention as hereinbefore described without departing from its scope, defined in and by the appended claims.

Claims

1. A system comprising one or more processing circuitries configured to obtain: a given image of a sequence of images acquired by a sensor, given metadata associated with the given image, wherein the given metadata comprise data informative of the given image, obtain at least one synthetic entity whose representation depends on said at least part of the given metadata, and generate a training image which comprises the given image, or an image derived from said given image, and said at least one synthetic entity, wherein the training image is associated with a label informative of the at least one synthetic entity.

2. The system of claim I, wherein obtaining the at least one synthetic entity comprises obtaining at least one reference synthetic entity and generating the at least one synthetic data based on the at least one reference synthetic data and said at least part of the given metadata.

3. The system of claim 2, configured to enable user selection of a type of the reference synthetic entity.

4. The system of any one of claims 1 to 3, wherein the label is informative at least one of: (ii) a position of the least one synthetic entity in the training image; (ii) a type of the least one synthetic entity.

5. The system of any one of claims 1 to 4, wherein the data informative of the given image is informative of acquisition of the given image by the sensor.

6. The system of any one of claims 1 to 5, configured to use the training image and the label to train an algorithm to perform target detection in one or more images.

7. The system of any one of claims 1 to 6, configured to use the training image and the label to train an algorithm to perform target detection in one or more images, wherein the one or more images have been acquired by a given sensor, wherein a type of the given sensor matches a type of said sensor according to a matching criterion.

8. The system of any one of claims 1 to 7, configured to use the training image and the label to train an algorithm to perform target detection of one or more targets of a given type in one or more images, wherein the at least one synthetic entity present in the training image is informative of an entity of a type matching the given type according to a matching criterion.

9. The system of any one of claims 1 to 8, configured to: for each given image of a plurality of different images of a sequence of images acquired by a sensor, obtain given metadata associated with the given image, wherein the given metadata comprise data informative of the given image, obtain at least one given synthetic entity whose representation depends on said at least part of the given metadata, and generate a given labelled training image which comprises the given image, or an image derived from said given image, and said at least one given synthetic entity, wherein the given training image is associated with a given label informative of the at least one given synthetic entity, thereby obtaining a dataset of a plurality of labelled training images.

10. The system of claim 9, configured to use the dataset of labelled training images to train an algorithm to perform target detection.

11. The system of claim 9 or of claim 10, wherein the given synthetic entity is of a same type for all of the plurality of images.

12. The system of any one of claims 1 to 11, configured to use at least part of the given metadata to obtain a synthetic entity with a visual representation which is realistic with respect to the given image, according to a criterion.

13. The system of any one of claims 1 to 12, configured to enable user selection of a type of sensor of the sensor which acquired the sequence of images.

14. The system of any one of claims 1 to 13, configured to enable user selection of one or more climate conditions, wherein the system is configured to use the one or more climate conditions to generate the synthetic entity whose representation depends on said one or more climate conditions.

15. The system of any one of claims 1 to 14, wherein the given metadata is informative of a given orientation of the sensor in the acquisition of the given image, wherein an orientation of the synthetic entity is determined based on said given orientation.

16. The system of any one of claims 1 to 15, wherein the given metadata is infonnative of a given field of view or of a given zoom of the sensor in the acquisition of the given image, wherein one or more dimensions of the synthetic entity are determined based on said given field of view or said given zoom.

17. The system of any one of claims 1 to 16, configured to use at least part of the given metadata to take into account one or more effects associated with the sensor acquisition of the given image, on the representation of said synthetic entity.

18. The system of any one of claims 1 to 17, configured to use at least part of the given metadata to generate the synthetic entity with a visual representation which simulates an acquisition of the synthetic entity in a scene of the given image acquired by the sensor.

19. The system any one of claims 1 to 18, configured to use at least part of the given metadata to generate the synthetic entity with a visual representation which simulates a radiance of the synthetic entity as if the synthetic entity had been acquired by the sensor in a scene of the given image.

20. The system of any one of claims 1 to 19, wherein the given metadata is informative of at least one of: (i) time at which the given image has been acquired by the sensor; (ii) day on which the given image has been acquired by the sensor; (iii) location at which the given image has been acquired by the sensor; (iv) period of time during which the given image has been acquired by the sensor; (v) one or more climate conditions under which the given image has been acquired by the sensor.

21. The system of claim 20, wherein the sensor is a camera, wherein the system is configured to simulate one or more effects associated with at least one of the time, the day, the location, the period of time or the one or more climate conditions on the representation of the synthetic entity, as if this synthetic entity had been acquired by the sensor at at least one of said time, day, location, period of time or under said one or more climate conditions.

22. The system of claim 21, wherein the one or more effects comprise a heat distribution of the synthetic entity.

23. The system of claim 21 or of claim 22, wherein the one or more effects comprise a luminosity of the synthetic entity.

24. The system of any one of claims 1 to 23, configured to obtain data informative of climate conditions and use said data to generate the synthetic entity.

25. The system of any one of claims 1 to 24, configured to add the synthetic entity on the given image at a random position.

26. The system of any one of claims 1 to 25, configured to add the synthetic entity at a position in the given image, which meets a criterion of realism.

27. The system of any one of claims 1 to 26, configured to select a location of the synthetic entity in the given image based on data Dtopography informative of a topography of a scene of the given image.

28. The system of any one of claims 1 to 27, wherein the given metadata comprises data Dposition informative of a position of a scene present in the given image acquired by the sensor, wherein the system is configured to use Dposinon to determine data Dtopography informative of a topography of the scene and to select a position of the synthetic entity in the given image based on Dtopography.

29. The system of any one of claims 1 to 28, configured to select an orientation of the synthetic entity based on data informative of a topography of a scene of the given image in which the synthetic entity is to be displayed.

30. The system of any one of claims 1 to 29, configured to add to the given image or to the training image, a shadow associated with said synthetic entity.

31. The system of claim 30. configured to generate said shadow based on at least part of the given metadata associated with the given image.

32. The system of claim 30 or of claim 31, configured to determine said shadow based on at least one of a time, a day, a location, a period of time, or one or more climate conditions associated with acquisition of the given image by the sensor.

33. The system of any one of claims 1 to 32, wherein the sensor includes: a camera, a radar system, an IR camera, a day-camera, a night-camera, a Light Detection and Ranging system (LIDAR), or a Synthetic-aperture radar system (SAR).

34. A method comprising, by one or more processing circuitries: obtaining: a given image of a sequence of images acquired by a sensor, given metadata associated with the given image, wherein the given metadata comprise data informative of the given image, obtaining at least one synthetic entity whose representation depends on said at least part of the given metadata, and generating a training image which comprises the given image, or an image derived from said given image, and said at least one synthetic entity, wherein the training image is associated with a label informative of the at least one synthetic entity. J

35. The method of claim 34, wherein obtaining the at least one synthetic entity comprises obtaining at least one reference synthetic entity and generating the at least one synthetic data based on the at least one reference synthetic data and said at least part of the given metadata.

36. The method of claim 34 or of claim 35, wherein the label is informative of at least one of: (i) a position of the least one synthetic entity in the training image; (ii) a type of the least one synthetic entity.

37. The method of any one of claims 34 to 36, comprising using the training image and the label to train an algorithm to perform target detection in one or more images.

38. The method of any one of claims 34 to 37, comprising: for each given image of a plurality of different images of a sequence of images acquired by a sensor, obtain given metadata associated with the given image, wherein the given metadata comprise data informative of the given image, obtain at least one given synthetic entity whose representation depends on said at least part of the given metadata, and generate a given labelled training image which comprises the given image, or an image derived from said given image, and said at least one given synthetic entity, wherein the given training image is associated with a given label informative of the at least one given synthetic entity, thereby obtaining a dataset of a plurality of labelled training images.

39. The method of any one of claims 34 to 38, comprising using at least part of the given metadata to obtain a synthetic entity with visual representation which is realistic with respect to the given image, according to a criterion.

40. The method of any one of claims 34 to 39, wherein the given metadata is informative of a given orientation of the sensor in the acquisition of the given image, wherein an orientation of the synthetic entity is determined based on said given orientation. g 17.12.23 RECEIVED, C M T o F FIc PATLNI

41. The method of any one of claims 34 to 40, wherein the given metadata is informative of a given field of view or of a given zoom of the sensor in the acquisition of the given image, wherein one or more dimensions of the synthetic entity are determined based on said given field of view or said given zoom.

42. The method of any one of claims 34 to 41, comprising using at least part of the given metadata to take into account one or more effects associated with the sensor acquisition of the given image, on the representation of said synthetic entity.

43. The method of any one of claims 34 to 42, comprising using at least part of the given metadata to generate the synthetic entity with a visual representation which simulates an acquisition of the synthetic entity in a scene of the given image acquired by the sensor.

44. The method of any one of claims 34 to 43, wherein the given metadata is informative of at least one of: (i) time at which the given image has been acquired by the sensor; (ii) day on which the given image has been acquired by the sensor; (iii) location at which the given image has been acquired by the sensor; (iv) period of time during which the given image has been acquired by the sensor; (v) one or more climate conditions under which the given image has been acquired by the sensor.

45. The method of claim 44, wherein the sensor is a camera, wherein the method comprises simulating one or more effects associated with at least one of the time, the day, the location, the period of time, or the one or more climate conditions on the representation of the synthetic entity, as if this synthetic entity had been acquired by the sensor at at least one of said time, day, location, period of time or under said one or more climate conditions.

46. The method of claim 45, wherein the one or more effects comprises a heat distribution of the synthetic entity.

47. The method of any one of claims 34 to 46, wherein the given metadata comprises data Dposition informative a position of a scene present in the given image acquired by the sensor, wherein the method comprises using Dposition to determine data Dtopography informative of a topography of the scene and to select a position of the synthetic entity in the given image based on Dtopography.

48. The method of any one of claims 33 to 47, configured to select an orientation of the synthetic entity based on data informative of a topography of a scene of the given image in which the synthetic entity is to be displayed.

49. A non-transitory computer readable medium comprising instructions that, when executed by one or more processing circuitries, cause the one or more processing circuitries to perform: obtaining a given image of a sequence of images acquired by a sensor, given metadata associated with the given image, wherein the given metadata comprise data informative of the given image, obtaining at least one synthetic entity whose representation depends on said at least part of the given metadata, and generating a training image which comprises the given image, or an image derived from said given image, and said at least one synthetic entity, wherein the training image is associated with a label informative of the at least one synthetic entity.

50. A non-transitory computer readable medium comprising instructions that, when executed by one or more processing circuitries, cause the one or more processing circuitries to perform: using a dataset of labelled training images to train an algorithm to perform target detection, wherein generation of least one given labelled training image of the dataset includes: obtaining a given image acquired by a sensor and given metadata associated with the given image, wherein the given metadata comprise data informative of the given image, obtaining at least one synthetic entity whose representation depends on said at least part of the given metadata, and generating said given labelled training image which comprises the given image, or an image derived from said given image, and said at least one synthetic entity, wherein the given labelled training image is associated with a label informative of the at least one synthetic entity. For the Applicants, REINHOLD COHN AND PARTNERS By: