WO2020040679A1 - A method and corresponding system for generating video-based models of a target such as a dynamic event - Google Patents
A method and corresponding system for generating video-based models of a target such as a dynamic event Download PDFInfo
- Publication number
- WO2020040679A1 WO2020040679A1 PCT/SE2019/050707 SE2019050707W WO2020040679A1 WO 2020040679 A1 WO2020040679 A1 WO 2020040679A1 SE 2019050707 W SE2019050707 W SE 2019050707W WO 2020040679 A1 WO2020040679 A1 WO 2020040679A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- moving
- time
- video
- target
- synchronization
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 43
- 238000003384 imaging method Methods 0.000 claims abstract description 22
- 238000012545 processing Methods 0.000 claims description 41
- 239000007787 solid Substances 0.000 claims description 29
- 230000003068 static effect Effects 0.000 claims description 27
- 238000004590 computer program Methods 0.000 claims description 24
- 238000012800 visualization Methods 0.000 claims description 15
- 230000001360 synchronised effect Effects 0.000 claims description 11
- 238000000605 extraction Methods 0.000 claims description 6
- 230000001427 coherent effect Effects 0.000 claims description 5
- 239000011521 glass Substances 0.000 claims description 3
- 230000002040 relaxant effect Effects 0.000 claims description 3
- 230000015572 biosynthetic process Effects 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 14
- 238000005516 engineering process Methods 0.000 description 9
- 230000006870 function Effects 0.000 description 9
- 238000004519 manufacturing process Methods 0.000 description 4
- ATJFFYVFTNAWJD-UHFFFAOYSA-N Tin Chemical compound [Sn] ATJFFYVFTNAWJD-UHFFFAOYSA-N 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 238000005259 measurement Methods 0.000 description 3
- 230000036962 time dependent Effects 0.000 description 3
- RZVHIXYEVGDQDX-UHFFFAOYSA-N 9,10-anthraquinone Chemical compound C1=CC=C2C(=O)C3=CC=CC=C3C(=O)C2=C1 RZVHIXYEVGDQDX-UHFFFAOYSA-N 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 241001261630 Abies cephalonica Species 0.000 description 1
- 238000006424 Flood reaction Methods 0.000 description 1
- 240000008313 Pseudognaphalium affine Species 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000008672 reprogramming Effects 0.000 description 1
- 239000000779 smoke Substances 0.000 description 1
- 230000026676 system process Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/275—Image signal generators from 3D object models, e.g. computer-generated stereoscopic image signals
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C11/00—Photogrammetry or videogrammetry, e.g. stereogrammetry; Photographic surveying
- G01C11/02—Picture taking arrangements specially adapted for photogrammetry or photographic surveying, e.g. controlling overlapping of pictures
- G01C11/025—Picture taking arrangements specially adapted for photogrammetry or photographic surveying, e.g. controlling overlapping of pictures by scanning the object
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/55—Depth or shape recovery from multiple images
- G06T7/593—Depth or shape recovery from multiple images from stereo images
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/128—Adjusting depth or disparity
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/204—Image signal generators using stereoscopic image cameras
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/296—Synchronisation thereof; Control thereof
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B64—AIRCRAFT; AVIATION; COSMONAUTICS
- B64U—UNMANNED AERIAL VEHICLES [UAV]; EQUIPMENT THEREFOR
- B64U10/00—Type of UAV
- B64U10/10—Rotorcrafts
- B64U10/13—Flying platforms
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B64—AIRCRAFT; AVIATION; COSMONAUTICS
- B64U—UNMANNED AERIAL VEHICLES [UAV]; EQUIPMENT THEREFOR
- B64U20/00—Constructional aspects of UAVs
- B64U20/80—Arrangement of on-board electronics, e.g. avionics systems or wiring
- B64U20/87—Mounting of imaging devices, e.g. mounting of gimbals
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B64—AIRCRAFT; AVIATION; COSMONAUTICS
- B64U—UNMANNED AERIAL VEHICLES [UAV]; EQUIPMENT THEREFOR
- B64U2101/00—UAVs specially adapted for particular uses or applications
- B64U2101/30—UAVs specially adapted for particular uses or applications for imaging, photography or videography
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B64—AIRCRAFT; AVIATION; COSMONAUTICS
- B64U—UNMANNED AERIAL VEHICLES [UAV]; EQUIPMENT THEREFOR
- B64U2201/00—UAVs characterised by their flight controls
- B64U2201/20—Remote controls
-
- G—PHYSICS
- G08—SIGNALLING
- G08G—TRAFFIC CONTROL SYSTEMS
- G08G5/00—Traffic control systems for aircraft, e.g. air-traffic control [ATC]
- G08G5/003—Flight plan management
- G08G5/0034—Assembly of a flight plan
-
- G—PHYSICS
- G08—SIGNALLING
- G08G—TRAFFIC CONTROL SYSTEMS
- G08G5/00—Traffic control systems for aircraft, e.g. air-traffic control [ATC]
- G08G5/0047—Navigation or guidance aids for a single aircraft
- G08G5/0069—Navigation or guidance aids for a single aircraft specially adapted for an unmanned aircraft
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/30—Image reproducers
- H04N13/332—Displays for viewing with the aid of special glasses or head-mounted displays [HMD]
- H04N13/344—Displays for viewing with the aid of special glasses or head-mounted displays [HMD] with head-mounted left-right displays
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N2013/0074—Stereoscopic image analysis
- H04N2013/0081—Depth or disparity estimation from stereoscopic image signals
Definitions
- the proposed technology generally relates to a method and corresponding system for generating one or more video-based models of a target such as a dynamic event, a dynamic sensor system, a ground-based system configured to generate one or more video-based models of a target as well as a corresponding computer program and computer-program product.
- Video taken from moving or movable vehicles such as aerial vehicles and/or surface- based vehicles and/or underwater vehicles including manned and/or unmanned (uncrewed) vehicles such as drones and/or Unmanned Aerial Vehicles (UAVs) can already today give emergency and security authorities and armed forces support for rapid decisions in crisis situations. Such situations may for example include fires, accidents, flooding and/or other security or military operations and the like. Other target scenarios may include sports events, movie productions and the like.
- Another object is to provide a dynamic sensor system.
- Yet another object is to provide a ground-based system configured to generate one or more video-based models of a target.
- Still another object is to provide a computer program for generating, when executed by a processor, one or more video-based models of a target.
- a method for generating one or more video-based models of a target comprises:
- time synchronization of the video frames of the video streams is provided to obtain, for at least one point in time, a set of one or more simultaneously registered video frames;
- a system configured to generate one or more video-based models of a target.
- the system is configured to provide video streams from at least two moving or movable vehicles equipped with cameras for simultaneously imaging the target from different viewpoints.
- Position synchronization of the moving or movable vehicles is provided to create a stable image base.
- Pointing synchronization of the cameras is provided to cover the same object(s) and/or dynamic event(s), and time synchronization of the video frames of the video streams is enabled to obtain, for at least one point in time, a set of simultaneously registered video frames.
- the system is also configured to generate, for said at least one point in time, at least one three-dimensional, 3D, model of the target based on the corresponding set of simultaneously registered video frames.
- a dynamic sensor system comprises a set of synchronously operative moving or movable vehicles equipped with cameras to enable video streams for simultaneously imaging a target from different viewpoints.
- the dynamic sensor system is configured to operate based on position synchronization of the moving or movable vehicles to create a stable image base, and the dynamic sensor system is configured to operate based on pointing synchronization of the cameras to cover the same object(s) and/or dynamic event(s).
- the dynamic sensor system is also configured to enable time synchronization of the video frames of the video streams to allow, for at least one point in time, generation and/or extraction of a set of simultaneously registered video frames.
- the dynamic sensor system is further configured to downlink image data of the video streams to a ground segment from at least one of the moving or movable vehicles to enable generation of one or more three-dimensional/four-dimensional, 3D/4D, models of the target based on the simultaneously registered video frames at the ground segment.
- a ground-based system configured to generate one or more video-based models of a target.
- the system is configured to receive video streams from at least two moving or movable vehicles equipped with cameras for simultaneously imaging the target from different viewpoints.
- the moving or movable vehicles being operated based on position synchronization for a stable image base, and the cameras being operated based on pointing synchronization for covering the same object(s) and/or dynamic event(s), and time synchronization of the video frames of the video streams being enabled to allow, for at least one point in time, generation and/or extraction of a set of simultaneously registered video frames.
- the system is also configured to generate, for said at least one point in time, at least one three-dimensional, 3D, model of the target based on the corresponding set of simultaneously registered video frames.
- a computer program for generating, when executed by a processor, one or more video-based models of a target.
- the computer program comprises instructions, which when executed by the processor, cause the processor to:
- a computer-program product comprising a non-transitory computer-readable medium carrying such a computer program.
- FIG. 1 is a schematic diagram illustrating an example of an overall system view including basic system components.
- FIG. 2 is a schematic diagram illustrating an example of an overall system view including basic system components and basic functions.
- FIG. 3 is a schematic diagram illustrating an example of a dynamic sensor system, here exemplified as an aerial segment based on drones equipped with on-board cameras.
- FIG. 4 is a schematic diagram illustrating an example of how momentary 3D models can be generated, e.g. from time-synchronized video frame pairs or more generally n- tuples for visualization and exploration.
- FIG. 5 is a schematic diagram illustrating an example of how the momentary 3D models together can be used to build near real-time 4D models for visualization and exploration.
- FIG. 6 is a schematic diagram illustrating an example of a computer-implementation according to an embodiment.
- FIG. 7 is a schematic flow diagram illustrating an example of a method for generating one or more video-based models of a target.
- video taken from moving or movable vehicles such as aerial vehicles and/or surface-based vehicles (e.g water and/or ground-based vehicles) and/or underwater vehicles, including manned and/or unmanned (unscrewed) vehicles such as drones and/or Unmanned Aerial Vehicles (UAVs) can already today give emergency and security authorities and armed forces support for rapid decisions in crisis situations.
- the innovation takes the decision support a big step forward by allowing creation of real-time 3D models as well as 4D models, i.e. a series of time- dependent 3D models.
- the proposed technology is also applicable for providing video-based models of other events, such as sport events, and for movie productions and the like.
- the advanced models of the present invention not only allow visualization of fixed objects, but also of dynamic events such as moving vehicles, smoke plumes, floods, accidents and crowds, e.g. using photogrammetry. This gives the user the opportunity to study dynamic events in the same way that we study static 3D models, e.g. from the angles and with the zooming desired by the user.
- the innovation also enables manual and automatic measurements of the dimensions (e.g. volume, area, length, height), directions and speed of moving objects.
- Moving objects for example vehicles, can however only be reconstructed in three dimensions when they have been imaged from different viewpoints in at least two images simultaneously. Using two or more images in a single video or still image sequence from one drone, the moving object will have changed position and/or shape, thus disabling 3D reconstruction by photogrammetric and computer vision techniques.
- each pair or more generally m-tuple (m 3 2, where m represents the number of cameras) of simultaneously registered images or video frames enables generation of a 3D model for a particular point in time
- m represents the number of cameras
- the union of all 3D models for a time interval to to t n constitute a 4D model.
- n represents the number of points in time.
- the user can navigate in both the three spatial dimensions and in the time dimension.
- drone will now be used to represent any suitable controllably moving or movable vehicle from which video sequences of events can be taken and used for 3D/4D modelling.
- unmanned and/or manned vehicles may hold passengers and/or even drivers but still be more or less autonomous, such as self-driving cars.
- unmanned or uncrewed vehicles such as drones may be used.
- unmanned aerial vehicles it may be desirable to use unmanned aerial vehicles, whereas in other applications unmanned surface vehicles may be preferred.
- terrain height shall be interpreted as object distance for applicable vehicles and applications, e.g. for vehicles that are not aerial.
- the moving or movable vehicles may be performing very small corrective and/or adaptive movements to maintain a certain position, at least for a given period of time, e.g. when drones are temporarily hovering over a target area.
- target will be used to represent any general target area and/or situation of interest.
- the target may represent one or more dynamic events and include one or more moving objects. This represents a particularly interesting application.
- the proposed technology may also be useful for allowing faster coverage of a given geographical area and/or improved accuracy for real-time processing.
- a method for generating one or more video-based models of a target For example, reference can be made to the schematic flow diagram of FIG. 7.
- the method comprises:
- time synchronization of the video frames of the video streams is provided to obtain, for at least one point in time, a set of simultaneously registered video frames
- S2 generating, for said at least one point in time, at least one three-dimensional, 3D, model of the target based on the corresponding set of simultaneously registered video frames.
- time synchronization of the video frames of the video streams is provided to obtain, for each of a plurality of points in time, a set of simultaneously registered video frames.
- at least one 3D model of the target may be generated, for each one of a selected set of the plurality of points in time, based on the corresponding set of simultaneously registered video frames.
- the method may include the following optional step;
- S3 combining at least a subset of the generated 3D models to generate a four- dimensional, 4D, model of the target to enable a user to navigate through the 4D model in the three spatial dimensions and in time dimension.
- the moving or movable vehicles represent any controllably moving or movable vehicles from which video sequences of events can be taken and used for 3D/4D modelling.
- image data based on the video streams may be downlinked to a ground segment from at least one of the moving or movable vehicles and/or a control uplink may be operated for controlling the moving or movable vehicles and/or the cameras from the ground segment.
- the moving or movable vehicles may include aerial vehicles and/or surface based vehicles, including manned or uncrewed vehicles.
- the moving or movable vehicles include drones or unmanned aerial vehicles (UAVs), each having an on-board camera.
- UAVs unmanned aerial vehicles
- synchronous drone flight including camera pointing may be controlled from a ground segment by a front-line officer, using pre-set parameters for the flight, or the flight may be controlled autonomously.
- the drones are configured to fly side by side, covering an area of interest across track.
- the drones are configured to fly one after another along track.
- the drones are configured to hover over an area of interest.
- the positioning and/or pointing synchronization correspond to relative orientation.
- the relative orientation can be computed continuously in real-time based on the content of the video streams and used to control the drones and/or cameras such that they obtain and/or maintain desired positioning and/or pointing synchronization.
- more or less autonomous flight control may be implemented by using the downlinked video streams to determine and adjust the relative positions and attitudes of the drones. Accordingly, corresponding instructions and/or commands may be generated and uplinked from a ground station to the drones to maintain the base and relevant positioning and/or pointing synchronizations.
- the position synchronization and/or the pointing synchronization may be realized by a radio link between the drones, direct or through a ground radio station, in which the positions of the drones may be communicated so that a correct base-to- height ratio can be maintained, or the position synchronization can be realized by a flight plan, which is created before the mission.
- automatic or manual pointing of the cameras may be performed to enable the target to be covered in an optimal way for photogrammetric processing of the video data.
- the cameras may be convergently pointing at the same object, which is imaged in the center of each image, or the cameras may be pointing strictly downwards in nadir for relaxing the requirements on pointing synchronization.
- the camera in each moving or movable vehicle takes a video frame at the same moment as the camera of a cooperative vehicle to provide for synchronous video takes.
- the time synchronization may be realized by matching an image of a moving object from a first vehicle at time ti with a sequence of images at times [ti- n... ti+n] from a second vehicle to find the optimal co-occurrence of the moving object at time tj; -n ⁇ j ⁇ +n.
- time synchronization based on detecting the index of the first frame in which a distinctly colored light, lit while cameras of the moving or movable vehicles are imaging, is visible in each video, and defining the time shift between the videos as the difference in frame indexes.
- the time synchronization may be realized by starting the cameras at the exact same time.
- stereo images ready to be displayed stereoscopically in Virtual Reality headset or with 3D glasses, may be generated based on the registered video frames.
- Dense image matching may then be performed on the stereo images to generate depth maps, where each pixel value represents relative distance between camera and imaged object.
- areas with moving objects may be identified using the depth maps.
- background static objects are separated from dynamic moving objects, and a coherent 3D model over the target including only static terrain and static objects is generated.
- a database of dynamic moving objects may be created in which solid moving objects and non-solid moving objects are classified and corresponding 3D models of the dynamic solid moving objects and 3D models of dynamic non-solid moving objects are provided.
- the 4D model may then be based on the 3D model of static terrain and static objects and the 3D models of the dynamic solid moving objects and 3D models of dynamic non-solid moving objects.
- visualization and/or decision support may be provided based on the generated 3D/4D models.
- the viewpoint and zooming within the 4D model may be adapted to user preferences.
- FIG. 1 For a system overview, reference can be made to the schematic examples illustrated in FIG. 1 and FIG. 2.
- a system 100 configured to generate one or more video-based models of a target.
- the system 100 is configured to provide video streams from at least two moving or movable vehicles 10-1 , 10-2, .... 10-m (m > 2) equipped with cameras 20-1 , 20-2, ... , 20-m for simultaneously imaging the target 50 from different viewpoints.
- Position synchronization of the moving or movable vehicles 10-1 , 10-2, ... , 10-m is provided to create a stable image base. Pointing synchronization of the cameras 20-1 , 20-2, ...
- the system 100 is also configured to generate, for said at least one point in time, at least one three-dimensional, 3D, model of the target based on the corresponding set of simultaneously registered video frames.
- the system 100 is configured to enable time synchronization of the video frames of the video streams to obtain, for each of a plurality of points in time, a set of simultaneously registered video frames.
- the system 100 may be configured to generate, for each one of a selected set of the plurality of points in time, at least one 3D model of the target based on the corresponding set of simultaneously registered video frames
- system 100 is further configured to combine at least a subset of the generated 3D models to generate a four-dimensional, 4D, model of the target to enable a user to navigate through the 4D model in the three spatial dimensions and in the time dimension.
- the system may be configured to download or downlink image data based on the video streams to a ground segment 30 from at least one of the moving or movable vehicles and/or to operate a control uplink for controlling the moving or movable vehicles 10-1 , 10-2, .... 10-m and/or the cameras 20-1 , 20-2, ... , 20-m from the ground segment 30.
- the system is configured to separate background static objects from dynamic moving objects, and generate a coherent 3D model over the target including only static terrain and static objects, and create a database of dynamic moving objects in which solid moving objects and non-solid moving objects are classified and corresponding 3D models of the dynamic solid moving objects and 3D models of dynamic non-solid moving objects.
- the system may then be configured to generate the 4D model based on the 3D model of static terrain and static objects and the 3D models of the dynamic solid moving objects and 3D models of dynamic non- solid moving objects.
- the system 100 comprises processing circuitry 110 and memory 120, the memory 120 comprising instructions, which when executed by the processing circuitry 110, cause the processing circuitry 1 10 to generate the video- based models of dynamic events, or electronic circuitry configured generate the video- based models of dynamic events.
- a dynamic sensor system 25 comprises a set of synchronously operative moving or movable vehicles 10-1 , 10-2, ... , 10-m equipped with cameras 20-1 , 20-2. 20-m to enable video streams for simultaneously imaging a target from different viewpoints.
- the dynamic sensor system 25 is configured to operate based on position synchronization of the moving or movable vehicles 10-1 , 10-2, ... , 10-m to create a stable image base, and the dynamic sensor system 25 is configured to operate based on pointing synchronization of the cameras 20-1 , 20-2, ... , 20-m to cover the same object(s) and/or dynamic event(s).
- the dynamic sensor system 25 is also configured to enable time synchronization of the video frames of the video streams to allow, for at least one point in time, generation and/or extraction of a set of simultaneously registered video frames.
- the dynamic sensor system 25 is further configured to downlink image data of the video streams to a ground segment 30 from at least one of the moving or movable vehicles 10-1 , 10-2, ..., 10-m to enable generation of one or more three-dimensiona!/four-dimensional, 3D/4D, models of the target based on the simultaneously registered video frames at the ground segment 30.
- the dynamic sensor system 25 enables the formation of an aerial segment of synchronously operative drones 10-1 , 10-2, .... 10-m having on-board cameras 20-1 , 20-2, ... , 20-m to allow for the synchronized video streams for simultaneously imaging the target from different viewpoints.
- a ground-based system 30 configured to generate one or more video-based models of a target.
- the system 30 is configured to receive video streams from at least two moving or movable vehicles 10-1 , 10-2, ..., 10-m equipped with cameras 20-1 , 20-2, ... , 20-m for simultaneously imaging the target 50 from different viewpoints.
- the moving or movable vehicles 10-1 , 10-2, ... , 10- m being operated based on position synchronization for a stable image base, and the cameras 20-1 , 20-2, ...
- the system 30 is also configured to generate, for said at least one point in time, at least one three-dimensional, 3D, model of the target based on the corresponding set of simultaneously registered video frames.
- the system 30 is configured to enable time synchronization of the video frames of the video streams to obtain, for each of a plurality of points in time, a set of simultaneously registered video frames.
- the system 30 may then be configured to generate, for each one of a selected set of the plurality of points in time, at least one 3D model of the target based on the corresponding set of simultaneously registered video frames.
- system 30 may optionally be configured to combine at least a subset of the generated 3D models to generate a four-dimensional, 4D, model of the target to enable a user to navigate through the 4D model in the three spatial dimensions and in the time dimension.
- the system 30 is configured to operate a control uplink for controlling the moving or movable vehicles 10-1 , 10-2, ... , 10-m and/or the cameras 20-1 , 20-2, .... 20-m.
- the system 30 may be configured to determine the position synchronization and pointing synchronization and send command on the control uplink to adapt the position synchronization and pointing synchronization of the moving or movable vehicles 10 and/or the cameras 20.
- the system 30 (see FIG. 6) comprises processing circuitry 1 10 and memory 120, the memory 120 comprising instructions, which when executed by the processing circuitry 1 10, cause the processing circuitry 1 10 to generate the video- based models of dynamic events, or electronic circuitry configured generate the video- based models of dynamic events.
- the proposed technology is based on three types of synchronization:
- the system comprises (see FIG. 1 and/or FIG. 2):
- o Frame rate of the camera can depend on the application.
- o Normally located in the front line and keeps the overall control of the system, including one or more of the following:
- target such as a dynamic event where real-time 3D/4D models would facilitate decision-making, e.g. fires, accidents, flooding or other security or military operations.
- target scenarios may include sports events, movie productions and the like.
- the image base (See FIG. 2 and FIG. 3), i.e. the distance between the drones, is optimized for accurate automatic 3D point reconstruction and is dependent on the height above the terrain and intrinsic parameters of the cameras.
- the synchronous drone flight 1 (including the sensor pointing 2) may be controlled from the ground segment by a front-line officer, mainly using pre-set parameters for the flight.
- the flight may be controlled autonomously.
- the overlap area between synchronized video frames may be determined as an integrated step in the generation of 4D models.
- This overlap may in a particular example be used to determine how the drones need to be reconfigured to maintain an optimal positioning and pointing synchronization.
- the commands required to update the drones positioning and pointing synchronization may in this example be computed and sent to the drones autonomously by the system.
- the positioning and/or pointing synchronization correspond to relative orientation, which if desired can be computed continuously in real-time based on the content of the video streams and used to control the drones and/or cameras such that they obtain and/or maintain desired positioning and/or pointing synchronization.
- position synchronization may be realized by a radio link between the drones (direct or through a ground radio station), in which the positions of the drones are communicated so that a correct base-to-height ratio can be maintained.
- position synchronization may be realized by a flight plan, which is created before the mission, in which distance between the drones is set to a constant value based on flying height, fie!d-of-view and other parameters.
- the system may be using automatic or manual pointing of the cameras.
- the cameras are convergently pointing at the same object, which is imaged in the center of each image.
- the cameras may be pointing strictly downwards in nadir.
- the overlap area is then determined by the base-to-height ratio and the intrinsic parameters of the camera, such as the field-of-view.
- pointing synchronization may be realized by a radio link between the drones (possibly through a ground radio station), in which the positions of the drones are communicated so that a correct target object to center at can be calculated.
- the sensor/camera in each drone needs to take a video frame at the same moment as the cooperative drone (synchronous video takes 3), otherwise the target will move or change shape between the frame takes.
- each frame may be associated with a precise time tagging.
- the difference in time, AM j - is equivalent with a difference in frame indexes in the video sequences, thus enabling time synchronization to the accuracy of the video frame rate, e.g. 1/30 seconds.
- This time calibration can be iterated during flight to avoid time drifts due to e.g. differing internal camera clocks.
- time synchronization may be realized by using a single electronic shutter, connected to both (all) drones before the mission, thus starting the cameras at the exact same time.
- a distinctly coloured light is lit while both (all) drones are imaging.
- the index of the first frame in which the light is visible is automatically detected in each video.
- the difference in frame indexes define the time shift between the videos.
- a communication link 4 may be established between the aerial segment and the ground segment, including at least a real-time video downlink, and optionally a control link (uplink) for controlling the aerial segment, e.g. the drones and/or the sensors.
- a control link uplink
- the interior orientations i.e. the physical characteristics of the cameras
- the interior orientations may be determined before or during mission, using targets or natural objects.
- Each set/pair of images from different videos/cameras, but the same time t, may be oriented relative each other through image matching and sensor orientation.
- Stereo images ready to be displayed stereoscopically in VR headsets or with 3D glasses, may be generated, preferably at each of a plurality of points in time (so that a video may be viewed stereoscopically).
- ⁇ Dense image matching may be performed on the stereo images to generate depth maps, i.e. images where the pixel value is the relative distance between camera and imaged object.
- GPS Global Positioning System
- IMU Inertial Measurement Unit
- object coordinates C,U,Z
- exterior orientation i.e. the position and orientations of the cameras
- DSM digital surface models
- Orientation data and depth maps from a sequence of already calculated frames may be used to predict the data at the currently processed frame to speed up calculations.
- Depth maps and/or DSM may be saved to enable replaying the real-time scenario and to provide input for the Near-Real-time process.
- the block of images may be adjusted to estimate globally optimal orientation data. o This may initially be done without GPS and IMU data, in which case absolute orientation of the block is done after the adjustment,
- GPS and IMU observations can be included in the adjustment.
- Dense image matching may be performed using the adjusted sensor parameters and existing depth maps from Real-time processing.
- Areas with moving objects can be identified using the depth maps, optical flow and e.g. affine and projective factorization.
- a coherent 3D model over the entire area and including oniy static terrain and objects (which do not move) may be created.
- a database with moving objects may be created in which (i) solid and (ii) deformable moving objects are classified.
- the database may include time-dependent 3D models.
- the proposed technology may optionally also include a visualization and/or decisionmaking/support tool 6 based on the generated 3D/4D models of the static and/or dynamic events, as indicated herein.
- FIG. 4 is a schematic diagram illustrating an example of how momentary 3D models can be generated, e.g. from time-synchronized video frame pairs or more generally n- tuples for visualization and exploration.
- FIG. 5 illustrate an example of how the momentary 3D models together can be used to build real-time and near real-time 4D models for visualization and exploration.
- real-time processing may give the user an immediate, but somewhat limited, 4D view of the target:
- Sensors 1 and 2 take a stream of video frames, creating an image pair at each moment.
- Each 3D model is exported as a raster (in“image space”) to the visualization module. Together they compose a 4D mode! of the target.
- the graphics show only two flying sensors, but the system could handle more sensors, which would improve quality of the real-time 4D model.
- real-time visualization may be a 4D model which gives the user an immediate and improved impression of the target. This view is much better than just watching the real-time video from one drone, since the viewpoint and the zooming can be adapted to the user’s preference.
- near real-time processing may give the user a somewhat delayed (by seconds), but full-scale 4D view of the target, which also can be studied in the time domain (go backwards to see what happened earlier):
- the system separates the background static objects from the dynamic moving objects.
- the system separates solid from non-solid moving objects.
- TINs In“object space”
- a TIN Triangulated Irregular Network
- the 3D models of dynamic non-solid objects are exported as a point cloud (in “object space”) to the exploration module.
- Digital TIN data structures may be used in a variety of applications, including geographic information systems (GIS), and computer aided drafting (CAD) for the visual representation of a topographical surface.
- GIS geographic information systems
- CAD computer aided drafting
- a TIN is normally a vector-based representation of the physical land surface or sea bottom, made up of irregularly distributed nodes and lines with three-dimensional coordinates (x, y, and z) that are arranged in a network of non-overlapping triangles.
- the graphics show only two flying sensors, but the system could handle more sensors, which would improve quality of the real-time 4D model.
- near real-time exploration may involve a tool for deeper analysis.
- it may store the generated 4D models and may visualize the models simultaneously, also in the time domain.
- the tool may also allow for more advanced analysis, as for example:
- processing circuitry such as one or more processors or processing units.
- processing circuitry includes, but is not limited to, one or more microprocessors, one or more Graphics Processing Unit (GPU), one or more Digital Signal Processors (DSPs), one or more Central Processing Units (CPUs), video acceleration hardware, and/or any suitable programmable logic circuitry such as one or more Field Programmable Gate Arrays (FPGAs), or one or more Programmable Logic Controllers (PLCs).
- GPU Graphics Processing Unit
- DSPs Digital Signal Processors
- CPUs Central Processing Units
- FPGAs Field Programmable Gate Arrays
- PLCs Programmable Logic Controllers
- the overall functionality may also be partitioned between programmed software, SW, for execution on one or more processors, and one or more pre-configured or possibly reconfigurable hardware circuits such as ASICs and/or FPGAs.
- the actual hardware- software partitioning can be decided by a system designer based on a number of factors including processing speed, cost of implementation and other requirements.
- FIG. 6 is a schematic diagram illustrating an example of a computer-implementation according to an embodiment.
- processing circuitry 110 including one or more processors.
- the processor(s) 110 and memory 120 are interconnected to each other to enable normal software execution.
- An optional input/output device 130 may also be interconnected to the processor(s) 110 and/or the memory 120 to enable input and/or output of relevant data such as input parameter(s) and/or resulting output parameter(s).
- processing and visualization may share memory resources through GPU interoperability.
- processor should be interpreted in a general sense as any system or device capable of executing program code or computer program instructions to perform a particular processing, determining or computing task.
- the processing circuitry including one or more processors is thus configured to perform, when executing the computer program, well-defined processing tasks such as those described herein.
- the processing circuitry does not have to be dedicated to only execute the above- described steps, functions, procedure and/or blocks, but may also execute other tasks.
- the computer program 125; 145 comprises instructions, which when executed by at least one processor 110, cause the processor(s) 110 to perform at least part of the steps and/or actions or tasks described herein.
- a computer program 125; 145 for generating, when executed by a processor 110, one or more video-based models of a target.
- the computer program 125; 145 comprises instructions, which when executed by the processor 110, cause the processor 110 to:
- time synchronization of the video frames of the video streams being provided to obtain, for at least one point in time, a set of simultaneously registered video frames; and generate, for said at least one point in time, at least one three-dimensional, 3D, model or four-dimensional, 4D, model of the target based on the corresponding set of simultaneously registered video frames.
- the software or computer program 125; 145 may be realized as a computer program product, which is normally carried or stored on a computer-readable medium 120; 140, in particular a non-volatile medium.
- the computer program 125; 145 may thus be loaded into the operating memory 120 of a computer or equivalent processing device for execution by the processing circuitry 110 thereof.
- the flow diagram or diagrams presented herein may be regarded as a computer flow diagram or diagrams, when performed by one or more processors.
- a corresponding apparatus may be defined as a group of function modules, where each step performed by the processor corresponds to a function module.
- the function modules are implemented as a computer program running on the processor.
- the computer program residing in memory may thus be organized as appropriate function modules configured to perform, when executed by the processor, at least part of the steps and/or tasks described herein.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Image Analysis (AREA)
- Studio Devices (AREA)
- Aviation & Aerospace Engineering (AREA)
- Image Processing (AREA)
Abstract
There is disclosed a method and corresponding systems for generating one or more video-based models of a target. The method comprises providing (S1) video streams from at least two moving or movable vehicles equipped with cameras for simultaneously imaging the target from different viewpoints. Position synchronization of the moving or movable vehicles is provided to create a stable image base, which represents the distance between the moving or movable vehicles. Pointing synchronization of the cameras is provided to cover the same object(s) and/or dynamic event(s). Time synchronization of the video frames of the video streams is provided to obtain, for at least one point in time, a set of simultaneously registered video frames. The method further comprises generating (S2), for said at least one point in time, at least one three-dimensional, 3D, model of the target based on the corresponding set of simultaneously registered video frames.
Description
t
A METHOD AND CORRESPONDING SYSTEM FOR GENERATING VIDEO-BASED
MODELS OF A TARGET SUCH AS A DYNAMIC EVENT
TECHNICAL FIELD
The proposed technology generally relates to a method and corresponding system for generating one or more video-based models of a target such as a dynamic event, a dynamic sensor system, a ground-based system configured to generate one or more video-based models of a target as well as a corresponding computer program and computer-program product.
BACKGROUND
Video taken from moving or movable vehicles such as aerial vehicles and/or surface- based vehicles and/or underwater vehicles including manned and/or unmanned (uncrewed) vehicles such as drones and/or Unmanned Aerial Vehicles (UAVs) can already today give emergency and security authorities and armed forces support for rapid decisions in crisis situations. Such situations may for example include fires, accidents, flooding and/or other security or military operations and the like. Other target scenarios may include sports events, movie productions and the like.
However, there is a general need for improved video-based models of target scenarios such as dynamic events and/or enhanced decision support based on such models, e.g. to extend imagery based 3D models to include also dynamic events for improved visualization, user interaction and/or decision support.
SUMMARY
It is an object to provide a method for generating one or more video-based models of a target.
It is also an object to provide a system configured to generate one or more video-based models of a target.
Another object is to provide a dynamic sensor system.
Yet another object is to provide a ground-based system configured to generate one or more video-based models of a target.
Still another object is to provide a computer program for generating, when executed by a processor, one or more video-based models of a target.
It is also an object to provide a corresponding computer-program product.
According to a first aspect, there is provided a method for generating one or more video-based models of a target. The method comprises:
providing video streams from at least two moving or movable vehicles equipped with cameras for simultaneously imaging the target from different viewpoints, « wherein position synchronization of the moving or movable vehicles is provided to create a stable image base, which represents the distance between the moving or movable vehicles,
# wherein pointing synchronization of the cameras is provided to cover the same object(s) and/or dynamic event(s), and
:* wherein time synchronization of the video frames of the video streams is provided to obtain, for at least one point in time, a set of one or more simultaneously registered video frames; and
generating, for said at least one point in time, at least one three-dimensional, 3D, model of the target based on the corresponding set of simultaneously registered video frames.
According to a second aspect, there is provided a system configured to generate one or more video-based models of a target. The system is configured to provide video
streams from at least two moving or movable vehicles equipped with cameras for simultaneously imaging the target from different viewpoints. Position synchronization of the moving or movable vehicles is provided to create a stable image base. Pointing synchronization of the cameras is provided to cover the same object(s) and/or dynamic event(s), and time synchronization of the video frames of the video streams is enabled to obtain, for at least one point in time, a set of simultaneously registered video frames. The system is also configured to generate, for said at least one point in time, at least one three-dimensional, 3D, model of the target based on the corresponding set of simultaneously registered video frames.
According to a third aspect, there is provided a dynamic sensor system. The dynamic sensor system comprises a set of synchronously operative moving or movable vehicles equipped with cameras to enable video streams for simultaneously imaging a target from different viewpoints. The dynamic sensor system is configured to operate based on position synchronization of the moving or movable vehicles to create a stable image base, and the dynamic sensor system is configured to operate based on pointing synchronization of the cameras to cover the same object(s) and/or dynamic event(s). The dynamic sensor system is also configured to enable time synchronization of the video frames of the video streams to allow, for at least one point in time, generation and/or extraction of a set of simultaneously registered video frames. The dynamic sensor system is further configured to downlink image data of the video streams to a ground segment from at least one of the moving or movable vehicles to enable generation of one or more three-dimensional/four-dimensional, 3D/4D, models of the target based on the simultaneously registered video frames at the ground segment.
According to a fourth aspect, there is provided a ground-based system configured to generate one or more video-based models of a target. The system is configured to receive video streams from at least two moving or movable vehicles equipped with cameras for simultaneously imaging the target from different viewpoints. The moving or movable vehicles being operated based on position synchronization for a stable image base, and the cameras being operated based on pointing synchronization for
covering the same object(s) and/or dynamic event(s), and time synchronization of the video frames of the video streams being enabled to allow, for at least one point in time, generation and/or extraction of a set of simultaneously registered video frames. The system is also configured to generate, for said at least one point in time, at least one three-dimensional, 3D, model of the target based on the corresponding set of simultaneously registered video frames..
According to a fifth aspect, there is provided a computer program for generating, when executed by a processor, one or more video-based models of a target. The computer program comprises instructions, which when executed by the processor, cause the processor to:
receive video streams from at least two moving or movable vehicles equipped with cameras for simultaneously imaging the target from different viewpoints,
• the moving or movable vehicles being operated based on position synchronization for a stable image base,
* the cameras being operated based on pointing synchronization for covering the same object(s) and/or dynamic event(s), and
« time synchronization of the video frames of the video streams being provided to obtain, for at least one point in time, a set of simultaneously registered video frames; and
generate, for said at least one point in time, at least one three-dimensional, 3D, model or four-dimensional, 4D, model of the target based on the corresponding set of simultaneously registered video frames.
According to a sixth aspect, there is provided a computer-program product comprising a non-transitory computer-readable medium carrying such a computer program.
Other advantages of the invention will be appreciated when reading the below detailed description.
BRIEF DESCRIPTION OF DRAWINGS
The invention, together with further objects and advantages thereof, may best be understood by making reference to the following description taken together with the accompanying drawings, in which:
FIG. 1 is a schematic diagram illustrating an example of an overall system view including basic system components.
FIG. 2 is a schematic diagram illustrating an example of an overall system view including basic system components and basic functions.
FIG. 3 is a schematic diagram illustrating an example of a dynamic sensor system, here exemplified as an aerial segment based on drones equipped with on-board cameras.
FIG. 4 is a schematic diagram illustrating an example of how momentary 3D models can be generated, e.g. from time-synchronized video frame pairs or more generally n- tuples for visualization and exploration.
FIG. 5 is a schematic diagram illustrating an example of how the momentary 3D models together can be used to build near real-time 4D models for visualization and exploration.
FIG. 6 is a schematic diagram illustrating an example of a computer-implementation according to an embodiment.
FIG. 7 is a schematic flow diagram illustrating an example of a method for generating one or more video-based models of a target.
DETAILED DESCRIPTION
As mentioned, video taken from moving or movable vehicles such as aerial vehicles and/or surface-based vehicles (e.g water and/or ground-based vehicles) and/or underwater vehicles, including manned and/or unmanned (unscrewed) vehicles such as drones and/or Unmanned Aerial Vehicles (UAVs) can already today give emergency and security authorities and armed forces support for rapid decisions in crisis situations. The innovation takes the decision support a big step forward by allowing creation of real-time 3D models as well as 4D models, i.e. a series of time- dependent 3D models.
The proposed technology is also applicable for providing video-based models of other events, such as sport events, and for movie productions and the like
The advanced models of the present invention not only allow visualization of fixed objects, but also of dynamic events such as moving vehicles, smoke plumes, floods, accidents and crowds, e.g. using photogrammetry. This gives the user the opportunity to study dynamic events in the same way that we study static 3D models, e.g. from the angles and with the zooming desired by the user. The innovation also enables manual and automatic measurements of the dimensions (e.g. volume, area, length, height), directions and speed of moving objects.
Moving objects, for example vehicles, can however only be reconstructed in three dimensions when they have been imaged from different viewpoints in at least two images simultaneously. Using two or more images in a single video or still image sequence from one drone, the moving object will have changed position and/or shape, thus disabling 3D reconstruction by photogrammetric and computer vision techniques.
This innovation makes use of two or more drones or similar moving or movable vehicles with synchronized or at least synchronizable video streams to enable simultaneous imaging from different viewpoints and hence 3D reconstruction of a
I target. For example, each pair or more generally m-tuple (m ³ 2, where m represents the number of cameras) of simultaneously registered images or video frames enables generation of a 3D model for a particular point in time, The union of all 3D models for a time interval to to tn, e.g. the length of the simultaneous video recordings or a selected subset thereof, constitute a 4D model. Here, n represents the number of points in time. In such a 4D model, the user can navigate in both the three spatial dimensions and in the time dimension.
For simplicity, the term video will now be used to represent any kind of image sequences.
For simplicity, the term drone will now be used to represent any suitable controllably moving or movable vehicle from which video sequences of events can be taken and used for 3D/4D modelling.
It should though be understood that other types of vehicles may be used by the present innovation, including aerial and/or surface vehicles, as well as unmanned and/or manned vehicles. It should also be understood that vehicles may hold passengers and/or even drivers but still be more or less autonomous, such as self-driving cars. In particular embodiments, unmanned or uncrewed vehicles such as drones may be used. In some applications, it may be desirable to use unmanned aerial vehicles, whereas in other applications unmanned surface vehicles may be preferred. The term terrain height shall be interpreted as object distance for applicable vehicles and applications, e.g. for vehicles that are not aerial.
It should also be understood that the moving or movable vehicles may be performing very small corrective and/or adaptive movements to maintain a certain position, at least for a given period of time, e.g. when drones are temporarily hovering over a target area.
The term target will be used to represent any general target area and/or situation of interest.
By way of example, the target may represent one or more dynamic events and include one or more moving objects. This represents a particularly interesting application.
It should though be understood that apart from modeling dynamic events, the proposed technology may also be useful for allowing faster coverage of a given geographical area and/or improved accuracy for real-time processing.
According to a first aspect, there is provided a method for generating one or more video-based models of a target. For example, reference can be made to the schematic flow diagram of FIG. 7.
Basically, the method comprises:
S1 : providing synchronized video streams from at least two moving or movable vehicles equipped with cameras for simultaneously imaging the target from different viewpoints,
* wherein position synchronization of the moving or movable vehicles is provided to create a stable image base, which represents the distance between the moving or movable vehicles,
« wherein pointing synchronization of the cameras is provided to cover the same object(s) and/or dynamic event(s), and
* wherein time synchronization of the video frames of the video streams is provided to obtain, for at least one point in time, a set of simultaneously registered video frames; and
S2: generating, for said at least one point in time, at least one three-dimensional, 3D, model of the target based on the corresponding set of simultaneously registered video frames.
In a particular example, time synchronization of the video frames of the video streams is provided to obtain, for each of a plurality of points in time, a set of simultaneously registered video frames. In this case, at least one 3D model of the target may be generated, for each one of a selected set of the plurality of points in time, based on the corresponding set of simultaneously registered video frames.
Accordingly, the method may include the following optional step;
S3: combining at least a subset of the generated 3D models to generate a four- dimensional, 4D, model of the target to enable a user to navigate through the 4D model in the three spatial dimensions and in time dimension.
In general, the moving or movable vehicles represent any controllably moving or movable vehicles from which video sequences of events can be taken and used for 3D/4D modelling.
By way of example, image data based on the video streams may be downlinked to a ground segment from at least one of the moving or movable vehicles and/or a control uplink may be operated for controlling the moving or movable vehicles and/or the cameras from the ground segment.
For example, the moving or movable vehicles may include aerial vehicles and/or surface based vehicles, including manned or uncrewed vehicles.
In a particular example, the moving or movable vehicles include drones or unmanned aerial vehicles (UAVs), each having an on-board camera.
As an example, synchronous drone flight including camera pointing may be controlled from a ground segment by a front-line officer, using pre-set parameters for the flight, or the flight may be controlled autonomously.
In a particular example, the drones are configured to fly side by side, covering an area of interest across track. In another example, the drones are configured to fly one after another along track. In yet an example, the drones are configured to hover over an area of interest.
In general, the positioning and/or pointing synchronization correspond to relative orientation. By way of example, the relative orientation can be computed continuously in real-time based on the content of the video streams and used to control the drones and/or cameras such that they obtain and/or maintain desired positioning and/or pointing synchronization.
For example, more or less autonomous flight control may be implemented by using the downlinked video streams to determine and adjust the relative positions and attitudes of the drones. Accordingly, corresponding instructions and/or commands may be generated and uplinked from a ground station to the drones to maintain the base and relevant positioning and/or pointing synchronizations.
For example, the position synchronization and/or the pointing synchronization may be realized by a radio link between the drones, direct or through a ground radio station, in which the positions of the drones may be communicated so that a correct base-to- height ratio can be maintained, or the position synchronization can be realized by a flight plan, which is created before the mission.
Basically, automatic or manual pointing of the cameras may be performed to enable the target to be covered in an optimal way for photogrammetric processing of the video data.
For example, the cameras may be convergently pointing at the same object, which is imaged in the center of each image, or the cameras may be pointing strictly downwards in nadir for relaxing the requirements on pointing synchronization.
In a particular set of embodiments, the camera in each moving or movable vehicle takes a video frame at the same moment as the camera of a cooperative vehicle to provide for synchronous video takes.
By way of example, the time synchronization may be realized by matching an image of a moving object from a first vehicle at time ti with a sequence of images at times [ti- n... ti+n] from a second vehicle to find the optimal co-occurrence of the moving object at time tj; -n < j < +n.
Optionally, a difference in time, At=tj - ti, may be equivalent with a difference in frame indexes in the video sequences, thus enabling time synchronization to the accuracy of the video frame rate.
It is also possible to the realize the time synchronization based on detecting the index of the first frame in which a distinctly colored light, lit while cameras of the moving or movable vehicles are imaging, is visible in each video, and defining the time shift between the videos as the difference in frame indexes.
Alternatively, the time synchronization may be realized by starting the cameras at the exact same time.
In a particular embodiment, stereo images, ready to be displayed stereoscopically in Virtual Reality headset or with 3D glasses, may be generated based on the registered video frames.
Dense image matching, for example, may then be performed on the stereo images to generate depth maps, where each pixel value represents relative distance between camera and imaged object.
By way of example, areas with moving objects may be identified using the depth maps.
In a particular example, background static objects are separated from dynamic moving objects, and a coherent 3D model over the target including only static terrain and static objects is generated. A database of dynamic moving objects may be created in which solid moving objects and non-solid moving objects are classified and corresponding 3D models of the dynamic solid moving objects and 3D models of dynamic non-solid moving objects are provided. The 4D model may then be based on the 3D model of static terrain and static objects and the 3D models of the dynamic solid moving objects and 3D models of dynamic non-solid moving objects.
In this way, visualization and/or decision support may be provided based on the generated 3D/4D models.
For example, the viewpoint and zooming within the 4D model may be adapted to user preferences.
For a system overview, reference can be made to the schematic examples illustrated in FIG. 1 and FIG. 2.
According to a second aspect, there is provided a system 100 configured to generate one or more video-based models of a target. The system 100 is configured to provide video streams from at least two moving or movable vehicles 10-1 , 10-2, .... 10-m (m > 2) equipped with cameras 20-1 , 20-2, ... , 20-m for simultaneously imaging the target 50 from different viewpoints. Position synchronization of the moving or movable vehicles 10-1 , 10-2, ... , 10-m is provided to create a stable image base. Pointing synchronization of the cameras 20-1 , 20-2, ... , 20-m is provided to cover the same object(s) and/or dynamic event(s), and time synchronization of the video frames of the video streams is enabled to obtain, for at least one point in time, a set of simultaneously registered video frames. The system 100 is also configured to generate, for said at least one point in time, at least one three-dimensional, 3D, model of the target based on the corresponding set of simultaneously registered video frames.
In a particular example, the system 100 is configured to enable time synchronization of the video frames of the video streams to obtain, for each of a plurality of points in time, a set of simultaneously registered video frames. The system 100 may be configured to generate, for each one of a selected set of the plurality of points in time, at least one 3D model of the target based on the corresponding set of simultaneously registered video frames
Optionally, the system 100 is further configured to combine at least a subset of the generated 3D models to generate a four-dimensional, 4D, model of the target to enable a user to navigate through the 4D model in the three spatial dimensions and in the time dimension.
For example, the system may be configured to download or downlink image data based on the video streams to a ground segment 30 from at least one of the moving or movable vehicles and/or to operate a control uplink for controlling the moving or movable vehicles 10-1 , 10-2, .... 10-m and/or the cameras 20-1 , 20-2, ... , 20-m from the ground segment 30.
In a particular example, the system is configured to separate background static objects from dynamic moving objects, and generate a coherent 3D model over the target including only static terrain and static objects, and create a database of dynamic moving objects in which solid moving objects and non-solid moving objects are classified and corresponding 3D models of the dynamic solid moving objects and 3D models of dynamic non-solid moving objects. The system may then be configured to generate the 4D model based on the 3D model of static terrain and static objects and the 3D models of the dynamic solid moving objects and 3D models of dynamic non- solid moving objects.
By way of example, the system 100 (see FIG. 6) comprises processing circuitry 110 and memory 120, the memory 120 comprising instructions, which when executed by the processing circuitry 110, cause the processing circuitry 1 10 to generate the video-
based models of dynamic events, or electronic circuitry configured generate the video- based models of dynamic events.
According to a third aspect, there is provided a dynamic sensor system 25. The dynamic sensor system 25 comprises a set of synchronously operative moving or movable vehicles 10-1 , 10-2, ... , 10-m equipped with cameras 20-1 , 20-2. 20-m to enable video streams for simultaneously imaging a target from different viewpoints. The dynamic sensor system 25 is configured to operate based on position synchronization of the moving or movable vehicles 10-1 , 10-2, ... , 10-m to create a stable image base, and the dynamic sensor system 25 is configured to operate based on pointing synchronization of the cameras 20-1 , 20-2, ... , 20-m to cover the same object(s) and/or dynamic event(s). The dynamic sensor system 25 is also configured to enable time synchronization of the video frames of the video streams to allow, for at least one point in time, generation and/or extraction of a set of simultaneously registered video frames. The dynamic sensor system 25 is further configured to downlink image data of the video streams to a ground segment 30 from at least one of the moving or movable vehicles 10-1 , 10-2, ..., 10-m to enable generation of one or more three-dimensiona!/four-dimensional, 3D/4D, models of the target based on the simultaneously registered video frames at the ground segment 30.
By way of example, the dynamic sensor system 25 enables the formation of an aerial segment of synchronously operative drones 10-1 , 10-2, .... 10-m having on-board cameras 20-1 , 20-2, ... , 20-m to allow for the synchronized video streams for simultaneously imaging the target from different viewpoints.
According to a fourth aspect, there is provided a ground-based system 30 configured to generate one or more video-based models of a target. The system 30 is configured to receive video streams from at least two moving or movable vehicles 10-1 , 10-2, ..., 10-m equipped with cameras 20-1 , 20-2, ... , 20-m for simultaneously imaging the target 50 from different viewpoints. The moving or movable vehicles 10-1 , 10-2, ... , 10- m being operated based on position synchronization for a stable image base, and the
cameras 20-1 , 20-2, ... , 20-m being operated based on pointing synchronization for covering the same object(s) and/or dynamic event(s), and time synchronization of the video frames of the video streams being enabled to allow, for at least one point in time, generation and/or extraction of a set of simultaneously registered video frames. The system 30 is also configured to generate, for said at least one point in time, at least one three-dimensional, 3D, model of the target based on the corresponding set of simultaneously registered video frames.
In a particular example, the system 30 is configured to enable time synchronization of the video frames of the video streams to obtain, for each of a plurality of points in time, a set of simultaneously registered video frames. The system 30 may then be configured to generate, for each one of a selected set of the plurality of points in time, at least one 3D model of the target based on the corresponding set of simultaneously registered video frames.
Further, the system 30 may optionally be configured to combine at least a subset of the generated 3D models to generate a four-dimensional, 4D, model of the target to enable a user to navigate through the 4D model in the three spatial dimensions and in the time dimension.
For example, the system 30 is configured to operate a control uplink for controlling the moving or movable vehicles 10-1 , 10-2, ... , 10-m and/or the cameras 20-1 , 20-2, .... 20-m.
By way of example, the system 30 may be configured to determine the position synchronization and pointing synchronization and send command on the control uplink to adapt the position synchronization and pointing synchronization of the moving or movable vehicles 10 and/or the cameras 20.
By way of example, the system 30 (see FIG. 6) comprises processing circuitry 1 10 and memory 120, the memory 120 comprising instructions, which when executed by the
processing circuitry 1 10, cause the processing circuitry 1 10 to generate the video- based models of dynamic events, or electronic circuitry configured generate the video- based models of dynamic events.
In the following, the proposed technology will be described with reference to various non-limiting examples for illustrative purposes.
For simplicity, the innovation will now mainly be described in the context of two drones with video cameras. However, the innovation works equally well with more simultaneous drones and some quality aspects would be even better with more drones.
In particular embodiments, the proposed technology is based on three types of synchronization:
• Position synchronization of the drones to create a precise, stable and known image base;
• Pointing synchronization of the cameras in order to cover the same object;
• Time synchronization of the frames in the video stream to get simultaneous images of the same object.
For many user applications, there is another important part of the innovation, namely the real-time aspect:
• Real-time downlink of the video stream
® Real-time processing
• Real-time visualization of the 4D model
For a better understanding of the present invention it may be useful to begin with a brief system overview. Although the drones are aerial in this particular set of examples, it should be understood that surface-based vehicles or even underwater vehicles may be employed to capture the video sequences.
Example of overall system
In this example, the system comprises (see FIG. 1 and/or FIG. 2):
• Aerial Segment
o A swarm (two or more) of drones controlled from the Ground Segment; o At least one of the drones can download the image data from the swarm to the Ground Segment.
* Sensor/video/camera system
o Normally a frame camera on-board each of the drones;
o Frame rate of the camera can depend on the application.
# Ground Segment
o Normally located in the front line and keeps the overall control of the system, including one or more of the following:
o Drone flight control,
o Reception of data from the swarm,
o Processing of the data,
o Visualization and decision-making tool for the front-line officer,
o Potential external communication, sharing of data with back-office, etc.
* Target
o Any kind of target such as a dynamic event where real-time 3D/4D models would facilitate decision-making, e.g. fires, accidents, flooding or other security or military operations. Other target scenarios may include sports events, movie productions and the like.
Examples of position synchronization of the drones
The image base (See FIG. 2 and FIG. 3), i.e. the distance between the drones, is optimized for accurate automatic 3D point reconstruction and is dependent on the height above the terrain and intrinsic parameters of the cameras.
With particular reference to FIG. 2, the synchronous drone flight 1 (including the sensor pointing 2) may be controlled from the ground segment by a front-line officer, mainly using pre-set parameters for the flight. As an alternative, the flight may be controlled autonomously.
By way of example, the overlap area between synchronized video frames may be determined as an integrated step in the generation of 4D models. This overlap may in a particular example be used to determine how the drones need to be reconfigured to maintain an optimal positioning and pointing synchronization. The commands required to update the drones positioning and pointing synchronization may in this example be computed and sent to the drones autonomously by the system.
As mentioned, the positioning and/or pointing synchronization correspond to relative orientation, which if desired can be computed continuously in real-time based on the content of the video streams and used to control the drones and/or cameras such that they obtain and/or maintain desired positioning and/or pointing synchronization.
For example, position synchronization may be realized by a radio link between the drones (direct or through a ground radio station), in which the positions of the drones are communicated so that a correct base-to-height ratio can be maintained.
As another example, position synchronization may be realized by a flight plan, which is created before the mission, in which distance between the drones is set to a constant value based on flying height, fie!d-of-view and other parameters.
Examples of pointing synchronization of the cameras
In order to make sure that target is covered in an optimal way for photogrammetric processing of the video data, the system may be using automatic or manual pointing of the cameras.
As an example, to maximize the overlap area covered by both (or all) drones, the cameras are convergently pointing at the same object, which is imaged in the center of each image.
As another example, relaxing the requirements on pointing synchronization, the cameras may be pointing strictly downwards in nadir. The overlap area is then determined by the base-to-height ratio and the intrinsic parameters of the camera, such as the field-of-view.
As an example, pointing synchronization may be realized by a radio link between the drones (possibly through a ground radio station), in which the positions of the drones are communicated so that a correct target object to center at can be calculated.
Examples of time synchronization of the frames
In order to make it possible to create 4D models of moving targets, the sensor/camera in each drone needs to take a video frame at the same moment as the cooperative drone (synchronous video takes 3), otherwise the target will move or change shape between the frame takes.
The precision of the timing, as well as the video frame rate, is dependent on the requirements from each specific application. In addition, each frame may be associated with a precise time tagging.
As an example, time synchronization may be realized by matching an image of a moving object from one drone at time t, with a sequence of images at times [hn..h+h] from a second drone to find the optimal co-occurrence of the moving object at time tf, -n<=j<=+n. The difference in time, AMj- is equivalent with a difference in frame indexes in the video sequences, thus enabling time synchronization to the accuracy of the video frame rate, e.g. 1/30 seconds. This time calibration can be iterated during flight to avoid time drifts due to e.g. differing internal camera clocks.
As another example, time synchronization may be realized by using a single electronic shutter, connected to both (all) drones before the mission, thus starting the cameras at the exact same time.
As another example, a distinctly coloured light is lit while both (all) drones are imaging. The index of the first frame in which the light is visible is automatically detected in each video. The difference in frame indexes define the time shift between the videos.
Examples of communication link(s)
A communication link 4 may be established between the aerial segment and the ground segment, including at least a real-time video downlink, and optionally a control link (uplink) for controlling the aerial segment, e.g. the drones and/or the sensors.
Examples of processing for 3D models
Examples of features for processing 5 for Real-time 3D models;
• The interior orientations (i.e. the physical characteristics of the cameras) may be determined before or during mission, using targets or natural objects.
• Each set/pair of images from different videos/cameras, but the same time t, may be oriented relative each other through image matching and sensor orientation.
• Stereo images, ready to be displayed stereoscopically in VR headsets or with 3D glasses, may be generated, preferably at each of a plurality of points in time (so that a video may be viewed stereoscopically). ø Dense image matching may be performed on the stereo images to generate depth maps, i.e. images where the pixel value is the relative distance between camera and imaged object.
• GPS (Global Positioning System) and IMU (Inertial Measurement Unit) data may
be used to calculate object coordinates (C,U,Z) from the depth maps and to calculate exterior orientation (i.e. the position and orientations of the cameras) -> point clouds and digital surface models (DSM), thus enabling the production of georeferenced 3D models or point clouds from dense matching.
• Orientation data and depth maps from a sequence of already calculated frames may be used to predict the data at the currently processed frame to speed up calculations.
• Depth maps and/or DSM:s may be saved to enable replaying the real-time scenario and to provide input for the Near-Real-time process.
Examples of processing for 4D models
Examples of features for processing 5 for Near-Real-time 4D models:
• For all synchronized video streams (two or more) or for a certain time span, matching may be made between frames within each video stream.
• Together with the depth maps between video streams from the Real-time solution, we get observations for a block of images.
• The block of images may be adjusted to estimate globally optimal orientation data. o This may initially be done without GPS and IMU data, in which case absolute orientation of the block is done after the adjustment,
o Or GPS and IMU observations can be included in the adjustment.
• Dense image matching may be performed using the adjusted sensor parameters and existing depth maps from Real-time processing.
• Areas with moving objects can be identified using the depth maps, optical flow and e.g. affine and projective factorization.
A coherent 3D model over the entire area and including oniy static terrain and objects (which do not move) may be created.
• A database with moving objects may be created in which (i) solid and (ii) deformable moving objects are classified. The database may include time-dependent 3D models.
• Together the static and moving object 3D-models define a 4D time-dependent model of the entire images area.
Examples of visualization
The proposed technology may optionally also include a visualization and/or decisionmaking/support tool 6 based on the generated 3D/4D models of the static and/or dynamic events, as indicated herein.
FIG. 4 is a schematic diagram illustrating an example of how momentary 3D models can be generated, e.g. from time-synchronized video frame pairs or more generally n- tuples for visualization and exploration.
FIG. 5 illustrate an example of how the momentary 3D models together can be used to build real-time and near real-time 4D models for visualization and exploration.
In a particular non-limiting example, for simplicity illustrated with reference to two sensors in FIG. 4, real-time processing may give the user an immediate, but somewhat limited, 4D view of the target:
1. Sensors 1 and 2 take a stream of video frames, creating an image pair at each moment.
2. From each image pair, the system processes a momentary 3D model in real- time.
3. Each 3D model is exported as a raster (in“image space”) to the visualization module. Together they compose a 4D mode! of the target.
For simplicity, the graphics show only two flying sensors, but the system could handle more sensors, which would improve quality of the real-time 4D model.
For example, real-time visualization may be a 4D model which gives the user an immediate and improved impression of the target. This view is much better than just watching the real-time video from one drone, since the viewpoint and the zooming can be adapted to the user’s preference.
In addition, manual and automatic measurements such as height and volume can be made in the 4D models.
In a particular non-limiting example, for simplicity illustrated with reference to two sensors in FIG. 5, near real-time processing may give the user a somewhat delayed (by seconds), but full-scale 4D view of the target, which also can be studied in the time domain (go backwards to see what happened earlier):
1. Creating momentary 3D models from Sensors 1 and 2 is similar, but parallel, to the Real-time processing.
2. The system separates the background static objects from the dynamic moving objects.
3. The system separates solid from non-solid moving objects.
4. The 3D models of static objects and the dynamic solid objects are exported as TINs (in“object space”) to the exploration module A TIN (Triangulated Irregular Network) is a representation of a continuous surface consisting entirely of triangular facets.
5. The 3D models of dynamic non-solid objects are exported as a point cloud (in “object space”) to the exploration module.
6. Together all the 3D models compose a 4D model of the target.
Digital TIN data structures may be used in a variety of applications, including geographic information systems (GIS), and computer aided drafting (CAD) for the visual representation of a topographical surface. A TIN is normally a vector-based representation of the physical land surface or sea bottom, made up of irregularly distributed nodes and lines with three-dimensional coordinates (x, y, and z) that are arranged in a network of non-overlapping triangles.
For simplicity, the graphics show only two flying sensors, but the system could handle more sensors, which would improve quality of the real-time 4D model.
For example, near real-time exploration may involve a tool for deeper analysis. By way of example, it may store the generated 4D models and may visualize the models simultaneously, also in the time domain. The tool may also allow for more advanced analysis, as for example:
• Object tracking
• Changes of target volume
• Static models of moving objects
• Object identification.
At least some of the steps, functions, procedures, modules and/or blocks described herein may be implemented in software such as a computer program for execution by suitable processing circuitry such as one or more processors or processing units. Examples of processing circuitry includes, but is not limited to, one or more microprocessors, one or more Graphics Processing Unit (GPU), one or more Digital
Signal Processors (DSPs), one or more Central Processing Units (CPUs), video acceleration hardware, and/or any suitable programmable logic circuitry such as one or more Field Programmable Gate Arrays (FPGAs), or one or more Programmable Logic Controllers (PLCs).
Alternatively, or as a complement, at least some of the steps, functions, procedures, modules and/or blocks described herein may be implemented in hardware using any conventional technology, such as discrete circuit or integrated circuit technology, including both general-purpose electronic circuitry and application-specific circuitry.
It should also be understood that it may be possible to re-use the general processing capabilities of any conventional device or unit in which the proposed technology is implemented. It may also be possible to re-use existing software, e.g. by reprogramming of the existing software or by adding new software components.
The overall functionality may also be partitioned between programmed software, SW, for execution on one or more processors, and one or more pre-configured or possibly reconfigurable hardware circuits such as ASICs and/or FPGAs. The actual hardware- software partitioning can be decided by a system designer based on a number of factors including processing speed, cost of implementation and other requirements.
FIG. 6 is a schematic diagram illustrating an example of a computer-implementation according to an embodiment. In this particular example, at least some of the steps, functions, procedures, modules and/or blocks described herein are implemented in a computer program, which is loaded into the memory for execution by processing circuitry 110 including one or more processors. The processor(s) 110 and memory 120 are interconnected to each other to enable normal software execution. An optional input/output device 130 may also be interconnected to the processor(s) 110 and/or the memory 120 to enable input and/or output of relevant data such as input parameter(s) and/or resulting output parameter(s).
In a particular embodiment, processing and visualization may share memory resources through GPU interoperability.
The term‘processor’ should be interpreted in a general sense as any system or device capable of executing program code or computer program instructions to perform a particular processing, determining or computing task.
The processing circuitry including one or more processors is thus configured to perform, when executing the computer program, well-defined processing tasks such as those described herein.
The processing circuitry does not have to be dedicated to only execute the above- described steps, functions, procedure and/or blocks, but may also execute other tasks.
In a particular embodiment, the computer program 125; 145 comprises instructions, which when executed by at least one processor 110, cause the processor(s) 110 to perform at least part of the steps and/or actions or tasks described herein.
In a particular example, there is provided a computer program 125; 145 for generating, when executed by a processor 110, one or more video-based models of a target. The computer program 125; 145 comprises instructions, which when executed by the processor 110, cause the processor 110 to:
receive video streams from at least two moving or movable vehicles equipped with cameras for simultaneously imaging the target from different viewpoints,
* the moving or movable vehicles being operated based on position synchronization for a stable image base,
¨ the cameras being operated based on pointing synchronization for covering the same object(s) and/or dynamic event(s), and
• time synchronization of the video frames of the video streams being provided to obtain, for at least one point in time, a set of simultaneously registered video frames; and
generate, for said at least one point in time, at least one three-dimensional, 3D, model or four-dimensional, 4D, model of the target based on the corresponding set of simultaneously registered video frames.
Optionally, with several 3D models being generated over a plurality of points in time, it is possible to combine at least a subset of the generated 3D models over the plurality of points in time to generate a four-dimensional, 4D, model of the target to enable a user to navigate through the 4D model in the three spatial dimensions and in the time dimension.
By way of example, the software or computer program 125; 145 may be realized as a computer program product, which is normally carried or stored on a computer-readable medium 120; 140, in particular a non-volatile medium. The computer program 125; 145 may thus be loaded into the operating memory 120 of a computer or equivalent processing device for execution by the processing circuitry 110 thereof.
The flow diagram or diagrams presented herein may be regarded as a computer flow diagram or diagrams, when performed by one or more processors. A corresponding apparatus may be defined as a group of function modules, where each step performed by the processor corresponds to a function module. In this case, the function modules are implemented as a computer program running on the processor.
The computer program residing in memory may thus be organized as appropriate function modules configured to perform, when executed by the processor, at least part of the steps and/or tasks described herein.
The embodiments described above are to be understood as a few illustrative examples of the present invention. It will be understood by those skilled in the art that various modifications, combinations and changes may be made to the embodiments without departing from the scope of the present invention. In particular, different part solutions
in the different embodiments can be combined in other configurations, where technically possible.
Claims
1. A method for generating one or more video-based models of a target, wherein the method comprises:
providing (S1 ) video streams from at least two moving or movable vehicles (10) equipped with cameras (20) for simultaneously imaging the target (50) from different viewpoints,
* wherein position synchronization of the moving or movable vehicles (10) is provided to create a stable image base, which represents the distance between the moving or movable vehicles,
* wherein pointing synchronization of the cameras (20) is provided to cover the same object(s) and/or dynamic event(s), and
* wherein time synchronization of the video frames of the video streams is provided to obtain, for at least one point in time, a set of simultaneously registered video frames; and
generating (S2), for said at least one point in time, at least one three- dimensional, 3D, model of the target based on the corresponding set of simultaneously registered video frames.
2. The method of claim 1 , wherein time synchronization of the video frames of the video streams is provided to obtain, for each of a plurality of points in time, a set of simultaneously registered video frames, and
wherein at least one 3D model of the target is generated, for each one of a selected set of the plurality of points in time, based on the corresponding set of simultaneously registered video frames.
3. The method of claim 2, further comprising combining (S3) at least a subset of the generated 3D models to generate a four-dimensional, 4D, model of the target to enable a user to navigate through the 4D model in the three spatial dimensions and in the time dimension.
4. The method of any of the claims 1 to 3, wherein the moving or movable vehicles (10) represent any controllably moving or movable vehicles from which video sequences of events can be taken and used for 3D/4D modelling.
5. The method of any of the claims 1 to 4, wherein image data based on the video streams are downlinked to a ground segment (30) from at least one of the moving or movable vehicles (10) and/or a control uplink is operated for controlling the moving or movable vehicles (10) and/or the cameras (20) from the ground segment (30).
6. The method of any of the claims 1 to 5, wherein the moving or movable vehicles (10) include aerial vehicles and/or surface-based vehicles, including manned or uncrewed vehicles.
7. The method of any of the claims 1 to 6, wherein the moving or movable vehicles (10) include drones or unmanned aerial vehicles (UAVs), each having an on-board camera (20).
8. The method of claim 7, wherein synchronous drone flight including camera pointing is controlled from a ground segment (30) by a front-line officer, using pre-set parameters for the flight, or the flight is controlled autonomously.
9. The method of claim 7 or 8, wherein the position synchronization and/or the pointing synchronization is realized by a radio link between the drones (10), direct or through a ground radio station, in which the positions of the drones are communicated so that a correct base-to-height ratio can be maintained, or the position synchronization is realized by a flight plan, which is created before the mission.
10. The method of any of the claims 1 to 9, wherein automatic or manual pointing of the cameras (20) is performed to enable the target to be covered in an optimal way for photogrammetric processing of the video data.
1 1. The method of any of the claims 1 to 10, wherein the cameras (20) are convergently pointing at the same object, which is imaged in the center of each image, or the cameras (20) are pointing strictly downwards in nadir for relaxing the requirements on pointing synchronization.
12. The method of any of the claims 1 to 1 1 , wherein the camera (20) in each moving or movable vehicle (10) takes a video frame at the same moment as the camera (20) of a cooperative vehicle (10) to provide for synchronous video takes.
13. The method of any of the claims 1 to 12, wherein the time synchronization is realized by matching an image of a moving object from a first vehicle at time ti with a sequence of images at times [ti-n... ti+n] from a second vehicle to find the optimal cooccurrence of the moving object at time ¾; -n < j < +n.
14. The method of claim 13, wherein a difference in time, At=tj - ti, is equivalent with a difference in frame indexes in the video sequences, thus enabling time synchronization to the accuracy of the video frame rate.
15. The method of any of the claims 1 to 12, wherein the time synchronization is based on detecting the index of the first frame in which a distinctly colored light, lit while cameras of the moving or movable vehicles are imaging, is visible in each video, and defining the time shift between the videos as the difference in frame indexes.
16. The method of any of the claims 1 to 12, wherein the time synchronization is realized by starting the cameras (20) at the exact same time.
17. The method of any of the claims 1 to 16, wherein stereo images, ready to be displayed stereoscopically in Virtual Reality headset or with 3D glasses, are generated based on the registered video frames.
18. The method of claim 17, wherein dense image matching is performed on the stereo images to generate depth maps, where each pixel value represents relative distance between camera and imaged object.
19. The method of claim 18, wherein areas with moving objects are identified using the depth maps.
20. The method of any of the claims 1 to 19, wherein background static objects are separated from dynamic moving objects, and a coherent 3D model over the target including only static terrain and static objects is generated, and a database of dynamic moving objects is created in which solid moving objects and non-solid moving objects are classified and corresponding 3D models of the dynamic solid moving objects and 3D models of dynamic non-solid moving objects are provided, wherein the 4D model is based on the 3D model of static terrain and static objects and the 3D models of the dynamic solid moving objects and 3D models of dynamic non-solid moving objects.
21. The method of any of the claims 1 to 20, wherein visualization and/or decision support is provided based on the generated 3D/4D models.
22. The method of any of the claims 1 to 21 , wherein the viewpoint and zooming within the 4D model are adapted to user preferences.
23. The method of any of the claims 1 to 22, wherein the target represents one or more dynamic events and includes one or more moving objects.
24. A system (100) configured to generate one or more video-based models of a target,
wherein the system (100) is configured to provide video streams from at least two moving or movable vehicles (10) equipped with cameras (20) for simultaneously imaging the target (50) from different viewpoints,
» wherein position synchronization of the moving or movable vehicles (10) is provided to create a stable image base,
• wherein pointing synchronization of the cameras (20) is provided to cover the same object(s) and/or dynamic event(s), and
* wherein time synchronization of the video frames of the video streams is enabled to obtain, for at least one point in time, a set of simultaneously registered video frames; and
wherein the system (100) is configured to generate, for said at least one point in time, at least one three-dimensional, 3D, model of the target based on the corresponding set of simultaneously registered video frames.
25. The system of claim 24, wherein the system (100) is configured to enable time synchronization of the video frames of the video streams to obtain, for each of a plurality of points in time, a set of simultaneously registered video frames, and
wherein the system (100) is configured to generate, for each one of a selected set of the plurality of points in time, at least one 3D model of the target based on the corresponding set of simultaneously registered video frames.
26. The system of claim 25, wherein the system (100) is configured to combine at least a subset of the generated 3D models to generate a four-dimensional, 4D, model of the target to enable a user to navigate through the 4D model in the three spatial dimensions and in the time dimension.
27. The system of any of the claims 24 to 26, wherein the system (100) is configured to download or downlink image data based on the video streams to a ground segment (30) from at least one of the moving or movable vehicles (10) and/or to operate a control uplink for controlling the moving or movable vehicles (10) and/or the cameras (20) from the ground segment (30).
28. The system of any of the claims 24 to 27, wherein the system (100) is configured to separate background static objects from dynamic moving objects, and generate a
coherent 3D mode! over the target including only static terrain and static objects, and create a database of dynamic moving objects in which solid moving objects and non- solid moving objects are classified and corresponding 3D models of the dynamic solid moving objects and 3D models of dynamic non-solid moving objects,
wherein the system (100) is configured to generate the 4D model based on the 3D model of static terrain and static objects and the 3D models of the dynamic solid moving objects and 3D models of dynamic non-solid moving objects
29. The system of any of the claims 24 to 28, wherein the system (100) comprises processing circuitry (1 10) and memory (120), the memory (120) comprising instructions, which when executed by the processing circuitry (1 10), cause the processing circuitry (110) to generate the video-based models of dynamic events, or electronic circuitry configured generate the video-based models of dynamic events.
30. A dynamic sensor system (25), wherein the dynamic sensor system (25) comprises a set of synchronously operative moving or movable vehicles (10) equipped with cameras (20) to enable video streams for simultaneously imaging a target (50) from different viewpoints,
wherein the dynamic sensor system (25) is configured to operate based on position synchronization of the moving or movable vehicles (10) to create a stable image base, and
wherein the dynamic sensor system (25) is configured to operate based on pointing synchronization of the cameras (20) to cover the same object(s) and/or dynamic event(s),
wherein the dynamic sensor system (25) is configured to enable time synchronization of the video frames of the video streams to allow, for at least one point in time, generation and/or extraction of a set of simultaneously registered video frames; and
wherein the dynamic sensor system (25) is configured to downlink image data of the video streams to a ground segment (30) from at least one of the moving or movable vehicles (10) to enable generation of one or more three-dimensional/four-
dimensional, 3D/4D, models of the target based on the simultaneously registered video frames at the ground segment (30).
31. The dynamic sensor system of claim 30, wherein the dynamic sensor system (25) enables the formation of an aerial segment of synchronously operative drones (10) having on-board cameras (20) to allow for simultaneously imaging the target from different viewpoints.
32. A ground-based system (30) configured to generate one or more video-based models of a target,
wherein the system (30) is configured to receive video streams from at least two moving or movable vehicles (10) equipped with cameras (20) for simultaneously imaging the target (50) from different viewpoints,
* the moving or movable vehicles (10) being operated based on position synchronization for a stable image base,
* the cameras (20) being operated based on pointing synchronization for covering the same object(s) and/or dynamic event(s), and
* time synchronization of the video frames of the video streams being enabled to allow, for at least one point in time, generation and/or extraction of a set of simultaneously registered video frames; and
wherein the system (30) is configured to generate, for said at least one point in time, at least one three-dimensional, 3D, model of the target based on the corresponding set of simultaneously registered video frames.
33. The ground-based system of claim 32, wherein the system (30) is configured to enable time synchronization of the video frames of the video streams to obtain, for each of a plurality of points in time, a set of simultaneously registered video frames, and
wherein the system (30) is configured to generate, for each one of a selected set of the plurality of points in time, at least one 3D model of the target based on the corresponding set of simultaneously registered video frames.
34. The ground-based system of claim 33, wherein the system (30) is configured to combine at least a subset of the generated 3D models to generate a four-dimensional, 4D, model of the target to enable a user to navigate through the 4D model in the three spatial dimensions and in the time dimension.
35. The ground-based system of any of the claims 32 to 34, wherein the system (30) is configured to operate a control uplink for controlling the moving or movable vehicles (10) and/or the cameras (20).
36. The ground-based system of claim 35, wherein the system (30) is configured to determine the position synchronization and pointing synchronization and send commands on the control uplink to adapt the position synchronization and pointing synchronization of the moving or movable vehicles (10) and/or the cameras (20).
37. The ground-based system of any of the claims 32 to 36, wherein the system (30) comprises processing circuitry (110) and memory (120), the memory (120) comprising instructions, which when executed by the processing circuitry (110), cause the processing circuitry (110) to generate the video-based models of dynamic events, or electronic circuitry configured generate the video-based models of dynamic events.
38. The system of any of the claims 24 to 37, wherein the target represents one or more dynamic events and includes one or more moving objects.
39. A computer program (125; 145) for generating, when executed by a processor (110), one or more video-based models of a target, wherein the computer program (125; 145) comprises instructions, which when executed by the processor (110), cause the processor (110) to:
receive video streams from at least two moving or movable vehicles equipped with cameras for simultaneously imaging the target from different viewpoints,
:» the moving or movable vehicles being operated based on position synchronization for a stable image base,
* the cameras being operated based on pointing synchronization for covering the same object(s) and/or dynamic event(s), and
* time synchronization of the video frames of the video streams being provided to obtain, for at least one point in time, a set of simultaneously registered video frames; and
generate, for said at least one point in time, at least one three-dimensional, 3D, model or four-dimensional, 4D, model of the target based on the corresponding set of simultaneously registered video frames.
40. A computer-program product comprising a non-transitory computer-readable medium (120; 140) carrying a computer program (125; 145) of claim 39.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP19850915.0A EP3841744B1 (en) | 2018-08-22 | 2019-07-22 | A method and corresponding system for generating video-based models of a target such as a dynamic event |
US17/268,364 US11483540B2 (en) | 2018-08-22 | 2019-07-22 | Method and corresponding system for generating video-based 3-D models of a target such as a dynamic event |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201862720982P | 2018-08-22 | 2018-08-22 | |
US62/720,982 | 2018-08-22 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020040679A1 true WO2020040679A1 (en) | 2020-02-27 |
Family
ID=69591203
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/SE2019/050707 WO2020040679A1 (en) | 2018-08-22 | 2019-07-22 | A method and corresponding system for generating video-based models of a target such as a dynamic event |
Country Status (3)
Country | Link |
---|---|
US (1) | US11483540B2 (en) |
EP (1) | EP3841744B1 (en) |
WO (1) | WO2020040679A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112383830A (en) * | 2020-11-06 | 2021-02-19 | 北京小米移动软件有限公司 | Video cover determining method and device and storage medium |
GB2595246A (en) * | 2020-05-19 | 2021-11-24 | Airbus Defence & Space Ltd | Photogrammetry |
WO2022096576A1 (en) * | 2020-11-05 | 2022-05-12 | Sony Group Corporation | Method, computer program, and apparatus for determining a relative position of a first aerial vehicle and at least one second aerial vehicle to each other |
JP2023508414A (en) * | 2020-06-30 | 2023-03-02 | ソニーグループ株式会社 | Multi-Drone Visual Content Ingestion System |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US12118890B2 (en) * | 2019-08-14 | 2024-10-15 | United States Of America As Represented By The Secretary Of The Air Force | Stereo vision relative navigation of airborne vehicles |
CN111523409B (en) * | 2020-04-09 | 2023-08-29 | 北京百度网讯科技有限公司 | Method and device for generating position information |
US12062145B2 (en) * | 2022-02-01 | 2024-08-13 | Samsung Electronics Co., Ltd. | System and method for three-dimensional scene reconstruction and understanding in extended reality (XR) applications |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070093945A1 (en) * | 2005-10-20 | 2007-04-26 | Grzywna Jason W | System and method for onboard vision processing |
EP2728308A2 (en) | 2012-10-31 | 2014-05-07 | Kabushiki Kaisha Topcon | Aerial photogrammetry and aerial photogrammetric system |
US20160327950A1 (en) * | 2014-06-19 | 2016-11-10 | Skydio, Inc. | Virtual camera interface and other user interaction paradigms for a flying digital assistant |
CN106454209A (en) * | 2015-08-06 | 2017-02-22 | 航天图景(北京)科技有限公司 | Unmanned aerial vehicle emergency quick action data link system and unmanned aerial vehicle emergency quick action monitoring method based on spatial-temporal information fusion technology |
US20170053169A1 (en) * | 2015-08-20 | 2017-02-23 | Motionloft, Inc. | Object detection and analysis via unmanned aerial vehicle |
US20180091704A1 (en) * | 2015-06-25 | 2018-03-29 | Panasonic Intellectual Property Management Co., | Video synchronization apparatus, and video synchronization method |
US20180091797A1 (en) * | 2016-09-27 | 2018-03-29 | The Boeing Company | Apparatus and method of compensating for relative motion of at least two aircraft-mounted cameras |
CN108615243A (en) * | 2017-01-25 | 2018-10-02 | 北京三星通信技术研究有限公司 | The determination method, apparatus and system of three-dimensional multimedia messages |
US20190037207A1 (en) * | 2017-07-28 | 2019-01-31 | California Institute Of Technology | Collaborative stereo system for three-dimensional terrain and object reconstruction |
US20190250601A1 (en) * | 2018-02-13 | 2019-08-15 | Skydio, Inc. | Aircraft flight user interface |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20180067908A (en) * | 2016-12-13 | 2018-06-21 | 한국전자통신연구원 | Apparatus for restoring 3d-model and method for using the same |
WO2019129919A1 (en) * | 2017-12-28 | 2019-07-04 | Nokia Technologies Oy | An apparatus, a method and a computer program for volumetric video |
EP3777185A4 (en) * | 2018-04-09 | 2022-01-05 | Nokia Technologies Oy | An apparatus, a method and a computer program for volumetric video |
WO2019225682A1 (en) * | 2018-05-23 | 2019-11-28 | パナソニックIpマネジメント株式会社 | Three-dimensional reconstruction method and three-dimensional reconstruction device |
WO2019230813A1 (en) * | 2018-05-30 | 2019-12-05 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ | Three-dimensional reconstruction method and three-dimensional reconstruction device |
-
2019
- 2019-07-22 US US17/268,364 patent/US11483540B2/en active Active
- 2019-07-22 EP EP19850915.0A patent/EP3841744B1/en active Active
- 2019-07-22 WO PCT/SE2019/050707 patent/WO2020040679A1/en active Search and Examination
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070093945A1 (en) * | 2005-10-20 | 2007-04-26 | Grzywna Jason W | System and method for onboard vision processing |
EP2728308A2 (en) | 2012-10-31 | 2014-05-07 | Kabushiki Kaisha Topcon | Aerial photogrammetry and aerial photogrammetric system |
US20160327950A1 (en) * | 2014-06-19 | 2016-11-10 | Skydio, Inc. | Virtual camera interface and other user interaction paradigms for a flying digital assistant |
US20180091704A1 (en) * | 2015-06-25 | 2018-03-29 | Panasonic Intellectual Property Management Co., | Video synchronization apparatus, and video synchronization method |
CN106454209A (en) * | 2015-08-06 | 2017-02-22 | 航天图景(北京)科技有限公司 | Unmanned aerial vehicle emergency quick action data link system and unmanned aerial vehicle emergency quick action monitoring method based on spatial-temporal information fusion technology |
US20170053169A1 (en) * | 2015-08-20 | 2017-02-23 | Motionloft, Inc. | Object detection and analysis via unmanned aerial vehicle |
US20180091797A1 (en) * | 2016-09-27 | 2018-03-29 | The Boeing Company | Apparatus and method of compensating for relative motion of at least two aircraft-mounted cameras |
CN108615243A (en) * | 2017-01-25 | 2018-10-02 | 北京三星通信技术研究有限公司 | The determination method, apparatus and system of three-dimensional multimedia messages |
US20190037207A1 (en) * | 2017-07-28 | 2019-01-31 | California Institute Of Technology | Collaborative stereo system for three-dimensional terrain and object reconstruction |
US20190250601A1 (en) * | 2018-02-13 | 2019-08-15 | Skydio, Inc. | Aircraft flight user interface |
Non-Patent Citations (2)
Title |
---|
CRACIUN, DANIELA, NUMERISATION CONJOINTE IMAGE-LASER POUR LA MODELISATION 3D DES ENVIRONNEMENTS COMPLEXES OU HABITES, 7 July 2012 (2012-07-07) |
See also references of EP3841744A4 |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2595246A (en) * | 2020-05-19 | 2021-11-24 | Airbus Defence & Space Ltd | Photogrammetry |
WO2021234364A1 (en) * | 2020-05-19 | 2021-11-25 | Airbus Defence And Space Limited | Photogrammetry |
GB2595246B (en) * | 2020-05-19 | 2024-09-18 | Airbus Defence & Space Ltd | Photogrammetry |
JP2023508414A (en) * | 2020-06-30 | 2023-03-02 | ソニーグループ株式会社 | Multi-Drone Visual Content Ingestion System |
JP7366349B2 (en) | 2020-06-30 | 2023-10-23 | ソニーグループ株式会社 | Multi-drone visual content ingestion system |
WO2022096576A1 (en) * | 2020-11-05 | 2022-05-12 | Sony Group Corporation | Method, computer program, and apparatus for determining a relative position of a first aerial vehicle and at least one second aerial vehicle to each other |
CN112383830A (en) * | 2020-11-06 | 2021-02-19 | 北京小米移动软件有限公司 | Video cover determining method and device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
EP3841744B1 (en) | 2024-08-07 |
EP3841744A4 (en) | 2022-05-18 |
US11483540B2 (en) | 2022-10-25 |
EP3841744A1 (en) | 2021-06-30 |
US20210306614A1 (en) | 2021-09-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11483540B2 (en) | Method and corresponding system for generating video-based 3-D models of a target such as a dynamic event | |
CN110068335B (en) | Unmanned aerial vehicle cluster real-time positioning method and system under GPS rejection environment | |
JP7210165B2 (en) | Method, device and display device for displaying virtual route | |
EP3971674B1 (en) | Systems and methods for uav flight control | |
KR102007567B1 (en) | Stereo drone and method and system for calculating earth volume in non-control points using the same | |
JP4854819B2 (en) | Image information output method | |
AU2010219335B2 (en) | Systems and Methods of Capturing Large Area Images in Detail Including Cascaded Cameras and/or Calibration Features | |
US8774950B2 (en) | Apparatuses, systems, and methods for apparatus operation and remote sensing | |
US11906983B2 (en) | System and method for tracking targets | |
US20190011921A1 (en) | Systems and methods for uav interactive instructions and control | |
CN109154499A (en) | System and method for enhancing stereoscopic display | |
US20180184073A1 (en) | Systems and Methods For Recording Stereo Pairs From Independent Camera Platforms | |
CN107850436A (en) | Merged using the sensor of inertial sensor and imaging sensor | |
US20190049945A1 (en) | Unmanned aerial vehicle swarm photography | |
CN106856566A (en) | A kind of information synchronization method and system based on AR equipment | |
EP2927771B1 (en) | Flying drone trajectory synchronization | |
US20180184063A1 (en) | Systems and Methods For Assembling Time Lapse Movies From Consecutive Scene Sweeps | |
CA3108629A1 (en) | System and method of operation for remotely operated vehicles for simultaneous localization and mapping | |
US8624959B1 (en) | Stereo video movies | |
McConville et al. | Visual odometry using pixel processor arrays for unmanned aerial systems in gps denied environments | |
US20180176441A1 (en) | Methods and Apparatus For Synchronizing Multiple Lens Shutters Using GPS Pulse Per Second Signaling | |
Kung et al. | The fast flight trajectory verification algorithm for Drone Dance System | |
Florea et al. | Wilduav: Monocular uav dataset for depth estimation tasks | |
WO2013062557A1 (en) | Stereo video movies | |
US20180174270A1 (en) | Systems and Methods For Mapping Object Sizes and Positions Onto A Cylindrical Panorama Using A Pivoting Stereoscopic Camera |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19850915 Country of ref document: EP Kind code of ref document: A1 |
|
DPE2 | Request for preliminary examination filed before expiration of 19th month from priority date (pct application filed from 20040101) | ||
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2019850915 Country of ref document: EP Effective date: 20210322 |