WO2023247041A1 - Multi-camera image data processing - Google Patents
Multi-camera image data processing Download PDFInfo
- Publication number
- WO2023247041A1 WO2023247041A1 PCT/EP2022/067160 EP2022067160W WO2023247041A1 WO 2023247041 A1 WO2023247041 A1 WO 2023247041A1 EP 2022067160 W EP2022067160 W EP 2022067160W WO 2023247041 A1 WO2023247041 A1 WO 2023247041A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- camera
- tiles
- fov
- region
- location
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/18—Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
- H04N7/181—Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast for receiving images from a plurality of remote sources
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/80—Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/94—Hardware or software architectures specially adapted for image or video understanding
- G06V10/95—Hardware or software architectures specially adapted for image or video understanding structured as a network, e.g. client-server architectures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/61—Control of cameras or camera modules based on recognised objects
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/695—Control of camera direction for changing a field of view, e.g. pan, tilt or based on tracking of objects
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30232—Surveillance
Definitions
- Various example embodiments relate to a multi-camera system and to a method, a device and a computer implementable instructions for processing image data produced by the multi-camera system.
- a multi-camera video analytics involves object detection and tracking among multiple cameras.
- a camera of a multi-camera system detects an object or an event that prompts streaming. For example, if multiple cameras detect the same object, unique objects detected by multiple cameras are to be identified. If the detected object moves or the camera changes its direction, the detected object may be away from the current camera view. Then another camera, that is able to capture the object, should continue streaming. When another camera better captures the object, the feed source may be changed. For cameras having fixed and non-overlapping field of views, FOVs, it is sufficient to know locations of the cameras in order to decide which camera shall pick the stream after the object leaves the current FOV.
- an apparatus for a multi-camera system comprising at least a first camera and a second camera, wherein coverage of the first camera and coverage of the second camera is divided into a set of tiles, the apparatus comprising:
- -a module configured to detect an object in first region on the first camera’s field of view, FOV,
- -a module configured to determine a location of the object in terms of one or more tiles of the set of tiles in the first camera’s view
- -a module configured to request a mapping information identifying one or more tiles of the set of tiles in the second camera’s FOV that corresponds to the one or more tiles of the set of tiles in the first camera’s FOV;
- -a module configured to receive a response from a view mapping database that is configured to identify the one or more tiles of the set of the tiles in the second camera’s FOV that correspond to the one or more tiles of the set of tiles in the first camera’s FOV, where the response further identifies one or more additional tiles contiguous with the identified one or more tiles of the set of tiles in the second camera’s FOV;
- -a module configured to share the response from the first camera to the second camera such that the second camera is enabled to find the object in the one or more of the tiles identified in the response.
- a method for a multi-camera system wherein the multi-camera system comprises at least a first camera and a second camera, wherein coverage of the first camera and coverage of the second camera is divided into a set of tiles, the method comprising:
- mapping information identifying one or more tiles of the set of tiles in the second camera’s FOV that corresponds to the one or more tiles of the set of tiles in the first camera’s FOV;
- a view mapping database that is configured to identify the one or more tiles of the set of the tiles in the second camera’s FOV that correspond to the one or more tiles of the set of tiles in the first camera’s FOV, where the response further identifies one or more additional tiles contiguous with the identified one or more tiles of the set of tiles in the second camera’s FOV;
- a non-transitory computer readable medium comprising program instructions that, when executed by at least one processor, cause a multi-camera system to at least perform:
- mapping information identifying one or more tiles of the set of tiles in the second camera’s FOV that corresponds to the one or more tiles of the set of tiles in the first camera’s FOV;
- -to receive a response from a view mapping database that is configured to identify the one or more tiles of the set of the tiles in the second camera’s FOV that correspond to the one or more tiles of the set of tiles in the first camera’s FOV, where the response further identifies one or more additional tiles contiguous with the identified one or more tiles of the set of tiles in the second camera’s FOV;
- an apparatus for a multi-camera system wherein the multi-camera system comprises at least a first camera and a second camera, wherein coverage of the first camera and coverage of the second camera is divided into a set of tiles, the apparatus comprising:
- -means for receiving a response from a view mapping database that is configured to identify the one or more tiles of the set of the tiles in the second camera’s FOV that correspond to the one or more tiles of the set of tiles in the first camera’s FOV, where the response further identifies one or more additional tiles contiguous with the identified one or more tiles of the set of tiles in the second camera’s FOV;
- a population phase of the view mapping database comprising:
- the second region of the second camera’s FOV at least partially overlapping with the first region of the first camera’s FOV;
- Fig. la shows, by way of example, a flow chart of a method for a multi-camera system
- Fig. lb shows, by way of example, a flow chart of a method for a population phase of a view mapping database
- FIG. 2 shows, by way of example, a block diagram of an apparatus
- FIG. 3 shows, by way of example, a block diagram of a system
- FIG. 4a-b show, by way of example, an aerial view of two cameras
- Figs. 4c-d illustrate, by way of example, an aerial view of a camera
- FIG. 5a-d show, by way of example, a region of a camera view
- Fig. 6a-b show, by way of example, mapping a remote camera region to a local camera region
- Fig. 7a-c show, by way of example, mapping a local camera view.
- a view mapping database is presented for mapping regions of camera views in a multi-camera system.
- VMDB full coverage of at least some or each camera of the multi-camera system is divided into tiles. The tiles are used as locations of the detected objects. Locations of the common detected objects among camera FOVs are used to populate one or more VMDB instances.
- Mechanisms for spatial, temporal and/or pan- tilt-zoom (PTZ) calibration are utilized in order to address inaccuracies due to object depth, lack of synchronization and camera movements.
- PTZ pan- tilt-zoom
- a field of view, FOV corresponds to a region perceivable by a camera at a particular time instant.
- an omnidirectional camera has 360 degree coverage, while a FOV may be 120 degrees at a time.
- Image sensor of a camera is configured to perceive or capture incoming light and to convert incoming light into an electrical signal that can be viewed, analysed or stored.
- full coverage corresponds to the FOV.
- Adjustable camera configurations and overlapping FOVs may cause challenges to multi-camera object detection and tracking.
- an object is used, while it may refer to an object or an event.
- An object may be an object or an event of interest, an object or an event detected in a camera FOV, or an object or an event triggering streaming function.
- a region refers to a region of a camera’s field of view, for example to detected region, at which an object is detected.
- a location refers to determined location (of the object). The terms are linked and there is association between the object detection region and the location of the object in the camera FOV.
- a location of a detected object may be determined by a region comprising one or more tiles.
- Fig. la shows, by way of example, a flowchart of a method for a multi-camera system according to an embodiment.
- a method utilizes a view mapping database, VMDB, in a multi-camera system. For at least some or all cameras of the system entire coverage of a camera view is divided into tiles. The tiles are used to indicate locations of objects.
- the method of Fig. la comprises detecting an object in a first region on a first camera’s FOV 1001, and determining a location of the object in terms of one or more tiles of the set of tiles in the first camera’s FOV 1002.
- la further comprises requesting a mapping information identifying one or more tiles of the set of tiles in the second camera’ s FOV that correspond to the one or more tiles of the set of tiles in the first camera’s FOV 1003.
- receiving a response from a view mapping database, VMDB that is configured to identify the one or more tiles of the set of tiles in the second camera’s FOV that correspond to the one or more tiles of the set of tiles in the first camera’s FOV, where the response further identifies one or more additional tiles contiguous with the identified one or more tiles of the set of tiles in the second camera’s FOV 1004. and sharing the response from the first camera to the second camera such that the second camera is enabled to find the object in the one or more tiles identified in the response 1005.
- the VMDB includes data on locations of the common object among, or in the group of, cameras and/or camera coverages or FOVs of the multi-camera system.
- Retrieved data comprises tiles of the common object(s) of the one or more other cameras of the multi-camera system. For example, a neighbouring camera, which is best suited as a next source of on ongoing streaming, may be decided based on the retrieved data.
- FIG. lb by way of example, a flowchart of a method for a population phase of a VMDB.
- the method of Fig. lb comprises dividing coverage of each camera of the multicamera system into a set of tiles 101. The tiles are used to indicate locations of objects.
- the method of Fig. lb further comprises detecting a second object in a region on a second camera’s FOV 102; and determining a location of the second object in the second camera’s FOV in terms of one or more tiles of the set of tiles in the second camera’s FOV 103.
- the method continues by detecting the second object in the first region of the first camera’s FOV, wherein the second region fo the second camera’s FOV at least partially overlaps with the first region of the first camera’s FOV 104; and by determining a location of the second object in the first camera’s FOV in terms of one or more tiles of the set of tiles in the first camera’s FOV 105.
- Method includes spatially calibrating the second region in the second camera’s FOV to the first region in the first camera’s FOV by mapping the one or more tiles determined as the location of the second object in the second camera’s FOV to the one or more tiles determined as the location of the second object in the first camera’s FOV 106; and storing the mapping information to a storage location of the VMDB 107.
- a population phase may be implemented at first, before use of the VMDB, in order to fill-in data to the VMDB.
- the VMDB is populated with data on camera view locations corresponding to each other among cameras of a multi-camera system.
- a camera captures image data, like image frames or video sequences.
- An object of interest may be detected from the image data. Detecting the object of interest may be a trigger for a streaming function for a multi-camera system. Camera coverages are divided into tiles, and tiles of a camera may be superimposed on the corresponding camera coverage view. If the object is found to move towards an edge of the camera FOV, the device including or hosting the camera is configured to retrieve information on other cameras of the multi-camera system that cover the same object. Regions comprising the tiles, which include the detected object in question, may be used instead of the whole image. Information on regions of common objects among FOVs of different cameras is retrieved form a VMDB. This enables to recognize the camera that is best suited to continue the streaming.
- the stream is handed over to the next device.
- push origin or representational state transfer, REST, based systems may be utilized for handover.
- a device that is handing over the streaming may crop the image of the detected object.
- the handover request may include a cropped image of the detected object.
- the cropped image may enhance identification of the object, which may be implemented using a function or a service for re-identification, RelD.
- Retreived VMDB data enables to run RelD for a retrieved set of one or more tiles, instead of the whole image data of the cameras, whose FOV includes the common object.
- a RelD may be used to find the first appearance of an object.
- a stream is started in response to a request including an input image of the object.
- the devices of the multi-camera system may be configured to periodically run RelD on the captured image data from their cameras in order to detect and identify objects.
- the devices are configured to compute similarity between objects detected in the captured images and the input image of the object. When similarity is detected, the object of the input image is considered to be found from the image data, and streaming is started.
- streaming only object detection is run, instead of RelD, which is more expensive and/or less effective. Tracking the object is implemented using tiles of camera coverage or FOV for locating the object. For historical, past stream requests, the request shall include a timestamp, which may be used as a starting point to run an RelD on the images recorded in the devices. Once the object is found, the process may continue by object detection.
- Fig. 2 shows, by way of example, an apparatus capable of processing image data, for example object detection and re-identification.
- device 20 may comprise, for example, a mobile communication device, a cellular phone, a server computer, an edge hardware, an image capture device, a video capture device, such as a camera.
- processor 210 which may comprise, for example, a single- or multi-core processor wherein a single-core processor comprises one processing core and a multi-core processor comprises more than one processing core.
- Processor 210 may comprise, in general, a control device.
- Processor 210 may comprise more than one processor.
- Processor 210 may be a control device.
- a processing core may comprise, for example, a Cortex-A8 processing core manufactured by ARM Holdings or a Steamroller processing core designed by Advanced Micro Devices Corporation.
- Processor 210 may comprise at least one Qualcomm Snapdragon and/or Intel Atom processor.
- Processor 210 may comprise at least one application- specific integrated circuit, ASIC.
- Processor 210 may comprise at least one field-programmable gate array, FPGA.
- Processor 210 may be means for performing method steps in device 20.
- Processor 210 may be configured, at least in part by computer instructions, to perform actions.
- a processor may comprise circuitry, or be constituted as circuitry or circuitries, the circuitry or circuitries being configured to perform phases of methods in accordance with example embodiments described herein.
- circuitry may refer to one or more or all of the following: (a) hardware-only circuit implementations, such as implementations in only analog and/or digital circuitry, and (b) combinations of hardware circuits and software, such as, as applicable: (i) a combination of analog and/or digital hardware circuit(s) with software/firmware and (ii) any portions of hardware processor(s) with software (including digital signal processor(s)), software, and memory(ies) that work together to cause an apparatus, such as a camera, an edge device or a server, to perform various functions) and (c) hardware circuit(s) and or processor(s), such as a microprocessor(s) or a portion of a microprocessor(s), that requires software (e.g., firmware) for operation, but the software may not be present when it is
- circuitry also covers an implementation of merely a hardware circuit or processor (or multiple processors) or portion of a hardware circuit or processor and its (or their) accompanying software and/or firmware.
- circuitry also covers, for example and if applicable to the particular claim element, a baseband integrated circuit or processor integrated circuit for a mobile device or a similar integrated circuit in server, a cellular network device, or other computing or network device.
- Device 20 comprises memory 220.
- Memory 220 may comprise random-access memory and/or permanent memory.
- Memory 220 may comprise at least one RAM chip.
- Memory 220 may comprise solid-state, magnetic, optical and/or holographic memory, for example.
- Memory 220 may be at least in part accessible to processor 210.
- Memory 220 may be at least in part comprised in processor 210.
- Memory 220 may be means for storing information.
- Memory 220 may comprise computer instructions that processor 210 is configured to execute. When computer instructions configured to cause processor 210 to perform certain actions are stored in memory 220, and device 20 overall is configured to run under the direction of processor 210 using computer instructions from memory 220, processor 210 and/or its at least one processing core may be considered to be configured to perform said certain actions.
- Memory 220 may be at least in part external to device 20 but accessible to device 20.
- Device 20 may comprise a transmitter 230.
- Device 20 may comprise a receiver 240.
- Transmitter 230 and receiver 240 may be configured to transmit and receive, respectively, information in accordance with at least one cellular or non-cellular standard.
- Transmitter 230 may comprise more than one transmitter unit.
- Receiver 340 may comprise more than one receiver unit.
- Transmitter 230 and/or receiver 240 may be configured to operate in accordance with global system for mobile communication, GSM, wideband code division multiple access, WCDMA, 5G, long term evolution, LTE, IS-95, wireless local area network, WLAN, Ethernet and/or worldwide interoperability for microwave access, WiMAX, standards, for example.
- Device 20 may comprise a near-field communication, NFC, transceiver 250.
- NFC transceiver 250 may support at least one NFC technology, such as NFC, Bluetooth, Wibree or similar technologies.
- Device 20 may comprise a connection port, like an Ethernet port, enabling wired connection including a cable connection, a router and/or a modulator - demodulator.
- Device 20 may comprise a network interface, like an input-output, IO, port.
- Device 20 may comprise user interface, UI.
- UI may comprise at least one of a display, a keyboard, a touchscreen, a vibrator configured to signal to a user by causing device 20 to vibrate, a speaker and a microphone.
- a user may be able to operate device 20 via UI, for example to manage digital files stored in memory 220 or on a cloud accessible via transmitter 230 and receiver 240, or via NFC transceiver 250, and/or to play games.
- Processor 210 may be furnished with a transmitter configured to output information from processor 210, via electrical leads internal to device 20, to other devices comprised in device 20.
- a transmitter may comprise a serial bus transmitter configured to, for example, output information via at least one electrical lead to memory 220 for storage therein.
- the transmitter may comprise a parallel bus transmitter.
- processor 210 may comprise a receiver configured to receive information in processor 210, via electrical leads internal to device 20, from other devices comprised in device 20.
- Such a receiver may comprise a serial bus receiver configured to, for example, receive information via at least one electrical lead from receiver 240 for processing in processor 210.
- the receiver may comprise a parallel bus receiver.
- Device 20 may comprise further devices not illustrated in Fig. 3.
- device 20 may comprise at least one camera.
- Device 20 may be configured to receive image data from at least one camera.
- a camera may be a nearly exclusive uplink only device configured to load images or video clips to a network.
- a camera may comprise features, functions and/or modules of one or more of: a bullet camera, a dome camera, a covert camera, a discreet camera, an infrared camera, a night vision camera, a power on Ethernet (PoE) camera, an outdoor camera, a day/night camera, a varifocal camera, a video camera, a network camera, an internet protocol (IP) camera, a wireless camera, a pan-tilt- zoom (PTZ) camera, a high-definition camera, a closed circuit television (CCTV) camera, and/or a software defined camera.
- device 20 lacks at least one device described above.
- some devices 20 may lack a NFC transceiver 250 and/or user identity module.
- Processor 210, memory 220, transmitter 230, receiver 240, NFC transceiver 350, and/or a camera may be interconnected by electrical leads internal to device 20 in a multitude of different ways.
- each of the aforementioned devices may be separately connected to a master bus internal to device 20, to allow for the devices to exchange information.
- this is only one example and depending on the embodiment various ways of interconnecting at least two of the aforementioned devices may be selected.
- a network architecture of a communication system may comprise a radio access architecture based on long term evolution advanced (LTE Advanced, LTE-A) or new radio (NR), also known as fifth generation (5G), without restricting the embodiments to such an architecture.
- LTE Advanced Long term evolution advanced
- NR new radio
- UMTS universal mobile telecommunications system
- UTRAN long term evolution advanced
- LTE long term evolution
- LTE long term evolution
- WiFi wireless local area network
- WiMAX worldwide interoperability for microwave access
- Bluetooth® personal communications services
- PCS personal communications services
- WCDMA wideband code division multiple access
- WCDMA wideband code division multiple access
- UWB ultra-wideband
- sensor networks mobile ad-hoc networks
- MANETs mobile ad-hoc networks
- IMS Internet Protocol multimedia subsystems
- a communication system typically comprises more than one network node in which case the network nodes may also be configured to communicate with one another over links, wired or wireless, designed for the purpose. These links may be used for signalling purposes.
- the network node is a computing device configured to control the radio resources of the communication system it is coupled to. Network nodes or their functionalities may be implemented by using any node, host, server or access point, or an entity suitable for such usage.
- 5G mobile communications supports a wide range of use cases and related applications including video streaming, virtual reality, extended reality, augmented reality, different ways of data sharing and various forms of machine type applications, including vehicular safety, different sensors and real-time control. 5G is expected to have multiple radio interfaces, and also being integratable with existing legacy radio access technologies, such as the LTE.
- the communication system is also able to communicate with other networks, such as a public switched telephone network (PSTN) or the Internet, or utilize services provided by them, for example via a server.
- the communication network may also be able to support the usage of cloud services.
- Edge cloud may be brought into radio access network (RAN) by utilizing network function virtualization (NFV) and software defined networking (SDN).
- NFV network function virtualization
- SDN software defined networking
- Using edge cloud may mean access node operations to be carried out, at least partly, in a server, host or node operationally coupled to a remote radio head or base station comprising radio parts. It is also possible that node operations will be distributed among a plurality of servers, nodes or hosts.
- Application of cloud RAN architecture enables RAN real time functions being carried out at the RAN side (in a distributed unit) and non-real time functions being carried out in a centralized manner (in a centralized unit).
- Fig. 3 illustrates, by way of an example, a block diagram of a system.
- the system is a multi-camera system comprising multiple cameras 301-1, 301-2, 301-3.
- the cameras throughout this application may be software defined cameras, whose software, like algorithms and data processing, may be decoupled from camera hardware.
- the 301-1, 301-2, 301-3 are configured to capture image data on their image sensors.
- the cameras 301-1, 301-2, 301-3 provide image data, for example digital image data, image frames or frames of video sequence.
- the system comprises an object detection module
- the object detection module 302-1, 302-2, 302-3 is configured to detect an object from image data. Object detection may detect instances of semantic objects of a certain predefined classes or object types, for example humans, buildings, cars, and so on.
- the object detection module 302-1, 302-2, 302-3 may comprise, for example, face detection, feature recognition and/or colour identification.
- An object detection module 302-1, 302-2, 302-3 may be configured to identify a predefined object type on image data produced by cameras 301-1, 301-2, 301-3.
- the system comprises a re-identification, RelD, module 304-1, 304-2, 304-3 for each camera 301-1, 301-2, 301-3.
- the re-identification module 304-1, 304-2, 304-3 is configured to extract features of the detected objects, which are detected by the object detection module 302-1, 302-2, 302-3.
- the re-identification module 304-1, 304-2, 304-3 is configured to identify unique objects.
- the re-identification module 304-1, 304-2, 304-3 may compute, calculate, compare and/or match similarity of the detected objects in order to identify unique objects.
- a RelD module may comprise a RelD algorithm.
- the RelD algorithm may utilize features, like visual and/or geographical features, in order to associate common objects among, or between cameras.
- the RelD algorithm may assign an identifier, ID, for every detected object. The ID is used for all cameras for the same object.
- the system comprises view mapping databases, VMDB, 303-1, 303-2, 303-3 for each camera 301-1, 301-2, 301-3.
- a VMDB 303-1, 303-2, 303-3 is configured to store mapping information between the cameras 301-1, 301-2, 301-3, including information on regions of a camera FOV and regions of camera FOVs of the other cameras of the system mapped to it.
- Full coverage of each camera 301-1, 301-2, 301-3 is divided into a set of tiles. Tiles correspond to rectangular spatial areas, which cumulatively cover entire coverage of a camera of at least some or all cameras of the system.
- a location of a detected object is determined by a region comprising one or more tiles.
- the VMDB is configured to comprise information on detected objects and their locations, which are determined in terms of one or more tiles.
- the VMDB comprises the information regarding regions of the cameras of the camera system, whose FOVs include the same object. When the same object is detected in two or more camera’s FOV, the regions of the same object between the two or more cameras are found to match.
- a mapping information is formed based on the matching regions of two or more camera’s FOVs and stored to the VMDB.
- identified objects for example identified using RelD, may be used.
- the VMDB is configured to populate the VMDB instances based on detected and/or identified common objects and locations of those.
- the object detection 302-1 module is configured to produce an object type and object location (region) for objects detected from captured data of the camera 301-1.
- the other object detection modules 302-2, 302-3 produce the same for captured data of the cameras 301-2, 301-3, correspondingly.
- An object type and a corresponding region may be provided for each detected object and for each camera of the system detecting the object. Regions comprising a detected object are mapped to regions of the other camera’s comprising the same object.
- RelD modules 304-1, 304-2, 304-3 are configured to compare and match features. Information is stored in VMDB instance 303-1, 303-2, 303-3 of the corresponding device 300-1, 300-2, 300-3.
- VMDB instances 303-1, 303-2, 303-3 include information on overlapping areas of the cameras 301-1, 301-2, 301-3, based on the common objects and their locations in FOVs of the camera’s 301-1, 301-2, 301-3.
- VMDB 303-1, 303-2, 303-3 instances comprise information on overlapping regions of the cameras 301-1, 301-2, 301-3 of the system. Use of object detection and location of the objects by mapping information and overlapping regions enables tracking objects and/or events by processing collection of regions. This avoids need to identify objects in all the cameras, which is less effective due to feature extraction.
- RelD modules 304-1, 304-2, 304-3 are configured to handle requests including regions in form of tiles of the requesting camera FOV.
- RelD modules 304-1, 304-2, 304-3 are configured to extract features, compare features in order to identify objects and assign an identifier for an object. An identifier may enable to match the objects between camera FOVs.
- a cropped image and region information (tiles) of a requesting camera may be sent to a local camera VMDB, which sends the request to a local camera RelD.
- mapping is stored to a data store or a storage location of the VMDB. Presence of a match and tiles matching with the request are provided by RelD.
- the mapping information is shared to the requesting camera device, or VMDB of it.
- VMDB instances are deployed for each camera.
- a local VMDB instance (303-1, 303-2, 303-3 of Fig. 3) includes a local coverage/FOV of the corresponding camera.
- Each camera 301-1, 301-2, 301-3 of Fig. 3 is served by a device 300-1, 300-2, 300-3.
- Each device 300-1, 300-2, 300-3 of Fig. 3 is configured to host modules providing services for object detection 302-1, 302-2, 302-3, re-identification 304-1, 304-2, 304-3, and VMDB 303-1, 303-2, 303-3.
- the modules may be implemented in containers, virtual machines or processes, for example.
- the modules, which may be called functions and/or services, may comprise executable instructions.
- the device 300-1, 300-2, 300-3 may be an edge device.
- Modules or functions of object detection, RelD and VMDB may be hosted and located centrally.
- VMDB may be configured in a central location to serve multiple cameras.
- a central VMDB may include a global coverage/FOV of multiple cameras.
- one or more instances of the modules or functions may be configured to serve all or a subset of the cameras of the system. Location and number of the instances with respect to the number of cameras may vary.
- Embodiments may use detected objects, as described.
- identified objects may be used.
- the objects may be identified using RelD function.
- RelD function may be run on the selected regions only. Tracking may be implemented using the regions and object detection function.
- An edge device may comprise a hardware and/or a software edge accelerator.
- An edge device enables a cloud scale machine learning inferences to run locally, near the sensory data sources, like cameras and microphones. This improves efficiency, latency and throughput by reducing or avoiding need to send large volume of data to remote data centres. While cloud environments may be used for computing heavy machine learning model training, edge devices may provide inference capabilities.
- a machine learning task may involve first preparing captured data to a compatible input format. Captured data may comprise pixels from an image sensor and/or waveforms from a microphone. Secondly, the machine learning task may comprise executing a machine learning model, optionally utilizing artificial intelligence and neural networks.
- the machine learning task may comprise interpreting output of the machine learning model in order to create inferences.
- the machine learning task may comprise serving the inferences through well-defined interfaces, like representational state transfer application programming interfaces, REST APIs. Inferences may be stored in order to enable use of historical queries.
- Sensors like cameras, are being pervasively deployed for different areas and functions, like traffic monitoring, industrial applications, virtual reality applications, augmented reality applications, surveillance. Cameras are used for investigation purposes, for example, after occurrence of an event of interest. Deploying edge computing for collaborative sensing mechanism enables lowering bandwidth, reducing response time and improving efficiency.
- VMDB is configured to provide overlapping regions among multiple cameras. In other words, between or in group of multiple cameras.
- VMDB enables implementation of a collaborative visual analytics system for edge environments. This enables reducing number of times that feature extraction models are applied on image data captured by cameras. Instead, more effective object detection and bounding box tracking mechanisms are utilized. VMDB provides mechanism for spatial, temporal and pan-tilt-zoom, PTZ, calibration. VMDB enables to address inaccuracies due to object depth, lack of synchronization and camera movements.
- Object detection, re-identification, RelD, and view mapping database, VMDB may be implemented as executable instructions.
- the corresponding modules of executable instructions may be stored in a memory and executed by a processor.
- the devices and modules of Fig. 3 may operate as standalone device or be connected, for example, via network, to other computer system, service system, server system, storage system or peripheral device.
- Devices and/or modules of the devices of Fig. 3 may communicate with each other through various mechanisms, for example using remote procedure calls, interprocess communication tools, or representational state transfer, REST, application programming interfaces, also known as RESTful APIs.
- a camera may receive a query on a common object location in the camera’s FOV.
- the query may be sent by a remote camera of a multi-camera system.
- the query includes an identified object type and the remote camera FOV region of the object type.
- the query information is compared to the FOV of the query receiving camera in order to detect the identified object type and its location. The comparison is done in an RelD module.
- the response includes the FOV region(s), which correspond to the identified object type location at the query receiving camera.
- the response may include overlapping regions of one or more other remote cameras.
- a response may have, for example, the following format:
- object class ⁇ object type>
- object_location ⁇ object bounding box>
- object region ⁇ region in the local camera’s field>
- overlapping_regions [
- region ⁇ region in the remote camera>
- region ⁇ region in the remote camera> ⁇ ] ⁇
- object_location ⁇ object bounding box>
- object_region ⁇ region in the local camera’s field>
- region ⁇ region in the remote camera> ⁇ ] ⁇ ] ⁇
- the response or the fields of it may be implemented in different format and/or comprise additional fields and/or field names.
- a multi-camera system is monitoring traffic.
- a separate query may be made to each device of the system including or hosting a camera.
- a list of detected cars, bounding boxes of the cars, corresponding regions in the camera’s FOV/co verage and regions in other camera FOVs/co verages are retrieved. This enables identifying the identical objects, without performing any additional or extra processing on the image data, and further, deducing the total number of cars by simply using regions of the cars and locations of the cars in the camera coverage.
- One of the devices of the system may be configured to eliminate redundancy by reducing the cars that are detected by multiple cameras. This enables to return the result without any additional processing.
- Figs. 4a-b illustrate, by way of example, an aerial view of two cameras.
- the aerial view is covered by cameras SDC1 and SDC2.
- An object 40 which is illustrated by a circle, is captured by both cameras at the same time instance.
- Fig. 4a illustrated camera view is covered by the cameras SDC1 and SDC2 at the time instance tl.
- Camera view regions, or FOVs of cameras SDC1 and SDC2 both contain the object 40.
- the FOVs of the cameras SDC1 and SDC2 map to each other.
- the aerial view is covered by the cameras SDC1 and SDC2 at a time instance t2, which is later than tl of Fig. 4a.
- FIGS. 4a-b show two aerial views, by two cameras SDC1, SDC2, at two sequential time instances tl, t2.
- Figs. 4c-d illustrate, by way of example, an aerial view of a camera.
- Fig. 4c illustrate a view of SDC1 at the time instance of tl.
- Figure 4d illustrates a view of SDC2 at the time instance of tl.
- Figs. 4a-d enable implementing spatial calibration.
- Figs. 5a-d illustrate, by way of examples, a region of a camera view.
- Fig. 5a illustrates a region of SDC1 FOV at the time instance tl.
- the object is located in the tile 4 of the region of the SDC1 FOV.
- Fig. 5b illustrates a region of SDC1 FOV at the time instance t2.
- the object is located at tile 6 of the region of the SDC1 FOV.
- Fig. 5c illustrates a region of SDC2 FOV at the time instance tl.
- Fig. 5d illustrates a region of SDC2 FOV at the time instance t2.
- the object is located in the tile 5 of the SDC2 FOV in both instances of time, tl and t2, as shown in Figs. 5cd.
- the cameras SDC1 and SDC2 view object from different angles.
- mapping the tile 5 of the SDC2 FOV to the SDC1 FOV it would be straightforward to map the tile 5 of the SDC2 FOV to the tiles 4 and 6 of SDC1 FOV. However, this may be inaccurate and insufficient.
- the object detected in the tile 5 of SDC2 FOV is mapped to regions ⁇ 4 ⁇ , ⁇ 5 ⁇ , ⁇ 6 ⁇ , ⁇ 4, 5 ⁇ , ⁇ 5, 6 ⁇ and ⁇ 4, 5, 6 ⁇ of SDC1 FOV. Mapping regions may be dependent on the size of the object.
- Regions of commonly detected objects are updated to a VMDB during a population phase by multiple cameras.
- spatial calibration may enlarge a region to be mapped. Region of any one of cameras may be enlarged.
- a continuous region is formed using spatial calibration.
- a spatially calibrated region comprises a list of regions that are contiguous subsets of the tiles that take place between the tiles indicated as the detected locations. All subsets of identified tiles are not simply included. Rather, a continuous region may be formed by adding neighbouring tiles. For example, in Figs. 5a-d the subset ⁇ 4, 6 ⁇ is not included, as it is not contiguous region. The two tiles, ⁇ 4 ⁇ , ⁇ 6 ⁇ , indicated as the detected locations are not next to each other, so the two would not form a contiguous region of adjacent tiles, but two separate regions.
- Figs. 6a-b illustrate, by way of examples, mapping a remote camera region to a local camera region.
- a local camera may identify an object and location of the object in the local camera FOV.
- a local region of the local camera FOV is used for mapping corresponding regions of other cameras of a multi-camera system.
- Figs. 6a illustrates a region (tiles 12, 13, 14, 22, 23, 24) in a remote camera FOV that is configured to map to a query including a local region of a local camera FOV.
- a query on mapping regions may be sent by a device including or hosting a local camera.
- a query on mapping regions may be responded by a device including or hosting a remote camera.
- a query is responded by identifying a region.
- a mapping region may be determined based on the detected or identified object and/or overlapping regions of the cameras. Instead of using the mapping region of Fig. 6a, the region as illustrated in Fig. 6b is reported. Region of Fig. 6b comprises the mapping region of Fig. 6a and the tiles around it. One or more additional tiles contiguous with the mapping region is added to the tiles of the mapping region. Providing a larger region than initial mapping region may lead to increased number of RelD operations to identify objects in the reported region. However, this enables providing accuracy to the response.
- Figs. 6a-b enable implementation of temporal calibration. Cameras of a multicamera system may not be in synchronization. Complete synchronization cannot be guaranteed even if cameras use a synchronization utility.
- a VMDB may include a time stamp of the cropped image, which includes a number of tiles.
- RelD (304-2 of Fig. 3) may use the image of camera (301-2 of Fig. 3) that is created, i.e. including a timestamp, at a time closest to the timestamp of the received cropped image.
- the two timestamps and/or time instances may differ. Inaccuracies may arise due to time differences, for example an object may move beyond the region indicated by a VMDB.
- the VMDB providing a larger region than direct mapping provides, as illustrated in Figs. 6a-b, enables to address this issue.
- VMDB has divided the full coverage of a camera into rectangular tiles.
- Field of view, FOV, of the camera is a subset of its coverage.
- a particular set of one or more tiles consistently refer to a specific region covered by the camera.
- VMDB is populated by providing identified object and its location. The location is determined as a region and expressed as a set of one or more tiles.
- the VMDB which may be deployed centrally and/or by local instances, stores information about regions that overlap among cameras of a multi-camera system.
- the VMDB and information stored in such enables concentrating on image parts, determined by the tiles, which have been found relevant. Relevance may be based on a detected object, object ID, an event, or camera configurations.
- Matching between regions among cameras of the multi-camera system may be created using RelD. Changes on captured images and/or camera settings have effect on updates and number of queries, which in turn require efficiency for adapting and providing responses, preferably in real time.
- Use of VMDB enables use of object detection replacing multiple runs of RelD. Further, relevant locations, i.e. tiles of camera FOV, are identified and those are processed instead of the whole image data.
- Fig. 7a-c show, by way of example, mapping a local camera view.
- a view, or a field of view, FOV, of a camera comprises a subset of the camera coverage.
- a FOV available at a time instance may comprise a set of one or more tiles.
- the available FOV, as well as available set of one or more tiles changes correspondingly.
- Fig. 7a shows coverage of a camera, which is divided into tiles numbered from 1 to 50.
- Current FOV of the camera at time instance tl is shown as a rectangle between tiles 12-36.
- Fig. 7b shows the FOV after the camera has moved towards right, at a time instance t2.
- the FOV at the time t2 is shown as a rectangle between tiles 14 and 38. Every tile is configured to identify the same region in the camera’s coverage, even if the camera is moved or camera settings are changed.
- Fig. 7c shows the FOV after the camera is zoomed in, at a time instance t3. The size of tiles is enlarged corresponding to the zoom, and smaller number of tiles, 25-27, represent the FOV at the time t3.
- a VMDB is configured to receive camera configurations or settings, like zoom level and direction of the camera, from the camera. The VMDB is configured to detect corresponding tiles based on the camera configuration and/or settings.
- a camera may be configured to save camera settings and configuration, for example in a local memory.
- Figs. 6a-c enable implementation of PTZ calibration.
- PTZ calibration provides accuracy in a local PTZ camera, while spatial and temporal calibration relate mapping regions of one or more cameras, which may be remote.
- Video analytics at an edge enable utilizing computation power of the edge or cloud in order to run real time video analytics consisting multiple machine learning operations.
- the embodiments provide efficiency and enable avoiding linearly growing resource consumption, which is due to per stream optimization as the number of video feeds increase. It has been possible to incorporate learning spatial and temporal relationship between the video feeds, e.g. obtained from geo -distributed locations when camera FOVs do not overlap. Where FOVs overlap and multiple cameras are viewing the same area concurrently, analysing all the frames poses demands to resources.
- a collaborative cross camera video analytics system may decide the next camera based on usefulness of investigated frame. Reducing network load, but maintaining timely and accurate video analysis, may be achieved using region of interest masks of the FOV during runtime.
- Masks enable cropping images so that, instead of the whole image data, only subset of the image data is sent to a cloud server for processing. Still the cloud server shall identify common identical objects from multiple cameras. Such is avoided by the previously presented, where overlapping regions and identical objects are taken into account. Additional image processing is avoided, and instead spatial-temporal queries may be handled by processing information on a location of a common object or event among camera views.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Signal Processing (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Studio Devices (AREA)
Abstract
Description
Claims
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US18/875,875 US20260025483A1 (en) | 2022-06-23 | 2022-06-23 | Multi-camera image data processing |
| PCT/EP2022/067160 WO2023247041A1 (en) | 2022-06-23 | 2022-06-23 | Multi-camera image data processing |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/EP2022/067160 WO2023247041A1 (en) | 2022-06-23 | 2022-06-23 | Multi-camera image data processing |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2023247041A1 true WO2023247041A1 (en) | 2023-12-28 |
Family
ID=82482936
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/EP2022/067160 Ceased WO2023247041A1 (en) | 2022-06-23 | 2022-06-23 | Multi-camera image data processing |
Country Status (2)
| Country | Link |
|---|---|
| US (1) | US20260025483A1 (en) |
| WO (1) | WO2023247041A1 (en) |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20200327802A1 (en) * | 2020-06-26 | 2020-10-15 | Intel Corporation | Object tracking technology based on cognitive representation of a location in space |
-
2022
- 2022-06-23 WO PCT/EP2022/067160 patent/WO2023247041A1/en not_active Ceased
- 2022-06-23 US US18/875,875 patent/US20260025483A1/en active Pending
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20200327802A1 (en) * | 2020-06-26 | 2020-10-15 | Intel Corporation | Object tracking technology based on cognitive representation of a location in space |
Non-Patent Citations (1)
| Title |
|---|
| JANG SI YOUNG ET AL: "Deploying Collaborative Machine Learning Systems in Edge with Multiple Cameras", 2021 THIRTEENTH INTERNATIONAL CONFERENCE ON MOBILE COMPUTING AND UBIQUITOUS NETWORK (ICMU), IPSJ, 17 November 2021 (2021-11-17), pages 1 - 6, XP034049809, DOI: 10.23919/ICMU50196.2021.9638879 * |
Also Published As
| Publication number | Publication date |
|---|---|
| US20260025483A1 (en) | 2026-01-22 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN110163885B (en) | Target tracking method and device | |
| US9386282B2 (en) | System and method for automatic camera hand-off using location measurements | |
| CN119251439A (en) | System and method for optimizing dynamic point clouds based on prioritized transformations | |
| US10812941B2 (en) | Positioning method and device | |
| US9384395B2 (en) | Method for providing augmented reality, and user terminal and access point using the same | |
| CN110313190B (en) | Control device and method | |
| CN106027960B (en) | A positioning system and method | |
| CN108347427B (en) | A video data transmission and processing method, device, terminal and server | |
| WO2013192270A1 (en) | Visual signatures for indoor positioning | |
| CN105611186B (en) | Exposure control method and system based on dual cameras | |
| CN105120159A (en) | Method for obtaining pictures via remote control and server | |
| US20260046378A1 (en) | Enhanced video system | |
| WO2019144746A1 (en) | Service management method and related devices | |
| CN104661300B (en) | Localization method, device, system and mobile terminal | |
| CN111340857A (en) | Camera tracking control method and device | |
| US20210027483A1 (en) | Collaborative visual enhancement devices | |
| CN105847756B (en) | Video identification tracking location system based on the dotted fitting in position | |
| CN108282635A (en) | Panorama image generation method and system, car networking big data service platform | |
| Xie et al. | A video analytics-based intelligent indoor positioning system using edge computing for IoT | |
| CN100507963C (en) | Large range battlefield situation intelligent perception system and perception method | |
| KR102664027B1 (en) | Camera to analyze video based on artificial intelligence and method of operating thereof | |
| US20260025483A1 (en) | Multi-camera image data processing | |
| US10701122B2 (en) | Video streaming stitching and transmitting method, video streaming gateway and video streaming viewer | |
| Jaenen et al. | Object tracking as job-scheduling problem | |
| US20230394826A1 (en) | Data processing device and method, and data processing system |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22740306 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 18875875 Country of ref document: US |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 22740306 Country of ref document: EP Kind code of ref document: A1 |
|
| WWP | Wipo information: published in national office |
Ref document number: 18875875 Country of ref document: US |