WO2020248248A1

WO2020248248A1 - Systems and methods for object tracking

Info

Publication number: WO2020248248A1
Application number: PCT/CN2019/091357
Authority: WO
Inventors: Peilun Li; Guozhen Li; Meiqi LU; Zhangxi Yan; Youzeng LI
Original assignee: Beijing Didi Infinity Technology And Development Co., Ltd.
Priority date: 2019-06-14
Filing date: 2019-06-14
Publication date: 2020-12-17
Also published as: CN111954886A

Abstract

A method may include obtaining a plurality of image collections captured by multiple cameras. The method may also include detecting one or more objects from each of at least a portion of a plurality of images included in the each of the plurality of image collections to extract one or more image features associated with at least one of the one or more objects and obtaining a trajectory of the at least one of the one or more objects. The method may also include determining one or more trajectory features of the trajectory of the at least one of the one or more objects; and matching, based on the one or more trajectory features and the one or more image features, an object tracked in a camera with a second object tracked in one or more other cameras.

Description

SYSTEMS AND METHODS FOR OBJECT TRACKING

TECHNICAL FIELD

The present application generally relates to image processing technology, and more particularly, to systems, methods, and computer storage media for multi-camera tracking and/or single-camera tracking.

BACKGROUND

In recent years, a number of approaches have been adopted to improve vehicle tracking, which is critical in an intelligent transportation system (ITS) . Images captured by fixed cameras can be used for traffic feature estimation, traffic anomaly detection, multi-camera tracking, and other applications. However, features of vehicles in traffic environment are different from those of general objects (e.g., pedestrian) , making it more difficult to directly apply an object tracking model for general objects (e.g., pedestrian) , without significant modification, to analyzing images of vehicles in traffic. Moreover, the similarity between different vehicles, high traffic density, serious occlusion phenomenon, etc., present great challenges to multi-target and cross-camera vehicles tracking. Therefore, it is desirable to provide effective systems and methods for tracking vehicles in traffic, while methods and systems may also be used to track any object precisely and quickly.

SUMMARY

In a first aspect of the present disclosure, a system for object tracking across multiple cameras is provided. The system may include at least one storage device and at least one processor. The at least one storage medium may store executable instructions. The at least one processor may be configured to be in communication with the at least one storage device, wherein when executing the executable instructions, the system is configured to perform one or more of the following operations. The system may obtain a plurality of image collections captured by multiple cameras. For each of the plurality of image collections, the system may detect one or more objects from each of at least a portion of a plurality of images included in the each of the plurality of image collections to extract one or more image features associated with at least one of the one or more objects. The system may track the at least one of the one or more objects from each of the plurality of image collections to obtain a trajectory of the at least one of the one or more objects. The system may also determine one or more trajectory features of the trajectory of the at least one of the one or more object. The system may further match, based on the one or more trajectory features of the trajectory and the one or more image features associated with the at least one of the one or more objects, a first object tracked in one of the plurality of image collections with a second object tracked in one or more other image collections of the plurality of image collections.

In some embodiments, the one or more trajectory features may include at least one of movement direction information associated with the trajectory of the at least one of the one or more objects, geographical location information associated with the trajectory, or time information associated with the trajectory.

In some embodiments, the trajectory may include a plurality of points, each of which denoting a geographical location of the at least one of the one or more objects corresponding to one of the plurality of images, and to determine one or more trajectory features of the trajectory of at least one of the one or more objects, the system may divide the trajectory into multiple slices, each of the multiple slices including several points corresponding to several images of the plurality of images. The system may also determine, based on the several points of each of the multiple slices, an average point for each of the multiple slices and determine, based on the average point for each of the multiple slices, the one or more trajectory features.

In some embodiments, to determine, based on the average point for each of the multiple slices, the one or more trajectory features, the system may designate a geographical location of the average point for each of the multiple slices as one of the one or more trajectory features.

In some embodiments, to determine, based on the average point for each of the multiple slices, the one or more trajectory features, the system may determine, based on average points corresponding to any two adjacent slices among the multiple slices, multiple movement directions; and designate the multiple movement directions as one of the one or more trajectory features.

In some embodiments, to match, based on the one or more trajectory features of the trajectory and the one or more image features associated with at least one of the one or more objects, a first object tracked in one of the plurality of image collections with a second object tracked in one or more other image collections of the plurality of image collections, the system may determine a first similarity degree between the one or more image features associated with the first object and the second object, respectively. The system may also determine a second similarity degree between at least one of the one or more trajectory features of a first trajectory of the first object and a second trajectory of the second object, respectively. The system may further determine whether the first object is matched with the second object at least in part based on at least one of the first similarity degree or the second similarity degree.

In some embodiments, to determine a second similarity degree between at least one of the trajectory features of a first trajectory of the first object and a second trajectory of the second object, respectively, the system may determine first movement direction information associated with the first trajectory, the first movement direction information including one or more first movement directions of the first trajectory. The system may also determine second movement direction information associated with the second trajectory, the second movement direction information including one or more second movement directions of the second trajectory. The system may also determine a similarity degree between each of the one or more first movement directions and each of the one or more second movement directions to obtain one or more similarity degrees associated with the first trajectory and the second trajectory; and designate a maximum similarity degree among the one or more similarity degrees as the second similarity degree between at least one of the one or more trajectory features of the first trajectory and the second trajectory, respectively.

In some embodiments, to determine a second similarity degree between at least one of the trajectory features of a first trajectory of the first object and a second trajectory of the second object, respectively, the system may determine first geographical location information associated with the first trajectory, the first geographical location information including one or more first geographical locations on the first trajectory. The system may determine second geographical location information associated with the second trajectory, the second geographical location information including one or more second geographical locations on the second trajectory. The system may also determine a geographical distance between each of the one or more first geographical locations and each of the one or more second geographical locations to obtain one or more geographical distances associated with the first trajectory and the second trajectory. The system may further determine the second similarity degree between at least one of the one or more trajectory features of the first trajectory and the second trajectory, respectively, based on a minimum geographical distance among the one or more geographical distances.

In some embodiments, to determine a second similarity degree between at least one of the trajectory features of a first trajectory of the first object and a second trajectory of the second object, respectively, the system may determine first time information associated with the first trajectory, the first time information including a first time period of the first trajectory; and determine second time information associated with the second trajectory, the second time information including a second time period of the second trajectory. The system may further determine, based on an intersection over union between the first time period and the second time period, the second similarity degree between at least one of the one or more trajectory features of the first trajectory and the second trajectory, respectively.

In some embodiments, to determine whether the first object is matched with the second object at least in part based on at least one of the first similarity degree or the second similarity degree, the system may determine a trajectory accessibility based on the one or more trajectory features, the trajectory accessibility indicating a probability that the first trajectory of the first object can access the second trajectory of the second object; and determining whether the first object is matched with the second object at least in part based on the trajectory accessibility.

In some embodiments, to determining whether the first object is matched with the second object at least in part based on the first similarity and the second similarity, the system may determine that the first object is matched with the second object in response to a determination including at least one of the first similarity degree satisfying a first condition or the second similarity degree satisfying a second condition.

In some embodiments, at least one of the first condition or the second condition may be adjustable according to a scene captured by at least one of the multiple cameras.

In some embodiments, the system may unify the first trajectory and the second trajectory to determine a target trajectory of the first object or the second object in response to a matching of the first object and the second object.

In some embodiments, to determine one or more trajectory features of the trajectory of at least one of the one or more objects, the system may smooth the trajectory of the at least one of the one or more objects to obtain a smoothed trajectory of the at least one of the one or more objects; and determine, based on the smoothed trajectory of the at least one of the one or more objects, the one or more trajectory features.

In a second aspect of the present disclosure, a system for object tracking is provided. The system may include at least one storage device and at least one processor. The at least one storage medium may store executable instructions. The at least one processor may be configured to be in communication with the at least one storage device, wherein when executing the executable instructions, the system is configured to perform one or more of the following operations. The system may obtain an image collection collected by a camera and detect one or more objects from each of a plurality of images included in the image collection to extract one or more image features associated with at least one of the one or more objects using an object detection model. The system may determine a geographical location of at least one of the one or more objects corresponding to each of the plurality of images. The system may further match, based on the one or more image features associated with the at least one of the one or more objects and the geographical location of the at least one of the one or more objects, a first object detected in one of the plurality of images with a second object detected in one or more other images of the plurality of images.

In a third aspect of the present disclosure, a method for object tracking across multiple cameras is provided. The method may include obtaining a plurality of image collections captured by multiple cameras. For each of the plurality of image collections, the method may include detecting one or more objects from each of at least a portion of a plurality of images included in the each of the plurality of image collections to extract one or more image features associated with at least one of the one or more objects. The method may include tracking the at least one of the one or more objects from each of the plurality of image collections to obtain a trajectory of the at least one of the one or more objects. The method may also include determining one or more trajectory features of the trajectory of the at least one of the one or more object. The method may further include matching, based on the one or more trajectory features of the trajectory and the one or more image features associated with the at least one of the one or more objects, a first object tracked in one of the plurality of image collections with a second object tracked in one or more other image collections of the plurality of image collections.

In a fourth aspect of the present disclosure, a method for object tracking is provided. The method may include obtaining an image collection collected by a camera and detecting one or more objects from each of a plurality of images included in the image collection to extract one or more image features associated with at least one of the one or more objects using an object detection model. The method may also include determining a geographical location of at least one of the one or more objects corresponding to each of the plurality of images. The method may further include matching, based on the one or more image features associated with the at least one of the one or more objects and the geographical location of the at least one of the one or more objects, a first object detected in one of the plurality of images with a second object detected in one or more other images of the plurality of images.

In a fifth aspect of the present disclosure, a non-transitory computer readable medium is provided. The non-transitory computer readable medium may include at least one set of instructions that, when executed by at least one processor, cause the at least one processor to effectuate a method. The method may include one or more of the following operations. The method may include obtaining a plurality of image collections captured by multiple cameras. For each of the plurality of image collections, the method may include detecting one or more objects from each of at least a portion of a plurality of images included in the each of the plurality of image collections to extract one or more image features associated with at least one of the one or more objects. The method may include tracking the at least one of the one or more objects from each of the plurality of image collections to obtain a trajectory of the at least one of the one or more objects. The method may also include determining one or more trajectory features of the trajectory of the at least one of the one or more object. The method may further include matching, based on the one or more trajectory features of the trajectory and the one or more image features associated with the at least one of the one or more objects, a first object tracked in one of the plurality of image collections with a second object tracked in one or more other image collections of the plurality of image collections.

In a sixth aspect of the present disclosure, a non-transitory computer readable medium is provided. The non-transitory computer readable medium may include at least one set of instructions that, when executed by at least one processor, cause the at least one processor to effectuate a method. The method may include one or more of the following operations. The method may include obtaining an image collection collected by a camera and detecting one or more objects from each of a plurality of images included in the image collection to extract one or more image features associated with at least one of the one or more objects using an object detection model. The method may also include determining a geographical location of at least one of the one or more objects corresponding to each of the plurality of images. The method may further include matching, based on the one or more image features associated with the at least one of the one or more objects and the geographical location of the at least one of the one or more objects, a first object detected in one of the plurality of images with a second object detected in one or more other images of the plurality of images.

In a seventh aspect of the present disclosure, a system for object tracking is provided. The system may include an obtaining module configured to obtain an image collection collected by a camera; a detecting module configured to detect one or more objects from each of a plurality of images included in the image collection to extract one or more image features associated with at least one of the one or more objects using an object detection model; a determining module configured to determine a geographical location of at least one of the one or more objects corresponding to each of the plurality of images; and a matching module configured to match, based on the one or more image features associated with the at least one of the one or more objects and the geographical location of the at least one of the one or more objects, a first object detected in one of the plurality of images with a second object detected in one or more other images of the plurality of images.

In an eighth aspect of the present disclosure, a system for object tracking is provided. The system may include an obtaining module configured to obtain a plurality of image collections captured by multiple cameras; a single-camera tracking module configured to for each of the plurality of image collections, detect one or more objects from each of at least a portion of a plurality of images included in the each of the plurality of image collections to extract one or more image features associated with at least one of the one or more objects; track the at least one of the one or more objects from each of the plurality of image collections to obtain a trajectory of the at least one of the one or more objects; and a multi-camera tracking module configured to determine one or more trajectory features of the trajectory of the at least one of the one or more objects; and matching, based on the one or more trajectory features of the trajectory and the one or more image features associated with the at least one of the one or more objects, a first object tracked in one of the plurality of image collections with a second object tracked in one or more other image collections of the plurality of image collections.

Additional features will be set forth in part in the description which follows, and in part will become apparent to those skilled in the art upon examination of the following and the accompanying drawings or may be learned by production or operation of the examples. The features of the present disclosure may be realized and attained by practice or use of various aspects of the methodologies, instrumentalities, and combinations set forth in the detailed examples discussed below.

BRIEF DESCRIPTION OF THE DRAWINGS

The methods, systems, and/or programming described herein are further described in terms of exemplary embodiments. These exemplary embodiments are described in detail with reference to the drawings. These embodiments are non-limiting exemplary embodiments, in which like reference numerals represent similar structures throughout the several views of the drawings, and wherein:

FIG. 1 is a schematic diagram of an exemplary object tracking system according to some embodiments of the present disclosure;

FIG. 2 is a schematic diagram illustrating exemplary hardware and/or software components of a computing device according to some embodiments of the present disclosure;

FIG. 3 is a schematic diagram illustrating exemplary hardware and/or software components of a mobile device according to some embodiments of the present disclosure;

FIG. 4A is a block diagram illustrating an exemplary processing device according to some embodiments of the present disclosure;

FIG. 4B is a block diagram illustrating another exemplary processing device according to some embodiments of the present disclosure;

FIG. 5 is a flowchart illustrating an exemplary process for single-camera object tracking according to some embodiments of the present disclosure;

FIG. 6 is a flowchart illustrating an exemplary process for matching two objects according to some embodiments of the present disclosure;

FIG. 7 is a flowchart illustrating an exemplary process for matching two tracks in single-camera tracking according to some embodiments of the present disclosure;

FIG. 8 is a flowchart illustrating an exemplary process for object tracking across multiple cameras according to some embodiments of the present disclosure;

FIG. 9 is a flowchart illustrating an exemplary process for multi-camera tracking according to some embodiments of the present disclosure;

FIG. 10 is a flowchart illustrating an exemplary process for determining a similarity degree between two trajectories according to some embodiments of the present disclosure;

FIG. 11 is a flowchart illustrating an exemplary process for determining a similarity degree between two trajectories according to some embodiments of the present disclosure;

FIG. 12A and 12B are diagrams showing speed changes of a vehicle according to some embodiments of the present disclosure;

FIG. 13A are diagrams showing a vehicle being tracked in across multiple cameras according to some embodiments of the present disclosure; and

FIG. 13B are diagrams showing trajectories of the vehicle as illustrated in FIG. 13A tracked in across multiple cameras according to some embodiments of the present disclosure.

DETAILED DESCRIPTION

In the following detailed description, numerous specific details are set forth by way of examples in order to provide a thorough understanding of the relevant disclosure. However, it should be apparent to those skilled in the art that the present disclosure may be practiced without such details. In other instances, well-known methods, procedures, systems, components, and/or circuitry have been described at a relatively high-level, without detail, in order to avoid unnecessarily obscuring aspects of the present disclosure. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present disclosure. Thus, the present disclosure is not limited to the embodiments shown, but to be accorded the widest scope consistent with the claims.

It will be understood that the term “system, ” “engine, ” “unit, ” “module, ” and/or “block” used herein are one method to distinguish different components, elements, parts, section or assembly of different level in ascending order. However, the terms may be displaced by other expression if they may achieve the same purpose.

Generally, the word “module, ” “unit, ” or “block, ” as used herein, refers to logic embodied in hardware or firmware, or to a collection of software instructions. A module, a unit, or a block described herein may be implemented as software and/or hardware and may be stored in any type of non-transitory computer-readable medium or other storage device. In some embodiments, a software module/unit/block may be compiled and linked into an executable program. It will be appreciated that software modules can be callable from other modules/units/blocks or from themselves, and/or may be invoked in response to detected events or interrupts. Software modules/units/blocks configured for execution on computing devices (e.g., processor 201 as illustrated in FIG. 2) may be provided on a computer readable medium, such as a compact disc, a digital image collection disc, a flash drive, a magnetic disc, or any other tangible medium, or as a digital download (and can be originally stored in a compressed or installable format that needs installation, decompression, or decryption prior to execution) . Such software code may be stored, partially or fully, on a storage device of the executing computing device, for execution by the computing device. Software instructions may be embedded in firmware, such as an EPROM. It will be further appreciated that hardware modules (or units or blocks) may be included in connected logic components, such as gates and flip-flops, and/or can be included in programmable units, such as programmable gate arrays or processors. The modules (or units or blocks) or computing device functionality described herein may be implemented as software modules (or units or blocks) , but may be represented in hardware or firmware. In general, the modules (or units or blocks) described herein refer to logical modules (or units or blocks) that may be combined with other modules (or units or blocks) or divided into sub-modules (or sub-units or sub-blocks) despite their physical organization or storage.

It will be understood that when a unit, engine, module, or block is referred to as being “on, ” “connected to, ” or “coupled to” another unit, engine, module, or block, it may be directly on, connected or coupled to, or communicate with the other unit, engine, module, or block, or an intervening unit, engine, module, or block may be present, unless the context clearly indicates otherwise. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.

The terminology used herein is for the purposes of describing particular examples and embodiments only and is not intended to be limiting. As used herein, the singular forms “a, ” “an, ” and “the” may be intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “include” and/or “comprise, ” when used in this disclosure, specify the presence of integers, devices, behaviors, stated features, steps, elements, operations, and/or components, but do not exclude the presence or addition of one or more other integers, devices, behaviors, features, steps, elements, operations, components, and/or groups thereof.

An aspect of the present disclosure relates to systems and methods for object tracking. A system for multi-camera object tracking may obtain a plurality of image collections captured by multiple cameras. Each image collection may include multiple images captured by one of the multiple cameras. For each of at least a portion of the plurality of image collections, the system may also detect one or more objects from at least a portion of the multiple images and extract one or more image features associated with at least one of the one or more detected objects from each of the at least a portion of the multiple images. The system may also track the at least one of the one or more objects from each of the at least a portion of the plurality of image collections to obtain a trajectory of the at least one of the one or more objects. The system may also determine one or more trajectory features of the trajectory of the at least one of the one or more object. The system may further match, based on the one or more trajectory features of the trajectory and the one or more image features associated with the at least one of the one or more objects, a first object tracked in one of the plurality of image collections with a second object tracked in one or more other image collections of the plurality of image collections. Accordingly, the system may fuse the trajectory features and the image features to track one or more objects, which may improve the accuracy of multi-camera tracking. In addition, the system may determine a first similarity degree between the image features associated the first object and the second object and determine a second similarity degree between at least one of the trajectory features associated with the first object and the second object. The system may determine whether the first object is matched with the second object by determining whether the first similarity degree satisfies a first condition and the second similarity degree satisfies a second condition. In this way, the system may use a hierarchical matching technique to track one or more objects across multiple cameras which may further improve the accuracy of multi-camera tracking.

FIG. 1 illustrates a schematic diagram of an exemplary object tracking system according to some embodiments of the present disclosure. As shown, the object tracking system 100 may include a server 110, a network 120, a camera 130, a storage device 140, a service providing system 150. The object tracking system 100 may be applied to an intelligent transportation system, a security system, an online to offline service providing system, etc. For example, the object tracking system 100 may be applied in traffic feature estimation, traffic anomaly detection, vehicle anomaly detection, etc.

The server 110 may process information and/or data relating to the object tracking system 100 to perform one or more functions described in the present disclosure. For example, the server 110 may perform a single-camera tracking and/or multi-camera tracking. The server 110 may be a single server or a server group. The server group may be centralized, or distributed (e.g., the server 110 may be a distributed system) . In some embodiments, the server 110 may be local or remote. For example, the server 110 may access information and/or data stored in the camera 130, and/or the storage device 140 via the network 120. As another example, the server 110 may be directly connected to the camera 130 and/or the storage device 140 to access stored information and/or data. In some embodiments, the server 110 may be implemented on a cloud platform. Merely by way of example, the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, an inter-cloud, a multi-cloud, or the like, or any combination thereof. In some embodiments, the server 110 (or a portion thereof) may be implemented on a computing device 200 having one or more components illustrated in FIG. 2 of the present disclosure.

In some embodiments, the server 110 may include a processing device 112. According to some embodiments of the present disclosure, the processing device 112 may process information and/or data related to the object tracking system 100 to perform one or more functions described in the present disclosure. For example, the processing device 112 may match an object (e.g., a vehicle) detected in one of multiple images included in an image collection with an object detected in one or more other images of the multiple images. As another example, the processing device 112 may match an object (e.g., a vehicle) tracked in one of multiple cameras image collections with an object tracked in one or more other cameras of the multiple cameras Additionally or alternatively, the processing device 112 may generate a trajectory of the tracked object.

In some embodiments, the processing device 112 may include one or more processing devices (e.g., single-core processing device (s) or multi-core processor (s) ) . Merely by way of example, the processing device 112 may include a central processing unit (CPU) , an application-specific integrated circuit (ASIC) , an application-specific instruction-set processor (ASIP) , a graphics processing unit (GPU) , a physics processing unit (PPU) , a digital signal processor (DSP) , a field-programmable gate array (FPGA) , a programmable logic device (PLD) , a controller, a microcontroller unit, a reduced instruction-set computer (RISC) , a microprocessor, or the like, or any combination thereof.

The network 120 may include any suitable network that can facilitate the exchange of information and/or data for the object tracking system 100. In some embodiments, one or more components in the object tracking system 100 (e.g., the server 110, the camera 130, and the storage device 140) may send information and/or data to another component (s) in the object tracking system 100 via the network 120. For example, the server 110 may obtain image collection data from the camera 130 via the network 120. In some embodiments, the network 120 may be any type of wired or wireless network, or combination thereof. Merely by way of example, the network 120 may include a cable network, a wireline network, an optical fiber network, a telecommunications network, an intranet, an Internet, a local area network (LAN) , a wide area network (WAN) , a wireless local area network (WLAN) , a metropolitan area network (MAN) , a wide area network (WAN) , a public telephone switched network (PSTN) , a Bluetooth network, a ZigBee network, a near field communication (NFC) network, or the like, or any combination thereof.

The camera 130 may be and/or include any suitable device that is capable of acquiring image data. Exemplary camera 130 may include a camera (e.g., a digital camera, an analog camera, an IP camera (IPC) , etc. ) , an image collection recorder, a scanner, a mobile phone, a tablet computing device, a wearable computing device, an infrared imaging device (e.g., a thermal imaging device) , or the like. In some embodiments, the camera 130 may include a gun camera 130-1, a dome camera 130-2, an integrated camera 130-3, a binocular camera 130-4, a monocular camera, etc. In some embodiments, the camera 130 may include a charge-coupled device (CCD) , a complementary metal-oxide-semiconductor (CMOS) sensor, an N-type metal-oxide-semiconductor (NMOS) , a contact image sensor (CIS) , and/or any other suitable image sensor. The image data acquired by the camera 130 may include an image, or any data about an image, such as values of one or more pixels (or referred to as pixel values) of an image (e.g., luma, gray values, intensities, chrominance, contrast of one or more pixels of an image) , RGB data, image collection data, audio information, timing information, location data, etc.

The storage device 140 may store data and/or instructions. The data and/or instructions may be obtained from, for example, the server 110, the camera 130, and/or any other component of the object tracking system 100. For example, the storage device 140 may store image data acquired by the camera 130 and/or one or more trajectories generated by the processing device 112. In some embodiments, the storage device 140 may store data and/or instructions that the server 110 (e.g., the processing device 112) may execute or use to perform exemplary methods described in the present disclosure. In some embodiments, the storage device 140 may include a mass storage device, a removable storage device, a volatile read-and-write memory, a read-only memory (ROM) , or the like, or any combination thereof. Exemplary mass storage devices may include a magnetic disk, an optical disk, solid-state drives, etc. Exemplary removable storage devices may include a flash drive, a floppy disk, an optical disk, a memory card, a zip disk, a magnetic tape, etc. Exemplary volatile read-and-write memory may include a random access memory (RAM) . Exemplary RAM may include a dynamic RAM (DRAM) , a double date rate synchronous dynamic RAM (DDR SDRAM) , a static RAM (SRAM) , a thyristor RAM (T-RAM) , and a zero-capacitor RAM (Z-RAM) , etc. Exemplary ROM may include a mask ROM (MROM) , a programmable ROM (PROM) , an erasable programmable ROM (PEROM) , an electrically erasable programmable ROM (EEPROM) , a compact disk ROM (CD-ROM) , and a digital versatile disk ROM, etc. In some embodiments, the storage device 140 may be implemented on a cloud platform. Merely by way of example, the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, an inter-cloud, a multi-cloud, or the like, or any combination thereof.

In some embodiments, the storage device 140 may be connected to the network 120 to communicate with one or more components of the object tracking system 100 (e.g., the server 110, the camera 130, etc. ) . One or more components of the object tracking system 100 may access the data or instructions stored in the storage device 140 via the network 120. In some embodiments, the storage device 140 may be directly connected to or communicate with one or more components of the object tracking system 100 (e.g., the server 110, the camera 130, etc. ) . In some embodiments, the storage device 140 may be part of the server 110 or the camera 130. In some embodiments, one or more components of the object tracking system 100 (e.g., the server 110, the camera 130, etc. ) may have permission to access the storage device 140. In some embodiments, one or more components of the object tracking system 100 may read and/or modify information stored in the storage device 140 when one or more conditions are met.

In some embodiments, the service providing system 140 may be configured to provide services, such as an object tracking service, an anomaly detection service, an online to offline service (e.g., a taxi service, a carpooling service, a food delivery service, a party organization service, an express service, etc. ) , an unmanned driving service, a map-based service (e.g., a route planning service) , a live chatting service, a query service, a Q&Aservice, etc. The service providing system 140 may generate service responses, for example, by transmitting a service request for object tracking received from a user to the processing device 112 for object tracking. For example, the service providing system 140 may include an intelligent transportation system, a security system, an online to offline service providing system, etc.

In some embodiments, the service providing system 140 may be a device, a platform, or other entity interacting with the object tracking system. In some embodiments, the service providing system 140 may also be implemented in a device with data processing, such as a mobile device 140-1, a tablet computer 140-2, a laptop computer 140-3, and a server 140-4, or the like, or any combination thereof. In some embodiments, the mobile device 140-1 may include a smart home device, a wearable device, a smart mobile device, a virtual reality device, an augmented reality device, or the like, or any combination thereof. In some embodiments, the smart home device may include a smart lighting device, a control device of an intelligent electrical apparatus, a smart monitoring device, a smart television, a smart image collection camera, an interphone, or the like, or any combination thereof. In some embodiments, the wearable device may include a smart bracelet, a smart footgear, a smart glass, a smart helmet, a smartwatch, a smart clothing, a smart backpack, a smart accessory, or the like, or any combination thereof. In some embodiments, the smart mobile device may include a smartphone, a personal digital assistant (PDA) , a gaming device, a navigation device, a point of sale (POS) device, or the like, or any combination thereof. In some embodiments, the virtual reality device and/or the augmented reality device may include a virtual reality helmet, a virtual reality glass, a virtual reality patch, an augmented reality helmet, an augmented reality glass, an augmented reality patch, or the like, or any combination thereof. For example, the virtual reality device and/or the augmented reality device may include a Google Glass, an Oculus Rift, a HoloLens, a Gear VR, etc. In some embodiments, the server 140-4 may include a database server, a file server, a mail server, a web server, an application server, a computing server, a media server, a communication server, etc.

It should be noted that the above description is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations and modifications may be made under the teachings of the present disclosure. For example, the object tracking system 100 may include one or more terminal devices. As another example, the processing device 112 may be integrated into the camera 130. However, those variations and modifications do not depart from the scope of the present disclosure.

FIG. 2 is a schematic diagram illustrating exemplary hardware and/or software components of a computing device 200 according to some embodiments of the present disclosure. The computing device 200 may be used to implement any component of the object tracking system 100 as described herein. For example, the server 110 (e.g., the processing device 112) and/or the camera 130 may be implemented on the computing device 200, via its hardware, software program, firmware, or a combination thereof. Although only one such computing device is shown, for convenience, the computer functions relating to the object tracking system 100 as described herein may be implemented in a distributed fashion on a number of similar platforms, to distribute the processing load.

As illustrated in FIG. 2, the computing device 200 may include a processor 201, a storage 203, an input/output (I/O) 205, and a communication port 207. The processor 201 may execute computer instructions (e.g., program code) and perform functions of the object tracking system 100 in accordance with techniques as described elsewhere in the present disclosure. The computer instructions may include, for example, routines, programs, objects, components, data structures, procedures, modules, and functions, which perform particular functions as described elsewhere in the present disclosure. For example, the processor 201 may determine a trajectory of an object based on image collection data acquired by the camera 130 In some embodiments, the processor 201 may include interface circuits and processing circuits therein. The interface circuits may be configured to receive electronic signals from the communication port 207, wherein the electronic signals encode structured data and/or instructions for the processing circuits to process. The processing circuits may conduct logic calculations, and then determine a conclusion, a result, and/or an instruction encoded as electronic signals. Then the interface circuits may send out the electronic signals from the processing circuits via the communication port 207.

In some embodiments, the processor 201 may include one or more hardware processors, such as a microcontroller, a microprocessor, a reduced instruction set computer (RISC) , an application specific integrated circuits (ASICs) , an application-specific instruction-set processor (ASIP) , a central processing unit (CPU) , a graphics processing unit (GPU) , a physics processing unit (PPU) , a microcontroller unit, a digital signal processor (DSP) , a field-programmable gate array (FPGA) , an advanced RISC machine (ARM) , a programmable logic device (PLD) , any circuit or processor capable of executing one or more functions, or the like, or any combinations thereof.

Merely for illustration, only one processor may be described in the computing device 200. However, it should be noted that the computing device 200 of the present disclosure may also include multiple processors, and thus operations and/or method steps that are performed by one processor as described in the present disclosure may also be jointly or separately performed by the multiple processors. For example, if in the present disclosure the processor of the computing device 200 executes both operations A and operation B, it should be understood that operation A and operation B may also be performed by two or more different processors jointly or separately in the computing device 200 (e.g., a first processor executes operation A and a second processor executes operation B, or vice versa, or the first and second processors jointly execute operations A and B) .

The storage 203 may store data/information obtained from the server 110, the camera 130, and/or any other component of the object tracking system 100. In some embodiments, the storage 203 may include a mass storage device, a removable storage device, a volatile read-and-write memory, a read-only memory (ROM) , or the like, or any combination thereof. For example, the mass storage device may include a magnetic disk, an optical disk, solid-state drives, etc. The removable storage device may include a flash drive, a floppy disk, an optical disk, a memory card, a zip disk, a magnetic tape, etc. The volatile read-and-write memory may include a random access memory (RAM) . The RAM may include a dynamic RAM (DRAM) , a double date rate synchronous dynamic RAM (DDR SDRAM) , a static RAM (SRAM) , a thyristor RAM (T-RAM) , and a zero-capacitor RAM (Z-RAM) , etc. The ROM may include a mask ROM (MROM) , a programmable ROM (PROM) , an erasable programmable ROM (EPROM) , an electrically-erasable programmable ROM (EEPROM) , a compact disk ROM (CD-ROM) , and a digital versatile disk ROM, etc. In some embodiments, the storage 203 may store one or more programs and/or instructions to perform exemplary methods described in the present disclosure. For example, the storage 203 may store a program for object tracking

The I/O 205 may input and/or output signals, data, information, etc. In some embodiments, the I/O 205 may enable user interaction with the computing device 200. In some embodiments, the I/O 205 may include or communicate with an input device and an output device to facilitate communication between the computing device 200 and an input device or an output device. Examples of the input device may include a keyboard, a mouse, a touch screen, a microphone, or the like, or any combination thereof. Examples of the output device may include a display device, a loudspeaker, a printer, a projector, or the like, or any combination thereof. Examples of the display device may include a liquid crystal display (LCD) , a light-emitting diode (LED) -based display, a flat panel display, a curved screen, a television device, a cathode ray tube (CRT) , a touch screen, or the like, or any combination thereof.

The communication port 207 may be connected to a network (e.g., the network 120) to facilitate data communications. The communication port 207 may establish connections between the computing device 200 and one or more other components of the object tracking system 100 or an external source. The connection may be a wired connection, a wireless connection, any other communication connection that can enable data transmission and/or reception, and/or any combination of these connections. The wired connection may include, for example, an electrical cable, an optical cable, a telephone wire, or the like, or any combination thereof. The wireless connection may include, for example, a Bluetooth ^TM link, a Wi-Fi ^TM link, a WiMax ^TM link, a WLAN link, a ZigBee link, a mobile network link (e.g., 3G, 4G, 5G, etc. ) , or the like, or any combination thereof. In some embodiments, the communication port 207 may be and/or include a standardized communication port, such as RS232, RS485, etc. In some embodiments, the communication port 207 may be a specially designed communication port.

FIG. 3 is a schematic diagram illustrating exemplary hardware and/or software components of a mobile device 300 according to some embodiments of the present disclosure. In some embodiments, one or more components (e.g., a terminal device not shown in figures, the processing device 112, and/or the camera 130) of the object tracking system 100 may be implemented on the mobile device 300.

As illustrated in FIG. 3, the mobile device 300 may include a communication port 310, a display 320, a graphics processing unit (GPU) 330, a central processing unit (CPU) 340, an I/O 350, a memory 360, and a storage 390. In some embodiments, any other suitable component, including but not limited to a system bus or a controller (not shown) , may also be included in the mobile device 300. In some embodiments, a mobile operating system 370 (e.g., iOS ^TM, Android ^TM, Windows Phone ^TM, etc. ) and one or more applications 380 may be loaded into the memory 360 from the storage 390 in order to be executed by the CPU 340. The applications 380 may include a browser or any other suitable mobile apps for receiving and rendering information relating to the object tracking system 100. User interactions with the information stream may be achieved via the I/O 350 and provided to the object tracking system 100 via the network 120.

To implement various modules, units, and their functionalities described in the present disclosure, computer hardware platforms may be used as the hardware platform (s) for one or more of the elements described herein. A computer with user interface elements may be used to implement a personal computer (PC) or any other type of work station or terminal device. A computer may also act as a server if appropriately programmed.

FIG. 4A is a block diagram illustrating an exemplary processing device 112 according to some embodiments of the present disclosure. As shown in FIG. 4, the processing device 112 may include an obtaining module 410, a detecting module 420, a determining module 430, and a matching module 440.

The obtaining module 410 may be configured to data for objecting tracking. For example, the obtaining module 410 may obtain one or more image collections captured by one or more cameras. In some embodiments, the obtaining module 410 may obtain the one or more image collections from the camera 130, the storage device 140, the service providing system 150, or any other storage device. The obtaining module 410 may obtain the one or more image collections from time to time, e.g., periodically. As another example, the obtaining module 410 may be configured to obtain an object detection model. The object detection model may be configured to detect and/or locate one or more objects in an image.

The detecting module 420 may be configured to detect and/or locate one or more objects from each of at least a portion of multiple images included in the image collection. In some embodiments, the detecting module 420 may be configured to extract one or more image features associated with at least one of the detected one or more objects using, for example, an object detection model. In some embodiments, the detecting module 420 may be configured to determine a location of a detected object in an image. The determining module 430 may be configured to determine a geographical location of the at least one of the one or more objects corresponding to each of the multiple images. In some embodiments, the determining module 430 may determine the geographical location of an object corresponding to a specific image based on a location of the object in the specific image. For example, the geographical location of the object may be denoted by geographical coordinates in a geographic coordinate system. The location of the object in the specific image may be denoted by coordinates in an image coordinate system. The processing device 112 may convert the coordinates in the image coordinate system into the geographical coordinates in the geographic coordinate system based on a transform relationship between the image coordinate system and the geographic coordinate system.

The matching module 440 may be configured to determine two matched objects presented in different images. For example, the matching module 440 may match, based on one or more image features associated with at least one of one or more objects and a geographical location of the at least one of the one or more objects, a first object detected in one of multiple images with a second object detected in one or more other images of the multiple images. As used herein, two or more “matched objects” detected respectively in two or more different images may refer to that the two or more objects are the same object. In some embodiments, the matching module 440 may match the first object detected in one of the multiple images with the second object detected in one or more other images of the multiple images by matching the first object and the second object detected in two adjacent images of the multiple images. For example, the matching module 440 may match the first object presented in a first image with one of the one or more objects (i.e., the second object) presented in a second image adjacent to the first image. In some embodiments, the term “adjacent” means close (e.g. right next to, or within a short fixed range) in a temporal sequence. The one of the one or more objects matched with the first object may be determined to be the same as the first object and/or designated as the first object.

In some embodiments, the matching module 440 may determine a trajectory of the first object tracked in the camera. In some embodiments, the matching module 440 may obtain a geographical location of the first object detected and/or tracked corresponding to each of at least a portion of the multiple images determined by the determining module 430. The matching module 440 may determine the trajectory of the first object based on the geographical location of the first object detected and/or tracked from each of at least a portion of the multiple images.

FIG. 4B is a block diagram illustrating another exemplary processing device 112 according to some embodiments of the present disclosure. As shown in FIG. 4, the processing device 112 may include an obtaining module 450, a single-camera tracking module 420, and a multi-camera tracking module 430.

The obtaining module 450 may be configured to data for objecting tracking. For example, the obtaining module 450 may obtain a plurality of image collections captured by multiple cameras. In some embodiments, the obtaining module 450 may obtain at least one of the plurality of image collections from the camera 130, the storage device 140, the service providing system 150, or any other storage device. The obtaining module 450 may obtain the at least one of the plurality of image collections from time to time, e.g., periodically. As another example, the obtaining module 450 may be configured to obtain an object detection model. The object detection model may be configured to detect and/or locate one or more objects in an image in at least one of the plurality of image collections.

The single-camera tracking module 460 may track an object in multiple images captured by one single camera. For example, the single-camera tracking module 460 may detect and/or locate one or more objects from multiple images included in each of at least a portion of the plurality of image collections. As another example, the single-camera tracking module 460 may extract one or more image features associated with at least one of the one or more objects. As still another example, the single-camera tracking module 460 may match an object presented in one of the multiple images with another object presented in one or more other images of the multiple images to track the object. As a further example, the single-camera tracking module 460 may determine a trajectory of the tracked object.

The multi-camera tracking module 470 may be configured to track an object (a vehicle) across the multiple cameras. The multi-camera tracking module 470 may further include a feature extraction unit 471, a similarity determination unit 472, and a matching unit 473. The feature extraction unit 471 may be configured to extract one or more image features of an object from each of multiple images where the object is tracked. The feature extraction unit 471 may be configured to extract one or more trajectory features of the tracked object. The similarity determination unit 472 may be configured to determine a first similarity degree of the one or more image features associated with two objects tracked by two different cameras. The similarity determination unit 472 may be configured to determine a second similarity degree of at least one of the one or more trajectory features associated with the two objects. The matching unit 473 may be configured to determine two matched objects tracked by two or more different cameras. For example, the matching unit 473 may be configured to determine whether an object tracked by one camera is matched with another object tracked by one or more other cameras at least in part based on the first similarity degree and the second similarity degree. In some embodiments, the matching unit 473 may be configured to determine whether an object tracked by one camera is matched with another object tracked by one or more other cameras based on the first similarity degree, the second similarity degree, and a trajectory accessibility between the two objects. In some embodiments, the matching unit 473 may be configured to determine a target trajectory in response to that the two objects are matched based on the trajectories of the two objects.

The modules may be hardware circuits of all or part of the processing device 112. The modules may also be implemented as an application or set of instructions read and executed by the processing device 112. Further, the modules may be any combination of the hardware circuits and the application/instructions. For example, the modules may be the part of the processing device 112 when the processing device 112 is executing the application/set of instructions. The modules in the processing device 112 may be connected to or communicate with each other via a wired connection or a wireless connection. The wired connection may include a metal cable, an optical cable, a hybrid cable, or the like, or any combination thereof. The wireless connection may include a Local Area Network (LAN) , a Wide Area Network (WAN) , a Bluetooth, a ZigBee, a Near Field Communication (NFC) , or the like, or any combination thereof.

It should be noted that the above descriptions of the processing devices 112 is provided for the purposes of illustration, and is not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations and modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure. In some embodiments, one or more of the modules mentioned above may be omitted. For example, the actuation module 420 may be omitted. In some embodiments, two or more of the modules may be combined into a single module, and any one of the modules may be divided into two or more units. For example, the determination module 410 and the actuation module 420 may be integrated into a single module. In some embodiments, the processing device 112 may further include one or more additional modules, such as a storage module.

FIG. 5 is a flowchart illustrating an exemplary process for single-camera object tracking according to some embodiments of the present disclosure. In some embodiments, process 500 may be executed by the object tracking system 100. For example, the process 500 may be implemented as a set of instructions (e.g., an application) stored in a storage device (e.g., the storage device 140, the storage 203, and the storage 390) . In some embodiments, the processing device 112 (e.g., the processor 201 of the computing device 200, the CPU 340 of the mobile device 300, and/or one or more modules illustrated in FIG. 4) may execute the set of instructions and may accordingly be directed to perform the process 500.

In 501, the processing device 112 (e.g., the obtaining module 410) (e.g., the processing circuits of the processor 201) may obtain an image collection captured by a camera. In some embodiments, the image collection may be an image sequence including multiple images. In some embodiments, the image collection may be a video including the multiple images (also referred to as frames) . The multiple images may be arranged in the image collection in time sequence. Each of the multiple images may correspond to a timestamp. The image collection may correspond to a time period from a starting time to an ending time. The starting time may be an earliest timestamp among timestamps of the multiple images. The ending time may be a latest timestamp among the timestamps of the multiple images.

The camera may be and/or include any suitable device that is capable of acquiring an image collection of a scene as described elsewhere in the present disclosure (e.g., FIG. 1 and the descriptions thereof) . In some embodiments, the image collection may be generated by the camera (e.g., the camera 130) via monitoring an area within the scope (i.e., the field of view ) of the camera. The image collection may record a scene happened within the scope of the camera during a time period (e.g., one or more seconds, one or more minutes, one or more hours, etc. ) . For example, the image collection may record one or more vehicles or pedestrians moving within the field of view of the camera. In some embodiments, the processing device 112 (e.g., the obtaining module 410) may obtain the image collection from the camera 130, the storage device 140, the service providing system 150, or any other storage device. The processing device 112 may obtain the image collection from time to time, e.g., periodically. For example, the processing device 112 may obtain the image collection from the camera per second, per minute, or per hour, etc. As another example, the processing device 112 may obtain the image collection from the camera in real time. As still another example, the processing device 112 may obtain the image collection from the camera in response to receive a request for object tracking inputted by a user (e.g., a policeman) by the service providing system 150.

In 502, the processing device 112 (e.g., the detecting module 420 or the determining module 430) (e.g., the interface circuits of the processor 201) may detect one or more objects from each of at least a portion of multiple images included in the image collection to extract one or more image features associated with at least one of the one or more objects using an object detection model. The multiple images included in the image collection may be described as in the descriptions for operation 501. An object detected from an image may include a person, a vehicle, an animal, a physical subject, or the like, or a combination thereof. The one or more objects presented in different images among the at least a portion of the multiple images may be different or same. For example, a specific object may be just presented in only several consecutive images among the multiple images. The count or number of the one or more objects detected from different images may be different or the same.

In some embodiments, the object detection model may be configured to detect and/or locate one or more objects in an image. For example, the object detection model may be configured to mark and/or locate an object in an input image using a bounding box. The bounding box may refer to a box enclosing the detected object in the input image. The bounding box may have any shape and/or size. For example, the bounding box may have the shape of a square, a rectangle, a triangle, a polygon, a circle, an ellipse, an irregular shape, or the like. In some embodiments, the bounding box may be a minimum bounding box that has a preset shape (e.g., a rectangle, a square, a polygon, a circle, an ellipse) and encloses a detected object in the input image. The object detection model may be configured to output the image with the bounding box for marking the object or output the bounding box with the detected object. In some embodiments, the objection detection model may be configured to extract and/or output one or more image features associated with a detected object presented in an image. For example, the object detection model may be configured to segment a region marked by the bounding box and extract the image features of the segmented region. The one or more image features extracted and/or output by the object detection model may be also referred to as a feature map or vector. Exemplary image features may include a low-level feature (e.g., an edge feature, a textural feature) , a high-level feature (e.g., a semantic feature) , a complicated feature (e.g., a deep hierarchical feature) , etc. Exemplary object detection models may include a region-based convolutional network (R-CNN) , a spatial pyramid pooling network (SPP-Net) , a fast region-based convolutional network (Fast R-CNN) , a faster region-based convolutional network (Faster R-CNN) , etc.

The processing device 112 may input a specific image of the multiple images into the object detection model. The object detection model may detect one or more objects presented in the specific image and locate at least one of the one or more detected objects using a bounding box. The object detection model may further extract the one or more image features associated with the at least one of the one or more detected objects from a region marked by the bounding box. In some embodiments, the processing device 112 may designate a serial number or other signs or identifying the at least one of the one or more detected objects.

In 503, the processing device 112 (e.g., the detecting module 420 or the determining module 430) (e.g., the processing circuits of the processor 201) may determine a geographical location of the at least one of the one or more objects corresponding to each of the multiple images. As used herein, a geographical location of an object corresponding to an image may refer to where the object is located when the image is captured by a camera or at a timestamp of the image. In some embodiments, the processing device 112 may determine the geographical location of an object corresponding to a specific image based on a location of the object in the specific image. For example, the geographical location of the object may be denoted by geographical coordinates in a geographic coordinate system. The location of the object in the specific image may be denoted by coordinates in an image coordinate system. The processing device 112 may convert the coordinates in the image coordinate system into the geographical coordinates in the geographic coordinate system based on a transform relationship between the image coordinate system and the geographic coordinate system. The transform relationship may be a default setting of the object tracking system 100.

In some embodiments, the processing device 112 may determine the location of the object in the specific image based on the bounding box for marking and/or locating the object. For example, the processing device 112 may designate a center location of the bounding box in the specific image as the location of the object in the specific image. As another example, the processing device 112 may determine the location of the object in the specific image based on parameters of the bounding box (e.g., locations of vertexes of the bounding box in the image, a length of each side of the bounding box, etc. ) . The processing device 112 may denote a bounding box as b= (x, y, w, h) . The processing device 112 may determine coordinates of a point c for denoting a location of the object in the specific image according to Equation 1 as follows:

where x refers to a horizontal coordinate of a specific vertex (e.g., the top left corner) of the bounding box in an image coordinate system, y refers to a longitudinal coordinate of the specific vertex of the bounding box in an image coordinate system, w refers to a length of the bounding box along a horizontal direction (i.e., the horizontal axis) of the image coordinate system, h refers to a length (or width) of the bounding box along a longitudinal direction (i.e., the longitudinal axis) of the image coordinate system,

refers to a horizontal coordinate of point c in an image coordinate system,

refers to a longitudinal coordinate of point c in the image coordinate system. The processing device 112 may further determine the geographical location of the object by converting the coordinates of the point in the image coordinate system to geographical coordinates of the geographical location of the object in the geographic coordinate system. The processing device 112 may determine geographical coordinates of the geographical location of the object according to Equation (2) as follows:

(lon, lan, h) =M· (c _x, c _y, 1) ^T (2) ,

where lon refers to the longitude of the object, lan refers to the latitude of the object, h refers to the altitude of the object, M refers to a transform matrix (i.e., transform relationship) between the image coordinate system and the geographical coordinate system. Transform matrix M may be a default setting of the object tracking system 100.

In 504, the processing device 112 (e.g., the matching module 440) (e.g., the processing circuits of the processor 201) may match, based on the one or more image features associated with the at least one of the one or more objects and the geographical location of the at least one of the one or more objects, a first object detected in one of the multiple images with a second object detected in one or more other images of the multiple images. As used herein, two or more “matched objects” detected respectively in two or more different images may refer to that the two or more objects are the same object.

In some embodiments, the processing device 112 may match the first object detected in one of the multiple images with the second object detected in one or more other images of the multiple images by matching the first object and the second object detected in two adjacent images of the multiple images. For example, the processing device 112 may match the first object presented in a first image with one of the one or more objects (i.e., the second object) presented in a second image adjacent to the first image. In some embodiments, the term “adjacent” means close (e.g. right next to, or within a short fixed range) in a temporal sequence. The one of the one or more objects matched with the first object may be determined to be the same as the first object and/or designated as the first object. The processing device 112 may further match the first object presented in the second image with one of the one or more objects (i.e., the second object) presented in a third image adjacent to the second image. The processing device 112 may match the first object with the second object detected in two adjacent images of the multiple images until the processing device 112 cannot determine a second object matching with the first object. Then a track of the first object or the second object may be obtained from the at least a portion of the multiple images or the image collection (e.g., a video) . In some embodiments, the processing device 112 may provide and unify serial numbers or signs for identifying the first object and the second object tracked from the at least a portion of the multiple images in response to matching the first object detected in one of the multiple images with the second object detected in one or more other images of the multiple images. The matched objects in the at least a portion of the multiple images may be designated a same serial number or sign. As used herein, the term “track” may refer to detecting, localizing, and/or identifying an object from multiple sequential (e.g. consecutive) images. A track of an object may include an identifier of the object (e.g., a serial number) , a location of the object in each of the multiple consecutive images, a geographical location corresponding to each of the multiple consecutive images, etc. As used herein, two adjacent images in the image collection may refer to that no other image is arranged between the two adjacent images. Two adjacent images may also be referred to as two consecutive images in the image collection.

In some embodiments, the processing device 112 may match the first object with the second object detected in two adjacent images of the multiple images at least in part based on a first similarity degree between the one or more image features associated with the first object and the second object and/or a second similarity degree between the geographical locations of the first object and the second object. In some embodiments, the processing device 112 may determine that the first object is matched with the second object in response to determine that the first similarity degree satisfies a first condition and/or the second similarity degree satisfies a second condition. For example, the processing device 112 may determine that the first object is matched with the second object in response to determine that the first similarity degree exceeds a first threshold and the second similarity degree exceeds a second threshold. In some embodiments, the processing device 112 may determine a target similarity degree between the first object and the second object based on the first similarity degree and the second similarity degree. For example, the processing device 112 may determine a sum of the first similarity degree and the second similarity degree as the target similarity degree. As another example, the processing device 112 may determine a weighted sum by weighting the first similarity degree and the second similarity degree as the target similarity degree. As still another example, the processing device 112 may determine an average value of the first similarity degree and the second similarity degree as the target similarity degree. The processing device 112 may determine that the first object is matched with the second object in response to determine that the total similarity degree exceeds a threshold.

In some embodiments, each of two adjacent images may include multiple objects. The processing device 112 may determine multiple pairs of matched objects from the multiple objects. Each pair of objects may include one object in one image of the two adjacent images and one object in another one image of the two adjacent images. The processing device 112 may determine one or more pairs of matched objects (e.g., the first object and the second object) from the multiple pairs of objects based on the first similarity degree and the second similarity degree corresponding to each pair of the multiple pairs of objects. For example, if a first image includes object A and B, and the second image adjacent to the first image includes object C, D, and F. The processing device 112 may determine six pairs of objects including (A, C) , (A, D) , (A, F) , (B, C) , (B, D) , and (B, F) . The processing device 112 may determine one or more pairs (e.g., (A, C) , (B, F) ) of matched objects from the six pairs of objects based on the first similarity degree and the second similarity degree corresponding to each pair of the six pairs of objects. In some embodiments, the processing device 112 may determine a first similarity degree between the one or more image features associated with each pair of the multiple pairs of objects. The processing device 112 may determine a second similarity degree between the geographical locations of each pair of the multiple pairs of objects. In some embodiments, the processing device 112 may determine one or more specific pairs of matched objects (i.e., the matched first object and the second object) from the multiple pairs of objects as the pairs of matched objects (e.g., the matched first object and second subject) based on the first similarity degree and the second similarity degree using a Hungarian algorithm. In some embodiments, the processing device 112 may determine, based on the first similarity degree and the second similarity degree, a target similarity degree of each pair of the multiple pairs of objects to obtain multiple target similarity degrees. The processing device 112 may classify the multiple target similarity degrees into several groups. Target similarity degrees in each of the several groups may correspond to different objects presented in the two adjacent images. For example, if a first image includes object A and B, and the second image adjacent to the first image includes object C, D, and F, the processing device 112 may determine six pairs of objects including (A, C) , (A, D) , (A, F) , (B, C) , (B, D) , and (B, F) . In addition, the processing device 112 may classify the target similarity degrees corresponding to (A, C) , (B, D) into a same group, (A, D) , (B, F) into another group, and (A, F) , (B, C) into another group. The processing device 112 may determine one or more specific pairs of matched objects (i.e., the matched first object and the second object) from the multiple pairs of objects as the pairs of matched objects (e.g., the matched first object and second subject) in response to determine that the target similarity degrees of one of several groups satisfies a condition. For example, if a sum of the target similarity degrees in a specific group of the several groups is the maximum among the several groups, the pairs of objects corresponding to the target similarity degrees in the specific group of the several groups may be designated as matched objects. As another example, if each of the target similarity degrees in a specific group of the several groups may be greater than a threshold, the pairs of objects corresponding to the target similarity degrees in the specific group of the several groups may be designated as matched objects.

In some embodiments, a similarity degree (e.g., the first similarity degree and/or the second similarity degree) associated with a pair of objects may be determined based on a distance associated with the pair of objects. The distance used herein may be also referred to as similarity distance. Exemplary similarity distances may include a Euclidean distance, a Manhattan distance, a Minkowski distance, etc. In some embodiments, the processing device 112 may determine a first distance between the one or more image features associated with each pair of the multiple pairs of objects. The processing device 112 may determine a second distance between the geographical locations of each pair of the multiple pairs of objects. The processing device 112 may determine, based on the first distance and the second distance, a target distance of each pair of the multiple pairs of objects to obtain multiple target distances. The processing device 112 may classify the multiple target distances into several groups. Target distances in each of the several groups may correspond to different objects presented in the two adjacent images. For example, if a first image includes object A and B, and the second image adjacent to the first image includes object C, D, and F, the processing device 112 may determine six pairs of objects including (A, C) , (A, D) , (A, F) , (B, C) , (B, D) , and (B, F) . In addition, the processing device 112 may classify the target distances corresponding to (A, C) , (B, D) into a same group, (A, D) , (B, F) into another group, and (A, F) , (B, C) into another group. The processing device 112 may determine one or more specific pairs of matched objects (i.e., the matched first object and the second object) from the multiple pairs of objects as the pairs of matched objects (e.g., the matched first object and second subject) in response to determine that the target distances of one of several groups satisfies a condition. For example, if a sum of the target distances in a specific group of the several groups is the minimum among sums of target distances of the several groups, the pairs of objects corresponding to the target distances in the specific group of the several groups may be designated as matched objects. As another example, if each of the target distances in a specific group of the several groups is smaller than a threshold, the pairs of objects corresponding to the target distances in the specific group of the several groups may be designated as matched objects.

In some embodiments, the processing device 112 may determine a loss function for object tracking. The loss function may include a first component configured to determine a first distance between one or more image features associated with each pair of multiple pairs of objects presented in the two adjacent images and a second component configured to determine a second distance between the geographical locations of each pair of the multiple pairs of objects. The processing device 112 may determine a value of the loss function corresponding to one or more pairs of the multiple pairs of objects. The one or more pairs of objects may include different objects presented in the two adjacent images. The processing device 112 may determine a specific pair of objects corresponding to a minimum value of the loss function as the matched first object and the second object. More descriptions for matching two objects may be found elsewhere in the present disclosure (e.g., FIG. 6 and the descriptions thereof) .

In 505, the processing device 112 (e.g., the determining module 430 or the matching module 440) (e.g., the processing circuits of the processor 201) may determine a trajectory of the first object tracked in the camera. In some embodiments, the processing device 112 may obtain a geographical location of the first object detected and/or tracked corresponding to each of at least a portion of the multiple images determined in operation 503. The processing device 112 may determine the trajectory of the first object based on the geographical location of the first object detected and/or tracked from each of at least a portion of the multiple images.

In some embodiments, the trajectory of the first object may include a travel route along which the first object moves within a field of view of the camera and leaves from the field of view of the camera. In some imaging scenarios (e.g., the vehicle being blocked) , the first object cannot be detected and/or tracked in one or more images, which may cause the trajectory of the first object to be uncompleted. For example, the processing device 112 may detect and/or track the first object in several images of the multiple images to obtain a first track of the first object. The processing device 112 may detect and/or track one or more candidate objects to obtain one or more candidate tracks (i.e., second tracks) of the one or more candidate objects. The processing device 112 may match the first track of the first object with the one or more candidate tracks to determine the trajectory of the first object. As used herein, two matched tracks determined from an image collection may refer to that the two matched tracks are belonged to the same object (e.g., a vehicle) and the two matched tracks are accessed. More descriptions for matching two tracks may be found elsewhere in the present disclosure (e.g., FIG. 6 and the descriptions thereof) .

It should be noted that the above description of the process 500 is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations and modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure. In some embodiments, the process 500 may be accomplished with one or more additional operations not described and/or without one or more of the operations herein discussed. Additionally, the order of the process 500 may not intended to be limiting. For example, operation 505 may be omitted.

Operations

502 and 503 may be integrated into one single operation.

FIG. 6 is a flowchart illustrating an exemplary process for matching two objects according to some embodiments of the present disclosure. In some embodiments, process 600 may be executed by the object tracking system 100. For example, the process 600 may be implemented as a set of instructions (e.g., an application) stored in a storage device (e.g., the storage device 140, the storage 203, and the storage 390) . In some embodiments, the processing device 112 (e.g., the processor 201 of the computing device 200, the CPU 340 of the mobile device 300, and/or one or more modules illustrated in FIG. 4) may execute the set of instructions and may accordingly be directed to perform the process 600. In some embodiments, one or more operations of the process 600 may be performed to achieve at least part of operation 504 as described in connection with FIG. 5.

In 601, the processing device 112 (e.g., the matching module 440) (e.g., the processing circuits of the processor 201) may determine a loss function. The loss function may include a first component configured to determine a first distance between one or more image features associated with each pair of multiple pairs of objects presented in two adjacent images, respectively. The loss function may also include a second component configured to determine a second distance between geographical locations of each pair of the multiple pairs of objects. A pair of objects may include one object detected in one of the two adjacent images and one object detected in another one image of the two adjacent images. For example, the two adjacent images may include a first image and a second image. If the processing device 112 detects M (e.g., 2) objects in the first image and N (e.g., 3) objects in the second image, the processing device 112 may determine M×N (e.g., 2×3) pairs of objects from the two adjacent images. Further, the processing device 112 may determine two objects and three objects in the two adjacent images, respectively, such as A and B detected in the first image, and C, D, and F detected in the second image. The processing device 112 may determine six pairs of objects including (A, C) , (A, D) , (A, F) , (B, C) , (B, D) , and (B, F) . As another example, the two adjacent images may include a first image and a second image. If the processing device 112 detects M (e.g., 2) objects in the first image and N (e.g., 3) objects in the second image, the processing device 112 may determine M (e.g., 2) or N (e.g., 3) pairs of objects from the two adjacent images. Further, the processing device 112 may determine two objects and three objects in the two adjacent images, respectively, such as A and B detected in the first image, and C, D, and F detected in the second image. The processing device 112 may determine 3 pairs of objects including (A, C) , (A, D) , (A, F) , or 3 pairs of objects including (B, C) , (B, D) , and (B, F) .

As used herein, a first distance between one or more image features associated with a pair of objects may be used to denote a similarity between the one or more image features associated with the pair of objects. The first distance may also be referred to as a similarity distance. In some embodiments, the greater the first distance between the one or more image features associated with the pair of objects is, the smaller the similarity between the one or more image features associated with the pair of objects may be. A second distance between geographical locations of a pair of objects may be used to denote a similarity between the geographical locations of the pair of objects. The second distance may also be referred to as a similarity distance. In some embodiments, the greater the second distance between the geographical locations of the pair of objects is, the smaller the similarity between the geographical locations of the pair of objects may be. The first distance and/or the second distance may include a Euclidean distance, a Manhattan distance, a Minkowski distance, etc.

In some embodiments, the loss function may be denoted by Equation (3) as follows:

loss _tracking (i, j) =||r _i-r _j|| ₂+λ _g||g _i-g _j|| ₂ (3) ,

where r _i refers to image features of a first object detected from one of the two adjacent images, r _j refers to image features of a second object detected from another one image of the two adjacent images, g _i refers to geographical coordinates of the first object, g _j refers to geographical coordinates of the second object, and λ _g refers to a weight for the second component. λ _g may be a default setting of the object tracking system 100. For example, λ _g may be a constant value in a range from 0 to 1, such as 0.05, 0.1, 0.2, etc. As another example, λ _g may be a constant value exceeding 1.

In 602, the processing device 112 (e.g., the matching module 440) (e.g., the processing circuits of the processor 201) may determine one or more specific pairs of objects from the multiple pairs of objects in response to determine that a value of the loss function corresponding to the one or more specific pairs of objects is the minimum.

In some embodiments, the processing device 112 may determine multiple values of the loss function. Each of the multiple values of the loss function may be determined based on the image features associated with one pair of the multiple pairs of objects and the geographical locations of the one pair of the multiple pairs of objects according to Equation (3) . The image features associated with an object may be extracted as described in connection with operation 502 illustrated in FIG. 5. The geographical location of an object may be determined as described in connection with operation 503 illustrated in FIG. 5. Each value of the multiple values of the loss function may correspond to one pair of the multiple pairs of objects in the two adjacent images. The processing device 112 may determine a minimum value among the multiple values of the loss function. The processing device 112 may determine the specific pair of objects corresponding to the minimum value as two matched objects (i.e., the matched first object and second object) .

In some embodiments, the processing device 112 may classify the multiple pairs of objects into several groups. Each group of the several groups may correspond to different objects presented in the two adjacent images. For example, if a first image includes object A and B, and the second image adjacent to the first image includes object C, D, and F, the processing device 112 may determine six pairs of objects including (A, C) , (A, D) , (A, F) , (B, C) , (B, D) , and (B, F) . In addition, the processing device 112 may classify the (A, C) , (B, D) into a same group, (A, D) , (B, F) into another group, and (A, F) , (B, C) into another group. In some embodiments, the processing device 112 may determine multiple values of the loss function based on each pair of objects in the several groups. Each of the multiple values of the loss function may correspond to one group of the several groups. The processing device 112 may determine a minimum among the multiple values. The processing device 112 may determine one or more pairs of objects in a specific group corresponding to the minimum as the one or more specific pairs of matched objects.

According to

operations

601 and 602, every two adjacent images among at least a portion of multiple images of an image collection may be processed to determine one or more pairs of matched objects. The processing device 112 may match a first object detected in one of the multiple images with one or more second objects detected in one or more other images of the multiple images based on the one or more pairs of matched objects. For example, if the processing device 112 determines that the first object detected in a first image is matched with a second object detected in a second image adjacent to the first image and the processing device 112 determines that the second object detected in the second image is matched with a third object detected in a third image adjacent to the second image, the processing device 112 may determine that the first object, the second object, and the third object are matched, i.e., the first object, the second object and the third object are a same object. Accordingly, the processing device 112 may track the first object in other images. The processing device 112 may match two objects detected in two adjacent images based on image features and geographical locations which may improve accuracy for single-camera tracking when multiple targets are detected by the camera and the similarity between the appearances of the multiple targets are high.

It should be noted that the above description of the process 600 is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations and modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure. In some embodiments, the process 600 may be accomplished with one or more additional operations not described and/or without one or more of the operations herein discussed. Additionally, the order of the process 600 may not be intended to be limiting. For example,

operations

601 and 602 may be integrated into a single operation.

FIG. 7 is a flowchart illustrating an exemplary process for matching two tracks in single-camera tracking according to some embodiments of the present disclosure. In some embodiments, process 700 may be executed by the object tracking system 100. For example, the process 700 may be implemented as a set of instructions (e.g., an application) stored in a storage device (e.g., the storage device 140, the storage 203, and the storage 390) . In some embodiments, the processing device 112 (e.g., the processor 201 of the computing device 200, the CPU 340 of the mobile device 300, and/or one or more modules illustrated in FIG. 4) may execute the set of instructions and may accordingly be directed to perform the process 700. In some embodiments, one or more operations of the process 700 may be performed to achieve at least part of operation 504 or operation 506 as described in connection with FIG. 5.

In 701, the processing device 112 (e.g., the matching module 440) (e.g., the processing circuits of the processor 201) may obtain a first track of a first object from a first portion of multiple images included in an image collection. As used herein, a term “track” may refer to a detecting, a localization, and/or an identification of an object from multiple consecutive images. A track of an object may include an identifier of the object (e.g., a number) , a location of the object in each of the multiple consecutive images, a geographical location corresponding to each of the multiple consecutive images, etc.

In some embodiments, the first track of the first object may be determined according to at least a portion of process 500 (e.g., operations 501-504) as illustrated in FIG. 5. For example, the processing device 112 may track the first object from the image collection to obtain the first track based on image features and a geographical location of the second object associated with each of the first portion of the multiple images. In some embodiments, the processing device 112 may track multiple objects from the image collection based on image features and geographical locations of each of the multiple objects to obtain a plurality of tracks. Each of the plurality of tracks may correspond to one of the multiple objects. The processing device 112 may obtain one of the plurality of tracks as the first track and designate one of the multiple objects corresponding to the one of the plurality of tracks as the first object. The first portion of the multiple images included in the image collection may be a segment of the image collection. Images in the first portion may be consecutive. For example, if the image collection includes 2000 frames, the first portion may include 0-100 frames, or 200-300 frames, etc. The first object may be detected from each of the images. Each of the images in the first portion may correspond to a timestamp indicating the time when the each of the images is captured by a camera.

In 702, the processing device 112 (e.g., the matching module) (e.g., the processing circuits of the processor 201) may obtain a second track of a second object from a second portion of the multiple images included in the image collection. In some embodiments, the second track of the first object may be determined according to at least a portion of process 500 (e.g., operations 501-504) as illustrated in FIG. 5. For example, the processing device 112 may track the second object from the image collection to obtain the second track based on image features and geographical location of the second object associated with each of the second portion of the multiple images. In some embodiments, the processing device 112 may determine another one of the plurality of tracks as described in operation 701 as the second track and designate another one of the multiple objects corresponding to another one of the plurality of tracks as the second object. The first track may be different from the second track. The first object may be same or different from the second object. The second portion of the multiple images included in the image collection may be another segment of the image collection partially or entirely different from the first portion of the multiple images. For example, if the image collection includes 2000 frames, and the first portion includes 0-100 frames, the second portion may include 150-1000 frames, etc. Images in the second portion may be consecutive. The second object may be detected from each of the images in the second portion. Each of the images in the second portion may correspond to a timestamp indicating the time when the each of the images in the second portion is captured by a camera.

In 703, the processing device 112 (e.g., the matching module) (e.g., the processing circuits of the processor 201) may determine whether the first track and the second track belong a same object. As used herein, whether the first track and the second track belonged to the same object may also be referred to that whether the first object and the second object are the same object or whether the first track and the second track are matched. In some embodiments, the processing device 112 may determine whether the first track and the second track are belonged to the same object at least in part based on a similarity degree between the first track and the second track. For example, if the similarity degree between the first track and the second track is greater than a similarity threshold, the processing device may determine that the first track and the second track belong the same object (i.e., the first object or the second object) . In some embodiments, the processing device 112 may determine whether the similarity degree between the first track and the second track is greater than the similarity threshold by determining whether a distance between the first track and the second track is less than a distance threshold. As used herein, the distance between the first track and the second track may refer to a similarity distance as described elsewhere in the present disclosure.

In some embodiments, the processing device 112 may determine the similarity distance between the first track and the second track at least in part based on image features associated with the first object and the second object extracted, respectively from one or more images in the first portion and one or more images in the second portion. Further, assuming that the first portion includes M images and the second portion includes N images, the processing device 112 may extract the image features associated with the first object from each of the M images in the first portion and extract image features associated with the second object from each of the N images in the second portion as described elsewhere in the present disclosure (e.g., FIG. 5 and the descriptions thereof) . The processing device 112 may determine a candidate distance between the image features associated with the first object extracted from each of the M images and the image features associated with the second object extracted from each of the N images in the second portion to obtain a plurality of candidate distances (e.g., M×N candidate distances) . Each of the plurality of candidate distances may correspond to a pair of images. One image in a pair of images may be from the M images in the first portion and another one image in the pair of images may be from the N images in the second portion. In some embodiments, the processing device 112 may determine a minimum candidate distance among the plurality of candidate distances as the similarity distance between the first track and the second track. In some embodiments, the processing device 112 may determine an average distance of the plurality of candidate distances as the similarity distance between the first track and the second track. In some embodiments, the processing device 112 may determine one of the plurality of candidate distances as the similarity distance between the first track and the second track. For example, the processing device 112 may determine a specific candidate distance from the plurality of candidate distances corresponding a pair of images including a first frame in the second portion and a last frame in the first portion or including a first frame in the first portion and a last frame in the second portion. The processing device 112 may determine the specific candidate distance as the similarity distance between the first track and the second track.

In some embodiments, the processing device 112 may determine whether the first track and the second track are belonged to the same object at least in part based on time information associated with each of the first track and the second track in response to determine that the similarity distance between the first track and the second track satisfies a condition (e.g., exceeds the similarity threshold) . The time information may include a time period associated with each of the first track and the second track, a time point or time stamp associated with the first track and the second track, etc. For example, the processing device 112 may determine a first time period associated with the first track based on a starting time and an ending time of the first track. The starting time of the first time period may correspond an earliest timestamp among the first portion of images. The ending time of the first time period may correspond to a latest timestamp among the first portion of images. The processing device 112 may determine a second time period associated with the second track based on a starting time and an ending time of the second track. The starting time of the second time period may correspond an earliest timestamp among the second portion of images. The ending time of the second time period may correspond to a latest timestamp among the second portion of images.

In some embodiments, the processing device 112 may determine that the first track and the second track belong the same object if the first time period and the second time period are not overlapped after determining that the similarity degree between the first track and the second track is greater than the similarity threshold. For example, If the first ending time of the first track is earlier than the second starting time of the second track, the processing device 112 may determine that the first time period and the second time period are not overlapped. If the first starting time of the first track is earlier than the second ending time of the second track and the first ending time of the first track is later than the second starting time of the second track, the processing device 112 may determine that the first time period and the second time period are overlapped.

In some embodiments, the processing device 112 may determine that the first track and the second track belong the same object if a time difference between the first track and the second track is less than a time threshold (e.g., 50 milliseconds, 100 milliseconds, 150 milliseconds, 200 milliseconds, etc. ) , after determining that the similarity degree between the first track and the second track is greater than the similarity threshold and/or determining that the first time period and the second time period are not overlapped. The time difference between the first track and the second track may refer to a difference between the first starting time of the first track and the second ending time of the second track or a difference between the first ending time of the first track and the second starting time of the second track. For example, if the first starting time of the first track and the second ending time of the second track are 9: 15: 50 a.m. and 9: 15: 51 a.m., respectively, a time difference between the first track and the second track may be 1 second (i.e., 1000 milliseconds) .

In 704, the processing device 112 (e.g., the matching module) (e.g., the processing circuits of the processor 201) may determine a trajectory of the first object or the second object in response to determine that the second track is belonged to the first object or the second object. The processing device 112 may determine a geographical location of the first object based on the location of the first object detected in each of images in the first portion and a geographical location of the second object based on the location of the second object detected in each of images in the second portion. For example, the processing device 112 may determine a first trajectory corresponding to the first track based on geographical locations of the first object and a second trajectory corresponding to the second track based on geographical locations of the second object. The processing device 112 may connect the first trajectory and the second trajectory to determine a trajectory of the first object.

According to process 700, after a single-camera tracking is perform on an image collection, a post-processing may be performed to match two tracks based on image features and time information, which may improve tracking loss of an object for single-camera tracking caused by object occlusion, image loss, and multiple-target matching, etc.

It should be noted that the above description of the process 700 is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations and modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure. In some embodiments, the process 700 may be accomplished with one or more additional operations not described and/or without one or more of the operations herein discussed. Additionally, the order of the process 700 may not be intended to be limiting. For example,

operations

701 and 702 may be integrated into a single operation. Operation 704 may be omitted.

FIG. 8 is a flowchart illustrating an exemplary process for object tracking across multiple cameras according to some embodiments of the present disclosure. In some embodiments, process 800 may be executed by the object tracking system 100. For example, the process 800 may be implemented as a set of instructions (e.g., an application) stored in a storage device (e.g., the storage device 140, the storage 203, and the storage 390) . In some embodiments, the processing device 112 (e.g., the processor 201 of the computing device 200, the CPU 340 of the mobile device 300, and/or one or more modules illustrated in FIG. 4) may execute the set of instructions and may accordingly be directed to perform the process 800.

In 801, the processing device 112 (e.g., the obtaining module 450) (e.g., the processing circuits of the processor 201) may obtain a plurality of image collections captured by multiple cameras. Each of the plurality of image collections may be captured by one of the multiple cameras (e.g., the camera 130) via monitoring an area within the scope (i.e., the field of view ) of the one of the multiple cameras. An image collection may record a scene happened within the scope of a camera during a time period (e.g., 10 minutes, 30 minutes, 1 hour, etc. ) . For example, the image collection may include one or more vehicles or pedestrians moving within the field of view of the camera and/or leaving the field of view of the camera. An image collection may include multiple images. Each of the multiple images may correspond to a timestamp. In some embodiments, the plurality of image collections may be captured by the multiple cameras simultaneously or approximately at the same time. In other words, a starting time and/or an ending time of each of the plurality of image collections may be close or same.

A camera among the multiple cameras may be and/or include any suitable device that is capable of acquiring an image collection of a scene as described elsewhere in the present disclosure (e.g., FIG. 1 and the descriptions thereof) . In some embodiments, the multiple cameras may be fixedly installed within an area. A distance between two adjacent cameras may be smaller than a threshold. For example, one of the two adjacent cameras may be installed within the field of view of another one of the two adjacent cameras. In some embodiments, the processing device 112 (e.g., the obtaining module 410) may obtain the plurality of image collections from the camera 130, the storage device 140, or any other storage device. The processing device 112 may obtain the plurality of image collections from time to time, e.g., periodically. For example, the processing device 112 may obtain the plurality of image collections from the multiple cameras or a storage device per week, or per month, or per quarter, etc. As another example, the processing device 112 may obtain the plurality of image collections from the multiple cameras in response to receive a request for object tracking from a terminal (e.g., the service providing system 150) inputted by a user (e.g., a police) .

In 802, the processing device 112 (e.g., the detecting module 420 or the single-camera tracking module 460) (e.g., the interface circuits of the processor 201) may detect one or more objects from multiple images included in each of at least a portion of the plurality of image collections to extract one or more image features associated with at least one of the one or more objects.

An object may include a person, a vehicle, an animal, a physical subject, or the like, or a combination thereof. The detecting of one or more objects from multiple images included in each of at least a portion of the plurality of image collections may include detecting one or more objects from each of at least a portion of the multiple images. In some embodiments, some images in the multiple images may not include an object. The one or more objects presented in different images among the at least a portion of the multiple images may be different or same. The count or number of the one or more objects detected from different images among the at least a portion of the multiple images may be different or same.

For one of the plurality of image collections, the processing device 112 may detect and/or locate one or more objects presented in each image of the at least a portion of the multiple images using an object detection model as described elsewhere in the present disclosure (e.g., FIG. 5 and the descriptions thereof) . For example, the processing device 112 may input an image into the object detection model. The object detection model may be configured to detect one or more objects in the input image and output the image with one or more bounding boxes for locating and/or marking the one or more objects. Then the processing device 112 may use the objection detection model to extract and/or output one or more image features associated with each of the one or more detected objects from a region marked by the bounding box in the input image. The one or more image features extracted and/or output by the object detection model may be also referred to as a feature map or a feature vector. Exemplary image features may include a low-level feature (e.g., an edge feature, a textural feature) , a high-level feature (e.g., a semantic feature) , a complicated feature (e.g., a deep hierarchical feature) , etc. Exemplary object detection models may include a region-based convolutional network (R-CNN) , a spatial pyramid pooling network (SPP-Net) , a fast region-based convolutional network (Fast R-CNN) , a faster region-based convolutional network (Faster R-CNN) , etc. More descriptions for image feature extraction may be found elsewhere in the present disclosure (e.g., FIG. 5 and the descriptions thereof) .

In 803, the processing device 112 (e.g., the detecting module 420 or the single-camera tracking module 460) (e.g., the processing circuits of the processor 201) may track the at least one of one or more objects from each of the at least a portion of the plurality of image collections to obtain a trajectory of the at least one of the one or more objects. A trajectory of an object may include a plurality of points. Each of the plurality of points may denote a geographical location of the object at a time corresponding to a timestamp of an image in the image collection from which the object is detected. The geographical location of the object at the time may be determined based on a location of the object in the image. More descriptions for determining a geographical location of an object based on an image may be found elsewhere in the present disclosure (e.g., operation 503 as described in FIG. 5) .

In some embodiments, the processing device 112 may track at least one of the one or more objects from an image collection of the plurality of image collections using a single-camera tracking technique. Exemplary single-camera tracking techniques may include applying a joint probabilistic data association algorithm, a deep neural network algorithm, an expectation over transformation algorithm, etc. In some embodiments, the processing device 112 may track the at least one of one or more objects from an image collection of the plurality of image collections according to at least a portion of process 500 as described in FIG. 5. For example, the processing device 112 may determine a geographical location of the at least one of the one or more objects corresponding to each of at least a portion of the multiple images included in an image collection. The processing device 112 may match, based on the one or more image features associated with the at least one of the one or more objects and the geographical location of the at least one of the one or more objects, an object detected in one of the at least a portion of the multiple images with an object detected in one or more other images of the at least a portion of the multiple images. The processing device 112 may determine a trajectory of the tracked object based on the geographical location of the object corresponding to each of the at least a portion of the multiple images in which the object is detected and tracked. More descriptions for matching an object detected in one of the at least a portion of the multiple images with an object detected in one or more other images of the at least a portion of the multiple images may be found elsewhere in the present disclosure (e.g., FIG. 6 and the descriptions thereof) . More descriptions for determining a trajectory of a track object may be found elsewhere in the present disclosure (e.g., FIG. 7 and the descriptions thereof) .

In 804, the processing device 112 (e.g., the multi-camera tracking module 470 or the feature extract unit 471) (e.g., the processing circuits of the processor 201) may determine one or more trajectory features of the trajectory of the at least one of the one or more objects. The one or more trajectory features may include movement direction information associated with the trajectory of each of the one or more objects, geographical location information associated with the trajectory, time information associated with the trajectory, etc. The movement direction information associated with the trajectory may be denoted by a motion vector associated with two points (e.g., two adjacent points) of the trajectory or additional points determined based on a plurality of points of the trajectory. For example, a trajectory may be denoted as Traj _j= {p ₁, p ₂, p ₃, ..., p _n} , wherein p ₁, p ₂, p ₃, ..., p _n refer to points of the trajectory Traj _j. A point of the trajectory Traj _j may be denoted as p _i= {id, lon _i, lat _i, t _i} , wherein id identifies the trajectory, t _i is a timestamp of each point p _i of the trajectory Traj _j, lon _i and lat _i refer to the longitude and latitude of point p _i in t _i. A motion vector

associated with point p _i-1 and point p _i of the object may be denoted as (lon _i-lon _i-1, lat _i-lat _i-1) . Motion vectors associated with any two adjacent points of the trajectory may be formed a motion vector sequence of the trajectory. The geographical location information associated with the trajectory may include geographical locations of the plurality of points of the trajectory, or one or more additional geographical locations determined based on the geographical locations of the plurality of points of the trajectory. For example, the additional geographical locations may be determined by averaging the geographical locations of at least a portion of the plurality of points of the trajectory. The time information associated with the trajectory may include a timestamp of each of the plurality of points of the trajectory, a duration of the trajectory, a starting time and/or ending time of the trajectory, etc.

In some embodiments, the processing device 112 may divide the trajectory into multiple slices, determine an average point for each of the multiple slices, based on the several points of each of the multiple slices, and determine the one or more trajectory features based on the average point for each of the multiple slices. Each of the multiple slices may include several points corresponding to several images of the multiple images. The count or number of the several points in a slice may also be referred to as a length of the slice. The count or number of the several points in a slice may be a default setting of the object tracking system 100. For example, the processing device 112 may set the length of a slice based on a total length (i.e., a total count of the plurality of points) of the trajectory. The greater the total length (i.e., a total count of the plurality of points) of the trajectory is, the greater the length of a slice may be. The count or number of the several points in each slice may be the same or different. In some embodiments, the processing device 112 may determine the average point for each of the multiple slices by averaging coordinates of the several points of each of the multiple slices. In some embodiments, the processing device 112 may determine the average point for each of the multiple slices by averaging coordinates of any two points (e.g., a starting point and an ending point) of each of the multiple slices. In some embodiments, the processing device 112 may designate one (e.g., a middle point) of the several points in each of the multiple slices as the average point. The average point may be also referred to as an additional point. The average point may be one of the plurality of points of the trajectory or different with each of the plurality of points. The processing device 112 may determine, based on the average point for each of the multiple slices, the one or more trajectory features. For example, the processing device 112 may determine a motion vector between any two adjacent average points of any two adjacent slices. As another example, the processing device 112 may determine the location information of a trajectory based on geographical locations of the average point of each slice. More descriptions for trajectory features may be found elsewhere in the present disclosure (e.g., FIGs. 10 and 11, and the descriptions thereof) .

In some embodiments, before extracting the one or more trajectory features of the trajectory of an object, the processing device 112 may perform a smoothing operation on the trajectory to delete or adjust one or more outliers of the trajectory. The one or more outliers of the trajectory may be identified and/or determined based on a motion speed of an object associated the trajectory. Since the motion speed of an object (e.g., vehicles) cannot change abruptly, and the speed between adjacent points (e.g., two adjacent points, three adjacent points) may be stable. The processing device 112 may compare the speed between adjacent points to identify and/or determine one or more outliers. For example, a trajectory may be denoted as Traj _j= {p ₁, p ₂, p ₃, ..., p _n} , wherein p ₁, p ₂, p ₃, ..., p _n are points of the trajectory Traj _j. A point of the trajectory Traj _j may be denoted as p _i= {id, lon _i, lat _i, t _i} , wherein id identifies the trajectory, t _i is a timestamp of each point p _i of the trajectory Traj _j, lon _i and lat _i refer to the longitude and latitude of point p _i in t _i. The processing device 112 may compare the speed among p _i, p _i-1, and p _i-2 with λ _speed to determine whether point p _i is an outlier. The speed may be determined according to Equation (4) as follows:

where,

is a motion vector of the object denoted as (lon _i-lon _i-1, lat _i-lat _i-1) . The are three propositions to represent judgements, Q: s _i, i-1＞λ _speed, o: s _i-1, i-2＞λ _speed, and T: s _i, i-2＞λ _speed. The processing device 112 may determine that point p _i is an outlier if point p _i satisfies Equation (5) as follows:

where ∨ is an “OR” operation, ∧ is an “AND” operation, and

is a “NOR” operation. According to Equation (9) , if s _i, i-1＞λ _speed, or if s _i, i-1≤λ _speed, and s _i-1, i-2＞λ _speed, and s _i, i-2＞λ _speed, the processing device 112 may determine that point p _i is an outlier.

In 805, the processing device 112 (e.g., the multi-camera tracking module 470) (e.g., the processing circuits of the processor 201) may match, based on the one or more trajectory features of the trajectory and the one or more image features associated with each of the one or more objects, a first object tracked in one of the plurality of image collections with a second object tracked in one or more other image collections of the plurality of image collections.

In some embodiments, the processing device 112 may determine a first similarity degree between the one or more image features associated with the first object and the second object, respectively. The processing device 112 may determine a second similarity degree between at least one of the one or more trajectory features of a first trajectory of the first object and a second trajectory of the second object, respectively. The processing device 112 may determine whether the first object is matched with the second object at least in part based on at least one of the first similarity degree or the second similarity degree. In some embodiments, the processing device 112 may determine whether the first object is matched with the second object by determining whether the first similarity degree exceeds a first threshold and the second similarity degree exceeds a second threshold. If the first similarity degree exceeds the first threshold and the second similarity degree exceeds the second threshold, the processing device 112 may determine that the first object is matched with the second object. In some embodiments, the processing device 112 may determine a target similarity degree based on the first similarity degree and the second similarity degree. For example, the processing device 112 may determine a sum of the first similarity degree and the second similarity as the target similarity. As another example, the processing device 112 may determine a weight for each of the first similarity degree and the second similarity degree. The processing device 112 may weighting the first similarity degree and the second similarity degree and determine a sum between the weighted first similarity degree and the weighted second similarity degree as the target similarity degree. If the target similarity degree exceeds a target threshold, the processing device 112 may determine that the first object is matched with the second object. The first threshold, the second threshold, the weight for each of the first similarity degree and the second similarity degree, and/or the target threshold may be a default setting of the object tracking system 100. The first threshold, the second threshold, the weight for each of the first similarity degree and the second similarity degree, and/or the target threshold may be adjustable according to, for example, an imaging scenario. For example, for an arterial road scene, the first threshold may be greater than that for a crossroad scene. For a crossroad scene, the second threshold may be greater than that for an arterial road scene.

In 806, the processing device 112 (e.g., the multi-camera tracking module 470) (e.g., the processing circuits of the processor 201) may determine a trajectory of the first object or the second object tracked in the multiple cameras. If the first trajectory and the second trajectory are matched, the processing device 112 may unify the first trajectory and the second trajectory to determine the trajectory of the first object and the second object. For example, the processing device 112 may connect a starting location of the first trajectory (or the second trajectory) with an ending location of the second trajectory (or the first trajectory) .

It should be noted that the above description of the process 800 is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations and modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure. In some embodiments, the process 800 may be accomplished with one or more additional operations not described and/or without one or more of the operations herein discussed. Additionally, the order of the process 800 may not be intended to be limiting. For example,

operations

802 and 803 may be integrated into a single operation. Operation 806 may be omitted.

FIG. 9 is a flowchart illustrating an exemplary process for multi-camera tracking according to some embodiments of the present disclosure. In some embodiments, process 900 may be executed by the object tracking system 100. For example, the process 900 may be implemented as a set of instructions (e.g., an application) stored in a storage device (e.g., the storage device 140, the storage 203, and the storage 390) . In some embodiments, the processing device 112 (e.g., the processor 201 of the computing device 200, the CPU 340 of the mobile device 300, and/or one or more modules illustrated in FIG. 4) may execute the set of instructions and may accordingly be directed to perform the process 900. In some embodiments, one or more operations of the process 900 may be performed to achieve at least part of operation 805 as described in connection with FIG. 8.

In 901, the processing device 112 (e.g., the similarity determination unit 472) (e.g., the processing circuits of the processor 201) may determine a first similarity degree between one or more image features associated with a first object and a second object, respectively. The one or more features associated the first object may be extracted from each of multiple first images (i.e., at least a portion of images in one of the multiple image collections) from which the first object is tracked using an object detection model as described elsewhere in the present disclosure. The one or more features associated the first object extracted from each of multiple first images may be also referred to as first image features. The one or more features associated the second object may be extracted from each of multiple second images (i.e., at least a portion of images in another one of the multiple image collections) from which the first object is tracked using the object detection model as described elsewhere in the present disclosure. The one or more features associated the second object extracted from each of multiple second images may be also referred to as second image features. The first similarity degree may be determined based on the first image features corresponding to each of the multiple first images and the second image features corresponding to each of the multiple second images.

In some embodiments, the processing device 112 may determine a candidate similarity degree between the first image features and the second image features extracted from each of the multiple first images and each of the multiple second images to obtain a plurality of candidate similarity degrees. For example, the processing device 112 may determine multiple pairs of images based on the multiple first images and the multiple second images. Each pair of the multiple pairs of images may include one first image and one second image. The processing device 112 may determine a candidate similarity degree between the first image features and the second image features extracted from a first image and a second image, respectively, in each pair of the multiple pairs of images to obtain the plurality of candidate similarity degrees. The processing device 112 may determine the first similarity degree based on the plurality of candidate similarity degrees. For example, the processing device 112 may determine a maximum candidate similarity degree among the plurality of candidate similarity degrees as the first similarity degree. As another example, the processing device 112 may determine an average value of the plurality of candidate similarity degrees as the first similarity degree.

In 902, the processing device 112 (e.g., the similarity determination unit 472) (e.g., the processing circuits of the processor 201) may determine a second similarity degree between at least one of one or more trajectory features of a first trajectory of the first object and a second trajectory of the second object, respectively. The first trajectory of the first object and/or the second trajectory of the second object may be determined according to process 500 and/or at least a portion of process 800. The second similarity degree between at least one of the trajectory features may include a movement direction similarity degree, a time similarity degree, a location similarity degree, etc.

The movement direction similarity degree between the first trajectory and the second trajectory may refer to a similarity degree between motion vectors or movement directions associated with the first trajectory and the second trajectory. The movement direction similarity degree may be determined based on a cosine similarity degree. In some embodiments, the first trajectory may include a first motion vector sequence including a plurality of first motion vectors determined as described in operation 804. The second trajectory may include a second motion vector sequence including a plurality of second motion vectors determined as described in operation 804. The processing device 112 may determine a cosine similarity degree between each of the plurality of first motion vectors and each of the plurality of second motion vectors to obtain a plurality of cosine similarity degrees. Each of the plurality of cosine similarity degrees may correspond to a pair of a first motion vector and a second motion vector. For example, if the first motion vector sequence includes 2 first motion vectors, including A1 and B1, the second motion vector sequence includes 2 second motion vectors, including A2 and B2, a pair of a first motion vector and a second motion vector may be one of (A1, A2) , (A1, B2) , (B1, A2) , and (B1, B2) . In some embodiments, the processing device 112 may determine a maximum value (i.e., the maximum similarity) among the plurality of cosine similarity degrees as the movement direction similarity degree. In some embodiments, the processing device 112 may determine an average value of the plurality of cosine similarity degrees as the movement direction similarity degree. More descriptions for determining the movement direction similarity degree may be found elsewhere in the present disclosure (e.g., FIG. 10 and the description thereof) .

The location similarity degree may be associated with a distance between geographical locations associated with the first trajectory and the second trajectory. For example, the location similarity degree may be associated with a distance between geographical locations of a starting point of the first trajectory and an ending point of the second trajectory. As another example, the location similarity degree may be associated with a distance between geographical locations of an ending point of the first trajectory and a starting point of the second trajectory. As still another example, the location similarity degree may be associated with a distance between geographical locations of an average point of the first trajectory and an average point of the second trajectory. The average point of a trajectory may be determined based on geographical locations of a plurality of points of the trajectory. For example, coordinates of the average point may be an average value of coordinates of the plurality of points. In some embodiments, the processing device 112 may determine the location similarity degree based on a geographical distance between one point (e.g., a middle point) of the plurality of points of the first trajectory and one point (e.g., a middle point) of the plurality of points of the second trajectory.

In some embodiments, the first trajectory may include a first location sequence including a plurality of first geographical locations of points (e.g., the average points for each slice) associated with the first trajectory determined as described in operation 804. The second trajectory may include a second location sequence including a plurality of second geographical locations of points (e.g., the average points for each slice) associated with the first trajectory determined as described in operation 804. The processing device 112 may determine a geographical distance between each of the plurality of first geographical locations and each of the plurality of second geographical locations to obtain a plurality of geographical distances (i.e., a geographical distance sequence) . Each of the plurality of geographical distances may correspond to a pair of a first geographical location and a second geographical location. For example, if the first location sequence includes 2 first locations, including p1 and p2, the second location sequence includes 2 second locations, including q1 and q2, a pair of a first location and a second location may be one of (p1, q1) , (p1, q2) , (p2, q1) , and (p2, q2) . In some embodiments, the processing device 112 may determine a maximum value (i.e., the minimum distance) among the plurality of geographical distances as the location similarity degree. In some embodiments, the processing device 112 may determine an average value of the plurality of geographical distances as the location similarity degree. More descriptions for determining the location similarity degree may be found elsewhere in the present disclosure (e.g., FIG. 11 and the description thereof) .

The time similarity degree may refer to a similarity degree between a time period of the first trajectory and a time period of the second trajectory. The time period of a trajectory may be defined by a starting time and an ending time of the trajectory. The time similarity degree may be determined based on a Jaccard coefficient, also referred to as an intersection over union (IOU) . The greater the Jaccard coefficient is, the greater the time similarity degree may be. For example, the first trajectory may include a first starting time s ₀, and a first ending time s ₁, and a first time period of the first trajectory may be denoted as t ₁= (s ₀, s ₁) . The second trajectory may include a second starting time s ₀′ and a second ending time s ₁′, and a second time period of the second trajectory may be denoted as denoted as t ₂= (s ₀′, s ₁′) . The Jaccard coefficient between the time information of the first trajectory and the second trajectory may be determined according to Equation (6) as follows:

In some embodiments, due to noises in image collection transmission, one or more images may be skipped within some image collections, time fluctuations may be allowed. For the first time period and/or the second time period may be broadened to enhance fault tolerance. For example, the first time period t ₁= (s ₀, s ₁) may be broadened to be as t ₁₊= (s ₀-δ, s ₁+δ) , and/or the second time period t ₂= (s ₀′, s ₁′) may be broadened to be as t ₂₊= (s ₀′-δ, s ₁′+δ) . The Jaccard coefficient between the time information of the first trajectory and the second trajectory may be determined according to Equation (7) as follows:

In 903, the processing device 112 (e.g., the similarity determination unit 472) (e.g., the processing circuits of the processor 201) may determine a trajectory accessibility based on the one or more trajectory features. The trajectory accessibility may indicate a probability that the first trajectory of the first object can access the second trajectory of the second object. The trajectory accessibility between two trajectories may refer to accessibility from one trajectory to the other. In other words, the trajectory accessibility between two trajectories may refer to a probability that an object (e.g., a vehicle) can travel from an ending location of one trajectory to a starting location of another trajectory within a period of time. The trajectory accessibility between the first trajectory and the second trajectory may be determined based on the movement similarity degree, the location similarity degree, a motion vector between starting locations of the first trajectory and the second trajectory, an average motion vector of the first trajectory and/or the second trajectory, etc. For example, if the processing device 112 determines that the location similarity degree is less than a threshold or a distance (e.g., the minimum distance) between the first trajectory and the second trajectory exceeds a distance threshold, the processing device 112 may determine that the trajectory accessibility between the first trajectory and the second trajectory may be equal to 0, which means that the first trajectory and the second trajectory have some overlapping parts. If the processing device 112 determines that the location similarity degree exceeds the threshold or the distance (e.g., the minimum distance) between the first trajectory and the second trajectory is less than the distance threshold, and the movement direction similarity degree is less than a similarity threshold or a similarity degree between the motion vector between starting locations of the first trajectory and the second trajectory and an average motion vector of the first trajectory and/or the second trajectory is less than the similarity threshold, the processing device 112 may determine that the trajectory accessibility between the first trajectory and the second trajectory may be equal to -1, which means that the first trajectory and the second trajectory have no trajectory accessibility. If the processing device 112 determines that the movement direction similarity degree exceeds the similarity threshold or the similarity degree between the motion vector between starting locations of the first trajectory and the second trajectory and an average motion vector of the first trajectory and/or the second trajectory exceeds the similarity threshold, the processing device 112 may determine that the trajectory accessibility between the first trajectory and the second trajectory may be equal to 1, which means the first trajectory and the second trajectory have trajectory accessibility. For example, the trajectory accessibility may be determined according to Equation (8) as follows:

where, Dis _min refers to the minimum distance, λ _dis refers to the distance threshold, Sim _max refers to the maximum similarity, i.e., the movement direction similarity degree, λ _sim refers to the similarity threshold, V _p0, q0 refers to is the motion vector between starting locations of the first trajectory and the second trajectory, V _mean refers to the average motion vector of the first trajectory and/or the second trajectory, and S (V _p0, q0, V _mean) refers to a similarity degree between V _p0, q0, and V _mean.

In 904, the processing device 112 (e.g., the matching unit 473) (e.g., the processing circuits of the processor 201) may determine whether the first object is matched with the second object based on the first similarity degree, the second similarity degree and the trajectory accessibility. In some embodiments, the processing device 112 may determine that the first object is matched with the second object in response to determine that the first similarity degree satisfies a first condition, the second similarity degree satisfies a second condition, and the trajectory accessibility satisfies a third condition. For example, the processing device 112 may determine that the first object is matched with the second object in response to determine that the first similarity degree exceeds a first threshold, the second similarity degree exceeds a second threshold, and the trajectory accessibility is equal to 1 or 0. As another example, the processing device 112 may determine that the first object is matched with the second object in response to determine that the first similarity degree exceeds the first threshold, at least one of the movement direction similarity degree, the location similarity degree, or the time similarity degree exceeds a second threshold, and the trajectory accessibility is equal to 1 or 0. As still another example, the processing device 112 may determine that the first object is matched with the second object in response to determine that the first similarity degree exceeds the first threshold, the movement direction similarity degree (e.g., a maximum similarity degree) exceeds a third threshold, the minimum distance is smaller than a distance threshold, and the time similarity degree exceeds a time threshold, and/or the trajectory accessibility is equal to 1 or 0. In some embodiments, the first condition and/or the second condition may be adjustable according to different imaging scenes. For example, for an arterial road scene, the first threshold may be greater than that for a crossroad scene. For a crossroad scene, the second threshold may be greater than that for an arterial road scene.

It should be noted that the above description of the process 900 is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations and modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure. In some embodiments, the process 900 may be accomplished with one or more additional operations not described and/or without one or more of the operations herein discussed. Additionally, the order of the process 900 may not be intended to be limiting. For example,

operations

901, 902, and/or 903 may be integrated into a single operation. Operation 904 may be omitted.

FIG. 10 is a flowchart illustrating an exemplary process for determining a similarity degree between two trajectories according to some embodiments of the present disclosure. In some embodiments, process 1000 may be executed by the object tracking system 100. For example, the process 1000 may be implemented as a set of instructions (e.g., an application) stored in a storage device (e.g., the storage device 140, the storage 203, and the storage 390) . In some embodiments, the processing device 112 (e.g., the processor 201 of the computing device 200, the CPU 340 of the mobile device 300, and/or one or more modules illustrated in FIG. 4) may execute the set of instructions and may accordingly be directed to perform the process 1000.

In 1001, the processing device 112 (e.g., the feature extraction unit 471) (e.g., the processing circuits of the processor 201) may divide a first trajectory into multiple first slices. The first trajectory may be determined by tracking a first object from a plurality of first images acquired by a first camera according to process 500 and/or at least a portion of process 800 (e.g., operations 801-803) . The first trajectory may include a plurality of first points. Each of the plurality of first points may denote a geographical location of the first object at a certain time. Each of the plurality of first points and the corresponding geographical location of the first object may correspond to one of the plurality of first images. Each of the multiple first slices may include several consecutive first points among the plurality of first points. Each of the multiple first slices may correspond to a first length. As used herein, the length of a slice may refer to a count or number of points in the slice. The first length of each of the multiple first slices may be the same or different. In some embodiments, the processing device 112 may divide the first trajectory into the multiple first slices based on a count or number of the plurality of first points (i.e., the total length of the first trajectory) and/or the total number of the multiple slices. For example, the processing device 112 may divide the first trajectory into the multiple first slices each of which includes a same count of first points (i.e., same length) .

In 1002, the processing device 112 (e.g., the feature extraction unit 471) (e.g., the processing circuits of the processor 201) may divide a second trajectory into multiple first slices. The second trajectory may be determined by tracking a second object from a plurality of second images acquired by a second camera according to process 500 and/or at least a portion of process 800 (e.g., operations 801-803) . The second trajectory may include a plurality of second points. Each of the plurality of second points may denote a geographical location of the second object at a certain time. Each of the plurality of second points and the corresponding geographical location of the second object may correspond to one of the plurality of second images. Each of the multiple second slices may include several consecutive second points among the plurality of second points. The several consecutive second points of each of the multiple second slices may be the same or different. Each of the multiple second slices may correspond to a second length. The second length of each of the multiple second slices may be the same or different. In some embodiments, the processing device 112 may divide the second trajectory into the multiple second slices based on a count or number of the plurality of second points (i.e., the total length of the second trajectory) For example, the processing device 112 may divide the second trajectory into the multiple second slices each of which includes a same count of second points.

In 1003, the processing device 112 (e.g., the feature extraction unit 471) (e.g., the processing circuits of the processor 201) may determine a first average point for each of the multiple first slices. In some embodiments, the first average point for a first slice may denote an average geographical location of geographical locations of several first points in the first slice. For example, a geographical location of a first point may be denoted by geographical coordinates (e.g., the longitude and latitude) of the first point. The average geographical location of geographical locations of several first points may be denoted by average geographical coordinates (e.g., an average longitude and an average latitude) of several first points. As a further example, the first average point for a first slice may be determined according to Equation (9) as follows:

where, p _j refers to an average point of the j-th slice, l refers to a length of each slice, lon _i refers to the longitude of each point of the j-th slice, and lon _i refers to the latitude of each point of the j-th slice. l may be a default setting of the object tracking system 100. For example, l may be equal to 10, 20, 30, etc.

In 1004, the processing device 112 (e.g., the feature extraction unit 431) (e.g., the processing circuits of the processor 201) may determine a second average point for each of the multiple second slices. The second average point for each of the multiple second slices may be determined as similar to the first average point for each of the multiple first slices. For example, the second average point for each of the multiple second slices may be determined according to Equation (9) .

In 1005, the processing device 112 (e.g., the feature extraction unit 431) (e.g., the processing circuits of the processor 201) may determine, based on first average points corresponding to any two adjacent slices among the multiple first slices, one or more first movement directions. A first movement direction may be determined based on geographical coordinates (e.g., the longitude and latitude) of the two adjacent first average points. For example, the first movement direction may be determined according to Equation (10) as follows:

where, lon _j refers to the longitude of an average point of the j-th slice, and lon _j refers to the latitude of the average point of the j-th slice, lon _j+1 refers to the longitude of an average point of the (j+1) -th slice, and lon _j+1 refers to the latitude of average point of the (j+1) -th slice. In some embodiments, the one or more first movement directions may be denoted as a first movement sequence.

In 1006, the processing device 112 (e.g., the feature extraction unit 471) (e.g., the processing circuits of the processor 201) may determine, based on second average points corresponding to any two adjacent slices among the multiple second slices, one or more second movement directions. A second movement direction may be determined based on geographical coordinates (e.g., the longitude and latitude) of the two adjacent second average points. For example, the second movement direction may be determined according to Equation (10) . In some embodiments, the one or more second movement directions may be denoted as a second movement sequence.

In 1007, the processing device 112 (e.g., the similarity determination unit 472) (e.g., the processing circuits of the processor 201) may determine a similarity degree between each of the one or more first movement directions and each of the one or more second movement directions to obtain one or more similarity degrees associated with the first trajectory and the second trajectory. Each of the one or more similarity degrees may be determined based on two movement directions. Each of the two movement directions may include one first movement direction and one second movement direction. For example, the processing device 112 may determine one or more pairs of movement directions. Each pair of the one or more pairs of movement directions may include a first movement direction and a second movement direction. The processing device 112 may determine a similarity degree between each pair of the one or more pairs of movement directions to obtain the one or more similarity degrees.

The similarity degree between one first movement direction and one second movement direction may be denoted by a cosine similarity between the one first movement direction and the one second movement direction. The greater the cosine similarity between one first movement direction and one second movement direction, the greater the similarity degree between the one first movement direction and the one second movement direction may be. The one or more similarity degrees between the first trajectory and the second trajectory may be denoted as a similarity sequence. In some embodiments, if the first trajectory includes s _p first slices and the second trajectory includes s _q second slices, the first movement directions of the first trajectory may be denoted as a first motion vector sequence

i=1, 2, ..., s _p-1 determined according to Equation (10) , and the second movement directions of the second trajectory may be denoted as a second motion vector sequence

j=1, 2, ..., s _q-1 determined according to Equation (10) . The similarity sequence of the first trajectory and the second trajectory may be determined according to Equation (11) as follows:

Where,

and

In 1008, the processing device 112 (e.g., the similarity determination unit 432) (e.g., the processing circuits of the processor 201) may designate a maximum similarity degree among the one or more similarity degrees as a second similarity degree between at least one of one or more trajectory features of the first trajectory and the second trajectory.

It should be noted that the above description of the process 1000 is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations and modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure. In some embodiments, the process 1000 may be accomplished with one or more additional operations not described and/or without one or more of the operations herein discussed. Additionally, the order of the process 1000 may not be intended to be limiting. For example,

operations

1001 and 1002 may be integrated into a single operation. Operation1004 may be omitted.

FIG. 11 is a flowchart illustrating an exemplary process for determining a similarity degree between two trajectories according to some embodiments of the present disclosure. In some embodiments, process 1100 may be executed by the object tracking system 100. For example, the process 1100 may be implemented as a set of instructions (e.g., an application) stored in a storage device (e.g., the storage device 140, the storage 203, and the storage 390) . In some embodiments, the processing device 112 (e.g., the processor 201 of the computing device 200, the CPU 340 of the mobile device 300, and/or one or more modules illustrated in FIG. 4) may execute the set of instructions and may accordingly be directed to perform the process 1100. In some embodiments, one or more operations of the process 900 may be performed to achieve at least part of operation 902 as described in connection with FIG. 11.

In 1101, the processing device 112 (e.g., the feature extraction unit 471) (e.g., the processing circuits of the processor 201) may divide a first trajectory into multiple first slices.

In 1102, the processing device 112 (e.g., the feature extraction unit 471) (e.g., the processing circuits of the processor 201) may divide a second trajectory into multiple first slices.

In 1103, the processing device 112 (e.g., the feature extraction unit 471) (e.g., the processing circuits of the processor 201) may determine a first average point for each of the multiple first slices.

In 1104, the processing device 112 (e.g., the feature extraction unit 471) (e.g., the processing circuits of the processor 201) may determine a second average point for each of the multiple second slices. Operations 1101-1104 may be performed as described in connection with operations 1001-1004, respectively.

In 1105, the processing device 112 (e.g., the similarity determination unit 472 ) (e.g., the processing circuits of the processor 201) may determine a geographical distance between each first average point and each second average point to obtain one or more geographical distances associated with the first trajectory and the second trajectory. For example, the first trajectory may include one or more first average points, each of which corresponds to one of the multiple first slices. The second trajectory may include one or more second average points, each of which corresponds to one of the multiple second slices. The first average points may be denoted as p _p= (lon _p1, lat _p1) , (lon _p2, lat _p2) , ..., (lon _pm, lat _pm) . The second average points may be denoted as p _q= (lon _q1, lat _q1) , (lon _q2, lat _q2) , ..., (lon _qn, lat _qn) . The processing device 112 may determine the distance between each of the first average points and each of the second average points. For example, the processing device 112 may determine one or more pairs of average points, for example, { (lon _p1, lat _p1) , (lon _q1, lat _q1) } , { (lon _p1, lat _p1) , (lon _q2, lat _q2) } , ... { (lon _p1, lat _p1) , (lon _qn, lat _qn) } , ... { (lon _p2, lat _p2) , (lon _q1, lat _q1) } , ... { (lon _pm, lat _pm) , (lon _qn, lat _qn) } . Each pair of the one or more pairs of average points may include a first average point and a second average point. The processing device 112 may determine a geographical distance between each pair of the one or more pairs of average points to obtain a distance sequence including the one or more geographical distances associated with the first trajectory and the second trajectory. The distance sequence may be determined according to Equation (12) as follows:

D _p, q= {|| (lon _pi-lon _pj) , (lat _pi-lat _pj) || ₂} (12) ,

Where i from 0 to m, and j from 0 to n. In some embodiments, the geographical distance between each first average point and each second average point may be also referred to as a Euclidean distance between the each first average point and each second average point.

In 1106, the processing device 112 (e.g., the similarity determination unit 472 ) (e.g., the processing circuits of the processor 201) may determine a second similarity degree between at least one of one or more trajectory features of the first trajectory and the second trajectory based on a minimum geographical distance among the one or more geographical distances. For example, the greater the minimum geographical distance among the one or more geographical distances is, the smaller the second similarity degree between at least one of one or more trajectory features of the first trajectory and the second trajectory may be.

It should be noted that the above description of the process 1100 is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations and modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure. In some embodiments, the process 1100 may be accomplished with one or more additional operations not described and/or without one or more of the operations herein discussed. Additionally, the order of the process 1100 may not be intended to be limiting. For example,

operations

1101 and 1102 may be integrated into a single operation. Operation 1108 may be omitted.

FIG. 12A and 12B are diagrams showing speed changes of a vehicle according to some embodiments of the present disclosure. FIG. 12A shows speeds of a vehicle determined based on a trajectory of the vehicle determined as described in FIG. 8. As shown in FIG. 12A, the speeds of the vehicle determined based on the trajectory of the vehicle changes abruptly for example, at timestamps corresponding to 40-120 frames in a frame sequence (i.e., an image collection) , which means the trajectory of the vehicle determined by tracking the vehicle from the frame sequence has multiple outliers. However, the speed should be stable between adjacent points on the trajectory of the vehicle. FIG. 12B shows speeds of the vehicle determined based on a smoothed trajectory of the vehicle determined as described in FIG. 8 by performing a smoothing operation on the trajectory illustrated in FIG. 12A. As shown in FIG. 12B, the speeds of the vehicle determined based on the trajectory of the vehicle are stable.

FIG. 13A are diagrams showing a vehicle being tracked in across multiple cameras according to some embodiments of the present disclosure. As shown in images 1 to 4 in FIG. 13A, a vehicle marked by bounding boxes is recorded by four cameras. The vehicle recorded in different images or cameras may have different appearance features, which may bring challenges to the cross-camera vehicles tracking.

FIG. 13B are diagrams showing trajectories of the vehicle as illustrated in FIG. 13A tracked in across multiple cameras according to some embodiments of the present disclosure. The vehicle detected in across multiple cameras is tracked based on image features and trajectory features as described elsewhere in the present disclosure (e.g., FIGs. 8 and 9) . Images 1 to 4 in FIG. 13B show the trajectory determined by tracking the vehicle from each of the multiple cameras as shown in Images 1 to 4 in FIG. 13A. Image 5 in FIG. 13B show the trajectory in each of Images 1 to 4 in FIG. 13B are matched according to process 800 and process 900.

Having thus described the basic concepts, it may be rather apparent to those skilled in the art after reading this detailed disclosure that the foregoing detailed disclosure is intended to be presented by way of example only and is not limiting. Various alterations, improvements, and modifications may occur and are intended to those skilled in the art, though not expressly stated herein. These alterations, improvements, and modifications are intended to be suggested by this disclosure and are within the spirit and scope of the exemplary embodiments of this disclosure.

Moreover, certain terminology has been used to describe embodiments of the present disclosure. For example, the terms “one embodiment, ” “an embodiment, ” and/or “some embodiments” mean that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Therefore, it is emphasized and should be appreciated that two or more references to “an embodiment” or “one embodiment” or “an alternative embodiment” in various portions of this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures or characteristics may be combined as suitable in one or more embodiments of the present disclosure.

Further, it will be appreciated by one skilled in the art, aspects of the present disclosure may be illustrated and described herein in any of a number of patentable classes or context including any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof. Accordingly, aspects of the present disclosure may be implemented entirely hardware, entirely software (including firmware, resident software, micro-code, etc. ) or combining software and hardware implementation that may all generally be referred to herein as a “unit, ” “module, ” or “system. ” Furthermore, aspects of the present disclosure may take the form of a computer program product embodied in one or more computer readable media having computer readable program code embodied thereon.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including electro-magnetic, optical, or the like, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that may communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable signal medium may be transmitted using any appropriate medium, including wireless, wireline, optical fiber cable, RF, or the like, or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object-oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C++, C#, VB. NET, Python or the like, conventional procedural programming languages, such as the “C” programming language, Visual Basic, Fortran 2103, Perl, COBOL 2102, PHP, ABAP, dynamic programming languages such as Python, Ruby, and Groovy, or other programming languages. The program code may execute entirely on the user’s computer, partly on the user’s computer, as a stand-alone software package, partly on the user’s computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user’s computer through any type of network, including a local area network (LAN) or a wide area network (WAN) , or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider) or in a cloud computing environment or offered as a service such as a Software as a Service (SaaS) .

Furthermore, the recited order of processing elements or sequences, or the use of numbers, letters, or other designations, therefore, is not intended to limit the claimed processes and methods to any order except as may be specified in the claims. Although the above disclosure discusses through various examples what is currently considered to be a variety of useful embodiments of the disclosure, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover modifications and equivalent arrangements that are within the spirit and scope of the disclosed embodiments. For example, although the implementation of various components described above may be embodied in a hardware device, it may also be implemented as a software only solution, for example, an installation on an existing server or mobile device.

Similarly, it should be appreciated that in the foregoing description of embodiments of the present disclosure, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the various inventive embodiments. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed subject matter requires more features than are expressly recited in each claim. Rather, inventive embodiments lie in less than all features of a single foregoing disclosed embodiment.

In some embodiments, the numbers expressing quantities or properties used to describe and claim certain embodiments of the application are to be understood as being modified in some instances by the term “about, ” “approximate, ” or “substantially. ” For example, “about, ” “approximate, ” or “substantially” may indicate a certain variation (e.g., ±1%, ±5%, ±10%, or ±20%) of the value it describes, unless otherwise stated. Accordingly, in some embodiments, the numerical parameters set forth in the written description and attached claims are approximations that may vary depending upon the desired properties sought to be obtained by a particular embodiment. In some embodiments, the numerical parameters should be construed in light of the number of reported significant digits and by applying ordinary rounding techniques. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of some embodiments of the application are approximations, the numerical values set forth in the specific examples are reported as precisely as practicable.

Each of the patents, patent applications, publications of patent applications, and other material, such as articles, books, specifications, publications, documents, things, and/or the like, referenced herein is hereby incorporated herein by this reference in its entirety for all purposes, excepting any prosecution file history associated with same, any of same that is inconsistent with or in conflict with the present document, or any of same that may have a limiting affect as to the broadest scope of the claims now or later associated with the present document. By way of example, should there be any inconsistency or conflict between the description, definition, and/or the use of a term associated with any of the incorporated material and that associated with the present document, the description, definition, and/or the use of the term in the present document shall prevail.

In closing, it is to be understood that the embodiments of the application disclosed herein are illustrative of the principles of the embodiments of the application. Other modifications that may be employed may be within the scope of the application. Thus, by way of example, but not of limitation, alternative configurations of the embodiments of the application may be utilized in accordance with the teachings herein. Accordingly, embodiments of the present application are not limited to that precisely as shown and described.

Claims

A system for object tracking across multiple cameras, comprising:

at least one storage medium including a set of instructions;

at least one processor in communication with the at least one storage medium, wherein when executing the set of instructions, the at least one processor is directed to cause the system to perform operations including:

obtaining a plurality of image collections captured by multiple cameras;

for each of the plurality of image collections, detecting one or more objects from each of at least a portion of a plurality of images included in the each of the plurality of image collections to extract one or more image features associated with at least one of the one or more objects;

tracking the at least one of the one or more objects from each of the plurality of image collections to obtain a trajectory of the at least one of the one or more objects;

determining one or more trajectory features of the trajectory of the at least one of the one or more objects; and

matching, based on the one or more trajectory features of the trajectory and the one or more image features associated with the at least one of the one or more objects, a first object tracked in one of the plurality of image collections with a second object tracked in one or more other image collections of the plurality of image collections.
The system of claim 1, wherein the one or more trajectory features include at least one of:

movement direction information associated with the trajectory of the at least one of the one or more objects,

geographical location information associated with the trajectory, or

time information associated with the trajectory.
The system of claim 1 or claim 2, wherein the trajectory includes a plurality of points, each of which denoting a geographical location of the at least one of the one or more objects corresponding to one of the plurality of images, and to determine one or more trajectory features of the trajectory of at least one of the one or more objects, the at least one processor is directed to cause the system to perform operations including:

dividing the trajectory into multiple slices, each of the multiple slices including several points corresponding to several images of the plurality of images;

determining, based on the several points of each of the multiple slices, an average point for each of the multiple slices; and

determining, based on the average point for each of the multiple slices, the one or more trajectory features.
The system of claim 3, wherein to determine, based on the average point for each of the multiple slices, the one or more trajectory features, the at least one processor is directed to cause the system to perform operations including:

designating a geographical location of the average point for each of the multiple slices as one of the one or more trajectory features.
The system of claim 3 or claim 4, wherein to determine, based on the average point for each of the multiple slices, the one or more trajectory features, the at least one processor is directed to cause the system to perform operations including:

determining, based on average points corresponding to any two adjacent slices among the multiple slices, multiple movement directions; and

designating the multiple movement directions as one of the one or more trajectory features.
The system of any one of claims 1 to 5, wherein to match, based on the one or more trajectory features of the trajectory and the one or more image features associated with at least one of the one or more objects, a first object tracked in one of the plurality of image collections with a second object tracked in one or more other image collections of the plurality of image collections, the at least one processor is directed to cause the system to perform operations including:

determining a first similarity degree between the one or more image features associated with the first object and the second object, respectively;

determining a second similarity degree between at least one of the one or more trajectory features of a first trajectory of the first object and a second trajectory of the second object, respectively; and

determining whether the first object is matched with the second object at least in part based on at least one of the first similarity degree or the second similarity degree.
The system of claim 6, wherein to determine a second similarity degree between at least one of the trajectory features of a first trajectory of the first object and a second trajectory of the second object, respectively, the at least one processor is directed to cause the system to perform operations including:

determining first movement direction information associated with the first trajectory, the first movement direction information including one or more first movement directions of the first trajectory;

determining second movement direction information associated with the second trajectory, the second movement direction information including one or more second movement directions of the second trajectory;

determining a similarity degree between each of the one or more first movement directions and each of the one or more second movement directions to obtain one or more similarity degrees associated with the first trajectory and the second trajectory; and

designating a maximum similarity degree among the one or more similarity degrees as the second similarity degree between at least one of the one or more trajectory features of the first trajectory and the second trajectory, respectively.
The system of claim 6 or claim 7, wherein to determine a second similarity degree between at least one of the trajectory features of a first trajectory of the first object and a second trajectory of the second object, respectively, the at least one processor is directed to cause the system to perform operations including:

determining first geographical location information associated with the first trajectory, the first geographical location information including one or more first geographical locations on the first trajectory;

determining second geographical location information associated with the second trajectory, the second geographical location information including one or more second geographical locations on the second trajectory;

determining a geographical distance between each of the one or more first geographical locations and each of the one or more second geographical locations to obtain one or more geographical distances associated with the first trajectory and the second trajectory; and

determining the second similarity degree between at least one of the one or more trajectory features of the first trajectory and the second trajectory, respectively, based on a minimum geographical distance among the one or more geographical distances.
The system of any one of claims 6 to 8, wherein to determine a second similarity degree between at least one of the trajectory features of a first trajectory of the first object and a second trajectory of the second object, respectively, the at least one processor is directed to cause the system to perform operations including:

determining first time information associated with the first trajectory, the first time information including a first time period of the first trajectory;

determining second time information associated with the second trajectory, the second time information including a second time period of the second trajectory; and

determining, based on an intersection over union between the first time period and the second time period, the second similarity degree between at least one of the one or more trajectory features of the first trajectory and the second trajectory, respectively.
The system of any one of claims 1 to 9, wherein to determine whether the first object is matched with the second object at least in part based on at least one of the first similarity degree or the second similarity degree, the at least one processor is directed to cause the system to perform operations including:

determining a trajectory accessibility based on the one or more trajectory features, the trajectory accessibility indicating a probability that the first trajectory of the first object can access the second trajectory of the second object; and

determining whether the first object is matched with the second object at least in part based on the trajectory accessibility.
The system of any one of claims 6 to 10, wherein to determining whether the first object is matched with the second object at least in part based on the first similarity and the second similarity, the at least one processor is directed to cause the system to perform operations including:

determining that the first object is matched with the second object in response to a determination including at least one of the first similarity degree satisfying a first condition or the second similarity degree satisfying a second condition.
The system of claim 11, wherein at least one of the first condition or the second condition is adjustable according to a scene captured by at least one of the multiple cameras.
The system of any one of claims 6 to 12, wherein the at least one processor is directed to cause the system to perform operations including:

unifying the first trajectory and the second trajectory to determine a target trajectory of the first object or the second object in response to a matching of the first object and the second object.
The system of any one of claims 1 to 13, wherein to determine one or more trajectory features of the trajectory of at least one of the one or more objects, the at least one processor is directed to cause the system to perform operations including:

smoothing the trajectory of the at least one of the one or more objects to obtain a smoothed trajectory of the at least one of the one or more objects; and

determining, based on the smoothed trajectory of the at least one of the one or more objects, the one or more trajectory features.
A system for object tracking, comprising:

at least one storage medium including a set of instructions;

at least one processor in communication with the at least one storage medium, wherein when executing the set of instructions, the at least one processor is directed to cause the system to perform operations including:

obtaining an image collection collected by a camera;

detecting one or more objects from each of a plurality of images included in the image collection to extract one or more image features associated with at least one of the one or more objects using an object detection model;

determining a geographical location of at least one of the one or more objects corresponding to each of the plurality of images; and

matching, based on the one or more image features associated with the at least one of the one or more objects and the geographical location of the at least one of the one or more objects, a first object detected in one of the plurality of images with a second object detected in one or more other images of the plurality of images.
The system of claim 15, wherein to determine a geographical location of at least one of the one or more objects corresponding to each of the at least a portion of the plurality of images, the at least one processor is directed to cause the system to perform operations including:

locating the one or more objects in each of the at least a portion of the plurality of images using the object detection model; and

determining, based on a location of at least one of the one or more objects in the each of the at least a portion of the plurality of images, the geographical location of at least one of the one or more objects.
The system of claim 15 or claim 16, wherein to match, based on the one or more image features associated with at least one of the one or more objects and the geographical location of at least one of the one or more objects, a first object detected in one of the plurality of images with a second object detected in one or more other images of the plurality of images, the at least one processor is directed to cause the system to perform operations including:

for each two adjacent images among at least a portion of the plurality of images,

determining a first similarity degree between the one or more image features associated with the first object and the second object presented in the two adjacent images, respectively;

determining a second similarity degree between geographical locations of the first object and the second object, respectively; and

determining whether the first object is matched with the second object at least in part based on at least one of the first similarity degree or the second similarity degree.
The system of claim 15 or claim 16, wherein to match, based on the one or more image features associated with at least one of the one or more objects and the geographical location of at least one of the one or more objects, a first object detected in one of the plurality of images with a second object detected in one or more other images of the plurality of images, the at least one processor is directed to cause the system to perform operations including:

for each two adjacent images among at least a portion of the plurality of images,

determining a first similarity degree between the one or more image features associated with each pair of a plurality of pairs of objects, each pair of the plurality of pairs of objects including one of the one or more objects detected in one of the two adjacent images and one of the one or more objects detected in another one of the two adjacent images;

determining a second similarity degree between the geographical locations of each pair of the plurality of pairs of objects;

determining, based on the first similarity degree and the second similarity degree, a specific pair of objects from the plurality of pairs of objects as the matched first object and the second subject using a Hungary algorithm.
The system of claim 17, wherein to match, based on the one or more image features associated with at least one of the one or more objects and the geographical location of at least one of the one or more objects, a first object detected in one of the plurality of images with a second object detected in one or more other images of the plurality of images, the at least one processor is directed to cause the system to perform operations including:

for each two adjacent images among at least a portion of the plurality of images,

determining a first distance between the one or more image features associated with each pair of a plurality of pairs of objects, each pair of the plurality of pairs of objects including one of the one or more objects detected in one of the two adjacent images and one of the one or more objects detected in another one of the two adjacent images;

determining a second distance between the geographical locations of each pair of the plurality of pairs of objects;

determining a specific pair of objects from the plurality of pairs of objects as the matched first object and the second subject using a Hungary algorithm.
The system of claim 17, wherein to match, based on the one or more image features associated with at least one of the one or more objects and the geographical location of at least one of the one or more objects, a first object detected in one of the plurality of images with a second object detected in one or more other images of the plurality of images, the at least one processor is directed to cause the system to perform operations including:

for each two adjacent images among at least a portion of the plurality of images,

determining a loss function including a first component configured to determine a first distance between one or more image features associated with each pair of a plurality of pairs of objects and a second component configured to determine a second distance between the geographical locations of each pair of the plurality of pairs of objects, each pair of the plurality of pairs of objects including one of the one or more objects detected in one of the two adjacent images and one of the one or more objects detected in another one of the two adjacent images; and

determining one or more specific pairs of objects presented in the two adjacent images in response to determine that a value of the loss function corresponding to the one or more specific pairs of objects is minimum, one of the one or more specific pairs of objects being the matched first object and the second object.
The system of any one of claims 15 to 20, wherein the at least one processor is directed to cause the system to perform additional operations including:

determining a first track of the first object or the second object in response to match the first object with the second object;

obtaining a second track of a candidate object;

determining whether the second track belongs the first object or the second object based on at least one of trajectory features between the first track and the second track, image features associated the first object or the second object and the candidate object, or time information associated with the first track and the second track.
The system of claim 21, wherein the first track is detected from a first portion of the plurality of images, and the second track is detected from a second portion of the plurality of images, to determine whether the second track is belonged to the first object or the second object, the at least one processor is directed to cause the system to perform operations including:

for any two images in the first portion and the second portion, respectively, determining a distance between image features associated the first object or the second object and the candidate object presented in the any two images in the first portion and the second portion, respectively, to obtain a plurality of distances; and

determining a minimum distance among the plurality of distances; and

determine, at least in part based on the minimum distance, whether the second track belongs the first object or the second object.
The system of claim 21 or claim 22, wherein the time information associated with the first track and the second track includes at least one of:

a time interval between the first track and the second track or a time period of each of the first track and the second track.
The system of any one of claims 21 to 23, wherein the at least one processor is directed to cause the system to perform operations including:

unifying the first track and the second track to determine a trajectory of the first object or the second object in response to determine that the second track belongs the first object or the second object.
A method for object tracking across multiple cameras implemented on a computing apparatus, the computing apparatus including at least one processor and at least one storage device, the method comprising:

obtaining a plurality of image collections captured by multiple cameras;

for each of the plurality of image collections, detecting one or more objects from each of at least a portion of a plurality of images included in the each of the plurality of image collections to extract one or more image features associated with at least one of the one or more objects;

tracking the at least one of the one or more objects from each of the plurality of image collections to obtain a trajectory of the at least one of the one or more objects;

determining one or more trajectory features of the trajectory of the at least one of the one or more objects; and

matching, based on the one or more trajectory features of the trajectory and the one or more image features associated with the at least one of the one or more objects, a first object tracked in one of the plurality of image collections with a second object tracked in one or more other image collections of the plurality of image collections.
A non-transitory computer-readable medium storing at least one set of instructions, wherein when executed by at least one processor, the at least one set of instructions directs the at least one processor to perform acts of:

obtaining a plurality of image collections captured by multiple cameras;

for each of the plurality of image collections, detecting one or more objects from each of at least a portion of a plurality of images included in the each of the plurality of image collections to extract one or more image features associated with at least one of the one or more objects;

tracking the at least one of the one or more objects from each of the plurality of image collections to obtain a trajectory of the at least one of the one or more objects;

determining one or more trajectory features of the trajectory of the at least one of the one or more objects; and

matching, based on the one or more trajectory features of the trajectory and the one or more image features associated with the at least one of the one or more objects, a first object tracked in one of the plurality of image collections with a second object tracked in one or more other image collections of the plurality of image collections.
A method for object tracking implemented on a computing apparatus, the computing apparatus including at least one processor and at least one storage device, the method comprising:

obtaining an image collection collected by a camera;

detecting one or more objects from each of a plurality of images included in the image collection to extract one or more image features associated with at least one of the one or more objects using an object detection model;

determining a geographical location of at least one of the one or more objects corresponding to each of the plurality of images; and

matching, based on the one or more image features associated with the at least one of the one or more objects and the geographical location of the at least one of the one or more objects, a first object detected in one of the plurality of images with a second object detected in one or more other images of the plurality of images.
A non-transitory computer-readable medium storing at least one set of instructions, wherein when executed by at least one processor, the at least one set of instructions directs the at least one processor to perform acts of:

obtaining an image collection collected by a camera;

detecting one or more objects from each of a plurality of images included in the image collection to extract one or more image features associated with at least one of the one or more objects using an object detection model;

determining a geographical location of at least one of the one or more objects corresponding to each of the plurality of images; and

matching, based on the one or more image features associated with the at least one of the one or more objects and the geographical location of the at least one of the one or more objects, a first object detected in one of the plurality of images with a second object detected in one or more other images of the plurality of images.
A system for object tracking across multiple cameras, comprising:

an obtaining module configured to obtain an image collection collected by a camera;

a detecting module configured to detect one or more objects from each of a plurality of images included in the image collection to extract one or more image features associated with at least one of the one or more objects using an object detection model

a determining module configured to determine a geographical location of at least one of the one or more objects corresponding to each of the plurality of images; and

a matching module configured to match, based on the one or more image features associated with the at least one of the one or more objects and the geographical location of the at least one of the one or more objects, a first object detected in one of the plurality of images with a second object detected in one or more other images of the plurality of images.
A system for object tracking, comprising:

an obtaining module configured to obtain a plurality of image collections captured by multiple cameras;

a single-camera tracking module configured to for each of the plurality of image collections, detect one or more objects from each of at least a portion of a plurality of images included in the each of the plurality of image collections to extract one or more image features associated with at least one of the one or more objects; track the at least one of the one or more objects from each of the plurality of image collections to obtain a trajectory of the at least one of the one or more objects; and

a multi-camera tracking module configured to determine one or more trajectory features of the trajectory of the at least one of the one or more objects; and matching, based on the one or more trajectory features of the trajectory and the one or more image features associated with the at least one of the one or more objects, a first object tracked in one of the plurality of image collections with a second object tracked in one or more other image collections of the plurality of image collections.