WO2020145882A1

WO2020145882A1 - Object tracking systems and methods for tracking a target object

Info

Publication number: WO2020145882A1
Application number: PCT/SG2019/050012
Authority: WO
Inventors: Bondan Setiawan; Yoriko Kazama; Hiroike Atsushi; Tamura Masato; Tarui Toshiaki
Original assignee: Hitachi, Ltd.
Priority date: 2019-01-09
Filing date: 2019-01-09
Publication date: 2020-07-16

Abstract

According to various embodiments, there is provided a method for tracking a target object, the method including: identifying an accompanying object from a series of digital images that capture the target object, based on a set of criteria applied to the series of digital images; wherein the set of criteria includes a minimum quantity or proportion of digital images in the series of digital images that captures both the target object and the accompanying object; extracting features of at least one of the accompanying object and a composite of the target object and the accompanying object, from the series of digital images; and comparing the extracted features with features in a further plurality of digital images.

Description

OBJECT TRACKING SYSTEMS AND

METHODS FOR TRACKING A TARGET OBJECT

TECHNICAL FIELD

[0001] Various embodiments relate to object tracking systems and methods for tracking objects.

BACKGROUND

[0002] Many cities are equipped with surveillance cameras to monitor the cities around the clock. While these surveillance cameras may be well-positioned to cover a wide geographical area, it may be highly challenging to locate or track an object using surveillance footage captured by these surveillance cameras. The amount of surveillance data generated by these surveillance cameras would require a large amount of manpower to review. Also, there may inevitably be blind spots in the combined field of view of the surveillance cameras. The objects to be located or tracked may sometimes be disguised, or obscured, such that the objects cannot be accurately detected in the surveillance footage.

SUMMARY

[0003] According to various embodiments, there may be provided a method for tracking a target object, the method including: identifying an accompanying object from a series of digital images that capture the target object, based on a set of criteria applied to the series of digital images; wherein the set of criteria includes a minimum quantity or proportion of digital images in the series of digital images that captures both the target object and the accompanying object; extracting features of at least one of the accompanying object and a composite of the target object and the accompanying object, from the series of digital images; and comparing the extracted features with features in a further plurality of digital images.

[0004] According to various embodiments, there may be provided an object tracking system including: an image identifier configured to retrieve from a database, a series of digital images that capture a target object, and further configured to identify an accompanying object from the series of digital images based on a set of criteria applied to the series of digital images; wherein the set of criteria includes a minimum quantity or proportion of digital images in the series of digital images that captures both the target object and the accompanying object; a feature extractor configured to extract features of at least one of the accompanying object and a composite of the target object and the accompanying object, from the series of digital images; and a comparison processor configured to compare the extracted features with features in a further plurality of digital images.

[0005] According to various embodiments, there may be provided a non-transitory computer readable medium including instructions, which when executed, performs a method for tracking a target object, the method including: identifying an accompanying object from a series of digital images that capture the target object, based on a set of criteria applied to the series of digital images; wherein the set of criteria includes a minimum quantity or proportion of digital images in the series of digital images that captures both the target object and the accompanying object; extracting features of at least one of the accompanying object and a composite of the target object and the accompanying object, from the series of digital images; and comparing the extracted features with features in a further plurality of digital images.

BRIEF DESCRIPTION OF THE DRAWINGS

[0006] In the drawings, like reference characters generally refer to the same parts throughout the different views. The drawings are not necessarily to scale, emphasis instead generally being placed upon illustrating the principles of the invention. In the following description, various embodiments are described with reference to the following drawings, in which:

[0007] FIG. 1 illustrates a conceptual diagram of a surveillance system according to various embodiments

[0008] FIG. 2 illustrates a conceptual diagram of an object tracking system according to various embodiments.

[0009] FIG. 3 illustrates a schematic hardware diagram of a surveillance system according to various embodiments.

[0010] FIG. 4 illustrates a conceptual flow chart of an object tracking system in operation according to various embodiments.

[0011] FIG. 5 illustrates an example of an image table according to various embodiments.

[0012] FIG. 6 illustrates an example of an object table according to various embodiments. [0013] FIG. 7 illustrates an example of an accompanying object table according to various embodiments.

[0014] FIG. 8 illustrates a diagram of a surveillance system according to various embodiments, being in operation.

[0015] FIGS. 9 to 14 illustrate example screens of a Graphical User Interface (GUI) of a surveillance system according to various embodiments.

[0016] FIG. 15 illustrates a conceptual diagram of a surveillance system according to various embodiments.

DESCRIPTION

[0017] Embodiments described below in context of the systems are analogously valid for the respective methods, and vice versa. Furthermore, it will be understood that the embodiments described below may be combined, for example, a part of one embodiment may be combined with a part of another embodiment.

[0018] It will be understood that any property described herein for a specific system may also hold for any system described herein. It will be understood that any property described herein for a specific method may also hold for any method described herein. Furthermore, it will be understood that for any system or method described herein, not necessarily all the components or steps described must be enclosed in the device or method, but only some (but not all) components or steps may be enclosed.

[0019] In this context, the system as described in this description may include a memory which is for example used in the processing carried out in the system. A memory used in the embodiments may be a volatile memory, for example a DRAM (Dynamic Random Access Memory) or a non-volatile memory, for example a PROM (Programmable Read Only Memory), an EPROM (Erasable PROM), EEPROM (Electrically Erasable PROM), or a flash memory, e.g., a floating gate memory, a charge trapping memory, an MRAM (Magnetoresistive Random Access Memory) or a PCRAM (Phase Change Random Access Memory).

[0020] In order that the invention may be readily understood and put into practical effect, various embodiments will now be described by way of examples and not limitations, and with reference to the figures. [0021] FIG. 1 illustrates a conceptual diagram of a surveillance system 100 according to various embodiments. The surveillance system 100 may include a plurality of image capturing devices 150, a recorder 152, an object tracking system 160, a storage device 154 and a communication device 156. The image capturing devices 150 may be cameras, including day cameras and/or infrared cameras. Each image capturing device 150 may be positioned at a respective location such that each image capturing device 150 may survey a substantially different area. The field of view of some image capturing devices 150 may partially overlap. The recorder 152 may record the images or video captured by the image capturing devices 150 into the storage device 154, or a separate data storage device. The object tracking system 160 may detect objects that are of interest to a user based on search commands input by the user. The search commands may include information on the objects of interest, for example, images of the objects of interest, object type, last seen location and time, and descriptors such as color, size and shape. The object tracking system 160 may also detect other objects that may be related to the objects of interest based on at least one criterion. Objects that are related to the objects of interest are referred herein as accompanying objects. The object tracking system 160 may extract features of the objects of interest, as well as features of the accompanying objects, and store these extracted features in the storage device 154. The extracted features may include visual characteristics of the objects, for example imagery features. Alternatively or additionally, the extracted features may include information on the objects, for example, a time of appearance of the object, a size of the object, a description of the object. The extracted features, e.g. the visual characteristics of the objects, may be encoded in the form of matrices or vectors. Alternatively, or additionally, the extracted features, e.g. the information on the objects, may be encoded in the form of a string of text. The object tracking system 160 may search for the target object in surveillance videos captured by the image capturing devices, using the extracted features. The object tracking system 160 may search for the target object in surveillance videos stored by the recorder 152, or in real-time surveillance videos as they are being streamed to the object tracking system 160. The communication device 156 may enable the transfer of data between the object tracking system 160 and at least one of the recorder 152, the storage device 154, the recorder 152 and the plurality of image capturing devices 150. The communication device 156 may be a transceiver for wireless communication such as WiFi, Bluetooth, or proprietary radio channels. The communication device 156 may also be a wired data connection. [0022] FIG. 2 illustrates a conceptual diagram of an object tracking system 160 according to various embodiments. The object tracking system 160 may be configured to execute a method for tracking an object. The method may include a registration process. The registration process may include retrieving videos captured by the image capturing devices 150. The videos may include a series of sequential images also referred herein as image frames. The object tracking system 160 may include an image processor 252. The image processor 252 may identify a target object in at least one of the videos using image recognition techniques. The image processor 252 may also be referred herein as an image identifier. The image processor 252 may identify the accompanying object(s) that is associated with the target object. The method of identifying the accompanying object will be described in subsequent paragraphs. The object tracking system 160 may include a feature extraction module 258 that may extract features of the accompanying object from the surveillance footage, and encode the features to create a digital signature of the accompanying object. The feature extraction module 258 may also be referred herein as a feature extractor. The feature extraction module 258 may also do likewise for the target object, as well as a composite of the target object and the accompanying object. The composite refers to an arrangement of the target object together with the accompanying object. The features of the composite may significantly differ from the features of the individual target object and the accompanying object, for example, because the target object could be partially hidden by the accompanying object. The feature extraction module 258 may also create digital signatures of the target object and the composite. The feature extraction module 258 may transmit the digital signatures of the target object, the accompanying object and the composite to a digital signature database. The digital signature database may be hosted on the storage device 154. The digital signature database may also store an association tag that defines the accompanying object as being associated with the target object.

[0023] The method for tracking an object may further include a search process. The search process may include receiving a query for the target object. A user may enter the query via a graphical user interface on a client console, for example a computer or a mobile computing device. For example, the user may upload an image, for example a photograph, of the target object. Alternatively, the user may select the target object from an image frame captured by one of the image capturing devices. The image of the target object, either uploaded by the user or cropped form the image frame, may be referred herein as a key image. The query may also include information on a location and/or time of search. The client console may transmit the query to a search processor 254 of the object tracking system 160. The search processor 254 may retrieve the video that corresponds to the location and/or time in the query. The search processor 254 may also transmit the key image to the digital signature database. The search processor 254 may extract the features of the key image and may compare the extracted features against the digital signatures stored in the digital signature database. The search processor 254 may thereby identify the target object and read its association tag, to determine its accompanying object. The search processor 254 may retrieve the digital signatures of at least one of the target object, the accompanying object, or the composite. The search processor 254 may transfer the digital signatures to the image processor 252 which then performs image recognition on the retrieved surveillance footage. When the image processor 252 finds a matching image from the surveillance footage that matches the digital signature of at least one of the target object, the accompanying object, or the composite, the search processor 254 may estimate the location of the target object based on the location of the camera that captured the matching image. The search processor 254 may also predict a trajectory of the accompanying object or the target object using a plurality of matching images, and refine the estimated location based on the predicted trajectory.

[0024] According to various embodiments, the search process may take place before the registration process. In other words, the registration process may proceed after the object tracking system 160 receives the query. The search processor 254 may search for the target object in videos captured by the image capturing devices 150. Based on the videos, the image processor 252 may identify the accompanying object(s). The image processor 252 may then extract features of the accompanying object(s) and the composite of the accompanying object(s) with the target object, and store these features in the digital signature database. The search processor 254 may initiate another round of search process, using the extracted features of the accompanying object(s) and the composite.

[0025] Identification of the accompanying object

[0026] The image processor 252 may be configured to identify the accompanying object(s) of any target object based on at least one criterion. The criteria may include a time duration that a candidate object appears in proximity of the target object. The image processor 252 may compute the number of times that the candidate object appears in the same image, in other words, the quantity of image frames that captures both the candidate object and the target object. The image processor 252 may compare the quantity against a quantity threshold. If the quantity exceeds the quantity threshold, the image processor 252 may determine the candidate object to be an accompanying object of the target object, or at least increase a probability count that the candidate object is an accompanying object of the target object.

[0027] The criteria may include a distance between the candidate object and the target object in each image frame that captures both the candidate object and the target object. The image processor 252 may compare the distance between the candidate object and the target object against a distance threshold. If the distance is less than the distance threshold, the image processor 252 may increase the probability count that the candidate object is the accompanying object, and conversely, may decrease the probability count. The image processor 252 may combine the time duration criterion and the distance criterion, to determine if the candidate object is less than the distance threshold away from the target object for a time duration longer than a time threshold represented by the quantity threshold.

[0028] The criteria may include whether the candidate object moves substantially together with the target object. The image processor 252 may chart the trajectory of the candidate object and the target object. The image processor 252 may disregard the candidate object, or decrease the probability count that the candidate object is the accompanying object, if the distance between the target object and the candidate object increases with successive image frames, thereby indicating that the candidate object is moving away from the target object.

[0029] The criteria may include whether the candidate object interacts with the target object according to a predefined movement. The image processor 252 may use a movement-trained machine learning model, to determine if the candidate object is performing a certain movement relative to the target object. For example, if the candidate object is a person, whether the person is picking up the target object.

[0030] The criteria may include object categories of the target object and the candidate object. The image processor 252 may recognize the object category of the target object and then determine the accompanying object category associated with the target object. For example, the image processor 252 may recognize the target object as a luggage and determine that the accompanying object category is a trolley. The image processor 252 may determine the accompanying object category by looking up in a database. The image processor 252 may also define the rest of the criteria based on the accompanying object category. For example, if the accompanying object category is a trolley, the interaction with the target object may be a movement of transporting the target object. The appropriate distance threshold may also vary depending on the accompanying object category. For example, the distance threshold between a trolley and luggage that it carries may be much shorter, as compared to a person and a trolley.

[0031] The image processor 252 may further detect decoupling of the accompanying object from the target object when the abovementioned criteria are no longer met. The image processor 252 may also determine that the decoupling when the distance between the accompanying object and the target object is increasing. When the image processor 252 detects a decoupling, it may proceed to identify a replacement accompanying object for the target object.

[0032] FIG. 3 illustrates a schematic hardware diagram 300 of a surveillance system 100 according to various embodiments. The surveillance system 100 may include an object tracking system 310. The object tracking system 310 may include a Central Processing Unit (CPU) 302, a network device 304, an input/output (I/O) interface 306 and a storage device 308. The CPU 302 may be configured to perform image and video analysis. The CPU 302 may alternatively be a Graphical Processing Unit (GPU). The storage device 308 may be a logical unit capable of storing programs 322 and databases 324. The storage device 308 may also temporarily store processing data when the CPU 302 is running programs 322. The storage device 308 may be an internal memory such as Random Access Memory (RAM), Solid State Drive (SSD), or Hard Disk Drive (HDD). The storage device 308 may also be a partially separated physical storage system as Network Attached Storage (NAS), or Storage Array Network (SAN). The program 322 may execute steps of the method for tracking objects. The database 324 may store information, such as location, description, time relating to the query, the target object, and the accompanying object, as well as features extracted from the image and videos. The network device 304 may perform data send and receive with external devices that are connected at the same network. The network device 304 may be wired Local Area Network (LAN), connected ethemet device, or wireless network connected device, etc. The I/O interface 306 may perform data send and receive with the input device 334 and the display 336. The I/O interface 306 may be a serial or parallel data interface such as Universal Serial Bus (USB), or High Definition Multimedia Interface (HDMI). The I/O interface 306 may also be a wireless connection such as Bluetooth, or Wireless LAN. The object tracking system 310 may be connected via the network device 304 and the I/O interface 306 to the image capture device 332, the input device 334 and the display 336. The image capture device 332 may provide image frames to the object tracking system 310. The image capture device 332 may include cameras or video management system (VMS). The display 336 may be a Liquid Crystal Display (LCD), a Plasma Display, a Cathode Ray Tube (CRT) Display, or a projector display, etc. The input device 334 may be a keyboard, a mouse, or a touch screen.

[0033] FIG. 4 illustrates a conceptual flow chart 400 of an object tracking system 310 in operation according to various embodiments. In 402, the object tracking system 310 may receive videos. The videos may be received from image capture devices 332. The videos may include surveillance videos or images captured by a plurality of cameras at a corresponding plurality of locations. The object tracking system 310 may receive the videos from the image capture device 332 through a network, a direct connection or wireless transmissions over data protocol such as Real Time Transport Protocol (RTSP). The object tracking system 310 may extract every image frame from the received videos and pass the image frame to 404. In 404, each image frame may be processed to detect objects, including the target object. 404 may also include recognizing an object category of each detected object. Detection of the target object may include utilizing a trained object detection neural network model trained to localize object positions within an image frame and to classify the target object. The process of localizing and classifying the target object may include evaluating the image signature or features extracted from each image frame. The image signature may be represented in an array of numbers. The image signature of the detected object may be registered on a database 324. In 406, other objects that were accompanying the target object, also referred herein as accompanying objects, may be identified. More than one accompanying object may be identified. The target object may appear together with more than one accompanying object at any time, or may be associated with different accompanying objects over time. For example, a first accompanying object may be a first person who delivers the target object to a second accompanying object which is another person. 406 may include determining an accompanying object category associated with the target object, and identifying the accompanying object at least partially based on the determined accompanying object category. 406 may include looking up in the database 324, for a matching accompanying object category to the object category of the target object as recognized in 404. 406 may be triggered by object detection process or periodically executed inside the object tracking system 310. The result of the identification of the accompanying objects, for example the image signature of the accompanying objects, may also be registered into the database 324. Thus, the database 324 may store information of the target objects and information of their corresponding accompanying objects. Whenever a user, for example a security officer, wants to perform an object search or tracking, the user may input a search command into the object tracking system 310 using an input device 334. In 408, the object tracking system 310 may accept the search command. The search command may include selecting a key image for the search or identification, where the key image portrays the target object to be searched. On performing definition of the key image, the object tracking system 310 may present a Graphical User Interface (GUI). The GUI may present an interface for the user to enter further information for the search. In 420, the object tracking system 310 may generate a query based on the key image. In other words, the object tracking system 310 may initiate searching or tracking of an object that resembles the key image. The query may be a combination of search time duration, camera selection and information on the accompanying object. The query may relate the key image with its corresponding signature from the database 312, or it may also extract the signature from the selected key image. In 422, the object tracking system 310 may compare the signature of the key image to image signatures stored in the database 324 to find a match, i.e. to identify the target object. The object tracking system 310 may also retrieve information of the accompanying object based on the matching target object. The image signature of the accompanying object may be used as a substitute for the target object, or used in combination with the initial search key image signature, for matching with images captured by the image capture device 332. The matching process can be performed by calculating similarity of object signature, for example by calculating the vector distance between key object signatures and the signatures of the object. The matching process may yield more than one matching image, for example using different key images. Each key image may include a different accompanying object. If the target object has more than one accompanying object, the object tracking system 310 may search for all of the accompanying objects, the last associated accompanying object, or any one of the accompanying object. After finding images with a matching signature to the signatures of the key images, the location of the target object may be estimated based on the location of the camera that captured the matching images. The matching process may further include determining the respective accuracies of the estimated locations and selecting the estimated locations with the highest determined accuracy. Determining the accuracies of the estimated locations may include comparing the similarity level of the signatures and whether the signature matches the target object or the accompanying object. For example, the accuracy level may be determined to be higher if the signature matches the target object. [0034] FIG. 5 illustrates an example of an image table 500 according to various embodiments. The image table 500 may be generated in 404. The image table 500 may contain information on images where objects are detected. These images may be image frames of videos captured by the image capturing devices 332. The image table 500 may be stored on the database 324. Column 502 may store image identifiers (ID). Each image of the same object in a different image frame may be assigned a unique image ID. Column 504 may store the image data. The image data may be the numerical representation of the image, for example, an array of binary code or hexadecimal code. Alternatively, column 504 may store a pointer or a web link to an archived image file. Column 506 may contain information on the image type, which may refer to the type of object within the detected image, in other words, the classification of the object that is detected within the image. Column 508 may contain the signature of the image, in other words, extracted features of the image. The signature may include a matrix or a vector that encodes the image. Column 510 may store a unique camera identifier that indicates the camera that captured the image. Column 512 may store the frame identifier that indicates a frame within the video captured by the camera. The frame ID may represent time information, since the sequence of the frame corresponds to the time that the image frame was captured. The frame ID may be an encoding of time and date, for example in the epoch format. Column 514 may store information on bounding box. The bounding box may contain x, y coordinate location position within the frame and x width and y width of the image, as information to be used to create bounding box of the object on top of the frame image. The distance between objects within an image frame may be calculated from the bounding box of each object.

[0035] FIG. 6 illustrates an example of an object table 600 according to various embodiments. The object table 600 may be stored in the database 324. The object table 600 may store information of objects detected in the images captured by the image capturing devices 332. The object table 600 may be generated in 404. Column 602 may store object IDs. The object IDs may be unique identification codes for every detected object. Column 604 may store information on the object type, i.e. category or classification that the detected object belongs to. For example, the detected object may be a person, a bicycle, a bag, an animal or any other types of objects, which values may be taken from column 506. Column 606 may include a list of image IDs from image frames where the object appears. Motion estimation combined with similarity of object images between consecutive frames may be used for determining whether object images from group of frames are actually the same object. Similarity of object images within frames from different cameras, which narrowed down with time, location and movement direction of objects may be used for determining whether object images from frames at different cameras are actually the same object.

[0036] FIG. 7 illustrates an example of an accompanying object table 700 according to various embodiments. The accompanying object table 700 may include information of all identified accompanying objects identified. The accompanying object table 700 may be stored in the database 324, and may be generated in 406. The accompanying object table 700 may store data on the association between an object and an accompanying object. Column 702 may indicate object ID. Column 704 may indicate accompanying object ID. Column 706 may indicate the ID of the camera that first captures the target object together with the accompanying object. Column 708 may indicate the ID of the image frame that first captures the target object together with the accompanying object. Column 710 may indicate the ID of the camera that captures a last image of the target object together with the accompanying object. Column 712 may indicate the ID of the last image where the target object appears together with the accompanying object. Each target object may be associated with more than one accompanying object such that the Table 700 may include more than one row having the same object ID in column 702 but different accompanying object IDs in column 704.

[0037] In the following, a surveillance system according to various embodiments is described with respect to specific examples of applications, although the surveillance system is not limited to being deployed in these examples.

[0038] First example

[0039] A surveillance system may be employed at an airport where cameras are arranged to monitor the baggage carousels. The cameras may be located in the vicinity of baggage carousels. The object tracking system may process every image (or image frame) captured by each camera, to identify luggage, for example suitcases, bags and other objects that are delivered by the conveyor belt of the baggage carousel. The object tracking system may register each luggage as an individual object. The object tracking system may retrieve from an object database, information on candidate accompanying objects for luggage. For example, the object database may include accompanying object categories such as humans, automobiles, trolleys, etc. The object database may store information that limits the accompanying object categories for luggage to trolleys and people. On retrieving information on the accompanying object categories, the object tracking system may proceed to identify trolleys and humans in the surveillance videos captured by the cameras. The object tracking system may identify accompanying object(s) of the trolley, from at least one of the categories of trolleys and humans, using a series of sequential image frames from the surveillance videos. The object tracking system may identify the accompanying object(s) based on the abovementioned criteria, either alone or in combination. For example, the object tracking system may determine that there is a trolley and a person that are in the vicinity of the luggage for longer than a time threshold. The object tracking system may alternatively, or additionally, determine that both the trolley and the person are closer to the luggage than a distance threshold. The object tracking system may also determine that the person performed a movement of lifting the luggage off the conveyor belt and loading the luggage onto the trolley. Upon recognizing the motion of the luggage being picked up, the object tracking system may pair the luggage with at least one of the traveler and the trolley. The object tracking system may also determine that the distance between the person and the luggage increases over successive frames, whereas the distance between the trolley and the luggage remains constant over successive frames, indicating that the person has handed over the luggage to the trolley. As such, the object tracking system may disregard the person, and may identify the trolley as the accompanying object of the luggage. In searching for the luggage, the object tracking system may extract the digital characteristics, i.e. digital signature of the luggage, the trolley and the composite image of the luggage arranged on the trolley. The object tracking system may compare image frames in various surveillance videos, against the extracted digital characteristics, to find the luggage, the trolley or a combination of the luggage arranged on the trolley. The object tracking system may search for the accompanying object, independently from the target object. The object tracking system may compare image frames in the surveillance videos against the extracted digital characteristics of the accompanying object, to locate the accompanying object. The object tracking system may switch from searching for the target object to searching for the accompanying object, either by user input or automatically when the target object cannot be found. Therefore, even if the target object cannot be found or identified in any of the surveillance videos, the object tracking system may still be able to locate the accompanying object which may provide clues to the location of the target object.

[0040] Second example

[0041] A second example of the surveillance system in operation, is described with respect to FIGS. 8 to 14. [0042] FIG. 8 illustrates a diagram 800 of a surveillance system 102 according to various embodiments, being in operation. The surveillance system 102 may include an object tracking system according to various embodiments. The surveillance system 102 may be part of the surveillance system 100. The surveillance system 102 may be connected to a plurality of cameras 104 that are situated at various locations, such that the surveillance system 102 may be able to receive surveillance videos recorded by the cameras 104. The plurality of cameras 104 may collectively survey a wide area, also referred herein as areas under camera surveillance 112. In the example application, the surveillance system 102 may be used to track a target object, for example, a missing object such as a bicycle 120. The target object may be any other object, such as a suitcase, a box, or a vehicle.

[0043] Use Case 1

[0044] The owner of the bicycle 120 may report the last time and location where he has seen the bicycle 120. A security officer may enter a query into the surveillance system 102. The query may include the reported timing and location. The surveillance system 102 may retrieve the surveillance video according to the reported timing and location, also referred herein as the search key determination area 110. The search key determination area 110 may be part of the areas under camera surveillance 112. The bicycle 120 may be identified from the retrieved surveillance video, either by image recognition techniques performed by the surveillance system 102, or by a user manually marking out the bicycle 120 from an image frame of the retrieved surveillance video as part of the query. The surveillance system 102 may extract features of the image of the bicycle 120 and may store the extracted features in a database. The surveillance system 102 may attempt to match the extracted features against image features in the surveillance videos of the areas under camera surveillance 112. When a match is found, the surveillance system 102 may analyse the image frame in which the match is found. In analyzing the image frame, the surveillance system 102 may determine if there is an accompanying object in proximity of the bicycle 120. The surveillance system 102 may determine the presence of the accompanying object based on at least one of time during which the accompanying object is near to the bicycle 120, the object type of the accompanying object, the interaction between the accompanying object and the bicycle 120, the trajectory of the accompanying object in relation to the trajectory of the bicycle 120. For example, the accompanying object may be a person 122. For example, the person 122 may need to appear close to the bicycle 120 for at least an arbitrary time threshold, such as 50% of the time from the time that the bicycle 120 was last seen, to be considered the accompanying object of the bicycle. For example, the bicycle 120 may be identified as a vehicle, and the surveillance system 102 may associate vehicles with people. As such, the surveillance system 102 may consider people near the bicycle 120 to be potentially the accompanying object. For example, the surveillance system 102 may analyse motion of the person 122, to determine if the person 122 is causing the bicycle 120 to move, for example riding or pushing the bicycle 120. If the person 122 appears to be causing the bicycle 120 to move, the surveillance system 102 may determine the person 122 to be the accompanying object. For example, the surveillance system 102 may compute velocities, i.e. direction and speed of the person 122, as well velocities of the bicycle 120 in the surveillance videos of the areas under camera surveillance 112. The surveillance system 102 may plot the trajectories of each of the bicycle 120 and the person 122. If the bicycle 120 and the person 122 are moving at least substantially along the same trajectory, and in tandem, the surveillance system 102 may determine the person 122 to be the accompanying object of the bicycle 120.

[0045] If an accompanying object is identified, the surveillance system 102 may extract features of the person 122. The surveillance system 102 may also extract additional features of images of the bicycle 120 juxtaposed together with the person 122, for example images of the person 122 riding the bicycle 120, or the person 122 pushing the bicycle 120 such that parts of the bicycle 120 are obscured by the person 122. The surveillance system 102 may use the extracted additional features, for comparison against features in the surveillance videos, to find more appearances of the bicycle 120 or the person 122. This new round of matching the extracted additional features may yield more search results of the bicycle 120, as the bicycle 120 may sometimes be at least partially obscured in the surveillance videos but by also looking out for the person 122, the chances of finding the bicycle 120 may be increased.

[0046] Use Case 2

[0047] The surveillance system 102 may be able to locate the bicycle 120 even without the information on the last time and location where the bicycle 120 has been seen. Most urban venues have defined pathways or exits. For example, the bicycle may most likely travel along a bicycle path in a park and then leave the park via an exit of the park. The surveillance system 102 may monitor the exits of the park. Areas in vicinity of the exits may be referred herein as identification areas 114. When the bicycle 120 appears in a vicinity of any exit, the surveillance system 102 may identify the bicycle 120 and may trigger an alert to a user of the surveillance system 102, for example a security officer stationed near the exit. Similar to Use Case 1, the surveillance system 102 may also identify the accompanying object which is the person 122 in this example. The surveillance system 102 may further monitor the identification area 114, for the person 122, or for a combination of the person 122 and the bicycle 120.

[0048] FIG. 9 illustrates an example screen 900 of a GUI of a surveillance system according to various embodiments. The GUI may display representations of a plurality of cameras, using for example icons or buttons 902. The user may choose to view images from one of the cameras by selecting its corresponding button 902.

[0049] FIG. 10 illustrates an example screen 1000 of a GUI of a surveillance system according to various embodiments. After the user has selected a camera in the screen 900, the GUI may display the video 1002 captured by the selected camera. The user may play, rewind, fast forward, or fast rewind the video 1002. When the user spots a target object 1004 that he wishes to track, the user may select or crop the target object 1004. A key image 1006 of the target object 1004 may be extracted from the video 1002. The key image 1006 may be used as a baseline for comparison, for identifying the target object 1004. The user may click on a button 1008 to initiate the search process.

[0050] FIG. 11 illustrates an example screen 1100 of a GUI of a surveillance system according to various embodiments. The screen 1100 may display results 1102 of the search process that may be initiated from the screen 1000. The object tracking system may search for the target object 1004 in all of the videos captured by the cameras, using the key image 1006. The object tracking system may compare the signature of the key image 1006 against the signatures of each image frame in the videos. The most similar image frames may be displayed in the results 1102. The time and camera ID may also be displayed alongside the most similar image frames.

[0051] FIG. 12 illustrates an example screen 1200 of a GUI of a surveillance system according to various embodiments. The screen 1200 may also display a live video feed 1206. The object tracking system may perform a matching check to identify whether the target object 1004 appears in a current image frame of the live video feed 1206. The screen 1200 may display a notification or alert 1202 when the target object 1004 appears in the current image frame.

[0052] FIG. 13 illustrates an example screen 1300 of a GUI of a surveillance system according to various embodiments. In this example, the target object 1004 may not be found in the live video feed 1206 but a person 1302 who is identified as the accompanying object may be found on a current image frame of the live video feed 1206. Consequently, the screen 1300 may display a notification or alert 1202. The person 1302 may be a suspected thief. A user of the object tracking system, for example, a security officer, may stop the person 1302 in time upon receiving the alert 1202. The target object 1004 may have more than one accompanying objects, or may be associated with different accompanying objects over time. For example, the person 1302 may hand the target object 1004 to another person after a time duration. The surveillance system may search for all of the accompanying objects, or the last associated accompanying object, or any one of the accompanying objects, in order to track the target object 1004.

[0053] FIG. 14 illustrates an example screen 1400 of a Graphical User Interface (GUI) of a surveillance system according to various embodiments. In this example, the target object 1004 and the accompanying object may not be found in the live video feed 1206 but a composite image of the accompanying object and the target object 1004 may be found on a current image frame of the live video feed 1206. Consequently, the screen 1400 may display a notification or alert 1202.

[0054] FIG. 15 illustrates a conceptual diagram of a surveillance system 1500 according to various embodiments. The surveillance system 1500 may include the surveillance system 100 or 102, or the object tracking system 160 or 310. The surveillance system 1500 may include an object detector 1502, an accompanying object identification unit 1504, a query generator 1506 and an object search unit 1508. The object detector 1502 may be configured to detect an object of interest, also referred herein as a target object, in the field-of-view of a camera. The object detector 1502 may be further configured to extract the unique signature of the object of interest. The accompanying object identification unit may be configured to identify another object that appears in the images captured by the camera, as an object that accompanies the object of interest. The query generator 1506 may be configured to generate a query using the signatures of the object of interest and the other object. The object search unit 1508 may be configured to perform search of matching objects using the query. The surveillance system 1500 may optionally include an object transfer identification unit 1510. The object transfer identification unit 1510 may be configured to identify changes of accompanying object from one object to another. In other words, the object transfer identification unit 1510 may be configured to detect that the object of interest is accompanied by a different object.

[0055] According to various embodiments, a method for tracking a target object may include identifying an accompanying object from a series of digital images that capture the target object. The identification of the accompanying object may include, or may be part of, 406. The identification of the accompanying object may be based on a set of criteria applied to the series of digital images. The set of criteria may include a minimum quantity or proportion of digital images in the series of digital images that captures both the target object and the accompanying object. The set of criteria may also include at least one of a maximum distance between the target object and the accompanying object, a common direction of movement between the target object and the accompanying object, and a recognized interaction type of an interaction between the target object and the accompanying object. The method may include detecting the interaction and recognizing the interaction type of the interaction from a predetermined plurality of interaction types. The method may further include extracting features of at least one of the accompanying object and a composite of the target object and the accompanying object, from the series of digital images. The extracted features may be referred herein as digital signatures, and may be stored in the database 324. The method may further include comparing the extracted features with features in a set of surveillance images. The comparison of the extracted features with features in the set of surveillance images may include, or may be part of, 422. The set of surveillance images may be captured by a plurality of cameras at a corresponding plurality of locations. The method may further include determining an accompanying object category associated with the target object, for example, by recognising an object category of the target object and looking up in a database for a matching accompanying object category. The method may also include defining the set of criteria based on the determined accompanying object category. Identifying the accompanying object may be further based on the accompanying object category. The method may further include providing an estimated location of the target object based on the corresponding location of the camera which captured the surveillance image where the extracted features are found. The method may further include identifying a second accompanying object from the series of digital images, extracting further features of at least one of the second accompanying object and a composite of the target object and the second accompanying object from the series of digital images, comparing the further extracted features with features in the set of surveillance images, and providing a further estimated location of the target object further based on the corresponding location of the camera which captured the surveillance image where the further extracted features are found. The identification of the second accompanying object may include, or may be part of, 406. The comparison of the further extracted features with features in the set of surveillance images may include, or may be part of, 422. The method may further include determining an accuracy of the estimated location based on at least one of the amount of extracted features found in the set of surveillance images, and whether the extracted features found correspond to the accompanying object or correspond to the composite of the target object and the accompanying object. The method may also include determining an accuracy of the further estimated location based on at least one of the amount of further extracted features found in the set of surveillance images, and whether the further extracted features found corresponding to the second accompanying object or correspond to the composite of the target object and the second accompanying object, and selecting one of the estimated location and the further estimated location based on the determined accuracies.

[0056] According to various embodiments, an object tracking system may include an image identifier, a feature extractor and a comparison processor. The object tracking system may be the object tracking system 160 or 310, or the surveillance system 1500. The image identifier may include, or may be part of, the image processor 252, the object detector 1502, or the accompanying object identification unit 1504. The image identifier may be configured to retrieve a series of digital images that capture a target object, from a database. The image identifier may be further configured to identify an accompanying object from the series of digital images based on a set of criteria applied to the series of digital images. The set of criteria may include a minimum quantity or proportion of digital images in the series of digital images that captures both the target object and the accompanying object. The feature extractor may include, or may be part of, the feature extraction module 258, or the object detector 1502. The feature extractor may be configured to extract features of at least one of the accompanying object and a composite of the target object and the accompanying object, from the series of digital images. The comparison processor may include, or may be part of, the image processor 252, the search processor 254, or the object search unit 1508. The comparison processor may be configured to compare the extracted features with features in a set of surveillance images.

[0057] According to various embodiments, a non-transitory computer readable medium may be provided. The non-transitory computer readable medium may include instructions, which when executed, may perform a method for tracking a target object.

[0058] The following examples pertain to further embodiments.

[0059] Example 1 is a method for tracking a target object, the method including: identifying an accompanying object from a series of digital images that capture the target object, based on a set of criteria applied to the series of digital images; wherein the set of criteria includes a minimum quantity or proportion of digital images in the series of digital images that captures both the target object and the accompanying object; extracting features of at least one of the accompanying object and a composite of the target object and the accompanying object, from the series of digital images; and comparing the extracted features with features in a further plurality of digital images.

[0060] In example 2, the subject-matter of any one of example 1 can optionally include: defining the set of criteria based on the determined accompanying object category.

[0061] In example 3, the subject-matter of any one of examples 1 to 2 can optionally include that the set of criteria further includes: a maximum distance between the target object and the accompanying object.

[0062] In example 4, the subject-matter of any one of examples 1 to 3 can optionally include that the set of criteria further includes: a common direction of movement between the target object and the accompanying object.

[0063] In example 5, the subject-matter of any one of examples 1 to 4 can optionally include: detecting an interaction between the target object and the accompanying object; and recognizing an interaction type of the interaction from a predetermined plurality of interaction types.

[0064] In example 6, the subject-matter of example 5 can optionally include that the set of criteria further includes: the recognized interaction type of the interaction.

[0065] In example 7, the subject-matter of any one of examples 1 to 6 can optionally include: detecting a decoupling of the accompanying object from the target object; and upon detecting the decoupling of the accompanying object, identifying another accompanying object based on the set of criteria.

[0066] In example 8, the subject-matter of any one of examples 1 to 7 can optionally include: determining an accompanying object category associated with the target object; and identifying the accompanying object further based on the accompanying object category.

[0067] In example 9, the subject-matter of any one of example 8 can optionally include that determining the accompanying object category includes: recognizing an object category of the target object; and looking up in a database, for a matching accompanying object category.

[0068] In example 10, the subject-matter of any one of examples 1 to 9 can optionally include that the further plurality of digital images are surveillance images captured by a plurality of cameras at a corresponding plurality of locations. [0069] In example 11, the subject-matter of example 10 can optionally include: providing an estimated location of the target object based on the corresponding location of the camera which captured the surveillance image where the extracted features are found.

[0070] In example 12, the subject-matter of example 11 can optionally include: identifying a second accompanying object from the series of digital images; extracting further features of at least one of the second accompanying object and a composite of the target object and the second accompanying object, from the series of digital images; comparing the further extracted features with features in the further plurality of digital images; and providing a further estimated location of the target object further based on the corresponding location of the camera which captured the surveillance image where the further extracted features are found.

[0071] In example 13, the subject-matter of example 12 can optionally include: determining an accuracy of the estimated location based on at least one of the amount of extracted features found in the surveillance images, and whether the extracted features found correspond to the accompanying object or correspond to the composite of the target object and the accompanying object; determining an accuracy of the further estimated location based on at least one of the amount of further extracted features found in the surveillance images, and whether the further extracted features found correspond to the second accompanying object or correspond to the composite of the target object and the second accompanying object; and selecting one of the estimated location and the further estimated location based on the determined accuracies.

[0072] In example 14, the subject-matter of any one of examples 1 to 13 can optionally include: receiving the plurality of surveillance images from a plurality of cameras through a network.

[0073] In example 15, the subject-matter of any one of examples 1 to 14 can optionally include that the series of digital images includes sequential image frames from a video.

[0074] In example 16, the subject-matter of any one of examples 1 to 15 can optionally include that providing an estimated location of the target object includes predicting a trajectory of the accompanying object.

[0075] Example 17 is an object tracking system including: an image identifier configured to retrieve from a database, a series of digital images that capture a target object, and further configured to identify an accompanying object from the series of digital images based on a set of criteria applied to the series of digital images; wherein the set of criteria includes a minimum quantity or proportion of digital images in the series of digital images that captures both the target object and the accompanying object; a feature extractor configured to extract features of at least one of the accompanying object and a composite of the target object and the accompanying object, from the series of digital images; and a comparison processor configured to compare the extracted features with features in a further plurality of digital images.

[0076] Example 18 is a non-transitory computer readable medium including instructions, which when executed, performs a method for tracking a target object, the method including: identifying an accompanying object from a series of digital images that capture the target object, based on a set of criteria applied to the series of digital images; wherein the set of criteria includes a minimum quantity or proportion of digital images in the series of digital images that captures both the target object and the accompanying object; extracting features of at least one of the accompanying object and a composite of the target object and the accompanying object, from the series of digital images; and comparing the extracted features with features in a further plurality of digital images.

[0077] While embodiments of the invention have been particularly shown and described with reference to specific embodiments, it should be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. The scope of the invention is thus indicated by the appended claims and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced. It will be appreciated that common numerals, used in the relevant drawings, refer to components that serve a similar or the same purpose.

[0078] It will be appreciated to a person skilled in the art that the terminology used herein is for the purpose of describing various embodiments only and is not intended to be limiting of the present invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

[0079] It is understood that the specific order or hierarchy of blocks in the processes / flowcharts disclosed is an illustration of exemplary approaches. Based upon design preferences, it is understood that the specific order or hierarchy of blocks in the processes / flowcharts may be rearranged. Further, some blocks may be combined or omitted. The accompanying method claims present elements of the various blocks in a sample order, and are not meant to be limited to the specific order or hierarchy presented.

[0080] The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects. Thus, the claims are not intended to be limited to the aspects shown herein, but is to be accorded the full scope consistent with the language claims, wherein reference to an element in the singular is not intended to mean“one and only one” unless specifically so stated, but rather“one or more.” The word“exemplary” is used herein to mean“serving as an example, instance, or illustration.” Any aspect described herein as“exemplary” is not necessarily to be construed as preferred or advantageous over other aspects. Unless specifically stated otherwise, the term“some” refers to one or more. Combinations such as “at least one of A, B, or C,”“one or more of A, B, or C,”“at least one of A, B, and C,”“one or more of A, B, and C,” and“A, B, C, or any combination thereof’ include any combination of A, B, and/or C, and may include multiples of A, multiples of B, or multiples of C. Specifically, combinations such as“at least one of A, B, or C,”“one or more of A, B, or C,”“at least one of A, B, and C,”“one or more of A, B, and C,” and“A, B, C, or any combination thereof’ may be A only, B only, C only, A and B, A and C, B and C, or A and B and C, where any such combinations may contain one or more member or members of A, B, or C. All structural and functional equivalents to the elements of the various aspects described throughout this disclosure that are known or later come to be known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the claims. Moreover, nothing disclosed herein is intended to be dedicated to the public regardless of whether such disclosure is explicitly recited in the claims. The words“module,”“mechanism,”“element,”“device,” and the like may not be a substitute for the word“means.” As such, no claim element is to be construed as a means plus function unless the element is expressly recited using the phrase“means for.”

Claims

1. A method for tracking a target object, the method comprising:

identifying an accompanying object from a series of digital images that capture the target object, based on a set of criteria applied to the series of digital images;

wherein the set of criteria comprises a minimum quantity or proportion of digital images in the series of digital images that captures both the target object and the accompanying object;

extracting features of at least one of the accompanying object and a composite of the target object and the accompanying object, from the series of digital images; and

comparing the extracted features with features in a further plurality of digital images.

2. The method of claim 1, further comprising:

defining the set of criteria based on the determined accompanying object category.

3. The method of claim 1, wherein the set of criteria further comprises:

a maximum distance between the target object and the accompanying object.

4. The method of claim 1, wherein the set of criteria further comprises:

a common direction of movement between the target object and the accompanying object.

5. The method of claim 1, further comprising:

detecting an interaction between the target object and the accompanying object; and recognizing an interaction type of the interaction from a predetermined plurality of interaction types.

6. The method of claim 5, wherein the set of criteria further comprises:

the recognized interaction type of the interaction.

7. The method of claim 1, further comprising:

detecting a decoupling of the accompanying object from the target object; and upon detecting the decoupling of the accompanying object, identifying another accompanying object based on the set of criteria.

8. The method of claim 1, further comprising:

determining an accompanying object category associated with the target object; and identifying the accompanying object further based on the accompanying object category.

9. The method of claim 8, wherein determining the accompanying object category comprises:

recognizing an object category of the target object; and

looking up in a database, for a matching accompanying object category.

10. The method of claim 1, wherein the further plurality of digital images are surveillance images captured by a plurality of cameras at a corresponding plurality of locations.

11. The method of claim 10, further comprising:

providing an estimated location of the target object based on the corresponding location of the camera which captured the surveillance image where the extracted features are found.

12. The method of claim 11, further comprising:

identifying a second accompanying object from the series of digital images;

extracting further features of at least one of the second accompanying object and a composite of the target object and the second accompanying object, from the series of digital images;

comparing the further extracted features with features in the further plurality of digital images; and

providing a further estimated location of the target object further based on the corresponding location of the camera which captured the surveillance image where the further extracted features are found.

13. The method of claim 12, further comprising: determining an accuracy of the estimated location based on at least one of the amount of extracted features found in the surveillance images, and whether the extracted features found correspond to the accompanying object or correspond to the composite of the target object and the accompanying object;

determining an accuracy of the further estimated location based on at least one of the amount of further extracted features found in the surveillance images, and whether the further extracted features found correspond to the second accompanying object or correspond to the composite of the target object and the second accompanying object; and

selecting one of the estimated location and the further estimated location based on the determined accuracies.

14. The method of claim 1, further comprising:

receiving the plurality of surveillance images from a plurality of cameras through a network.

15. The method of claim 1, wherein the series of digital images includes sequential image frames from a video.

16. The method of claim 1, wherein providing an estimated location of the target object comprises predicting a trajectory of the accompanying object.

17. An object tracking system comprising:

an image identifier configured to retrieve from a database, a series of digital images that capture a target object, and further configured to identify an accompanying object from the series of digital images based on a set of criteria applied to the series of digital images; wherein the set of criteria comprises a minimum quantity or proportion of digital images in the series of digital images that captures both the target object and the accompanying object;

a feature extractor configured to extract features of at least one of the accompanying object and a composite of the target object and the accompanying object, from the series of digital images; and

a comparison processor configured to compare the extracted features with features in a further plurality of digital images.

18. A non-transitory computer readable medium comprising instructions, which when executed, performs a method for tracking a target object, the method comprising: