CN114004864A

CN114004864A - Object tracking method, related device and computer program product

Info

Publication number: CN114004864A
Application number: CN202111267978.0A
Authority: CN
Inventors: 路金诚; 张伟; 谭啸; 孙昊
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-10-29
Filing date: 2021-10-29
Publication date: 2022-02-01

Abstract

The disclosure provides an object tracking method, an object tracking device, an electronic device, a computer readable storage medium and a computer program product, and relates to the technical field of artificial intelligence such as computer vision and deep learning. One embodiment of the method comprises: respectively extracting historical objects and current objects from historical images and current images, obtaining an object pair consisting of any historical object and any current object, wherein the historical images and the current images are from the same continuous frame image set, generating comprehensive characteristic distances of the object pair based on optical flow characteristic distances and pixel point change characteristic distances of the historical objects and the current objects in the object pair, and finally determining the current objects corresponding to the historical objects based on the maximum optimization combination result of the comprehensive characteristic distances. The implementation method can track the object based on the characteristics of multiple dimensions, and the tracking quality of multi-target tracking operation is improved.

Description

Object tracking method, related device and computer program product

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to the field of artificial intelligence technologies such as computer vision and deep learning, and in particular, to an object tracking method and apparatus, an electronic device, a computer-readable storage medium, and a computer program product.

Background

The main task of multi-target tracking is to locate a plurality of targets in a given video at the same time, record corresponding tracks on the premise of maintaining the Identity (ID) of the targets, and can be widely applied to the fields of robot navigation, intelligent monitoring video, industrial detection, aerospace, automatic driving and the like.

Disclosure of Invention

The embodiment of the disclosure provides an object tracking method, an object tracking device, an electronic device, a computer-readable storage medium and a computer program product.

In a first aspect, an embodiment of the present disclosure provides an object tracking method, including: respectively extracting a history object and a current object from a history image and a current image, and obtaining an object pair consisting of any history object and any current object, wherein the history image and the current image are from the same continuous frame image set; generating a comprehensive characteristic distance of the object pair based on the optical flow characteristic distance and the pixel point change characteristic distance of the historical object and the current object in the object pair; and determining the current object corresponding to each historical object based on the maximum optimization combination result of each comprehensive characteristic distance.

In a second aspect, an embodiment of the present disclosure provides an object tracking apparatus, including: an object pair generating unit configured to extract a history object and a current object from a history image and a current image, respectively, and obtain an object pair consisting of any one of the history object and any one of the current object, wherein the history image and the current image are from the same continuous frame image set; a comprehensive feature generating unit configured to generate a comprehensive feature distance of the pair of objects based on optical flow feature distances and pixel point change feature distances of a current object and a history object in the pair of objects; and the object tracking unit is configured to determine a current object corresponding to each historical object based on the maximum optimization combination result of each comprehensive characteristic distance.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method of object tracking as described in any one of the implementations of the first aspect when executed.

In a fourth aspect, the disclosed embodiments provide a non-transitory computer-readable storage medium storing computer instructions for enabling a computer to implement an object tracking method as described in any implementation manner of the first aspect when executed.

In a fifth aspect, the embodiments of the present disclosure provide a computer program product comprising a computer program, which when executed by a processor is capable of implementing the object tracking method as described in any implementation manner of the first aspect.

The object tracking method, device, electronic device, computer-readable storage medium, and computer program product provided by the embodiments of the present disclosure extract a history object and a current object from a history image and a current image, respectively, and obtain an object pair composed of any history object and any current object, where the history image and the current image are from a same continuous frame image set, and generate a comprehensive feature distance of the object pair based on optical flow feature distances and pixel point change feature distances of the history object and the current object in the object pair, and finally determine a current object corresponding to each history object based on a maximum optimized combination result of the comprehensive feature distances.

According to the method and the device, after the content of the pixel level of the object included in the frame images in the continuous frame image set is obtained, the object is tracked based on the characteristics of multiple dimensions, and the tracking quality of multi-target tracking operation can be improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture to which the present disclosure may be applied;

fig. 2 is a flowchart of an object tracking method according to an embodiment of the disclosure;

FIG. 3 is a flow chart of another object tracking method provided by the embodiments of the present disclosure;

FIGS. 4-1 and 4-2 are schematic diagrams illustrating the effect of the object tracking method in an application scenario according to the embodiment of the present disclosure;

fig. 5 is a block diagram illustrating an object tracking apparatus according to an embodiment of the disclosure;

fig. 6 is a schematic structural diagram of an electronic device suitable for executing an object tracking method according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness. It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict.

In addition, in the technical scheme related to the disclosure, the processes of obtaining, storing, using, processing, transporting, providing, disclosing and the like of the personal information of the related user (for example, the image including the human face object related to the subsequent process of the disclosure) are all in accordance with the regulations of related laws and regulations, and do not violate the customs of the public order.

Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the object tracking methods, apparatus, electronic devices, and computer-readable storage media of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 and the server 105 may be installed with various applications for implementing information communication between the two, such as an object tracking application, an automatic driving application, an instant messaging application, and the like.

The

terminal apparatuses

101, 102, 103 and the server 105 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices with display screens, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like; when the

terminal devices

101, 102, and 103 are software, they may be installed in the electronic devices listed above, and they may be implemented as multiple software or software modules, or may be implemented as a single software or software module, and are not limited in this respect. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of multiple servers, or may be implemented as a single server; when the server is software, the server may be implemented as a plurality of software or software modules, or may be implemented as a single software or software module, which is not limited herein.

The server 105 may provide various services through various built-in applications, and taking a target tracking class application with an object tracking service as an example, the server 105 may implement the following effects when running the target tracking class application: firstly, acquiring historical images and current images from the same continuous frame image set from

terminal equipment

101, 102 and 103 through a network 104; then, the server 105 extracts the history object and the current object from the history image and the current image, respectively, and obtains an object pair consisting of any one of the history object and any one of the current object; next, the server 105 generates a comprehensive feature distance of the object pair based on the optical flow feature distance and the pixel point change feature distance of the historical object and the current object in the object pair; finally, the server 105 determines the current object corresponding to each of the historical objects based on the maximum optimized combination result of each of the integrated feature distances.

Note that the history image and the current image from the same continuous frame image set may be stored locally in the server 105 in advance in various ways, in addition to being acquired from the

terminal apparatuses

101, 102, 103 via the network 104. Thus, when the server 105 detects that such data is already stored locally (e.g., an object tracking task that persists before starting processing), it may choose to retrieve such data directly from locally, in which case the exemplary system architecture 100 may also not include the

terminal devices

101, 102, 103 and the network 104.

Since the extraction of the image features, the generation of the maximum optimized combination result of the comprehensive feature distance, and the like need to occupy more computation resources and stronger computation capability, the object tracking method provided in the following embodiments of the present disclosure is generally executed by the server 105 having stronger computation capability and more computation resources, and accordingly, the object tracking device is generally disposed in the server 105. However, it should be noted that when the

terminal devices

101, 102, and 103 also have computing capabilities and computing resources meeting the requirements, the

terminal devices

101, 102, and 103 may also complete the above-mentioned operations performed by the server 105 through the object tracking application installed thereon, and then output the same result as the server 105. Especially, when there are a plurality of terminal devices having different computation capabilities at the same time, but the target tracking application determines that the terminal device has a strong computation capability and a large amount of computation resources are left, the terminal device may execute the above computation, so as to appropriately reduce the computation pressure of the server 105, and accordingly, the object tracking apparatus may be installed in the

terminal devices

101, 102, and 103. In such a case, the exemplary system architecture 100 may also not include the server 105 and the network 104.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring to fig. 2, fig. 2 is a flowchart of an object tracking method according to an embodiment of the disclosure, wherein the process 200 includes the following steps:

step 201, extracting a history object and a current object from the history image and the current image respectively, and obtaining an object pair consisting of any history object and any current object.

In the embodiment, when receiving the history image and the current image from the same continuous frame image set, the execution subject of the object tracking method (for example, the server 105 shown in fig. 1) responds, and extracts the history object included in the history image from the history image, extracts the current object from the current image, the history object and the current object are usually objects except for environment objects which can move and exist in the history image and the current image, preferably, all the history object and the current object existing in the history image and the current image are extracted to simultaneously complete the tracking of a plurality of objects, and after the history object and the current object, any history object and any current object are extracted to form an object pair, namely, the object pair includes a history object and a current object.

It should be understood that, when there are a plurality of history objects and/or a plurality of current objects, there should be a plurality of object pairs consisting of a history object and a current object.

It should be noted that the historical images and the current images may be acquired by the execution subject directly from a local storage device, or may be acquired from a non-local storage device (e.g.,

terminal devices

101, 102, 103 shown in fig. 1). The local storage device may be a data storage module disposed in the execution main body, such as a server hard disk, in which case the historical image and the current image can be quickly read locally; the non-local storage device may also be any other electronic device configured to store data, such as some user terminals, in which case the executing entity may obtain the required historical images and current images by sending an obtaining command to the electronic device.

Step 202, generating a comprehensive characteristic distance of the object pair based on the optical flow characteristic distance and the pixel point change characteristic distance of the historical object and the current object in the object pair.

In this embodiment, after an object pair consisting of any historical object and any current object is obtained, the optical flow feature distances and the pixel point change feature distances of the historical object and the current object in the object pair are respectively extracted, and the obtained optical flow feature distances and the pixel point change feature distances are spliced to obtain the comprehensive feature distance.

For example, optical flow prediction may be implemented by using a current All-Pairs Field Transforms (RAFT) capable of performing dense optical flow prediction, and determining optical flow characteristic distances between a history object and the current object, specifically, after obtaining an IOU distance between the history object and the current object by using an Intersection-Over-unity (IOU) ratio of an Intersection area of the history object to an Intersection-Over-unity (IOU) as a characteristic 1, and using the following formula, calculating an optical flow characteristic distance d1 between the history object and the current object in the object pair:

d1＝1-IOU(f(mask(a),mask(b))

wherein f is optical flow transformation, mask (a) is a history object, and mask (b) is a current object.

Further, in this exemplary case, the historical object and the current object may be input into a neural network such as a residual neural network (ResNet) for feature extraction, the neural network may extract a feature code in a high-dimensional space through convolution, that is, a pedestrian Re-Identification feature (ReID feature), the pedestrian Re-Identification feature code is a vector of 128 dimensions in general, and after the pedestrian Re-Identification feature is used as a pixel point change feature, a pixel point change feature distance d2 between pixel point change features corresponding to the historical object and the current feature may be determined by using the following formula:

d2＝cos(ReID(a),ReID(b))

wherein reid (a) is a pedestrian re-identification feature corresponding to the historical object, and reid (b) is a pedestrian re-identification feature corresponding to the current object.

Finally, d1 and d2 are added to get the composite feature distance.

And step 203, determining the current object corresponding to each historical object based on the maximum optimization combination result of each comprehensive characteristic distance.

In this embodiment, after the comprehensive feature distances corresponding to each object pair are obtained based on the above step 202, the processing results of each comprehensive feature distance are combined by using a maximum Combination optimization Algorithm (Optimal Combination Algorithm), which is a problem of solving an extreme value in a discrete state. Arranging a certain discrete object according to a certain determined constraint condition, seeking the problem of a maximum solution or a minimum solution of a certain specific arrangement under a certain optimization criterion when the specific arrangement meeting the constraint condition exists, determining a unique combination mode of each historical object and the current object by using the maximum optimization algorithm, enabling the historical objects and the current object to correspond in a one-to-one mode (namely, after the combination mode is carried out according to the combination mode of the historical objects and the current object, in the combination mode, each historical object only exists in a unique object pair or only exists in an object pair corresponding to the historical object relative to the same historical object, and the current object uniquely corresponding to the historical object can be indicated by the unique object pair), optimizing the sum result of the comprehensive characteristic distances corresponding to each object pair in the combination mode, and obtaining the maximum optimization combination result based on each comprehensive characteristic distance, and determining a current object corresponding to each history object based on each object pair existing in the maximum optimization combination result, determining that the history object and the current object point to the same objective target (such as a moving person, an automobile and the like), and completing the association between the history object and the current object and the object tracking.

It should be understood that when the total number of the extracted historical objects is greater than or equal to the total number of the current objects, the current object uniquely corresponding to each historical object is determined according to the above contents, and when the total number of the extracted historical objects is less than the total number of the current objects, the historical object uniquely corresponding to each current object is determined approximately according to the above contents by taking the current object as a reference, so that the problem of tracking incompatibility caused by different identification numbers of the historical object and the current object is avoided, and the robustness of the object tracking method provided by the present disclosure is improved.

According to the object tracking method provided by the embodiment of the disclosure, after the content of the pixel level of the object included in the frame images in the continuous frame image set is obtained, the object is tracked based on the characteristics of multiple dimensions, so that the tracking quality of multi-target tracking operation can be improved.

In some optional implementations of this embodiment, the object tracking method further includes: determining the object pair with the comprehensive characteristic distance meeting the requirement of a preset threshold value as a relevant object pair, correspondingly, determining the current object corresponding to each historical object based on the maximum optimization combination result of each comprehensive characteristic distance, wherein the method comprises the following steps: and determining the current object corresponding to each historical object based on the maximum optimization combination result of the comprehensive characteristic distance of each associated object pair.

Specifically, after the comprehensive characteristic distance of each object pair is determined, each comprehensive characteristic distance and each object pair are screened according to a preset threshold requirement, so that the object pairs in which the comprehensive characteristic distance does not meet the preset requirement are removed, the workload of the subsequent maximum optimization combination result based on each comprehensive characteristic distance is reduced, the object pairs composed of the history objects with weak association and the current objects enter the process of obtaining the maximum optimization combination result and tracking the objects, the processing efficiency of the object tracking method can be improved, and the quality of object tracking can be improved.

In some optional implementation manners of the embodiment, determining, based on the maximum optimized combination result of the comprehensive feature distances, a current object corresponding to each of the historical objects includes: processing each comprehensive characteristic distance by using a Hungarian matching algorithm to generate a maximum optimization combination result; and determining the current object corresponding to each history object according to the corresponding relation between the history object and the current object indicated by the maximum optimization combination result.

Specifically, the Hungarian matching Algorithm (Hungarian Algorithm) is a combinatorial optimization Algorithm for solving a task assignment problem in polynomial time, the Hungarian Algorithm considers the status of each matching object as the same, the maximum matching is solved on the premise that each matching object cannot be equal in status, but a real object is the best matching which we want to find, so the real object has higher weight, and the matching result can be closer to the real situation on the basis, so the Hungarian matching Algorithm can be used for determining the ideal result of the assignment relationship between the historical object and the current object based on the characteristic comprehensive distances to obtain the maximum optimized combinatorial result close to the real situation.

Referring to fig. 3, fig. 3 is a flowchart of another object tracking method according to an embodiment of the disclosure, in which the process 300 includes the following steps:

step 301, extracting a history object and a current object from the history image and the current image respectively, and obtaining an object pair consisting of any history object and any current object.

Step 302, generating a comprehensive characteristic distance of the object pair based on the optical flow characteristic distance, the pixel point change characteristic distance and the center offset distance of the historical object and the current object in the object pair.

In this embodiment, after an object pair consisting of any history object and any current object is obtained, optical flow feature distances, pixel point change feature distances, and Center offset distances of the history object and the current object in the object pair are respectively extracted, and the obtained optical flow feature distances, pixel point change feature distances, and Center offset distances are spliced to obtain a comprehensive feature distance, where the Center offset distance may be predicted based on a Center offset neural network (Center Track), the Center offset is a two-dimensional vector and represents changes of a Center point in x and y directions, and the Center offset distance d3 may be determined by using the following formula:

d3＝1-IOU(g(mask(a)),mask(b))

where g is a transformation based on the center point offset, mask (a) is a history object, and mask (b) is a current object, and accordingly, d1, d2, and d3 may be added to obtain a composite feature distance.

Step 303, determining the current object corresponding to each historical object based on the maximum optimization combination result of each comprehensive characteristic distance.

The

above steps

301 and 303 are the same as the

steps

201 and 203 shown in fig. 2, and for the same contents, please refer to corresponding parts in the previous embodiment, which are not described herein again, and on the basis of the embodiment shown in fig. 2, in this embodiment, a center offset distance is further merged into the generation of the synthetic feature distance, so that robustness in the object tracking process can be further improved, and the tracking effect is improved.

In some optional implementation manners of this embodiment, in order to further improve the quality of the obtained object pair, the object pair whose offset does not meet the preset requirement (obviously does not meet the objective rule) may be eliminated by determining whether the offset between the historical object and the current object included in the object pair is reasonable, and the object tracking method further includes: acquiring a shooting time difference between the historical image and the current image; determining the offset speed parameter of the object pair according to the center offset distance and the shooting time difference; and rejecting the object pair in response to the offset speed parameter exceeding a preset speed threshold.

Specifically, after the shooting time corresponding to each of the historical object and the current object is obtained, the shooting time difference between the historical object and the current object is determined, and the offset speed parameter of the object pair is determined according to the center offset distance and the shooting time difference, wherein the offset speed parameter comprises an offset speed direction and an offset speed value, the object pair is removed when the offset speed value in the offset speed parameter exceeds a preset speed threshold, and the preset speed threshold can be correspondingly set corresponding to the offset speed direction.

For example, when the tracked object is a car traveling on a vertical road, a speed threshold value of a horizontal direction offset speed may be set to 30km/h, and when the horizontal speed of the car traveling on the vertical road is 40km/h, the speed is determined to be abnormal (in an objective case, a car traveling on the vertical road cannot generate a horizontal offset of 40 km/h), and then the probability that the car traveling on the vertical road and the current car traveling on the vertical road are determined to belong to the same objective target is low, and the car is rejected.

On the basis of any of the above embodiments, in order to reduce generation of object pairs with low reference value, avoid resource waste caused by excessive object pairs, and influence on subsequent generation of comprehensive feature distance and object tracking efficiency, after acquiring a history image and a current image, extracting a history object and a current object from the history object and the current image, respectively, extracting a first object feature of each history object and a second object feature of each current object, and associating the first object feature and the second object feature with similarity exceeding a preset confidence degree, and then correspondingly combining the history object and the current object corresponding to the first object feature and the second object feature with association relationship to form an object pair, so as to improve object pair quality.

In order to deepen understanding, the disclosure further provides a specific implementation scheme in combination with a specific application scenario, specifically as follows:

the history object and the current object are extracted from the history image (fig. 4-1) and the current image (fig. 4-2) from the same continuous frame image set, respectively, the history object A, B, C, D, E is extracted from the history image, and the current object A, B, C is extracted from the current image, respectively.

Object pairs (history object A-current object A), (history object A-current object B), (history object A-current object C), (history object B-current object A), (history object B-current object B), (history object B-current object C), (history object C-current object A), (history object C-current object B), (history object C-current object C), (history object D-current object A), (history object D-current object B), (history object D-current object C), (history object E-current object A), (history object E-current object B), and history object E-current object C) are obtained.

And determining the comprehensive characteristic distance of each object pair based on the optical flow characteristic distance, the pixel point change characteristic distance and the center offset distance of the historical object and the current object in each object pair.

According to the obtained characteristic distances of each object pair, determining the object pairs (the history object C-the current object A), (the history object D-the current object B), (the history object D-the current object C), (the history object E-the current object A), (the history object E-the current object B) and (the history object E-the current object C) with the characteristic distances meeting the requirement of a preset threshold value as the associated object pair.

And determining the combination forms of the maximum optimization combination result to be (historical object C-current object A), (historical object D-current object B) and (historical object E-current object C) based on the comprehensive characteristic distances corresponding to the association code object pairs, and determining that the current object corresponding to the historical object C is the current object A, the current object corresponding to the historical object D is the current object B and the current object corresponding to the historical object E is the current object C.

With further reference to fig. 5, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of an object tracking apparatus, which corresponds to the embodiment of the method shown in fig. 2, and which is particularly applicable to various electronic devices.

As shown in fig. 5, the object tracking apparatus 500 of the present embodiment may include: an object pair generation unit 501, an integrated feature generation unit 502, and an object tracking unit 503. The object pair generating unit 501 is configured to extract a history object and a current object from a history image and a current image, respectively, and obtain an object pair consisting of any history object and any current object, where the history image and the current image are from the same continuous frame image set; a comprehensive feature generating unit 502 configured to generate a comprehensive feature distance of the pair of objects based on optical flow feature distances and pixel point change feature distances of the current object and the historical object in the pair of objects; an object tracking unit 503 configured to determine a current object corresponding to each of the historical objects based on a maximum optimized combination result of each of the integrated feature distances.

In the present embodiment, the object tracking apparatus 500: the specific processing and the technical effects thereof of the object pair generating unit 501, the comprehensive feature generating unit 502 and the object tracking unit 503 can refer to the related descriptions of step 201 and step 203 in the corresponding embodiment of fig. 2, which are not described herein again.

In some optional implementations of the present embodiment, the object tracking apparatus 500 further includes: the association object pair screening unit is configured to determine an object pair with the comprehensive characteristic distance meeting the requirement of a preset threshold value as an association object pair; and the object tracking unit is further configured to determine a current object corresponding to each of the historical objects based on a maximum optimized combination result of the integrated feature distances of each of the associated object pairs.

In some optional implementations of this embodiment, the object tracking unit 503 includes: the maximum optimization combination result generation subunit is configured to process each comprehensive characteristic distance by using a Hungarian matching algorithm to generate a maximum optimization combination result; and the object tracking result generating subunit is configured to determine a current object corresponding to each history object according to the corresponding relationship between the history object and the current object indicated by the maximum optimization combination result.

In some optional implementations of the present embodiment, the integrated feature generating unit 502 is further configured to generate the integrated feature distance of the object pair based on the optical flow feature distances, pixel point change feature distances, and center offset distances of the current object and the historical object in the object pair.

In some optional implementations of the present embodiment, the object tracking apparatus 500 further includes: a time difference acquisition unit configured to acquire a capturing time difference between the history image and the current image; an offset speed determination unit configured to determine an offset speed parameter of the pair of objects according to the center offset distance and the photographing time difference; an object pair optimization unit configured to cull the object pair in response to the offset speed parameter exceeding a preset speed threshold.

In some optional implementations of this embodiment, the object pair generating unit 501 includes: an object extraction subunit configured to extract a history object and a current object from the history image and the current image, respectively; an object feature extraction subunit configured to extract a first object feature corresponding to each of the history objects and a second object feature of each of the current objects, respectively; and the object pair generation subunit is configured to combine the history object and the current object corresponding to the first object feature and the second object feature with the similarity exceeding a preset confidence threshold into an object pair.

The present embodiment exists as an apparatus embodiment corresponding to the method embodiment, and the object tracking apparatus provided in the present embodiment tracks the object based on the features of multiple dimensions simultaneously after acquiring the content of the pixel level of the object included in the frame images in the continuous frame image set, so as to improve the tracking quality of the multi-target tracking job.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 6 illustrates a schematic block diagram of an example electronic device 600 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the apparatus 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 performs the respective methods and processes described above, such as the object tracking method. For example, in some embodiments, the object tracking method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the object tracking method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the object tracking method by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service extensibility in the conventional physical host and Virtual Private Server (VPS) service. The server may also be divided into servers of a distributed system, or servers that incorporate a blockchain.

According to the technical scheme of the embodiment of the disclosure, after the content of the pixel level of the object included in the frame images in the continuous frame image set is obtained, the object is tracked based on the characteristics of multiple dimensions, and the tracking quality of multi-target tracking operation can be improved.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in this disclosure may be performed in parallel or sequentially or in a different order, as long as the desired results of the technical solutions provided by this disclosure can be achieved, and are not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. An object tracking method, comprising:

respectively extracting a history object and a current object from a history image and a current image, and obtaining an object pair consisting of any history object and any current object, wherein the history image and the current image are from the same continuous frame image set;

generating a comprehensive characteristic distance of the object pair based on optical flow characteristic distances and pixel point change characteristic distances of a historical object and a current object in the object pair;

and determining a current object corresponding to each historical object based on the maximum optimization combination result of each comprehensive characteristic distance.

2. The method of claim 1, further comprising:

determining the object pair with the comprehensive characteristic distance meeting the requirement of a preset threshold value as a relevant object pair; and

the determining a current object corresponding to each of the historical objects based on the maximum optimized combination result of each of the synthetic feature distances includes:

and determining the current object corresponding to each historical object based on the maximum optimization combination result of the comprehensive characteristic distance of each associated object pair.

3. The method of claim 1, wherein said determining a current object corresponding to each of said historical objects based on a maximum optimized combined result for each of said composite feature distances comprises:

processing each comprehensive characteristic distance by using a Hungarian matching algorithm to generate a maximum optimization combination result;

and determining the current object corresponding to each history object according to the corresponding relation between the history object and the current object indicated by the maximum optimization combination result.

4. The method of any of claims 1-3, wherein the generating a composite feature distance for the pair of objects based on optical flow feature distances and pixel point variation feature distances for a current object and a historical object in the pair of objects comprises:

and generating a comprehensive characteristic distance of the object pair based on the optical flow characteristic distance, the pixel point change characteristic distance and the center offset distance of the historical object and the current object in the object pair.

5. The method of claim 4, further comprising:

acquiring a shooting time difference between the historical image and the current image;

determining the offset speed parameter of the object pair according to the center offset distance and the shooting time difference;

and rejecting the object pair in response to the offset speed parameter exceeding a preset speed threshold.

6. The method of claim 1, wherein said extracting historical objects and current objects from the historical image and current image, respectively, and obtaining an object pair consisting of any one of the historical objects and any one of the current objects comprises:

extracting a history object and a current object from the history image and the current image respectively;

respectively extracting a first object feature corresponding to each history object and a second object feature of each current object;

and forming object pairs by the historical objects and the current objects corresponding to the first object features and the second object features with the similarity exceeding a preset confidence threshold.

7. An object tracking device, comprising:

an object pair generation unit configured to extract a history object and a current object from a history image and a current image, respectively, and obtain an object pair composed of any one of the history objects and any one of the current objects, wherein the history image and the current image are from the same continuous frame image set;

a comprehensive feature generating unit configured to generate a comprehensive feature distance of the object pair based on optical flow feature distances and pixel point change feature distances of a current object and a history object in the object pair;

and the object tracking unit is configured to determine a current object corresponding to each historical object based on the maximum optimization combination result of each comprehensive characteristic distance.

8. The apparatus of claim 7, further comprising:

the association object pair screening unit is configured to determine an object pair with the comprehensive characteristic distance meeting a preset threshold value requirement as an association object pair; and

the object tracking unit is further configured to determine a current object corresponding to each of the historical objects based on a maximum optimized combination result of the integrated feature distances of each of the associated object pairs.

9. The apparatus of claim 7, wherein the object tracking unit comprises:

the maximum optimization combination result generation subunit is configured to process each comprehensive characteristic distance by using a Hungarian matching algorithm to generate a maximum optimization combination result;

an object tracking result generating subunit configured to determine a current object corresponding to each of the history objects according to the correspondence between the history object and the current object indicated by the maximum optimization combining result.

10. The apparatus of any of claims 7-9, wherein the integrated feature generation unit is further configured to generate an integrated feature distance for the pair of objects based on optical flow feature distances, pixel point change feature distances, and center offset distances of a current object and a historical object in the pair of objects.

11. The apparatus of claim 10, further comprising:

a time difference acquisition unit configured to acquire a shooting time difference between the history image and the current image;

a shift speed determination unit configured to determine a shift speed parameter of the object pair from the center shift distance and the photographing time difference;

an object pair optimization unit configured to cull the object pairs in response to the offset speed parameter exceeding a preset speed threshold.

12. The apparatus of claim 7, wherein the object pair generation unit comprises:

an object extraction subunit configured to extract a history object and a current object from the history image and the current image, respectively;

an object feature extraction subunit configured to extract a first object feature corresponding to each of the history objects and a second object feature of each of the current objects, respectively;

and the object pair generation subunit is configured to combine the history object and the current object corresponding to the first object feature and the second object feature with the similarity exceeding a preset confidence threshold into an object pair.

13. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the object tracking method of any one of claims 1-6.

14. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the object tracking method of any one of claims 1-6.

15. A computer program product comprising a computer program which, when executed by a processor, implements an object tracking method according to any one of claims 1-6.