WO2022033306A1

WO2022033306A1 - Target tracking method and apparatus

Info

Publication number: WO2022033306A1
Application number: PCT/CN2021/108893
Authority: WO
Inventors: 李亚学
Original assignee: 深圳市道通智能航空技术股份有限公司
Priority date: 2020-08-12
Filing date: 2021-07-28
Publication date: 2022-02-17
Also published as: CN112037255A

Abstract

The embodiments of the present invention relate to a target tracking method and apparatus. The embodiments comprise: according to an appearance feature of a tracking target and by means of a preset tracking algorithm, determining, from an image frame, a target area in which the tracking target is located; obtaining multiple object areas from the image frame by means of a preset deep learning algorithm; selecting, from among the multiple object areas, an object area with the same object attribute as the target area, and using same as a target object area; and adjusting the position and size of the target area according to the target object area, so as to generate an optimized target area. In the present invention, on the basis of an original tracking algorithm, optimization and adjustment are performed in combination with a detection result of deep learning, such that interference in a complicated environment can be better adapted to and resisted, thereby effectively improving the overall performance of target tracking.

Description

Target tracking method and device

This application claims the priority of the Chinese patent application with the application number 202010805592X and the application title "Target Tracking Method and Device" filed with the China Patent Office on August 12, 2020, the entire contents of which are incorporated into this application by reference.

【Technical field】

The invention relates to the technical field of machine vision, in particular to a target tracking method, device, image processing chip and unmanned aerial vehicle.

【Background technique】

"Object tracking" is a technique of predicting the size and position of a target object in subsequent image frames of a video sequence given the target size and position of the initial image frame of the video sequence. It has a wide range of applications in many fields such as video surveillance, human-computer interaction and multimedia analysis.

However, in the actual application process, the tracked target is prone to change in shape due to non-rigid motion, and is subject to illumination transformation and background environment interference.

Therefore, how to avoid the interference of irrelevant factors in the video sequence to the tracking target and improve the performance of target tracking to meet and adapt to the complex and changeable practical application situation is an urgent problem to be solved.

[Content of the invention]

The embodiments of the present invention aim to provide a target tracking method, a device, an image processing chip and an unmanned aerial vehicle, which can solve the defects of the existing target tracking method.

To solve the above technical problems, the embodiments of the present invention provide the following technical solutions: a target tracking method. The method includes:

According to the appearance characteristics of the tracking target, the target area where the tracking target is located is determined in the image frame through a preset tracking algorithm;

In the image frame, several object regions are obtained through a preset deep learning algorithm;

In the several object regions, select the object region with the same object attribute as the target region as the target object region;

According to the target object area, the position and size of the target area are adjusted to generate an optimized target area.

Optionally, in the initial image frame, through the deep learning algorithm, several optional object regions are identified and obtained, and each of the object regions is marked with a corresponding object label;

Determine the object area selected by the user as the tracking target.

Optionally, in the several object regions, selecting an object region with the same object attribute as the target region as the target object region, specifically including:

Selecting the same object region as the object label of the tracking target as the candidate object region;

Among the candidate object regions, the candidate object region with the largest overlap with the tracking result of the previous image frame is selected as the target object region.

Optionally, the selection of the candidate object region with the largest overlap with the target region as the target object region specifically includes:

Calculate the intersection area and union area of the candidate object area and the target area;

The ratio of the intersection area and the union area is used as the degree of overlap between the candidate object area and the target object area.

Optionally, the representational features include: a gradient direction histogram, a local binary pattern, and a color feature.

Optionally, generating an optimized target area by adjusting the position and size of the target area according to the target object area specifically includes:

setting the first weight of the target object area and the second weight of the target area;

According to the first weight and the second weight, the center point of the target object area and the center point of the target area are weighted and summed to obtain the center point of the optimized target area; the target object The position of the area is represented by the center point of the target object area, the position of the target area is represented by the center point of the target area, the position of the optimized target area is represented by the center point of the optimized target area, and

According to the first weight and the second weight, weighted summation is performed on the size of the target object area and the target area to obtain the size of the optimized target area.

Optionally, the center point of the optimized target area is calculated and obtained by the following formula:

center_x_opt=λ*center_x_track+(1-λ)*center_x_detect;

center_y_opt=λ*center_y_track+(1-λ)*center_y_detect;

Among them, center_x_opt is the coordinate of the center point of the optimized target area in the horizontal direction of the image frame, center_y_opt is the coordinate of the center point of the optimized target area in the vertical direction of the image frame; center_x_track is the center point of the target area in the image frame The coordinates in the horizontal direction, center_y_track is the coordinates of the center point of the target area in the vertical direction of the image frame, center_x_detect is the coordinates of the center point of the target object area in the horizontal direction of the image frame, center_y_detect is the center point of the target object area Coordinates in the vertical direction of the image frame, λ is the second weight.

Optionally, the size of the optimized target area is calculated and obtained by the following formula:

width_opt=λ*width_track+(1-λ)*width_detect;

height_opt=λ*height_track+(1-λ)*height_detect;

Among them, width_opt is the width of the optimized target area, width_track is the width of the target area, width_detect is the width of the target object area, height_opt is the height of the optimized target area, height_track is the height of the target area, height_detect is the height of the target object area, λ is the second weight.

In order to solve the above technical problems, the embodiments of the present invention also provide the following technical solutions: a target tracking device, comprising:

a target tracking module, used for determining the target area where the tracking target is located in the image frame through a preset tracking algorithm according to the apparent feature of the tracking target;

A deep learning recognition module, used to obtain several object regions in the image frame through a preset deep learning algorithm;

a selection module, configured to select, among the several object regions, an object region with the same object attribute as the target region as the target object region;

The optimization module is configured to adjust the position and size of the target area according to the target object area to generate an optimized target area.

In order to solve the above technical problems, the embodiments of the present invention further provide the following technical solutions: an image processing chip, comprising: a processor and a memory communicatively connected to the processor; the memory stores computer program instructions, and the computer Program instructions, when invoked by the processor, cause the processor to perform the object tracking method described above.

In order to solve the above-mentioned technical problems, the embodiments of the present invention also provide the following technical solutions: an unmanned aerial vehicle, comprising: an unmanned aerial vehicle main body, an image acquisition device and an image processing chip installed on the gimbal of the unmanned aerial vehicle main body;

The image acquisition device is used to continuously collect multiple frames of images; the image processing chip is used to receive the multiple frames of images continuously collected by the image acquisition device, and to perform the above-mentioned target tracking on the received multiple frames of images method to realize the tracking of the tracking target.

Compared with the prior art, the target tracking method of the embodiment of the present invention is optimized and adjusted based on the original tracking algorithm and combined with the detection results of deep learning, which can better adapt to and resist interference in complex environments, and is effective. Improves the overall performance of object tracking.

【Description of drawings】

One or more embodiments are exemplified by the pictures in the corresponding drawings, and these exemplifications do not constitute limitations of the embodiments, and elements with the same reference numerals in the drawings are denoted as similar elements, Unless otherwise stated, the figures in the accompanying drawings do not constitute a scale limitation.

1 is a schematic diagram of an application scenario of a target tracking method according to an embodiment of the present invention;

2 is a schematic diagram of a target tracking apparatus provided by an embodiment of the present invention;

3 is a method flowchart of a target tracking method provided by an embodiment of the present invention;

4 is a method flowchart of a method for selecting a target object region provided by an embodiment of the present invention;

5 is a schematic diagram of an application example of a target tracking method provided by an embodiment of the present invention;

FIG. 6 is a schematic structural diagram of an image processing chip provided by an embodiment of the present invention.

【detailed description】

In order to facilitate understanding of the present invention, the present invention will be described in more detail below with reference to the accompanying drawings and specific embodiments. It should be noted that when an element is referred to as being "fixed to" another element, it can be directly on the other element, or one or more intervening elements may be present therebetween. When an element is referred to as being "connected" to another element, it can be directly connected to the other element or one or more intervening elements may be present therebetween. The terms "upper", "lower", "inner", "outer", "bottom" and other terms used in this specification indicate the orientation or positional relationship based on the orientation or positional relationship shown in the accompanying drawings, and are only for the convenience of describing the present invention. The invention and simplified description do not indicate or imply that the device or element referred to must have a particular orientation, be constructed and operate in a particular orientation, and therefore should not be construed as limiting the invention. Furthermore, the terms "first," "second," "third," etc. are used for descriptive purposes only and should not be construed to indicate or imply relative importance.

Unless otherwise defined, all technical and scientific terms used in this specification have the same meaning as commonly understood by one of ordinary skill in the technical field of the present invention. The terms used in the description of the present invention in this specification are only for the purpose of describing specific embodiments, and are not used to limit the present invention. As used in this specification, the term "and/or" includes any and all combinations of one or more of the associated listed items.

In addition, the technical features involved in the different embodiments of the present invention described below can be combined with each other as long as they do not conflict with each other.

In a video sequence consisting of a series of consecutive image frames, the traditional object tracking process includes generating candidate samples, feature extraction, scoring candidate samples using the observation model, updating the observation model to adapt to changes in the target, and fusion to obtain the final decision result and so on.

Among them, "feature extraction" refers to the process of extracting discriminative features to represent the target. Extracting and obtaining discriminative features is the basis of candidate sample scoring and is the key to determining the performance of target tracking. Most of the existing improvements on the performance of target tracking methods focus on how to select appropriate features.

However, the target tracking method provided by the embodiment of the present invention can overcome the interference of occlusion, deformation, background noise, scale transformation, etc. target tracking performance.

FIG. 1 is an application scenario of a target tracking method provided by an embodiment of the present invention. As shown in FIG. 1 , in this application scenario, a drone 10 equipped with an aerial camera, an intelligent terminal 20 and a wireless network 30 are included.

The drone 10 may be any type of powered unmanned aerial vehicle, including but not limited to quadcopter drones, fixed-wing aircraft, and helicopter models. It can have the corresponding volume or power according to the needs of the actual situation, so as to provide the load capacity, flight speed and flight cruising range that can meet the needs of use.

The UAV 10 may be equipped with any type of image capture device, including a motion camera, a high-definition camera, or a wide-angle camera. As one of the functional modules carried on the UAV, it can be installed and fixed on the UAV by installing a fixed bracket such as a gimbal, and is controlled by the UAV 10 to perform the task of image acquisition.

Of course, one or more functional modules can also be added to the UAV, so that the UAV can realize the corresponding functions, such as the built-in main control chip, which is used as the control core of UAV flight and data transmission, or a picture transmission device. , and upload the acquired image information to a device (such as a server or an intelligent terminal) that establishes a connection with the drone.

The smart terminal 20 may be any type of smart device used to establish a communication connection with the drone, such as a mobile phone, a tablet computer, or a smart remote control. The smart terminal 20 may be equipped with one or more different user interaction devices to collect user instructions or display and feed back information to the user.

These interactive devices include, but are not limited to: buttons, display screens, touch screens, speakers, and remote control joysticks. For example, the smart terminal 20 may be equipped with a touch display screen, through which the user's remote control instructions for the drone are received, and the image information obtained by the aerial camera is displayed to the user through the touch display screen. The touch screen switches the image information currently displayed on the display.

In some embodiments, the existing image vision processing technology may also be integrated between the drone 10 and the intelligent terminal 20 to further provide more intelligent services. For example, the drone 10 can collect images through an aerial camera, and then the intelligent terminal 20 executes the target tracking method provided by the embodiment of the present invention to track a specific face in the video, and finally realizes the communication between the user and the drone. Human-computer interaction.

In other embodiments, the target tracking method can also be executed by the drone 10 or an external server, and the final data result can be directly provided to the intelligent terminal 20 .

The wireless network 30 can be a wireless communication network based on any type of data transmission principle for establishing a data transmission channel between two nodes, such as a Bluetooth network, a WiFi network, a wireless cellular network or a combination thereof located in a specific signal frequency band, to achieve Data transmission between the drone 10, the smart terminal 20 and/or the server.

FIG. 2 is a structural block diagram of a target tracking apparatus provided by an embodiment of the present invention. The target tracking device can be executed by any suitable type of electronic computing platform, such as an image processing chip built in the drone, a server or an intelligent terminal that establishes a wireless communication connection with the drone. In this embodiment, the composition of the target tracking device is described in the form of functional modules.

Those skilled in the art can understand that the functional modules shown in FIG. 2 can be selectively implemented by software, hardware or a combination of software and hardware according to actual needs. For example, it may be implemented by the processor calling an associated software application stored in memory.

As shown in FIG. 2 , the target tracking device 200 includes: a target tracking module 210 , a deep learning identification module 220 , a selection module 230 and an optimization module 240 .

The target tracking module 210 is configured to determine the target area where the tracking target is located in the image frame by using a preset tracking algorithm according to the appearance characteristics of the tracking target.

"Representational features" refers to some hand-designed discriminative features used in traditional object tracking methods. It has the characteristics of fast operation speed and small delay. Specifically, the representation features include gradient direction histograms, local binary patterns and color features. The target tracking module 210 is a functional module for executing traditional target tracking methods, and specifically, any suitable type of tracking algorithm can be selected.

The deep learning identification module 220 is configured to obtain several object regions in the image frame through a preset deep learning algorithm.

"Deep learning" is a method of image recognition using deep neural networks trained on sample data. Through deep learning, multiple different object regions can be identified in the image frame. Each object area represents a specific object.

The selection module 230 is configured to select, among the several object regions, an object region with the same object attribute as the target region as the target object region.

"Object attribute" refers to the metric used to judge whether the object area and the target area belong to the same object. Specifically, it may be composed of one or more indicators or conditions, and basically, object regions belonging to the same target object can be determined or selected.

In some embodiments, the selection module 230 is specifically configured to select an object region that is the same as the object label of the tracking target as a candidate object region; and in the candidate object region, select the tracking result of the previous image frame The candidate object region with the largest overlap degree is used as the target object region.

The optimization module 240 is configured to adjust the position and size of the target area according to the target object area to generate an optimized target area.

The "target object area" is an area obtained based on deep learning, which has a relatively good resistance to interference factors in complex environments and is not easily interfered. Therefore, on the basis of the traditional tracking algorithm, the target object region is introduced as a reference, so that the effect of the tracking result can be optimized, and an optimized target region can be generated.

In some embodiments, the optimization module 240 is specifically configured to: set a first weight of the target object area and a second weight of the target area;

According to the first weight and the second weight, the center point of the target object area and the center point of the target area are weighted and summed to obtain the center point of the optimized target area; and according to the The first weight and the second weight are weighted summation of the size of the target object area and the target area to obtain the size of the optimized target area.

The position of the target object area is represented by the center point of the target object area, the position of the target area is represented by the center point of the target area, and the position of the optimized target area is represented by the optimized target area. The center point of the area is represented.

Specifically, on the one hand, the center point of the optimized target area can be calculated by the following formula:

center_x_opt=λ*center_x_track+(1-λ)*center_x_detect;

center_y_opt=λ*center_y_track+(1-λ)*center_y_detect;

On the other hand, the size of the optimized target area can be calculated by the following formula:

width_opt=λ*width_track+(1-λ)*width_detect;

height_opt=λ*height_track+(1-λ)*height_detect;

Although, in the application scenario shown in Figure 1, the image acquisition device applied to the UAV is taken as an example. However, those skilled in the art can understand that the target tracking method can also be used in other types of scenarios and devices to improve the performance of the target tracking algorithm. The target tracking method disclosed in the embodiment of the present invention is not limited to be applied to the UAV shown in FIG. 1 .

In some embodiments, as shown in FIG. 2 , the target tracking device 200 may further include a marking module 250 . The marking module 250 is used to identify and obtain several selectable object regions through the deep learning algorithm in the initial image frame, and determine the object region selected by the user as the tracking target. Wherein, each of the object regions is marked with a corresponding object label.

Among them, the "object label" (label) is the output of the deep learning algorithm, which is used to mark the object corresponding to the object area. The specific form of the object label depends on the deep learning algorithm used and its training data.

Based on the characteristics of deep learning algorithms, multiple object regions with the same object label may be identified in an image frame. Therefore, the target object to be tracked can be selected by the user as a candidate object.

FIG. 3 is a method flowchart of a target tracking method provided by an embodiment of the present invention. As shown in Figure 3, the target tracking method may include the following steps:

310. Determine, in the image frame, a target area where the tracking target is located by using a preset tracking algorithm according to the apparent feature of the tracking target.

The "image frame" refers to a certain frame of image being processed in the video sequence. The tracking algorithm takes the video sequence composed of continuous image frames as the processing object, and predicts and tracks the position of the target in the image frame by frame.

The tracking algorithm can use any type of fast tracking algorithm in the prior art, which takes the appearance feature as the discriminating feature of the tracking target. Specifically, the appearance feature includes but is not limited to histogram of gradient orientation (HOG), local binary pattern (LBP), color feature, and the like.

Different representational features have their own unique advantages and disadvantages. For example, the gradient direction histogram is easily disturbed by the non-rigid body motion of the target (such as the movement of the person who is the tracking target from standing to squatting) and occlusion. The color features are easily affected by changes in the lighting environment.

The target area refers to a rectangular frame with a specific size in the image frame that is calculated and output by the "tracking algorithm" and contains the tracking target. Specifically, it can be calculated and obtained by any type of tracking algorithm.

320. In the image frame, obtain several object regions through a preset deep learning algorithm.

Among them, the "deep learning algorithm" can be of any type, using sample data to realize the image processing method of the neural network model. Through deep learning algorithms, multiple objects present in an image frame can be obtained with high confidence.

Similar to the output form of the target area, the output of the deep learning algorithm is also a rectangular box containing recognizable objects. Of course, the deep learning algorithm also outputs the object label corresponding to each object area, which is used to mark the target object (such as a face, an airplane, etc.) corresponding to the object area.

330. Among the several object regions, select an object region with the same object attribute as the target region as the target object region.

Among them, deep learning algorithms often identify more than one object region in an image frame. Therefore, it is necessary to adopt an appropriate screening method to select one of the most reliable object regions to introduce into the tracking results.

The specific filtering method used depends on the actual object properties used. The technical personnel can choose to use one or more measurement standards as the actual object attributes according to the needs of the actual situation.

In some embodiments, the steps shown in FIG. 4 may be specifically adopted to select and obtain a target object region from a plurality of object regions as a reference for adjustment and optimization:

331. Select an object region that is the same as the object label of the tracking target as a candidate object region.

Among them, the "object label" (label) is output by the deep learning algorithm and used to mark the object corresponding to the object area. The exact form of object labels depends on the deep learning algorithm used and its training data.

332. From the candidate object regions, select the candidate object region with the largest overlap with the tracking result of the previous image frame as the target object region.

Among them, due to the characteristics of the deep learning algorithm, multiple object regions with the same object label may be identified in the image frame. Therefore, after the screening of object labels, there may still be more than one candidate object region.

At this time, the degree of overlap between the candidate region and the tracking result of the previous image frame can be used as a criterion, and a target object region that can be used as an adjustment reference and standard can be further selected.

In this embodiment, the object label and the degree of overlap are used to judge the attributes of the object, so that the real tracking target can be found from the output result of the deep learning algorithm with certainty, and this can be used as the basis for adjusting and optimizing the tracking result. Reference.

340. According to the target object area, adjust the position and size of the target area to generate an optimized target area.

Among them, "adjustment" refers to using any suitable type of function mapping method to integrate the target object area and the target area, and by adjusting the position and size, generate and output an optimization result, that is, the optimized target area. Forms of adjustment include changing and optimizing the position and size of the rectangular box representing the target area in the image frame.

In practice, the tracking target can be tracked by linking the optimized target areas of each image frame in a series of continuous image frame sequences, and the changes of the position and size of the tracking target in the image frame sequence can be determined.

In this embodiment, on the basis of the original tracking algorithm, through a reasonable screening method, combined with the detection results of deep learning to optimize and adjust, it can better resist the interference of complex environments on the tracking results, and effectively improve the The overall performance of object tracking.

The specific process of the target tracking will be described in detail below with reference to the schematic diagram of the actual application of the target tracking method in the video sequence shown in FIG. 5 .

As shown in FIG. 5 , it is assumed that the video sequence is composed of n consecutive image frames (the target tracking method provided by the embodiment of the present invention is sequentially performed in the 2nd to the nth image frame, and the predicted tracking target is in each image frame. position.) The tracking result, the object area and the target area can all be represented by the smallest enclosing rectangle containing the target object.

First, in the initial image frame, the target area where the tracking target is located is determined according to the user's instruction.

The initial image frame refers to the starting point of target tracking. For simplicity of presentation, in this embodiment, the initial image frame is the first image frame in the video sequence. Of course, according to the actual needs of the user, any image frame in the video sequence can also be randomly selected as the initial image frame, which is used as the starting point of target tracking.

As shown in Figure 5, in the actual use process, in order to facilitate interaction with the user and subsequent processing, a deep learning algorithm can be used to identify and detect in the initial image frame, and several optional object regions (each of the object regions) can be obtained. are marked with corresponding object labels, denoted by L1 to L4 in Figure 5) as optional tracking targets. The user can issue a corresponding user selection instruction according to his own needs, and select one of these optional object areas as the tracking target (eg L4).

Then, in the subsequent image frames, from the multiple object regions detected by the deep learning algorithm, a target object region with the same object attribute as the tracking target is selected.

Specifically, for the object region obtained by detection, first select the object region that is the same as the object label of the tracking target of the initial image frame as the candidate object region. Then, the object area with the highest degree of overlap with the target area D of the previous image frame is used as the target object area.

Among them, the degree of overlap between the two rectangular boxes can be represented by the intersection ratio between the object area and the target area. As shown in Figure 5, the "intersection over union ratio" (IoU) refers to the ratio between the intersection and union of two regions of interest. The calculation process is specifically: calculating the intersection area and union area of the candidate object area and the target area, and using the ratio of the intersection area and the union area as the candidate object area and the target object. The overlap of the regions.

Finally, by combining the selected target object area with the target area detected by the tracking algorithm, the optimized target area in the image frame can be obtained.

As long as the above steps are repeatedly performed in subsequent image frames, the target tracking result in a video sequence composed of a series of consecutive image frames can be obtained. To sum up, the complete target tracking method can be expressed by the following formula (1):

Among them, Bbox represents the tracking target, and the subscripts track and detect represent the target area obtained by the tracking algorithm and the object area obtained by the deep learning algorithm, respectively. The object area may contain multiple, use them separately

Representation (j is the serial number of the object area).

When i=0, it represents the tracking target selected in the initial image frame; when i>0, it represents the tracking result in the subsequent image frame. in,

Represents the target area in the i-th image frame,

represents the optimized tracking result of the i-th image frame,

Represents the target object area (that is, the object area with the same object label as the target area of the previous frame and the largest overlap).

The specific integration method for the tracking algorithm and the deep learning algorithm is as follows: first, the first weight of the target object area and the second weight of the target area are set. Then, according to the first weight and the second weight, the center point of the target object area and the center point of the target area are weighted and summed to obtain the optimized center point of the target area and according to the The first weight and the second weight are weighted and summed to the size of the target object area and the target area to obtain the size of the optimized target area.

The position of the target object area is represented by the center point of the target object area, the position of the target area is represented by the center point of the target area, and the position of the optimized target area is represented by the optimized target area. The center point of the area indicates that the first weight and the second weight of the target object area and the target area can be preset by technicians according to the actual situation, and are constant values that can be determined experimentally or empirically. In this embodiment, the confidence of the target area may be used as the weight (second weight) occupied by the tracking algorithm. Correspondingly, the weight occupied by the deep learning algorithm (ie, the first weight) is 1 minus the confidence of the target area.

Specifically, the position of the optimized target area is obtained by calculating the following formulas (2) and (3):

center_x_opt=λ*center_x_track+(1-λ)*center_x_detect (2);

center_y_opt=λ*center_y_track+(1-λ)*center_y_detect (3);

where λ is the confidence level. Moreover, in this embodiment, the optimized target area, the object area, and the target area are all represented by a rectangular frame. Therefore, the optimized target area, the object area, and the position of the target area can be represented by the position coordinates of the center of the rectangular frame in the image frame. That is, the position of the optimized target area can be expressed as (center_x_opt, center_y_opt), the position of the object area can be expressed as (center_x_detect, center_y_detect), and the position of the target area can be expressed as (center_x_track, center_y_track).

The size of the optimized target area can be calculated by the following formulas (4) and (5):

width_opt=λ*width_track+(1-λ)*width_detect (4);

height_opt=λ*height_track+(1-λ)*height_detect (5);

Among them, width_opt is the width of the optimized target area, width_track is the width of the target area, width_detect is the width of the target object area, height_opt is the height of the optimized target area, height_track is the height of the target area, height_detect is the height of the target object area, λ is the confidence level.

As described above, on the basis of the original tracking algorithm, the target tracking method provided by the embodiment of the present invention is optimized and adjusted in combination with the detection results of deep learning, which can better adapt to and resist interference in complex environments, and effectively improve the overall performance of target tracking.

An embodiment of the present invention further provides a non-volatile computer storage medium, where the computer storage medium stores at least one executable instruction, and the computer-executable instruction can execute the target tracking method in any of the foregoing method embodiments.

FIG. 6 shows a schematic structural diagram of an image processing chip according to an embodiment of the present invention. The specific embodiment of the present invention does not limit the specific implementation of the image processing chip.

As shown in FIG. 6 , the image processing chip may include: a processor (processor) 602 , a communication interface (Communications Interface) 604 , a memory (memory) 606 , and a communication bus 608 .

The processor 602 , the communication interface 604 , and the memory 606 communicate with each other through the communication bus 608 . The communication interface 604 is used to communicate with network elements of other devices such as clients or other servers. The processor 602 is configured to execute the program 610, and specifically may execute the relevant steps in the above-mentioned embodiments of the target tracking method.

Specifically, the program 610 may include program code including computer operation instructions.

The processor 602 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present invention. The one or more processors included in the network slicing device may be the same type of processors, such as one or more CPUs; or may be different types of processors, such as one or more CPUs and one or more ASICs.

The memory 606 is used to store the program 610 . Memory 606 may include high-speed RAM memory, and may also include non-volatile memory, such as at least one disk memory.

Program 610 may be used to cause processor 602 to perform the following steps:

First, according to the appearance characteristics of the tracking target, through a preset tracking algorithm, the target area where the tracking target is located is determined in the image frame. Secondly, in the image frame, several object regions are obtained through a preset deep learning algorithm. Thirdly, among the several object regions, an object region with the same object attribute as the target region is selected as the target object region. Finally, according to the target object area, an optimized target area is generated by adjusting the position and size of the target area.

In some embodiments, the program 610 is further configured to cause the processor 602 to perform the following steps before executing the step of determining the target area where the tracking target is located in the image frame by using a preset tracking algorithm: first, in the initial image In the frame, several selectable object regions are identified and obtained through the deep learning algorithm, and each of the object regions is marked with a corresponding object label. Then, the object area selected by the user is determined as the tracking target.

In some embodiments, the program 610 may be used to cause the processor 602 to perform the step of selecting an object region with the same object attribute as the target region among the several object regions as the target object region, specifically Used for:

Select the object area with the same object label as the tracking target as the candidate object area; in the candidate object area, select the candidate object area with the largest overlap with the tracking result of the previous image frame as the target object area.

In some embodiments, the program 610 may be configured to cause the processor 602 to, when performing the step of selecting the candidate object region with the largest overlap with the target region as the target object region, be specifically configured to: calculate the candidate object region The intersection area and the union area of the object area and the tracking result; the ratio of the intersection area and the union area is taken as the degree of overlap between the candidate object area and the tracking result.

In some embodiments, the program 610 may be used to cause the processor 602 to perform the step of generating an optimized target area by adjusting the position and size of the target area according to the target object area, specifically using At:

Set the first weight of the target object area and the second weight of the target area; according to the first weight and the second weight, the center point of the target object area and the center point of the target area are Perform weighted summation to obtain the center point of the optimized target area; the position of the target object area is represented by the center point of the target object area, and the position of the target area is represented by the center point of the target area, The position of the optimized target area is represented by the center point of the optimized target area, and according to the first weight and the second weight, a weighted calculation is performed on the size of the target object area and the target area. and, to obtain the size of the optimized target area.

Specifically, the processor 602 can obtain the center point of the optimized target area by calculating the following formula:

center_x_opt=λ*center_x_track+(1-λ)*center_x_detect;

center_y_opt=λ*center_y_track+(1-λ)*center_y_detect;

The processor 602 can also obtain the size of the optimized target area by calculating the following formula:

width_opt=λ*width_track+(1-λ)*width_detect;

height_opt=λ*height_track+(1-λ)*height_detect;

Those skilled in the art should further realize that each step of the exemplary target tracking method described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, computer software or a combination of the two, for the purpose of clear illustration Interchangeability of hardware and software, the above description has generally described the components and steps of each example in terms of functions. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution.

Skilled artisans may use different methods of implementing the described functionality for each particular application, but such implementations should not be considered beyond the scope of the present invention. The computer software can be stored in a computer-readable storage medium, and when the program is executed, it can include the processes of the above-mentioned method embodiments. Wherein, the storage medium can be a magnetic disk, an optical disk, a read-only storage memory, or a random storage memory, and the like.

Based on the image processing chip provided by the embodiment of the present invention, the embodiment of the present invention further provides an unmanned aerial vehicle. Wherein, the unmanned aerial vehicle comprises: an unmanned aerial vehicle main body, an image acquisition device and an image acquisition chip installed on the gimbal of the unmanned aerial vehicle main body.

The image acquisition device is used to continuously collect multiple frames of images; the image processing chip is used to receive the multiple frames of images continuously collected by the image acquisition device, and to perform the following steps on the received multiple frames of images:

Based on the optimized target area determined by the image acquisition chip of the UAV in multiple consecutive image frames, the UAV can track the tracking target.

In some embodiments, before the step of determining the target area where the tracking target is located in the image frame, the image processing chip further performs the following steps:

First, in the initial image frame, through the deep learning algorithm, several selectable object regions are identified and obtained, and each of the object regions is marked with a corresponding object label. Then, the object area selected by the user is determined as the tracking target.

In some embodiments, when the image processing chip performs the step of selecting an object region with the same object attribute as the target region from among the several object regions as the target object region, the image processing chip is specifically configured to:

In some embodiments, when the image processing chip performs the step of selecting the candidate object region with the largest overlap with the target region as the target object region, the image processing chip is specifically configured to: calculate the difference between the candidate object region and the target object region. The intersection area and the union area of the tracking results; the ratio of the intersection area and the union area is used as the overlap between the candidate object area and the tracking result.

In some embodiments, when the image processing chip performs the step of generating an optimized target area by adjusting the position and size of the target area according to the target object area, the image processing chip is specifically configured to:

Specifically, the image processing chip can obtain the center point of the optimized target area by calculating the following formula:

center_x_opt=λ*center_x_track+(1-λ)*center_x_detect;

center_y_opt=λ*center_y_track+(1-λ)*center_y_detect;

The image processing chip can also obtain the size of the optimized target area by calculating the following formula:

width_opt=λ*width_track+(1-λ)*width_detect;

height_opt=λ*height_track+(1-λ)*height_detect;

Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; under the idea of the present invention, the technical features in the above embodiments or different embodiments can also be combined, The steps may be carried out in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity; although the invention has been The skilled person should understand that it is still possible to modify the technical solutions recorded in the foregoing embodiments, or to perform equivalent replacements on some of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the implementation of the present invention. scope of technical solutions.

Claims

A target tracking method, comprising:

According to the appearance characteristics of the tracking target, the target area where the tracking target is located is determined in the image frame through a preset tracking algorithm;

In the image frame, several object regions are obtained through a preset deep learning algorithm;

In the several object regions, select the object region with the same object attribute as the target region as the target object region;

According to the target object area, an optimized target area is generated by adjusting the position and size of the target area.
The method according to claim 1, characterized in that, before the step of determining the target area where the tracking target is located in the image frame by using a preset tracking algorithm according to the appearance characteristics of the tracking target, the method further comprises: :

In the initial image frame, through the deep learning algorithm, several selectable object regions are identified and obtained, and each of the object regions is marked with a corresponding object label;

Determine the object area selected by the user as the tracking target.
The method according to claim 2, wherein, in the several object regions, selecting an object region with the same object attribute as the target region as the target object region, specifically comprising:

Screening and selecting the same object region as the object label of the tracking target as the candidate object region;

Among the candidate object regions, the candidate object region with the largest overlap with the tracking result of the previous image frame is selected as the target object region.
The method according to claim 3, wherein the selecting the candidate object region with the largest overlap with the target region as the target object region specifically includes:

Calculate the intersection area and union area of the candidate object area and the tracking result;

The ratio of the intersection area and the union area is taken as the degree of overlap between the candidate object area and the tracking result.
The method according to claim 1, wherein the appearance features include: gradient direction histogram, local binary pattern and color feature.
The method according to any one of claims 1-5, wherein, according to the target object area, by adjusting the position and size of the target area, an optimized target area is generated, specifically including:

setting the first weight of the target object area and the second weight of the target area;

According to the first weight and the second weight, the center point of the target object area and the center point of the target area are weighted and summed to obtain the center point of the optimized target area; the target object The position of the area is represented by the center point of the target object area, the position of the target area is represented by the center point of the target area, the position of the optimized target area is represented by the center point of the optimized target area, and

According to the first weight and the second weight, weighted summation is performed on the size of the target object area and the target area to obtain the optimized size of the target area.
The method according to claim 6, wherein the center point of the optimized target area is calculated and obtained by the following formula:

center_x_opt=λ*center_x_track+(1-λ)*center_x_detect;

center_y_opt=λ*center_y_track+(1-λ)*center_y_detect;

Among them, center_x_opt is the coordinate of the center point of the optimized target area in the horizontal direction of the image frame, center_y_opt is the coordinate of the center point of the optimized target area in the vertical direction of the image frame; center_x_track is the center point of the target area in the image frame The coordinates in the horizontal direction, center_y_track is the coordinates of the center point of the target area in the vertical direction of the image frame, center_x_detect is the coordinates of the center point of the target object area in the horizontal direction of the image frame, center_y_detect is the center point of the target object area Coordinates in the vertical direction of the image frame, λ is the second weight.
The method according to claim 6, wherein the size of the optimized target area is obtained by calculating the following formula:

width_opt=λ*width_track+(1-λ)*width_detect;

height_opt=λ*height_track+(1-λ)*height_detect;

Among them, width_opt is the width of the optimized target area, width_track is the width of the target area, width_detect is the width of the target object area, height_opt is the height of the optimized target area, height_track is the height of the target area, height_detect is the height of the target object area, λ is the second weight.
A target tracking device, comprising:

a target tracking module, used for determining the target area where the tracking target is located in the image frame through a preset tracking algorithm according to the apparent feature of the tracking target;

A deep learning recognition module, used to obtain several object regions in the image frame through a preset deep learning algorithm;

a selection module, configured to select, among the several object regions, an object region with the same object attribute as the target region as the target object region;

The optimization module is configured to adjust the position and size of the target area according to the target object area to generate an optimized target area.
An image processing chip, comprising: a processor and a memory communicatively connected to the processor;

Computer program instructions are stored in the memory, and when called by the processor, the computer program instructions cause the processor to execute the target tracking method according to any one of claims 1-8.
An unmanned aerial vehicle, characterized in that it comprises: an unmanned aerial vehicle body, an image acquisition device and an image processing chip installed on the gimbal of the unmanned aerial vehicle body;

The image acquisition device is used to continuously collect multiple frames of images; the image processing chip is used to receive the multiple frames of images continuously collected by the image acquisition device, and to perform the steps of claims 1-8 on the received multiple frames of images. The target tracking method described in any one of them realizes the tracking of the tracking target.