WO2022033306A1 - Target tracking method and apparatus - Google Patents

Target tracking method and apparatus Download PDF

Info

Publication number
WO2022033306A1
WO2022033306A1 PCT/CN2021/108893 CN2021108893W WO2022033306A1 WO 2022033306 A1 WO2022033306 A1 WO 2022033306A1 CN 2021108893 W CN2021108893 W CN 2021108893W WO 2022033306 A1 WO2022033306 A1 WO 2022033306A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
area
tracking
target area
center
Prior art date
Application number
PCT/CN2021/108893
Other languages
French (fr)
Chinese (zh)
Inventor
李亚学
Original Assignee
深圳市道通智能航空技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市道通智能航空技术股份有限公司 filed Critical 深圳市道通智能航空技术股份有限公司
Publication of WO2022033306A1 publication Critical patent/WO2022033306A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the invention relates to the technical field of machine vision, in particular to a target tracking method, device, image processing chip and unmanned aerial vehicle.
  • Object tracking is a technique of predicting the size and position of a target object in subsequent image frames of a video sequence given the target size and position of the initial image frame of the video sequence. It has a wide range of applications in many fields such as video surveillance, human-computer interaction and multimedia analysis.
  • the tracked target is prone to change in shape due to non-rigid motion, and is subject to illumination transformation and background environment interference.
  • the embodiments of the present invention aim to provide a target tracking method, a device, an image processing chip and an unmanned aerial vehicle, which can solve the defects of the existing target tracking method.
  • a target tracking method includes:
  • the target area where the tracking target is located is determined in the image frame through a preset tracking algorithm
  • the position and size of the target area are adjusted to generate an optimized target area.
  • the deep learning algorithm In the initial image frame, through the deep learning algorithm, several optional object regions are identified and obtained, and each of the object regions is marked with a corresponding object label;
  • selecting an object region with the same object attribute as the target region as the target object region specifically including:
  • the candidate object region with the largest overlap with the tracking result of the previous image frame is selected as the target object region.
  • the selection of the candidate object region with the largest overlap with the target region as the target object region specifically includes:
  • the ratio of the intersection area and the union area is used as the degree of overlap between the candidate object area and the target object area.
  • the representational features include: a gradient direction histogram, a local binary pattern, and a color feature.
  • generating an optimized target area by adjusting the position and size of the target area according to the target object area specifically includes:
  • the center point of the target object area and the center point of the target area are weighted and summed to obtain the center point of the optimized target area; the target object
  • the position of the area is represented by the center point of the target object area
  • the position of the target area is represented by the center point of the target area
  • the position of the optimized target area is represented by the center point of the optimized target area
  • weighted summation is performed on the size of the target object area and the target area to obtain the size of the optimized target area.
  • the center point of the optimized target area is calculated and obtained by the following formula:
  • center_x_opt ⁇ *center_x_track+(1- ⁇ )*center_x_detect
  • center_y_opt ⁇ *center_y_track+(1- ⁇ )*center_y_detect
  • center_x_opt is the coordinate of the center point of the optimized target area in the horizontal direction of the image frame
  • center_y_opt is the coordinate of the center point of the optimized target area in the vertical direction of the image frame
  • center_x_track is the center point of the target area in the image frame
  • center_x_detect is the coordinates of the center point of the target object area in the horizontal direction of the image frame
  • center_y_detect is the center point of the target object area Coordinates in the vertical direction of the image frame
  • is the second weight.
  • the size of the optimized target area is calculated and obtained by the following formula:
  • width_opt ⁇ *width_track+(1- ⁇ )*width_detect
  • height_opt ⁇ *height_track+(1- ⁇ )*height_detect
  • width_opt is the width of the optimized target area
  • width_track is the width of the target area
  • width_detect is the width of the target object area
  • height_opt is the height of the optimized target area
  • height_track is the height of the target area
  • height_detect is the height of the target object area
  • is the second weight.
  • a target tracking device comprising:
  • a target tracking module used for determining the target area where the tracking target is located in the image frame through a preset tracking algorithm according to the apparent feature of the tracking target;
  • a deep learning recognition module used to obtain several object regions in the image frame through a preset deep learning algorithm
  • a selection module configured to select, among the several object regions, an object region with the same object attribute as the target region as the target object region;
  • the optimization module is configured to adjust the position and size of the target area according to the target object area to generate an optimized target area.
  • an image processing chip comprising: a processor and a memory communicatively connected to the processor; the memory stores computer program instructions, and the computer Program instructions, when invoked by the processor, cause the processor to perform the object tracking method described above.
  • an unmanned aerial vehicle comprising: an unmanned aerial vehicle main body, an image acquisition device and an image processing chip installed on the gimbal of the unmanned aerial vehicle main body;
  • the image acquisition device is used to continuously collect multiple frames of images;
  • the image processing chip is used to receive the multiple frames of images continuously collected by the image acquisition device, and to perform the above-mentioned target tracking on the received multiple frames of images method to realize the tracking of the tracking target.
  • the target tracking method of the embodiment of the present invention is optimized and adjusted based on the original tracking algorithm and combined with the detection results of deep learning, which can better adapt to and resist interference in complex environments, and is effective. Improves the overall performance of object tracking.
  • FIG. 1 is a schematic diagram of an application scenario of a target tracking method according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of a target tracking apparatus provided by an embodiment of the present invention.
  • FIG. 3 is a method flowchart of a target tracking method provided by an embodiment of the present invention.
  • FIG. 4 is a method flowchart of a method for selecting a target object region provided by an embodiment of the present invention
  • FIG. 5 is a schematic diagram of an application example of a target tracking method provided by an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of an image processing chip provided by an embodiment of the present invention.
  • the traditional object tracking process includes generating candidate samples, feature extraction, scoring candidate samples using the observation model, updating the observation model to adapt to changes in the target, and fusion to obtain the final decision result and so on.
  • feature extraction refers to the process of extracting discriminative features to represent the target. Extracting and obtaining discriminative features is the basis of candidate sample scoring and is the key to determining the performance of target tracking. Most of the existing improvements on the performance of target tracking methods focus on how to select appropriate features.
  • the target tracking method provided by the embodiment of the present invention can overcome the interference of occlusion, deformation, background noise, scale transformation, etc. target tracking performance.
  • FIG. 1 is an application scenario of a target tracking method provided by an embodiment of the present invention.
  • a drone 10 equipped with an aerial camera, an intelligent terminal 20 and a wireless network 30 are included.
  • the drone 10 may be any type of powered unmanned aerial vehicle, including but not limited to quadcopter drones, fixed-wing aircraft, and helicopter models. It can have the corresponding volume or power according to the needs of the actual situation, so as to provide the load capacity, flight speed and flight cruising range that can meet the needs of use.
  • the UAV 10 may be equipped with any type of image capture device, including a motion camera, a high-definition camera, or a wide-angle camera.
  • a motion camera a high-definition camera
  • a wide-angle camera As one of the functional modules carried on the UAV, it can be installed and fixed on the UAV by installing a fixed bracket such as a gimbal, and is controlled by the UAV 10 to perform the task of image acquisition.
  • one or more functional modules can also be added to the UAV, so that the UAV can realize the corresponding functions, such as the built-in main control chip, which is used as the control core of UAV flight and data transmission, or a picture transmission device. , and upload the acquired image information to a device (such as a server or an intelligent terminal) that establishes a connection with the drone.
  • a device such as a server or an intelligent terminal
  • the smart terminal 20 may be any type of smart device used to establish a communication connection with the drone, such as a mobile phone, a tablet computer, or a smart remote control.
  • the smart terminal 20 may be equipped with one or more different user interaction devices to collect user instructions or display and feed back information to the user.
  • buttons, display screens, touch screens, speakers, and remote control joysticks are included in the smart terminal 20 .
  • the smart terminal 20 may be equipped with a touch display screen, through which the user's remote control instructions for the drone are received, and the image information obtained by the aerial camera is displayed to the user through the touch display screen.
  • the touch screen switches the image information currently displayed on the display.
  • the existing image vision processing technology may also be integrated between the drone 10 and the intelligent terminal 20 to further provide more intelligent services.
  • the drone 10 can collect images through an aerial camera, and then the intelligent terminal 20 executes the target tracking method provided by the embodiment of the present invention to track a specific face in the video, and finally realizes the communication between the user and the drone. Human-computer interaction.
  • the target tracking method can also be executed by the drone 10 or an external server, and the final data result can be directly provided to the intelligent terminal 20 .
  • the wireless network 30 can be a wireless communication network based on any type of data transmission principle for establishing a data transmission channel between two nodes, such as a Bluetooth network, a WiFi network, a wireless cellular network or a combination thereof located in a specific signal frequency band, to achieve Data transmission between the drone 10, the smart terminal 20 and/or the server.
  • FIG. 2 is a structural block diagram of a target tracking apparatus provided by an embodiment of the present invention.
  • the target tracking device can be executed by any suitable type of electronic computing platform, such as an image processing chip built in the drone, a server or an intelligent terminal that establishes a wireless communication connection with the drone.
  • the composition of the target tracking device is described in the form of functional modules.
  • FIG. 2 can be selectively implemented by software, hardware or a combination of software and hardware according to actual needs. For example, it may be implemented by the processor calling an associated software application stored in memory.
  • the target tracking device 200 includes: a target tracking module 210 , a deep learning identification module 220 , a selection module 230 and an optimization module 240 .
  • the target tracking module 210 is configured to determine the target area where the tracking target is located in the image frame by using a preset tracking algorithm according to the appearance characteristics of the tracking target.
  • representational features refers to some hand-designed discriminative features used in traditional object tracking methods. It has the characteristics of fast operation speed and small delay. Specifically, the representation features include gradient direction histograms, local binary patterns and color features.
  • the target tracking module 210 is a functional module for executing traditional target tracking methods, and specifically, any suitable type of tracking algorithm can be selected.
  • the deep learning identification module 220 is configured to obtain several object regions in the image frame through a preset deep learning algorithm.
  • Deep learning is a method of image recognition using deep neural networks trained on sample data. Through deep learning, multiple different object regions can be identified in the image frame. Each object area represents a specific object.
  • the selection module 230 is configured to select, among the several object regions, an object region with the same object attribute as the target region as the target object region.
  • Object attribute refers to the metric used to judge whether the object area and the target area belong to the same object. Specifically, it may be composed of one or more indicators or conditions, and basically, object regions belonging to the same target object can be determined or selected.
  • the selection module 230 is specifically configured to select an object region that is the same as the object label of the tracking target as a candidate object region; and in the candidate object region, select the tracking result of the previous image frame The candidate object region with the largest overlap degree is used as the target object region.
  • the optimization module 240 is configured to adjust the position and size of the target area according to the target object area to generate an optimized target area.
  • the "target object area” is an area obtained based on deep learning, which has a relatively good resistance to interference factors in complex environments and is not easily interfered. Therefore, on the basis of the traditional tracking algorithm, the target object region is introduced as a reference, so that the effect of the tracking result can be optimized, and an optimized target region can be generated.
  • the optimization module 240 is specifically configured to: set a first weight of the target object area and a second weight of the target area;
  • the center point of the target object area and the center point of the target area are weighted and summed to obtain the center point of the optimized target area; and according to the The first weight and the second weight are weighted summation of the size of the target object area and the target area to obtain the size of the optimized target area.
  • the position of the target object area is represented by the center point of the target object area
  • the position of the target area is represented by the center point of the target area
  • the position of the optimized target area is represented by the optimized target area.
  • the center point of the area is represented.
  • the center point of the optimized target area can be calculated by the following formula:
  • center_x_opt ⁇ *center_x_track+(1- ⁇ )*center_x_detect
  • center_y_opt ⁇ *center_y_track+(1- ⁇ )*center_y_detect
  • center_x_opt is the coordinate of the center point of the optimized target area in the horizontal direction of the image frame
  • center_y_opt is the coordinate of the center point of the optimized target area in the vertical direction of the image frame
  • center_x_track is the center point of the target area in the image frame
  • center_x_detect is the coordinates of the center point of the target object area in the horizontal direction of the image frame
  • center_y_detect is the center point of the target object area Coordinates in the vertical direction of the image frame
  • is the second weight.
  • the size of the optimized target area can be calculated by the following formula:
  • width_opt ⁇ *width_track+(1- ⁇ )*width_detect
  • height_opt ⁇ *height_track+(1- ⁇ )*height_detect
  • width_opt is the width of the optimized target area
  • width_track is the width of the target area
  • width_detect is the width of the target object area
  • height_opt is the height of the optimized target area
  • height_track is the height of the target area
  • height_detect is the height of the target object area
  • is the second weight.
  • the image acquisition device applied to the UAV is taken as an example.
  • the target tracking method can also be used in other types of scenarios and devices to improve the performance of the target tracking algorithm.
  • the target tracking method disclosed in the embodiment of the present invention is not limited to be applied to the UAV shown in FIG. 1 .
  • the target tracking device 200 may further include a marking module 250 .
  • the marking module 250 is used to identify and obtain several selectable object regions through the deep learning algorithm in the initial image frame, and determine the object region selected by the user as the tracking target. Wherein, each of the object regions is marked with a corresponding object label.
  • the "object label” (label) is the output of the deep learning algorithm, which is used to mark the object corresponding to the object area.
  • the specific form of the object label depends on the deep learning algorithm used and its training data.
  • the target object to be tracked can be selected by the user as a candidate object.
  • FIG. 3 is a method flowchart of a target tracking method provided by an embodiment of the present invention. As shown in Figure 3, the target tracking method may include the following steps:
  • the "image frame” refers to a certain frame of image being processed in the video sequence.
  • the tracking algorithm takes the video sequence composed of continuous image frames as the processing object, and predicts and tracks the position of the target in the image frame by frame.
  • the tracking algorithm can use any type of fast tracking algorithm in the prior art, which takes the appearance feature as the discriminating feature of the tracking target.
  • the appearance feature includes but is not limited to histogram of gradient orientation (HOG), local binary pattern (LBP), color feature, and the like.
  • the gradient direction histogram is easily disturbed by the non-rigid body motion of the target (such as the movement of the person who is the tracking target from standing to squatting) and occlusion.
  • the color features are easily affected by changes in the lighting environment.
  • the target area refers to a rectangular frame with a specific size in the image frame that is calculated and output by the "tracking algorithm" and contains the tracking target. Specifically, it can be calculated and obtained by any type of tracking algorithm.
  • the "deep learning algorithm” can be of any type, using sample data to realize the image processing method of the neural network model. Through deep learning algorithms, multiple objects present in an image frame can be obtained with high confidence.
  • the output of the deep learning algorithm is also a rectangular box containing recognizable objects.
  • the deep learning algorithm also outputs the object label corresponding to each object area, which is used to mark the target object (such as a face, an airplane, etc.) corresponding to the object area.
  • the specific filtering method used depends on the actual object properties used.
  • the technical personnel can choose to use one or more measurement standards as the actual object attributes according to the needs of the actual situation.
  • the steps shown in FIG. 4 may be specifically adopted to select and obtain a target object region from a plurality of object regions as a reference for adjustment and optimization:
  • the "object label” (label) is output by the deep learning algorithm and used to mark the object corresponding to the object area.
  • label is output by the deep learning algorithm and used to mark the object corresponding to the object area.
  • the exact form of object labels depends on the deep learning algorithm used and its training data.
  • the degree of overlap between the candidate region and the tracking result of the previous image frame can be used as a criterion, and a target object region that can be used as an adjustment reference and standard can be further selected.
  • the object label and the degree of overlap are used to judge the attributes of the object, so that the real tracking target can be found from the output result of the deep learning algorithm with certainty, and this can be used as the basis for adjusting and optimizing the tracking result.
  • the target object area adjust the position and size of the target area to generate an optimized target area.
  • adjustment refers to using any suitable type of function mapping method to integrate the target object area and the target area, and by adjusting the position and size, generate and output an optimization result, that is, the optimized target area.
  • Forms of adjustment include changing and optimizing the position and size of the rectangular box representing the target area in the image frame.
  • the tracking target can be tracked by linking the optimized target areas of each image frame in a series of continuous image frame sequences, and the changes of the position and size of the tracking target in the image frame sequence can be determined.
  • the video sequence is composed of n consecutive image frames (the target tracking method provided by the embodiment of the present invention is sequentially performed in the 2nd to the nth image frame, and the predicted tracking target is in each image frame. position.)
  • the tracking result, the object area and the target area can all be represented by the smallest enclosing rectangle containing the target object.
  • the target area where the tracking target is located is determined according to the user's instruction.
  • the initial image frame refers to the starting point of target tracking.
  • the initial image frame is the first image frame in the video sequence.
  • any image frame in the video sequence can also be randomly selected as the initial image frame, which is used as the starting point of target tracking.
  • a deep learning algorithm can be used to identify and detect in the initial image frame, and several optional object regions (each of the object regions) can be obtained. are marked with corresponding object labels, denoted by L1 to L4 in Figure 5) as optional tracking targets.
  • the user can issue a corresponding user selection instruction according to his own needs, and select one of these optional object areas as the tracking target (eg L4).
  • a target object region with the same object attribute as the tracking target is selected.
  • the object region obtained by detection first select the object region that is the same as the object label of the tracking target of the initial image frame as the candidate object region. Then, the object area with the highest degree of overlap with the target area D of the previous image frame is used as the target object area.
  • the degree of overlap between the two rectangular boxes can be represented by the intersection ratio between the object area and the target area.
  • the "intersection over union ratio” (IoU) refers to the ratio between the intersection and union of two regions of interest.
  • the calculation process is specifically: calculating the intersection area and union area of the candidate object area and the target area, and using the ratio of the intersection area and the union area as the candidate object area and the target object. The overlap of the regions.
  • the optimized target area in the image frame can be obtained.
  • the target tracking result in a video sequence composed of a series of consecutive image frames can be obtained.
  • the complete target tracking method can be expressed by the following formula (1):
  • Bbox represents the tracking target
  • the subscripts track and detect represent the target area obtained by the tracking algorithm and the object area obtained by the deep learning algorithm, respectively.
  • the object area may contain multiple, use them separately Representation (j is the serial number of the object area).
  • Represents the target area in the i-th image frame represents the optimized tracking result of the i-th image frame, Represents the target object area (that is, the object area with the same object label as the target area of the previous frame and the largest overlap).
  • the specific integration method for the tracking algorithm and the deep learning algorithm is as follows: first, the first weight of the target object area and the second weight of the target area are set. Then, according to the first weight and the second weight, the center point of the target object area and the center point of the target area are weighted and summed to obtain the optimized center point of the target area and according to the The first weight and the second weight are weighted and summed to the size of the target object area and the target area to obtain the size of the optimized target area.
  • the position of the target object area is represented by the center point of the target object area
  • the position of the target area is represented by the center point of the target area
  • the position of the optimized target area is represented by the optimized target area.
  • the center point of the area indicates that the first weight and the second weight of the target object area and the target area can be preset by technicians according to the actual situation, and are constant values that can be determined experimentally or empirically.
  • the confidence of the target area may be used as the weight (second weight) occupied by the tracking algorithm.
  • the weight occupied by the deep learning algorithm ie, the first weight
  • the first weight is 1 minus the confidence of the target area.
  • the position of the optimized target area is obtained by calculating the following formulas (2) and (3):
  • center_x_opt ⁇ *center_x_track+(1- ⁇ )*center_x_detect (2);
  • center_y_opt ⁇ *center_y_track+(1- ⁇ )*center_y_detect (3)
  • the optimized target area, the object area, and the target area are all represented by a rectangular frame. Therefore, the optimized target area, the object area, and the position of the target area can be represented by the position coordinates of the center of the rectangular frame in the image frame. That is, the position of the optimized target area can be expressed as (center_x_opt, center_y_opt), the position of the object area can be expressed as (center_x_detect, center_y_detect), and the position of the target area can be expressed as (center_x_track, center_y_track).
  • the size of the optimized target area can be calculated by the following formulas (4) and (5):
  • width_opt ⁇ *width_track+(1- ⁇ )*width_detect (4);
  • height_opt ⁇ *height_track+(1- ⁇ )*height_detect (5);
  • width_opt is the width of the optimized target area
  • width_track is the width of the target area
  • width_detect is the width of the target object area
  • height_opt is the height of the optimized target area
  • height_track is the height of the target area
  • height_detect is the height of the target object area
  • is the confidence level.
  • the target tracking method provided by the embodiment of the present invention is optimized and adjusted in combination with the detection results of deep learning, which can better adapt to and resist interference in complex environments, and effectively improve the overall performance of target tracking.
  • An embodiment of the present invention further provides a non-volatile computer storage medium, where the computer storage medium stores at least one executable instruction, and the computer-executable instruction can execute the target tracking method in any of the foregoing method embodiments.
  • FIG. 6 shows a schematic structural diagram of an image processing chip according to an embodiment of the present invention.
  • the specific embodiment of the present invention does not limit the specific implementation of the image processing chip.
  • the image processing chip may include: a processor (processor) 602 , a communication interface (Communications Interface) 604 , a memory (memory) 606 , and a communication bus 608 .
  • processor processor
  • Communication interface Communication Interface
  • memory memory
  • communication bus 608
  • the processor 602 , the communication interface 604 , and the memory 606 communicate with each other through the communication bus 608 .
  • the communication interface 604 is used to communicate with network elements of other devices such as clients or other servers.
  • the processor 602 is configured to execute the program 610, and specifically may execute the relevant steps in the above-mentioned embodiments of the target tracking method.
  • the program 610 may include program code including computer operation instructions.
  • the processor 602 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present invention.
  • the one or more processors included in the network slicing device may be the same type of processors, such as one or more CPUs; or may be different types of processors, such as one or more CPUs and one or more ASICs.
  • the memory 606 is used to store the program 610 .
  • Memory 606 may include high-speed RAM memory, and may also include non-volatile memory, such as at least one disk memory.
  • Program 610 may be used to cause processor 602 to perform the following steps:
  • the target area where the tracking target is located is determined in the image frame.
  • several object regions are obtained through a preset deep learning algorithm.
  • an object region with the same object attribute as the target region is selected as the target object region.
  • an optimized target area is generated by adjusting the position and size of the target area.
  • the program 610 is further configured to cause the processor 602 to perform the following steps before executing the step of determining the target area where the tracking target is located in the image frame by using a preset tracking algorithm: first, in the initial image In the frame, several selectable object regions are identified and obtained through the deep learning algorithm, and each of the object regions is marked with a corresponding object label. Then, the object area selected by the user is determined as the tracking target.
  • the program 610 may be used to cause the processor 602 to perform the step of selecting an object region with the same object attribute as the target region among the several object regions as the target object region, specifically Used for:
  • the program 610 may be configured to cause the processor 602 to, when performing the step of selecting the candidate object region with the largest overlap with the target region as the target object region, be specifically configured to: calculate the candidate object region The intersection area and the union area of the object area and the tracking result; the ratio of the intersection area and the union area is taken as the degree of overlap between the candidate object area and the tracking result.
  • the program 610 may be used to cause the processor 602 to perform the step of generating an optimized target area by adjusting the position and size of the target area according to the target object area, specifically using At:
  • the processor 602 can obtain the center point of the optimized target area by calculating the following formula:
  • center_x_opt ⁇ *center_x_track+(1- ⁇ )*center_x_detect
  • center_y_opt ⁇ *center_y_track+(1- ⁇ )*center_y_detect
  • center_x_opt is the coordinate of the center point of the optimized target area in the horizontal direction of the image frame
  • center_y_opt is the coordinate of the center point of the optimized target area in the vertical direction of the image frame
  • center_x_track is the center point of the target area in the image frame
  • center_x_detect is the coordinates of the center point of the target object area in the horizontal direction of the image frame
  • center_y_detect is the center point of the target object area Coordinates in the vertical direction of the image frame
  • is the second weight.
  • the processor 602 can also obtain the size of the optimized target area by calculating the following formula:
  • width_opt ⁇ *width_track+(1- ⁇ )*width_detect
  • height_opt ⁇ *height_track+(1- ⁇ )*height_detect
  • width_opt is the width of the optimized target area
  • width_track is the width of the target area
  • width_detect is the width of the target object area
  • height_opt is the height of the optimized target area
  • height_track is the height of the target area
  • height_detect is the height of the target object area
  • is the second weight.
  • each step of the exemplary target tracking method described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, computer software or a combination of the two, for the purpose of clear illustration Interchangeability of hardware and software, the above description has generally described the components and steps of each example in terms of functions. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution.
  • the computer software can be stored in a computer-readable storage medium, and when the program is executed, it can include the processes of the above-mentioned method embodiments.
  • the storage medium can be a magnetic disk, an optical disk, a read-only storage memory, or a random storage memory, and the like.
  • the embodiment of the present invention further provides an unmanned aerial vehicle.
  • the unmanned aerial vehicle comprises: an unmanned aerial vehicle main body, an image acquisition device and an image acquisition chip installed on the gimbal of the unmanned aerial vehicle main body.
  • the image acquisition device is used to continuously collect multiple frames of images;
  • the image processing chip is used to receive the multiple frames of images continuously collected by the image acquisition device, and to perform the following steps on the received multiple frames of images:
  • the target area where the tracking target is located is determined in the image frame.
  • several object regions are obtained through a preset deep learning algorithm.
  • an object region with the same object attribute as the target region is selected as the target object region.
  • an optimized target area is generated by adjusting the position and size of the target area.
  • the UAV can track the tracking target.
  • the image processing chip before the step of determining the target area where the tracking target is located in the image frame, the image processing chip further performs the following steps:
  • the image processing chip when the image processing chip performs the step of selecting an object region with the same object attribute as the target region from among the several object regions as the target object region, the image processing chip is specifically configured to:
  • the image processing chip when the image processing chip performs the step of selecting the candidate object region with the largest overlap with the target region as the target object region, the image processing chip is specifically configured to: calculate the difference between the candidate object region and the target object region. The intersection area and the union area of the tracking results; the ratio of the intersection area and the union area is used as the overlap between the candidate object area and the tracking result.
  • the image processing chip when the image processing chip performs the step of generating an optimized target area by adjusting the position and size of the target area according to the target object area, the image processing chip is specifically configured to:
  • the image processing chip can obtain the center point of the optimized target area by calculating the following formula:
  • center_x_opt ⁇ *center_x_track+(1- ⁇ )*center_x_detect
  • center_y_opt ⁇ *center_y_track+(1- ⁇ )*center_y_detect
  • center_x_opt is the coordinate of the center point of the optimized target area in the horizontal direction of the image frame
  • center_y_opt is the coordinate of the center point of the optimized target area in the vertical direction of the image frame
  • center_x_track is the center point of the target area in the image frame
  • center_x_detect is the coordinates of the center point of the target object area in the horizontal direction of the image frame
  • center_y_detect is the center point of the target object area Coordinates in the vertical direction of the image frame
  • is the second weight.
  • the image processing chip can also obtain the size of the optimized target area by calculating the following formula:
  • width_opt ⁇ *width_track+(1- ⁇ )*width_detect
  • height_opt ⁇ *height_track+(1- ⁇ )*height_detect
  • width_opt is the width of the optimized target area
  • width_track is the width of the target area
  • width_detect is the width of the target object area
  • height_opt is the height of the optimized target area
  • height_track is the height of the target area
  • height_detect is the height of the target object area
  • is the second weight.

Abstract

The embodiments of the present invention relate to a target tracking method and apparatus. The embodiments comprise: according to an appearance feature of a tracking target and by means of a preset tracking algorithm, determining, from an image frame, a target area in which the tracking target is located; obtaining multiple object areas from the image frame by means of a preset deep learning algorithm; selecting, from among the multiple object areas, an object area with the same object attribute as the target area, and using same as a target object area; and adjusting the position and size of the target area according to the target object area, so as to generate an optimized target area. In the present invention, on the basis of an original tracking algorithm, optimization and adjustment are performed in combination with a detection result of deep learning, such that interference in a complicated environment can be better adapted to and resisted, thereby effectively improving the overall performance of target tracking.

Description

目标跟踪方法和装置Target tracking method and device
本申请要求于2020年8月12日提交中国专利局、申请号为202010805592X、申请名称为“目标跟踪方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202010805592X and the application title "Target Tracking Method and Device" filed with the China Patent Office on August 12, 2020, the entire contents of which are incorporated into this application by reference.
【技术领域】【Technical field】
本发明涉及机器视觉技术领域,尤其涉及一种目标跟踪方法、装置、图像处理芯片以及无人机。The invention relates to the technical field of machine vision, in particular to a target tracking method, device, image processing chip and unmanned aerial vehicle.
【背景技术】【Background technique】
“目标跟踪”是在给定的视频序列的初始图像帧的目标大小与位置的情况下,预测该目标物体在视频序列后续的图像帧中的大小和位置的技术。其在视频监控、人机交互和多媒体分析等多个领域都有广泛的应用。"Object tracking" is a technique of predicting the size and position of a target object in subsequent image frames of a video sequence given the target size and position of the initial image frame of the video sequence. It has a wide range of applications in many fields such as video surveillance, human-computer interaction and multimedia analysis.
但在实际应用过程中,被跟踪目标由于非刚体运动,形状易发生变化,受到光照变换以及背景环境干扰等的情况,经常出现跟踪失败的情况,无法完成目标跟踪的任务。However, in the actual application process, the tracked target is prone to change in shape due to non-rigid motion, and is subject to illumination transformation and background environment interference.
因此,如何避免视频序列中无关因素对于跟踪目标的干扰,提升目标跟踪的性能,以满足和适应复杂多变的实际应用情况是一个迫切需要解决的问题。Therefore, how to avoid the interference of irrelevant factors in the video sequence to the tracking target and improve the performance of target tracking to meet and adapt to the complex and changeable practical application situation is an urgent problem to be solved.
【发明内容】[Content of the invention]
本发明实施例旨在提供一种目标跟踪方法、装置、图像处理芯片以及无人机,能够解决现有目标跟踪方法存在的缺陷。The embodiments of the present invention aim to provide a target tracking method, a device, an image processing chip and an unmanned aerial vehicle, which can solve the defects of the existing target tracking method.
为解决上述技术问题,本发明实施例提供以下技术方案:一种目标跟踪方法。该方法包括:To solve the above technical problems, the embodiments of the present invention provide the following technical solutions: a target tracking method. The method includes:
根据跟踪目标的表象特征,通过预设的跟踪算法,在图像帧中确定所述跟踪目标所在的目标区域;According to the appearance characteristics of the tracking target, the target area where the tracking target is located is determined in the image frame through a preset tracking algorithm;
在所述图像帧中,通过预设的深度学习算法,获得若干个物体区域;In the image frame, several object regions are obtained through a preset deep learning algorithm;
在所述若干个物体区域中,选择与所述目标区域的物体属性相同的物体区域作为目标物体区域;In the several object regions, select the object region with the same object attribute as the target region as the target object region;
根据所述目标物体区域,调整所述目标区域的位置和大小,生成优化的目标区域。According to the target object area, the position and size of the target area are adjusted to generate an optimized target area.
可选地,在初始图像帧中,通过所述深度学习算法,识别获得若干个可选的物体区域,每一个所述物体区域标记有对应的物体标签;Optionally, in the initial image frame, through the deep learning algorithm, several optional object regions are identified and obtained, and each of the object regions is marked with a corresponding object label;
确定用户选中的物体区域作为跟踪目标。Determine the object area selected by the user as the tracking target.
可选地,所述在所述若干个物体区域中,选择与所述目标区域的物体属性相同的物体区域作为目标物体区域,具体包括:Optionally, in the several object regions, selecting an object region with the same object attribute as the target region as the target object region, specifically including:
选择与所述跟踪目标的物体标签相同的物体区域作为候选物体区域;Selecting the same object region as the object label of the tracking target as the candidate object region;
在候选的物体区域中,选取与上一帧图像帧的跟踪结果的重叠度最大的候选物体区域作为所述目标物体区域。Among the candidate object regions, the candidate object region with the largest overlap with the tracking result of the previous image frame is selected as the target object region.
可选地,所述选取与所述目标区域重叠度最大的候选物体区域作为所述目标物体区域,具体包括:Optionally, the selection of the candidate object region with the largest overlap with the target region as the target object region specifically includes:
计算所述候选物体区域与所述目标区域的交集面积和并集面积;Calculate the intersection area and union area of the candidate object area and the target area;
将所述交集面积和所述并集面积的比值作为所述候选物体区域与所述目标物体区域的重叠度。The ratio of the intersection area and the union area is used as the degree of overlap between the candidate object area and the target object area.
可选地,所述表象特征包括:梯度方向直方图、局部二值模式和颜色特征。Optionally, the representational features include: a gradient direction histogram, a local binary pattern, and a color feature.
可选地,所述根据所述目标物体区域,通过调整所述目标区域的位置和大小的方式,生成优化的目标区域,具体包括:Optionally, generating an optimized target area by adjusting the position and size of the target area according to the target object area specifically includes:
设置所述目标物体区域的第一权重和所述目标区域的第二权重;setting the first weight of the target object area and the second weight of the target area;
根据所述第一权重和所述第二权重,对所述目标物体区域的中心点和所述目标区域的中心点进行加权求和,获得所述优化的目标区域的中心点;所述目标物体区域的位置由所述目标物体区域的中心点表示,所述目标区域的位置由所述目标区域的中心点表示,所述优化的目标区域的位置由所述优化的目标区域的中心点表示,并且According to the first weight and the second weight, the center point of the target object area and the center point of the target area are weighted and summed to obtain the center point of the optimized target area; the target object The position of the area is represented by the center point of the target object area, the position of the target area is represented by the center point of the target area, the position of the optimized target area is represented by the center point of the optimized target area, and
根据所述第一权重和所述第二权重,对所述目标物体区域与所述目标区 域的大小进行加权求和,获得所述优化的目标区域的大小。According to the first weight and the second weight, weighted summation is performed on the size of the target object area and the target area to obtain the size of the optimized target area.
可选地,所述优化的目标区域的中心点通过如下算式计算获得:Optionally, the center point of the optimized target area is calculated and obtained by the following formula:
center_x_opt=λ*center_x_track+(1-λ)*center_x_detect;center_x_opt=λ*center_x_track+(1-λ)*center_x_detect;
center_y_opt=λ*center_y_track+(1-λ)*center_y_detect;center_y_opt=λ*center_y_track+(1-λ)*center_y_detect;
其中,center_x_opt为优化的目标区域的中心点在图像帧的水平方向上的坐标,center_y_opt为优化的目标区域的中心点在图像帧的垂直方向上的坐标;center_x_track为目标区域的中心点在图像帧的水平方向上的坐标,center_y_track为目标区域的中心点在图像帧的垂直方向上的坐标,center_x_detect为目标物体区域的中心点在图像帧的水平方向上的坐标,center_y_detect为目标物体区域的中心点在图像帧的垂直方向上的坐标,λ为第二权重。Among them, center_x_opt is the coordinate of the center point of the optimized target area in the horizontal direction of the image frame, center_y_opt is the coordinate of the center point of the optimized target area in the vertical direction of the image frame; center_x_track is the center point of the target area in the image frame The coordinates in the horizontal direction, center_y_track is the coordinates of the center point of the target area in the vertical direction of the image frame, center_x_detect is the coordinates of the center point of the target object area in the horizontal direction of the image frame, center_y_detect is the center point of the target object area Coordinates in the vertical direction of the image frame, λ is the second weight.
可选地,所述优化的目标区域的大小通过如下算式计算获得:Optionally, the size of the optimized target area is calculated and obtained by the following formula:
width_opt=λ*width_track+(1-λ)*width_detect;width_opt=λ*width_track+(1-λ)*width_detect;
height_opt=λ*height_track+(1-λ)*height_detect;height_opt=λ*height_track+(1-λ)*height_detect;
其中,width_opt为优化的目标区域的宽度,width_track为目标区域的宽度,width_detect为目标物体区域的宽度,height_opt为优化的目标区域的高度,height_track为目标区域的高度,height_detect为目标物体区域的高度,λ为第二权重。Among them, width_opt is the width of the optimized target area, width_track is the width of the target area, width_detect is the width of the target object area, height_opt is the height of the optimized target area, height_track is the height of the target area, height_detect is the height of the target object area, λ is the second weight.
为解决上述技术问题,本发明实施例还提供以下技术方案:一种目标跟踪装置,包括:In order to solve the above technical problems, the embodiments of the present invention also provide the following technical solutions: a target tracking device, comprising:
目标跟踪模块,用于根据跟踪目标的表象特征,通过预设的跟踪算法,在图像帧中确定所述跟踪目标所在的目标区域;a target tracking module, used for determining the target area where the tracking target is located in the image frame through a preset tracking algorithm according to the apparent feature of the tracking target;
深度学习识别模块,用于在所述图像帧中,通过预设的深度学习算法,获得若干个物体区域;A deep learning recognition module, used to obtain several object regions in the image frame through a preset deep learning algorithm;
选择模块,用于在所述若干个物体区域中,选择与所述目标区域的物体属性相同的物体区域作为目标物体区域;a selection module, configured to select, among the several object regions, an object region with the same object attribute as the target region as the target object region;
优化模块,用于根据所述目标物体区域,调整所述目标区域的位置和大小,生成优化的目标区域。The optimization module is configured to adjust the position and size of the target area according to the target object area to generate an optimized target area.
为解决上述技术问题,本发明实施例还提供以下技术方案:一种图像处 理芯片,包括:处理器以及与所述处理器通信连接的存储器;所述存储器中存储有计算机程序指令,所述计算机程序指令在被所述处理器调用时,以使所述处理器执行如上所述的目标跟踪方法。In order to solve the above technical problems, the embodiments of the present invention further provide the following technical solutions: an image processing chip, comprising: a processor and a memory communicatively connected to the processor; the memory stores computer program instructions, and the computer Program instructions, when invoked by the processor, cause the processor to perform the object tracking method described above.
为解决上述技术问题,本发明实施例还提供以下技术方案:一种无人机,包括:无人机主体,安装在所述无人机主体的云台上的图像采集设备以及图像处理芯片;In order to solve the above-mentioned technical problems, the embodiments of the present invention also provide the following technical solutions: an unmanned aerial vehicle, comprising: an unmanned aerial vehicle main body, an image acquisition device and an image processing chip installed on the gimbal of the unmanned aerial vehicle main body;
所述图像采集设备用于连续采集多帧图像;所述图像处理芯片用于接收所述图像采集设备连续采集的多帧图像,并对接收到的所述多帧图像执行如上所述的目标跟踪方法,实现对跟踪目标的跟踪。The image acquisition device is used to continuously collect multiple frames of images; the image processing chip is used to receive the multiple frames of images continuously collected by the image acquisition device, and to perform the above-mentioned target tracking on the received multiple frames of images method to realize the tracking of the tracking target.
与现有技术相比较,本发明实施例的目标跟踪方法在原有的跟踪算法的基础上,结合深度学习的检测结果进行优化和调整,可以更好的适应和抵抗复杂环境下的干扰,有效的提升了目标跟踪的整体性能。Compared with the prior art, the target tracking method of the embodiment of the present invention is optimized and adjusted based on the original tracking algorithm and combined with the detection results of deep learning, which can better adapt to and resist interference in complex environments, and is effective. Improves the overall performance of object tracking.
【附图说明】【Description of drawings】
一个或多个实施例通过与之对应的附图中的图片进行示例性说明,这些示例性说明并不构成对实施例的限定,附图中具有相同参考数字标号的元件表示为类似的元件,除非有特别申明,附图中的图不构成比例限制。One or more embodiments are exemplified by the pictures in the corresponding drawings, and these exemplifications do not constitute limitations of the embodiments, and elements with the same reference numerals in the drawings are denoted as similar elements, Unless otherwise stated, the figures in the accompanying drawings do not constitute a scale limitation.
图1为本发明实施例的目标跟踪方法的应用场景的示意图;1 is a schematic diagram of an application scenario of a target tracking method according to an embodiment of the present invention;
图2为本发明实施例提供的目标跟踪装置的示意图;2 is a schematic diagram of a target tracking apparatus provided by an embodiment of the present invention;
图3为本发明实施例提供的目标跟踪方法的方法流程图;3 is a method flowchart of a target tracking method provided by an embodiment of the present invention;
图4为本发明实施例提供的目标物体区域选择方法的方法流程图;4 is a method flowchart of a method for selecting a target object region provided by an embodiment of the present invention;
图5为本发明实施例提供的目标跟踪方法的应用实例的示意图;5 is a schematic diagram of an application example of a target tracking method provided by an embodiment of the present invention;
图6为本发明实施例提供的图像处理芯片的结构示意图。FIG. 6 is a schematic structural diagram of an image processing chip provided by an embodiment of the present invention.
【具体实施方式】【detailed description】
为了便于理解本发明,下面结合附图和具体实施例,对本发明进行更详细的说明。需要说明的是,当元件被表述“固定于”另一个元件,它可以直接在另一个元件上、或者其间可以存在一个或多个居中的元件。当一个元件被表述“连接”另一个元件,它可以是直接连接到另一个元件、或者其间可以存在一个或多个居中的元件。本说明书所使用的术语“上”、“下”、“内”、“外”、“底 部”等指示的方位或位置关系为基于附图所示的方位或位置关系,仅是为了便于描述本发明和简化描述,而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作,因此不能理解为对本发明的限制。此外,术语“第一”、“第二”“第三”等仅用于描述目的,而不能理解为指示或暗示相对重要性。In order to facilitate understanding of the present invention, the present invention will be described in more detail below with reference to the accompanying drawings and specific embodiments. It should be noted that when an element is referred to as being "fixed to" another element, it can be directly on the other element, or one or more intervening elements may be present therebetween. When an element is referred to as being "connected" to another element, it can be directly connected to the other element or one or more intervening elements may be present therebetween. The terms "upper", "lower", "inner", "outer", "bottom" and other terms used in this specification indicate the orientation or positional relationship based on the orientation or positional relationship shown in the accompanying drawings, and are only for the convenience of describing the present invention. The invention and simplified description do not indicate or imply that the device or element referred to must have a particular orientation, be constructed and operate in a particular orientation, and therefore should not be construed as limiting the invention. Furthermore, the terms "first," "second," "third," etc. are used for descriptive purposes only and should not be construed to indicate or imply relative importance.
除非另有定义,本说明书所使用的所有的技术和科学术语与属于本发明的技术领域的技术人员通常理解的含义相同。本说明书中在本发明的说明书中所使用的术语只是为了描述具体的实施例的目的,不是用于限制本发明。本说明书所使用的术语“和/或”包括一个或多个相关的所列项目的任意的和所有的组合。Unless otherwise defined, all technical and scientific terms used in this specification have the same meaning as commonly understood by one of ordinary skill in the technical field of the present invention. The terms used in the description of the present invention in this specification are only for the purpose of describing specific embodiments, and are not used to limit the present invention. As used in this specification, the term "and/or" includes any and all combinations of one or more of the associated listed items.
此外,下面所描述的本发明不同实施例中所涉及的技术特征只要彼此之间未构成冲突就可以相互结合。In addition, the technical features involved in the different embodiments of the present invention described below can be combined with each other as long as they do not conflict with each other.
在由一系列连续的图像帧组成的视频序列中,传统的目标跟踪流程包括生成候选样本,特征提取,使用观测模型为候选样本进行评分,更新观测模型适应目标的变化以及融合获得最终的决策结果等步骤。In a video sequence consisting of a series of consecutive image frames, the traditional object tracking process includes generating candidate samples, feature extraction, scoring candidate samples using the observation model, updating the observation model to adapt to changes in the target, and fusion to obtain the final decision result and so on.
其中,“特征提取”是指提取具有鉴别性的特征来表示目标的过程。提取获得鉴别性特征是候选样本评分的基础,是决定目标跟踪性能的关键。现有关于目标跟踪方法性能的改进大多集中在如何选取合适的特征。Among them, "feature extraction" refers to the process of extracting discriminative features to represent the target. Extracting and obtaining discriminative features is the basis of candidate sample scoring and is the key to determining the performance of target tracking. Most of the existing improvements on the performance of target tracking methods focus on how to select appropriate features.
而本发明实施例提供的目标跟踪方法通过结合和调整优化的方式,在不付出过多的延时或者计算量的情况下,可以克服遮挡,形变,背景杂斑以及尺度变换等的干扰,提升目标跟踪的性能。However, the target tracking method provided by the embodiment of the present invention can overcome the interference of occlusion, deformation, background noise, scale transformation, etc. target tracking performance.
图1为本发明实施例提供的目标跟踪方法的应用场景。如图1所示,在该应用场景中,包括搭载了航拍相机的无人机10、智能终端20以及无线网络30。FIG. 1 is an application scenario of a target tracking method provided by an embodiment of the present invention. As shown in FIG. 1 , in this application scenario, a drone 10 equipped with an aerial camera, an intelligent terminal 20 and a wireless network 30 are included.
无人机10可以是以任何类型的动力驱动的无人飞行载具,包括但不限于四轴无人机、固定翼飞行器以及直升机模型等。其可以根据实际情况的需要,具备相应的体积或者动力,从而提供能够满足使用需要的载重能力、飞行速度以及飞行续航里程等。The drone 10 may be any type of powered unmanned aerial vehicle, including but not limited to quadcopter drones, fixed-wing aircraft, and helicopter models. It can have the corresponding volume or power according to the needs of the actual situation, so as to provide the load capacity, flight speed and flight cruising range that can meet the needs of use.
无人机10上可以搭载有任何类型的图像采集设备,包括运动相机、高清相机或者广角相机。其作为无人机上搭载的其中一种功能模块,可以通过云 台等安装固定支架,安装固定在无人机上,并受控于无人机10,执行图像采集的任务。The UAV 10 may be equipped with any type of image capture device, including a motion camera, a high-definition camera, or a wide-angle camera. As one of the functional modules carried on the UAV, it can be installed and fixed on the UAV by installing a fixed bracket such as a gimbal, and is controlled by the UAV 10 to perform the task of image acquisition.
当然,无人机上还可以添加有一种或者多种功能模块,令无人机能够实现相应的功能,例如内置的主控芯片,作为无人机飞行和数据传输等的控制核心或者是图传装置,将采集获得的图像信息上传至与无人机建立连接的设备(如服务器或者智能终端)中。Of course, one or more functional modules can also be added to the UAV, so that the UAV can realize the corresponding functions, such as the built-in main control chip, which is used as the control core of UAV flight and data transmission, or a picture transmission device. , and upload the acquired image information to a device (such as a server or an intelligent terminal) that establishes a connection with the drone.
智能终端20可以是任何类型,用以与无人机建立通信连接的智能设备,例如手机、平板电脑或者智能遥控器等。该智能终端20可以装配有一种或者多种不同的用户交互装置,用以采集用户指令或者向用户展示和反馈信息。The smart terminal 20 may be any type of smart device used to establish a communication connection with the drone, such as a mobile phone, a tablet computer, or a smart remote control. The smart terminal 20 may be equipped with one or more different user interaction devices to collect user instructions or display and feed back information to the user.
这些交互装置包括但不限于:按键、显示屏、触摸屏、扬声器以及遥控操作杆。例如,智能终端20可以装配有触控显示屏,通过该触控显示屏接收用户对无人机的遥控指令并通过触控显示屏向用户展示由航拍相机获得的图像信息,用户还可以通过遥控触摸屏切换显示屏当前显示的图像信息。These interactive devices include, but are not limited to: buttons, display screens, touch screens, speakers, and remote control joysticks. For example, the smart terminal 20 may be equipped with a touch display screen, through which the user's remote control instructions for the drone are received, and the image information obtained by the aerial camera is displayed to the user through the touch display screen. The touch screen switches the image information currently displayed on the display.
在一些实施例中,无人机10与智能终端20之间还可以融合现有的图像视觉处理技术,进一步的提供更智能化的服务。例如无人机10可以通过航拍相机采集图像,然后由智能终端20执行本发明实施例提供的目标跟踪方法,对视频中的特定的人脸进行目标跟踪,最终实现用户与无人机之间的人机交互。In some embodiments, the existing image vision processing technology may also be integrated between the drone 10 and the intelligent terminal 20 to further provide more intelligent services. For example, the drone 10 can collect images through an aerial camera, and then the intelligent terminal 20 executes the target tracking method provided by the embodiment of the present invention to track a specific face in the video, and finally realizes the communication between the user and the drone. Human-computer interaction.
在另一些实施例中,该目标跟踪方法还可以由无人机10或者外置的服务器来执行,直接向智能终端20提供最终的数据结果。In other embodiments, the target tracking method can also be executed by the drone 10 or an external server, and the final data result can be directly provided to the intelligent terminal 20 .
无线网络30可以是基于任何类型的数据传输原理,用于建立两个节点之间的数据传输信道的无线通信网络,例如位于特定信号频段的蓝牙网络、WiFi网络、无线蜂窝网络或者其结合,实现无人机10,智能终端20和/或服务器之间的数据传输。The wireless network 30 can be a wireless communication network based on any type of data transmission principle for establishing a data transmission channel between two nodes, such as a Bluetooth network, a WiFi network, a wireless cellular network or a combination thereof located in a specific signal frequency band, to achieve Data transmission between the drone 10, the smart terminal 20 and/or the server.
图2为本发明实施例提供的目标跟踪装置的结构框图。该目标跟踪装置可以由任何合适类型的电子计算平台所执行,例如内置在无人机中的图像处理芯片,与无人机建立无线通信连接的服务器或者智能终端。在本实施例中,以功能模块的方式描述该目标跟踪装置的组成。FIG. 2 is a structural block diagram of a target tracking apparatus provided by an embodiment of the present invention. The target tracking device can be executed by any suitable type of electronic computing platform, such as an image processing chip built in the drone, a server or an intelligent terminal that establishes a wireless communication connection with the drone. In this embodiment, the composition of the target tracking device is described in the form of functional modules.
本领域技术人员可以理解的是,图2所示的功能模块可以根据实际情况的需要,选择性的通过软件、硬件或者软件和硬件相结合的方式来实现。例 如,可以通过处理器调用存储器中存储的相关软件应用程序予以实现。Those skilled in the art can understand that the functional modules shown in FIG. 2 can be selectively implemented by software, hardware or a combination of software and hardware according to actual needs. For example, it may be implemented by the processor calling an associated software application stored in memory.
如图2所示,该目标跟踪装置200包括:目标跟踪模块210,深度学习识别模块220,选择模块230以及优化模块240。As shown in FIG. 2 , the target tracking device 200 includes: a target tracking module 210 , a deep learning identification module 220 , a selection module 230 and an optimization module 240 .
其中,目标跟踪模块210用于根据跟踪目标的表象特征,通过预设的跟踪算法,在图像帧中确定所述跟踪目标所在的目标区域。The target tracking module 210 is configured to determine the target area where the tracking target is located in the image frame by using a preset tracking algorithm according to the appearance characteristics of the tracking target.
“表象特征”是指在传统的目标跟踪方法中使用的,一些手工设计的鉴别性特征。其具有运算速度快,时延小等的特点。具体的,所述表象特征包括梯度方向直方图、局部二值模式和颜色特征。该目标跟踪模块210是执行传统的目标跟踪方法的功能模块,具体可以选择使用任何合适类型的跟踪算法。"Representational features" refers to some hand-designed discriminative features used in traditional object tracking methods. It has the characteristics of fast operation speed and small delay. Specifically, the representation features include gradient direction histograms, local binary patterns and color features. The target tracking module 210 is a functional module for executing traditional target tracking methods, and specifically, any suitable type of tracking algorithm can be selected.
深度学习识别模块220用于在所述图像帧中,通过预设的深度学习算法,获得若干个物体区域。The deep learning identification module 220 is configured to obtain several object regions in the image frame through a preset deep learning algorithm.
“深度学习”是一种利用深度神经网络,通过样本数据训练而进行图像识别的方式。通过深度学习的方式,可以在图像帧中识别得到多个不同的物体区域。每个物体区域表示某个特定的物体。"Deep learning" is a method of image recognition using deep neural networks trained on sample data. Through deep learning, multiple different object regions can be identified in the image frame. Each object area represents a specific object.
选择模块230用于在所述若干个物体区域中,选择与所述目标区域的物体属性相同的物体区域作为目标物体区域。The selection module 230 is configured to select, among the several object regions, an object region with the same object attribute as the target region as the target object region.
“物体属性”是指用于判断物体区域是否与目标区域属于同一个物体的衡量标准。其具体可以由一种或者多种指标或者条件组成,基本上可以确定或者选择出属于同一个目标物体的物体区域。"Object attribute" refers to the metric used to judge whether the object area and the target area belong to the same object. Specifically, it may be composed of one or more indicators or conditions, and basically, object regions belonging to the same target object can be determined or selected.
在一些实施例中,所述选择模块230具体用于选择与所述跟踪目标的物体标签相同的物体区域作为候选物体区域;并且在候选的物体区域中,选取与上一帧图像帧的跟踪结果的重叠度最大的候选物体区域作为所述目标物体区域。In some embodiments, the selection module 230 is specifically configured to select an object region that is the same as the object label of the tracking target as a candidate object region; and in the candidate object region, select the tracking result of the previous image frame The candidate object region with the largest overlap degree is used as the target object region.
优化模块240用于根据所述目标物体区域,调整所述目标区域的位置和大小,生成优化的目标区域。The optimization module 240 is configured to adjust the position and size of the target area according to the target object area to generate an optimized target area.
“目标物体区域”是基于深度学习方式获得的区域,对于复杂环境的干扰因素会有比较好的抵抗能力,不容易受到干扰。因此,在传统跟踪算法的基础上,引入目标物体区域作为参考,可以使得跟踪结果的效果得到优化, 生成优化的目标区域。The "target object area" is an area obtained based on deep learning, which has a relatively good resistance to interference factors in complex environments and is not easily interfered. Therefore, on the basis of the traditional tracking algorithm, the target object region is introduced as a reference, so that the effect of the tracking result can be optimized, and an optimized target region can be generated.
在一些实施例中,优化模块240具体用于:设置所述目标物体区域的第一权重和所述目标区域的第二权重;In some embodiments, the optimization module 240 is specifically configured to: set a first weight of the target object area and a second weight of the target area;
根据所述第一权重和所述第二权重,对所述目标物体区域的中心点和所述目标区域的中心点进行加权求和,获得所述优化的目标区域的中心点;并且根据所述第一权重和所述第二权重,对所述目标物体区域与所述目标区域的大小进行加权求和,获得所述优化的目标区域的大小。According to the first weight and the second weight, the center point of the target object area and the center point of the target area are weighted and summed to obtain the center point of the optimized target area; and according to the The first weight and the second weight are weighted summation of the size of the target object area and the target area to obtain the size of the optimized target area.
其中,所述目标物体区域的位置由所述目标物体区域的中心点表示,所述目标区域的位置由所述目标区域的中心点表示,所述优化的目标区域的位置由所述优化的目标区域的中心点表示。The position of the target object area is represented by the center point of the target object area, the position of the target area is represented by the center point of the target area, and the position of the optimized target area is represented by the optimized target area. The center point of the area is represented.
具体的,一方面,所述优化的目标区域的中心点可以通过如下算式计算获得:Specifically, on the one hand, the center point of the optimized target area can be calculated by the following formula:
center_x_opt=λ*center_x_track+(1-λ)*center_x_detect;center_x_opt=λ*center_x_track+(1-λ)*center_x_detect;
center_y_opt=λ*center_y_track+(1-λ)*center_y_detect;center_y_opt=λ*center_y_track+(1-λ)*center_y_detect;
其中,center_x_opt为优化的目标区域的中心点在图像帧的水平方向上的坐标,center_y_opt为优化的目标区域的中心点在图像帧的垂直方向上的坐标;center_x_track为目标区域的中心点在图像帧的水平方向上的坐标,center_y_track为目标区域的中心点在图像帧的垂直方向上的坐标,center_x_detect为目标物体区域的中心点在图像帧的水平方向上的坐标,center_y_detect为目标物体区域的中心点在图像帧的垂直方向上的坐标,λ为第二权重。Among them, center_x_opt is the coordinate of the center point of the optimized target area in the horizontal direction of the image frame, center_y_opt is the coordinate of the center point of the optimized target area in the vertical direction of the image frame; center_x_track is the center point of the target area in the image frame The coordinates in the horizontal direction, center_y_track is the coordinates of the center point of the target area in the vertical direction of the image frame, center_x_detect is the coordinates of the center point of the target object area in the horizontal direction of the image frame, center_y_detect is the center point of the target object area Coordinates in the vertical direction of the image frame, λ is the second weight.
另一方面,所述优化的目标区域的大小则可以通过如下算式计算获得:On the other hand, the size of the optimized target area can be calculated by the following formula:
width_opt=λ*width_track+(1-λ)*width_detect;width_opt=λ*width_track+(1-λ)*width_detect;
height_opt=λ*height_track+(1-λ)*height_detect;height_opt=λ*height_track+(1-λ)*height_detect;
其中,width_opt为优化的目标区域的宽度,width_track为目标区域的宽度,width_detect为目标物体区域的宽度,height_opt为优化的目标区域的高度,height_track为目标区域的高度,height_detect为目标物体区域的高度,λ为第二权重。Among them, width_opt is the width of the optimized target area, width_track is the width of the target area, width_detect is the width of the target object area, height_opt is the height of the optimized target area, height_track is the height of the target area, height_detect is the height of the target object area, λ is the second weight.
虽然,图1所示的应用场景中以应用在无人机搭载的图像采集设备为例。 但是,本领域技术人员可以理解的是,该目标跟踪方法还可以在其它类型的场景和设备中使用,以提高目标跟踪算法的性能。本发明实施例公开的目标跟踪方法并不限于在图1所示的无人机上应用。Although, in the application scenario shown in Figure 1, the image acquisition device applied to the UAV is taken as an example. However, those skilled in the art can understand that the target tracking method can also be used in other types of scenarios and devices to improve the performance of the target tracking algorithm. The target tracking method disclosed in the embodiment of the present invention is not limited to be applied to the UAV shown in FIG. 1 .
在一些实施例中,如图2所示,该目标跟踪装置200还可以包括标记模块250。该标记模块250用于在初始图像帧中,通过所述深度学习算法,识别获得若干个可选的物体区域,并确定用户选中的物体区域作为跟踪目标。其中,每一个所述物体区域标记有对应的物体标签。In some embodiments, as shown in FIG. 2 , the target tracking device 200 may further include a marking module 250 . The marking module 250 is used to identify and obtain several selectable object regions through the deep learning algorithm in the initial image frame, and determine the object region selected by the user as the tracking target. Wherein, each of the object regions is marked with a corresponding object label.
其中,物体标签”(label)是深度学习算法输出的,用于标记物体区域所对应的物体。物体标签的具体形式取决于使用的深度学习算法及其训练数据。Among them, the "object label" (label) is the output of the deep learning algorithm, which is used to mark the object corresponding to the object area. The specific form of the object label depends on the deep learning algorithm used and its training data.
基于深度学习算法的特点,在图像帧中可能识别出多个具有相同物体标签的物体区域。因此,可以据此作为候选的物体提供给用户选择想要进行跟踪的目标物体。Based on the characteristics of deep learning algorithms, multiple object regions with the same object label may be identified in an image frame. Therefore, the target object to be tracked can be selected by the user as a candidate object.
图3为本发明实施例提供的目标跟踪方法的方法流程图。如图3所示,该目标跟踪方法可以包括如下步骤:FIG. 3 is a method flowchart of a target tracking method provided by an embodiment of the present invention. As shown in Figure 3, the target tracking method may include the following steps:
310、根据跟踪目标的表象特征,通过预设的跟踪算法,在图像帧中确定所述跟踪目标所在的目标区域。310. Determine, in the image frame, a target area where the tracking target is located by using a preset tracking algorithm according to the apparent feature of the tracking target.
其中,“图像帧”是指视频序列中,正在被处理的某一帧图像。跟踪算法是以连续的图像帧组成的视频序列为处理对象,逐帧的预测追踪目标在图像中的位置。The "image frame" refers to a certain frame of image being processed in the video sequence. The tracking algorithm takes the video sequence composed of continuous image frames as the processing object, and predicts and tracks the position of the target in the image frame by frame.
跟踪算法可以采用现有技术中任何类型的,以表象特征作为追踪目标的鉴别性特征的快速跟踪算法。具体的,该表象特征包括但不限于梯度方向直方图(HOG)、局部二值模式(LBP)和颜色特征等。The tracking algorithm can use any type of fast tracking algorithm in the prior art, which takes the appearance feature as the discriminating feature of the tracking target. Specifically, the appearance feature includes but is not limited to histogram of gradient orientation (HOG), local binary pattern (LBP), color feature, and the like.
不同的表象特征具有其自身独有的优势和缺陷,例如,梯度方向直方图容易受到目标发生非刚体运动(如作为追踪目标的人由站立变为蹲下的运动)和遮挡等情况的干扰。而颜色特征容易受到光照环境变化的影响。Different representational features have their own unique advantages and disadvantages. For example, the gradient direction histogram is easily disturbed by the non-rigid body motion of the target (such as the movement of the person who is the tracking target from standing to squatting) and occlusion. The color features are easily affected by changes in the lighting environment.
目标区域是指采用“跟踪算法”计算后输出的,在图像帧中具有特定大小的,将跟踪目标包含在内的矩形框。其具体可以通过任何类型的跟踪算法计算获得。The target area refers to a rectangular frame with a specific size in the image frame that is calculated and output by the "tracking algorithm" and contains the tracking target. Specifically, it can be calculated and obtained by any type of tracking algorithm.
320、在所述图像帧中,通过预设的深度学习算法,获得若干个物体区域。320. In the image frame, obtain several object regions through a preset deep learning algorithm.
其中,“深度学习算法”可以是任何类型的,利用样本数据实现神经网络模型的图像处理方式。通过深度学习算法,可以具有较高置信度的获得在图像帧中存在的多个物体。Among them, the "deep learning algorithm" can be of any type, using sample data to realize the image processing method of the neural network model. Through deep learning algorithms, multiple objects present in an image frame can be obtained with high confidence.
与目标区域的输出形式相类似的,深度学习算法输出的结果也是包含有可识别物体的矩形框。当然,深度学习算法还会输出每个物体区域对应的物体标签,用于标记物体区域所对应的目标物体(如人脸,飞机等)。Similar to the output form of the target area, the output of the deep learning algorithm is also a rectangular box containing recognizable objects. Of course, the deep learning algorithm also outputs the object label corresponding to each object area, which is used to mark the target object (such as a face, an airplane, etc.) corresponding to the object area.
330、在所述若干个物体区域中,选择与所述目标区域的物体属性相同的物体区域作为目标物体区域。330. Among the several object regions, select an object region with the same object attribute as the target region as the target object region.
其中,深度学习算法通常会在图像帧中识别出多于一个的物体区域。因此,需要采用合适的筛选方式,选择其中一个最为可靠的物体区域引入到跟踪结果中。Among them, deep learning algorithms often identify more than one object region in an image frame. Therefore, it is necessary to adopt an appropriate screening method to select one of the most reliable object regions to introduce into the tracking results.
具体使用的筛选方式,取决于实际使用的物体属性。技术人员可以根据实际情况的需要,选择使用一种或者多种的衡量标准作为实际使用的物体属性。The specific filtering method used depends on the actual object properties used. The technical personnel can choose to use one or more measurement standards as the actual object attributes according to the needs of the actual situation.
在一些实施例中,具体可以采用如图4所示的步骤,从多个物体区域中选择获得作为调整和优化参考的目标物体区域:In some embodiments, the steps shown in FIG. 4 may be specifically adopted to select and obtain a target object region from a plurality of object regions as a reference for adjustment and optimization:
331、选择与所述跟踪目标的物体标签相同的物体区域作为候选物体区域。331. Select an object region that is the same as the object label of the tracking target as a candidate object region.
其中,“物体标签”(label)是深度学习算法输出的,用于标记物体区域所对应的物体。物体标签的具体形式取决于使用的深度学习算法及其训练数据。Among them, the "object label" (label) is output by the deep learning algorithm and used to mark the object corresponding to the object area. The exact form of object labels depends on the deep learning algorithm used and its training data.
332、在候选的物体区域中,选取与上一帧图像帧的跟踪结果的重叠度最大的候选物体区域作为所述目标物体区域。332. From the candidate object regions, select the candidate object region with the largest overlap with the tracking result of the previous image frame as the target object region.
其中,由于深度学习算法的特点,在图像帧中可能识别出多个具有相同物体标签的物体区域。因此,在经过了物体标签的筛选以后,仍然有可能存在不止一个的候选物体区域。Among them, due to the characteristics of the deep learning algorithm, multiple object regions with the same object label may be identified in the image frame. Therefore, after the screening of object labels, there may still be more than one candidate object region.
此时,可以以候选区域与上一帧图像帧的跟踪结果的重叠度作为衡量标准,进一步的选择出可以作为调整参考和标准的目标物体区域。At this time, the degree of overlap between the candidate region and the tracking result of the previous image frame can be used as a criterion, and a target object region that can be used as an adjustment reference and standard can be further selected.
在本实施例中,通过物体标签和重叠度两个衡量标准用于判断物体属性,可以有把握的从深度学习算法的输出结果中找到真实的跟踪目标,并据此作 为调整和优化跟踪结果的参考依据。In this embodiment, the object label and the degree of overlap are used to judge the attributes of the object, so that the real tracking target can be found from the output result of the deep learning algorithm with certainty, and this can be used as the basis for adjusting and optimizing the tracking result. Reference.
340、根据所述目标物体区域,调整所述目标区域的位置和大小,生成优化的目标区域。340. According to the target object area, adjust the position and size of the target area to generate an optimized target area.
其中,“调整”是指采用任何合适类型的函数映射方式,将目标物体区域和目标区域进行整合,通过调整位置和大小的方式,生成并输出一个优化结果,即优化的目标区域。调整的形式包括改变和优化表示目标区域的矩形框在图像帧中的位置和大小。Among them, "adjustment" refers to using any suitable type of function mapping method to integrate the target object area and the target area, and by adjusting the position and size, generate and output an optimization result, that is, the optimized target area. Forms of adjustment include changing and optimizing the position and size of the rectangular box representing the target area in the image frame.
在实际操作中,将一系列连续的图像帧序列中各个图像帧的优化的目标区域联系起来即可实现对跟踪目标的跟踪,确定跟踪目标在图像帧序列中的位置、尺寸等的变动情况。In practice, the tracking target can be tracked by linking the optimized target areas of each image frame in a series of continuous image frame sequences, and the changes of the position and size of the tracking target in the image frame sequence can be determined.
在本实施例中,在原有的跟踪算法的基础上,通过合理的筛选方式,结合了深度学习的检测结果进行优化和调整,可以更好的抵抗复杂环境对于跟踪结果的干扰,有效的提升了目标跟踪的整体性能。In this embodiment, on the basis of the original tracking algorithm, through a reasonable screening method, combined with the detection results of deep learning to optimize and adjust, it can better resist the interference of complex environments on the tracking results, and effectively improve the The overall performance of object tracking.
以下结合图5所示的目标跟踪方法在视频序列中的实际应用示意图,详细描述目标跟踪的具体过程。The specific process of the target tracking will be described in detail below with reference to the schematic diagram of the actual application of the target tracking method in the video sequence shown in FIG. 5 .
如图5所示,假设视频序列由n帧连续的图像帧所组成(本发明实施例提供的目标跟踪方法依次在第2帧至第n帧图像帧中执行,预测跟踪目标在各个图像帧中的位置。)跟踪结果、物体区域以及目标区域都可以采用包含目标物体的最小外接矩形框来表示。As shown in FIG. 5 , it is assumed that the video sequence is composed of n consecutive image frames (the target tracking method provided by the embodiment of the present invention is sequentially performed in the 2nd to the nth image frame, and the predicted tracking target is in each image frame. position.) The tracking result, the object area and the target area can all be represented by the smallest enclosing rectangle containing the target object.
首先,在初始图像帧中,根据用户指令,确定所述跟踪目标所在的目标区域。First, in the initial image frame, the target area where the tracking target is located is determined according to the user's instruction.
其中,初始图像帧是指进行目标跟踪的起点。为陈述简便,在本实施例中,初始图像帧为视频序列中的第1帧图像帧。当然,根据用户的实际需求,也可以在视频序列中随意选择任意一帧图像帧作为初始图像帧,用以作为目标跟踪的起点。The initial image frame refers to the starting point of target tracking. For simplicity of presentation, in this embodiment, the initial image frame is the first image frame in the video sequence. Of course, according to the actual needs of the user, any image frame in the video sequence can also be randomly selected as the initial image frame, which is used as the starting point of target tracking.
如图5所示,在实际使用过程中,为了便于与用户交互和后续处理,可以使用深度学习算法在初始图像帧中进行识别检测,获得若干个可选的物体区域(每一个所述物体区域标记有对应的物体标签,图5中以L1至L4中表示)作为可选的追踪目标。用户可以根据自己的需要,发出相应的用户选择指令,在这些可选的物体区域中选择其中的一个作为追踪目标(如L4)。As shown in Figure 5, in the actual use process, in order to facilitate interaction with the user and subsequent processing, a deep learning algorithm can be used to identify and detect in the initial image frame, and several optional object regions (each of the object regions) can be obtained. are marked with corresponding object labels, denoted by L1 to L4 in Figure 5) as optional tracking targets. The user can issue a corresponding user selection instruction according to his own needs, and select one of these optional object areas as the tracking target (eg L4).
然后,在后续的图像帧中,在通过深度学习算法检测获得多个的物体区域中,选出与跟踪目标的物体属性相同的目标物体区域。Then, in the subsequent image frames, from the multiple object regions detected by the deep learning algorithm, a target object region with the same object attribute as the tracking target is selected.
具体的,对于检测获得的物体区域,首先选择与初始图像帧的追踪目标的物体标签相同的物体区域作为候选物体区域。然后,使用与上一帧图像帧的目标区域D的重叠度最高的物体区域作为目标物体区域。Specifically, for the object region obtained by detection, first select the object region that is the same as the object label of the tracking target of the initial image frame as the candidate object region. Then, the object area with the highest degree of overlap with the target area D of the previous image frame is used as the target object area.
其中,两个矩形框之间的重叠度可以通过物体区域与目标区域之间的交并比来表示。如图5所示,“交并比”(IoU)是指两个感兴趣区域的交集与并集之间的比值。其计算过程具体为:计算所述候选物体区域与所述目标区域的交集面积和并集面积,并将所述交集面积和所述并集面积的比值作为所述候选物体区域与所述目标物体区域的重叠度。Among them, the degree of overlap between the two rectangular boxes can be represented by the intersection ratio between the object area and the target area. As shown in Figure 5, the "intersection over union ratio" (IoU) refers to the ratio between the intersection and union of two regions of interest. The calculation process is specifically: calculating the intersection area and union area of the candidate object area and the target area, and using the ratio of the intersection area and the union area as the candidate object area and the target object. The overlap of the regions.
最后,将选择出的目标物体区域和跟踪算法检测获得的目标区域进行结合,即可获得在图像帧中的优化的目标区域。Finally, by combining the selected target object area with the target area detected by the tracking algorithm, the optimized target area in the image frame can be obtained.
只要依次在后续的图像帧中重复执行该上述步骤,即可获得在一系列连续的图像帧组成的视频序列中的目标追踪结果。总结而言,完整的目标追踪方法可以通过如下算式(1)表示:As long as the above steps are repeatedly performed in subsequent image frames, the target tracking result in a video sequence composed of a series of consecutive image frames can be obtained. To sum up, the complete target tracking method can be expressed by the following formula (1):
Figure PCTCN2021108893-appb-000001
Figure PCTCN2021108893-appb-000001
其中,Bbox表示追踪目标,下标track和detect分别表示跟踪算法获得的目标区域和深度学习算法获得的物体区域。物体区域可能包含多个,分别使用
Figure PCTCN2021108893-appb-000002
表示(j为物体区域的序号)。
Among them, Bbox represents the tracking target, and the subscripts track and detect represent the target area obtained by the tracking algorithm and the object area obtained by the deep learning algorithm, respectively. The object area may contain multiple, use them separately
Figure PCTCN2021108893-appb-000002
Representation (j is the serial number of the object area).
在i=0时,表示在初始图像帧中选择的跟踪目标;i>0时,表示在后续图像帧的跟踪结果。其中,
Figure PCTCN2021108893-appb-000003
表示在第i帧图像帧的目标区域,
Figure PCTCN2021108893-appb-000004
表示第i帧图像帧的优化跟踪结果,
Figure PCTCN2021108893-appb-000005
表示目标物体区域(即与上一帧的目标区域的物体标签相同并且重叠度最大的物体区域)。
When i=0, it represents the tracking target selected in the initial image frame; when i>0, it represents the tracking result in the subsequent image frame. in,
Figure PCTCN2021108893-appb-000003
Represents the target area in the i-th image frame,
Figure PCTCN2021108893-appb-000004
represents the optimized tracking result of the i-th image frame,
Figure PCTCN2021108893-appb-000005
Represents the target object area (that is, the object area with the same object label as the target area of the previous frame and the largest overlap).
而对于追踪算法和深度学习算法的具体整合方法为:首先,设置所述目标物体区域的第一权重和所述目标区域的第二权重。然后,根据所述第一权重和所述第二权重,对所述目标物体区域的中心点和所述目标区域的中心点进行加权求和,获得所述优化的目标区域的中心点并且根据所述第一权重和所述第二权重,对所述目标物体区域与所述目标区域的大小进行加权求和,获得所述优化的目标区域的大小。The specific integration method for the tracking algorithm and the deep learning algorithm is as follows: first, the first weight of the target object area and the second weight of the target area are set. Then, according to the first weight and the second weight, the center point of the target object area and the center point of the target area are weighted and summed to obtain the optimized center point of the target area and according to the The first weight and the second weight are weighted and summed to the size of the target object area and the target area to obtain the size of the optimized target area.
其中,所述目标物体区域的位置由所述目标物体区域的中心点表示,所述目标区域的位置由所述目标区域的中心点表示,所述优化的目标区域的位置由所述优化的目标区域的中心点表示,目标物体区域和目标区域的第一权重和第二权重可以由技术人员根据实际情况预先设置,是一个可以通过实验方式或者经验确定的常数值。在本实施例中,可以将目标区域的置信度作为跟踪算法所占的权重(第二权重)。相应地,深度学习算法所占的权重(即第一权重)为1减去目标区域的置信度。The position of the target object area is represented by the center point of the target object area, the position of the target area is represented by the center point of the target area, and the position of the optimized target area is represented by the optimized target area. The center point of the area indicates that the first weight and the second weight of the target object area and the target area can be preset by technicians according to the actual situation, and are constant values that can be determined experimentally or empirically. In this embodiment, the confidence of the target area may be used as the weight (second weight) occupied by the tracking algorithm. Correspondingly, the weight occupied by the deep learning algorithm (ie, the first weight) is 1 minus the confidence of the target area.
具体而言,所述优化的目标区域的位置通过如下算式(2)和(3)计算获得:Specifically, the position of the optimized target area is obtained by calculating the following formulas (2) and (3):
center_x_opt=λ*center_x_track+(1-λ)*center_x_detect   (2);center_x_opt=λ*center_x_track+(1-λ)*center_x_detect (2);
center_y_opt=λ*center_y_track+(1-λ)*center_y_detect   (3);center_y_opt=λ*center_y_track+(1-λ)*center_y_detect (3);
其中,λ为置信度。而且,由于在本实施例中,优化的目标区域、物体区域以及目标区域都是采用矩形框来表示。因此,优化的目标区域、物体区域以及目标区域的位置可以由矩形框的中心的在图像帧中的位置坐标来表示。亦即,优化的目标区域的位置可以表示为(center_x_opt,center_y_opt),物体区域的位置可以表示为(center_x_detect,center_y_detect),目标区域的位置可以表示为(center_x_track,center_y_track)。where λ is the confidence level. Moreover, in this embodiment, the optimized target area, the object area, and the target area are all represented by a rectangular frame. Therefore, the optimized target area, the object area, and the position of the target area can be represented by the position coordinates of the center of the rectangular frame in the image frame. That is, the position of the optimized target area can be expressed as (center_x_opt, center_y_opt), the position of the object area can be expressed as (center_x_detect, center_y_detect), and the position of the target area can be expressed as (center_x_track, center_y_track).
所述优化的目标区域的大小则可以通过如下算式(4)和(5)计算获得:The size of the optimized target area can be calculated by the following formulas (4) and (5):
width_opt=λ*width_track+(1-λ)*width_detect   (4);width_opt=λ*width_track+(1-λ)*width_detect (4);
height_opt=λ*height_track+(1-λ)*height_detect   (5);height_opt=λ*height_track+(1-λ)*height_detect (5);
其中,width_opt为优化的目标区域的宽度,width_track为目标区域的宽度,width_detect为目标物体区域的宽度,height_opt为优化的目标区域的高度,height_track为目标区域的高度,height_detect为目标物体区域的高度,λ为置信度。Among them, width_opt is the width of the optimized target area, width_track is the width of the target area, width_detect is the width of the target object area, height_opt is the height of the optimized target area, height_track is the height of the target area, height_detect is the height of the target object area, λ is the confidence level.
如上所描述的,本发明实施例提供的目标追踪方法在原有的跟踪算法的基础上,结合深度学习的检测结果进行优化和调整,可以更好的适应和抵抗复杂环境下的干扰,有效的提升了目标跟踪的整体性能。As described above, on the basis of the original tracking algorithm, the target tracking method provided by the embodiment of the present invention is optimized and adjusted in combination with the detection results of deep learning, which can better adapt to and resist interference in complex environments, and effectively improve the overall performance of target tracking.
本发明实施例还提供了一种非易失性计算机存储介质,所述计算机存储介质存储有至少一可执行指令,该计算机可执行指令可执行上述任意方法实 施例中的目标跟踪方法。An embodiment of the present invention further provides a non-volatile computer storage medium, where the computer storage medium stores at least one executable instruction, and the computer-executable instruction can execute the target tracking method in any of the foregoing method embodiments.
图6示出了本发明实施例的图像处理芯片的结构示意图,本发明具体实施例并不对图像处理芯片的具体实现做限定。FIG. 6 shows a schematic structural diagram of an image processing chip according to an embodiment of the present invention. The specific embodiment of the present invention does not limit the specific implementation of the image processing chip.
如图6所示,该图像处理芯片可以包括:处理器(processor)602、通信接口(Communications Interface)604、存储器(memory)606、以及通信总线608。As shown in FIG. 6 , the image processing chip may include: a processor (processor) 602 , a communication interface (Communications Interface) 604 , a memory (memory) 606 , and a communication bus 608 .
其中:处理器602、通信接口604、以及存储器606通过通信总线608完成相互间的通信。通信接口604,用于与其它设备比如客户端或其它服务器等的网元通信。处理器602,用于执行程序610,具体可以执行上述目标追踪方法实施例中的相关步骤。The processor 602 , the communication interface 604 , and the memory 606 communicate with each other through the communication bus 608 . The communication interface 604 is used to communicate with network elements of other devices such as clients or other servers. The processor 602 is configured to execute the program 610, and specifically may execute the relevant steps in the above-mentioned embodiments of the target tracking method.
具体地,程序610可以包括程序代码,该程序代码包括计算机操作指令。Specifically, the program 610 may include program code including computer operation instructions.
处理器602可能是中央处理器CPU,或者是特定集成电路ASIC(Application Specific Integrated Circuit),或者是被配置成实施本发明实施例的一个或多个集成电路。网络切片设备包括的一个或多个处理器,可以是同一类型的处理器,如一个或多个CPU;也可以是不同类型的处理器,如一个或多个CPU以及一个或多个ASIC。The processor 602 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present invention. The one or more processors included in the network slicing device may be the same type of processors, such as one or more CPUs; or may be different types of processors, such as one or more CPUs and one or more ASICs.
存储器606,用于存放程序610。存储器606可能包含高速RAM存储器,也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。The memory 606 is used to store the program 610 . Memory 606 may include high-speed RAM memory, and may also include non-volatile memory, such as at least one disk memory.
程序610可以用于使得处理器602执行如下步骤:Program 610 may be used to cause processor 602 to perform the following steps:
首先,根据跟踪目标的表象特征,通过预设的跟踪算法,在图像帧中确定所述跟踪目标所在的目标区域。其次,在所述图像帧中,通过预设的深度学习算法,获得若干个物体区域。再次,在所述若干个物体区域中,选择与所述目标区域的物体属性相同的物体区域作为目标物体区域。最后,根据所述目标物体区域,通过调整所述目标区域的位置和大小的方式,生成优化的目标区域。First, according to the appearance characteristics of the tracking target, through a preset tracking algorithm, the target area where the tracking target is located is determined in the image frame. Secondly, in the image frame, several object regions are obtained through a preset deep learning algorithm. Thirdly, among the several object regions, an object region with the same object attribute as the target region is selected as the target object region. Finally, according to the target object area, an optimized target area is generated by adjusting the position and size of the target area.
在一些实施例中,程序610还用于使得处理器602在执行通过预设的跟踪算法,在图像帧中确定所述跟踪目标所在的目标区域的步骤之前,执行如下步骤:首先,在初始图像帧中,通过所述深度学习算法,识别获得若干个可选的物体区域,每一个所述物体区域标记有对应的物体标签。然后,确定用户选中的物体区域作为跟踪目标。In some embodiments, the program 610 is further configured to cause the processor 602 to perform the following steps before executing the step of determining the target area where the tracking target is located in the image frame by using a preset tracking algorithm: first, in the initial image In the frame, several selectable object regions are identified and obtained through the deep learning algorithm, and each of the object regions is marked with a corresponding object label. Then, the object area selected by the user is determined as the tracking target.
在一些实施例中,程序610可以用于使得处理器602在执行所述在所述若干个物体区域中,选择与所述目标区域的物体属性相同的物体区域作为目标物体区域的步骤时,具体用于:In some embodiments, the program 610 may be used to cause the processor 602 to perform the step of selecting an object region with the same object attribute as the target region among the several object regions as the target object region, specifically Used for:
选择与所述跟踪目标的物体标签相同的物体区域作为候选物体区域;在候选的物体区域中,选取与上一帧图像帧的跟踪结果的重叠度最大的候选物体区域作为所述目标物体区域。Select the object area with the same object label as the tracking target as the candidate object area; in the candidate object area, select the candidate object area with the largest overlap with the tracking result of the previous image frame as the target object area.
在一些实施例中,程序610可以用于使得处理器602在执行所述选取与所述目标区域重叠度最大的候选物体区域作为所述目标物体区域的步骤时,具体用于:计算所述候选物体区域与所述跟踪结果的交集面积和并集面积;将所述交集面积和所述并集面积的比值,作为所述候选物体区域与所述跟踪结果的重叠度。In some embodiments, the program 610 may be configured to cause the processor 602 to, when performing the step of selecting the candidate object region with the largest overlap with the target region as the target object region, be specifically configured to: calculate the candidate object region The intersection area and the union area of the object area and the tracking result; the ratio of the intersection area and the union area is taken as the degree of overlap between the candidate object area and the tracking result.
在一些实施例中,程序610可以用于使得处理器602在执行所述根据所述目标物体区域,通过调整所述目标区域的位置和大小的方式,生成优化的目标区域的步骤时,具体用于:In some embodiments, the program 610 may be used to cause the processor 602 to perform the step of generating an optimized target area by adjusting the position and size of the target area according to the target object area, specifically using At:
设置所述目标物体区域的第一权重和所述目标区域的第二权重;根据所述第一权重和所述第二权重,对所述目标物体区域的中心点和所述目标区域的中心点进行加权求和,获得所述优化的目标区域的中心点;所述目标物体区域的位置由所述目标物体区域的中心点表示,所述目标区域的位置由所述目标区域的中心点表示,所述优化的目标区域的位置由所述优化的目标区域的中心点表示,并且根据所述第一权重和所述第二权重,对所述目标物体区域与所述目标区域的大小进行加权求和,获得所述优化的目标区域的大小。Set the first weight of the target object area and the second weight of the target area; according to the first weight and the second weight, the center point of the target object area and the center point of the target area are Perform weighted summation to obtain the center point of the optimized target area; the position of the target object area is represented by the center point of the target object area, and the position of the target area is represented by the center point of the target area, The position of the optimized target area is represented by the center point of the optimized target area, and according to the first weight and the second weight, a weighted calculation is performed on the size of the target object area and the target area. and, to obtain the size of the optimized target area.
具体而言,所述处理器602可以通过如下算式计算获得所述优化的目标区域的中心点:Specifically, the processor 602 can obtain the center point of the optimized target area by calculating the following formula:
center_x_opt=λ*center_x_track+(1-λ)*center_x_detect;center_x_opt=λ*center_x_track+(1-λ)*center_x_detect;
center_y_opt=λ*center_y_track+(1-λ)*center_y_detect;center_y_opt=λ*center_y_track+(1-λ)*center_y_detect;
其中,center_x_opt为优化的目标区域的中心点在图像帧的水平方向上的坐标,center_y_opt为优化的目标区域的中心点在图像帧的垂直方向上的坐标;center_x_track为目标区域的中心点在图像帧的水平方向上的坐标,center_y_track为目标区域的中心点在图像帧的垂直方向上的坐标, center_x_detect为目标物体区域的中心点在图像帧的水平方向上的坐标,center_y_detect为目标物体区域的中心点在图像帧的垂直方向上的坐标,λ为第二权重。Among them, center_x_opt is the coordinate of the center point of the optimized target area in the horizontal direction of the image frame, center_y_opt is the coordinate of the center point of the optimized target area in the vertical direction of the image frame; center_x_track is the center point of the target area in the image frame The coordinates in the horizontal direction, center_y_track is the coordinates of the center point of the target area in the vertical direction of the image frame, center_x_detect is the coordinates of the center point of the target object area in the horizontal direction of the image frame, center_y_detect is the center point of the target object area Coordinates in the vertical direction of the image frame, λ is the second weight.
所述处理器602还可以通过如下算式计算获得所述优化的目标区域的大小:The processor 602 can also obtain the size of the optimized target area by calculating the following formula:
width_opt=λ*width_track+(1-λ)*width_detect;width_opt=λ*width_track+(1-λ)*width_detect;
height_opt=λ*height_track+(1-λ)*height_detect;height_opt=λ*height_track+(1-λ)*height_detect;
其中,width_opt为优化的目标区域的宽度,width_track为目标区域的宽度,width_detect为目标物体区域的宽度,height_opt为优化的目标区域的高度,height_track为目标区域的高度,height_detect为目标物体区域的高度,λ为第二权重。Among them, width_opt is the width of the optimized target area, width_track is the width of the target area, width_detect is the width of the target object area, height_opt is the height of the optimized target area, height_track is the height of the target area, height_detect is the height of the target object area, λ is the second weight.
本领域技术人员应该还可以进一步意识到,结合本文中所公开的实施例描述的示例性的目标追踪方法的各个步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。Those skilled in the art should further realize that each step of the exemplary target tracking method described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, computer software or a combination of the two, for the purpose of clear illustration Interchangeability of hardware and software, the above description has generally described the components and steps of each example in terms of functions. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution.
本领域技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。所述的计算机软件可存储于计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体或随机存储记忆体等。Skilled artisans may use different methods of implementing the described functionality for each particular application, but such implementations should not be considered beyond the scope of the present invention. The computer software can be stored in a computer-readable storage medium, and when the program is executed, it can include the processes of the above-mentioned method embodiments. Wherein, the storage medium can be a magnetic disk, an optical disk, a read-only storage memory, or a random storage memory, and the like.
基于本发明实施例提供的图像处理芯片,本发明实施例还提供了一种无人机。其中,该无人机包括:无人机主体、安装在所述无人机主体的云台上的图像采集设备以及图像采集芯片。Based on the image processing chip provided by the embodiment of the present invention, the embodiment of the present invention further provides an unmanned aerial vehicle. Wherein, the unmanned aerial vehicle comprises: an unmanned aerial vehicle main body, an image acquisition device and an image acquisition chip installed on the gimbal of the unmanned aerial vehicle main body.
所述图像采集设备用于连续采集多帧图像;所述图像处理芯片用于接收所述图像采集设备连续采集的多帧图像,并对接收到的所述多帧图像执行如下所述的步骤:The image acquisition device is used to continuously collect multiple frames of images; the image processing chip is used to receive the multiple frames of images continuously collected by the image acquisition device, and to perform the following steps on the received multiple frames of images:
首先,根据跟踪目标的表象特征,通过预设的跟踪算法,在图像帧中确定所述跟踪目标所在的目标区域。其次,在所述图像帧中,通过预设的深度学习算法,获得若干个物体区域。再次,在所述若干个物体区域中,选择与 所述目标区域的物体属性相同的物体区域作为目标物体区域。最后,根据所述目标物体区域,通过调整所述目标区域的位置和大小的方式,生成优化的目标区域。First, according to the appearance characteristics of the tracking target, through a preset tracking algorithm, the target area where the tracking target is located is determined in the image frame. Secondly, in the image frame, several object regions are obtained through a preset deep learning algorithm. Thirdly, among the several object regions, an object region with the same object attribute as the target region is selected as the target object region. Finally, according to the target object area, an optimized target area is generated by adjusting the position and size of the target area.
无人机的基于图像采集芯片在多个连续图像帧中确定的优化的目标区域,实现无人机对跟踪目标的跟踪。Based on the optimized target area determined by the image acquisition chip of the UAV in multiple consecutive image frames, the UAV can track the tracking target.
在一些实施例中,所述图像处理芯片在图像帧中确定所述跟踪目标所在的目标区域的步骤之前,还执行如下步骤:In some embodiments, before the step of determining the target area where the tracking target is located in the image frame, the image processing chip further performs the following steps:
首先,在初始图像帧中,通过所述深度学习算法,识别获得若干个可选的物体区域,每一个所述物体区域标记有对应的物体标签。然后,确定用户选中的物体区域作为跟踪目标。First, in the initial image frame, through the deep learning algorithm, several selectable object regions are identified and obtained, and each of the object regions is marked with a corresponding object label. Then, the object area selected by the user is determined as the tracking target.
在一些实施例中,所述图像处理芯片在执行所述在所述若干个物体区域中,选择与所述目标区域的物体属性相同的物体区域作为目标物体区域的步骤时,具体用于:In some embodiments, when the image processing chip performs the step of selecting an object region with the same object attribute as the target region from among the several object regions as the target object region, the image processing chip is specifically configured to:
选择与所述跟踪目标的物体标签相同的物体区域作为候选物体区域;在候选的物体区域中,选取与上一帧图像帧的跟踪结果的重叠度最大的候选物体区域作为所述目标物体区域。Select the object area with the same object label as the tracking target as the candidate object area; in the candidate object area, select the candidate object area with the largest overlap with the tracking result of the previous image frame as the target object area.
在一些实施例中,所述图像处理芯片在执行所述选取与所述目标区域重叠度最大的候选物体区域作为所述目标物体区域的步骤时,具体用于:计算所述候选物体区域与所述跟踪结果的交集面积和并集面积;将所述交集面积和所述并集面积的比值,作为所述候选物体区域与所述跟踪结果的重叠度。In some embodiments, when the image processing chip performs the step of selecting the candidate object region with the largest overlap with the target region as the target object region, the image processing chip is specifically configured to: calculate the difference between the candidate object region and the target object region. The intersection area and the union area of the tracking results; the ratio of the intersection area and the union area is used as the overlap between the candidate object area and the tracking result.
在一些实施例中,所述图像处理芯片在执行所述根据所述目标物体区域,通过调整所述目标区域的位置和大小的方式,生成优化的目标区域的步骤时,具体用于:In some embodiments, when the image processing chip performs the step of generating an optimized target area by adjusting the position and size of the target area according to the target object area, the image processing chip is specifically configured to:
设置所述目标物体区域的第一权重和所述目标区域的第二权重;根据所述第一权重和所述第二权重,对所述目标物体区域的中心点和所述目标区域的中心点进行加权求和,获得所述优化的目标区域的中心点;所述目标物体区域的位置由所述目标物体区域的中心点表示,所述目标区域的位置由所述目标区域的中心点表示,所述优化的目标区域的位置由所述优化的目标区域的中心点表示,并且根据所述第一权重和所述第二权重,对所述目标物体区 域与所述目标区域的大小进行加权求和,获得所述优化的目标区域的大小。Set the first weight of the target object area and the second weight of the target area; according to the first weight and the second weight, the center point of the target object area and the center point of the target area are Perform weighted summation to obtain the center point of the optimized target area; the position of the target object area is represented by the center point of the target object area, and the position of the target area is represented by the center point of the target area, The position of the optimized target area is represented by the center point of the optimized target area, and according to the first weight and the second weight, a weighted calculation is performed on the size of the target object area and the target area. and, to obtain the size of the optimized target area.
具体而言,所述图像处理芯片可以通过如下算式计算获得所述优化的目标区域的中心点:Specifically, the image processing chip can obtain the center point of the optimized target area by calculating the following formula:
center_x_opt=λ*center_x_track+(1-λ)*center_x_detect;center_x_opt=λ*center_x_track+(1-λ)*center_x_detect;
center_y_opt=λ*center_y_track+(1-λ)*center_y_detect;center_y_opt=λ*center_y_track+(1-λ)*center_y_detect;
其中,center_x_opt为优化的目标区域的中心点在图像帧的水平方向上的坐标,center_y_opt为优化的目标区域的中心点在图像帧的垂直方向上的坐标;center_x_track为目标区域的中心点在图像帧的水平方向上的坐标,center_y_track为目标区域的中心点在图像帧的垂直方向上的坐标,center_x_detect为目标物体区域的中心点在图像帧的水平方向上的坐标,center_y_detect为目标物体区域的中心点在图像帧的垂直方向上的坐标,λ为第二权重。Among them, center_x_opt is the coordinate of the center point of the optimized target area in the horizontal direction of the image frame, center_y_opt is the coordinate of the center point of the optimized target area in the vertical direction of the image frame; center_x_track is the center point of the target area in the image frame The coordinates in the horizontal direction, center_y_track is the coordinates of the center point of the target area in the vertical direction of the image frame, center_x_detect is the coordinates of the center point of the target object area in the horizontal direction of the image frame, center_y_detect is the center point of the target object area Coordinates in the vertical direction of the image frame, λ is the second weight.
所述图像处理芯片还可以通过如下算式计算获得所述优化的目标区域的大小:The image processing chip can also obtain the size of the optimized target area by calculating the following formula:
width_opt=λ*width_track+(1-λ)*width_detect;width_opt=λ*width_track+(1-λ)*width_detect;
height_opt=λ*height_track+(1-λ)*height_detect;height_opt=λ*height_track+(1-λ)*height_detect;
其中,width_opt为优化的目标区域的宽度,width_track为目标区域的宽度,width_detect为目标物体区域的宽度,height_opt为优化的目标区域的高度,height_track为目标区域的高度,height_detect为目标物体区域的高度,λ为第二权重。Among them, width_opt is the width of the optimized target area, width_track is the width of the target area, width_detect is the width of the target object area, height_opt is the height of the optimized target area, height_track is the height of the target area, height_detect is the height of the target object area, λ is the second weight.
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;在本发明的思路下,以上实施例或者不同实施例中的技术特征之间也可以进行组合,步骤可以以任意顺序实现,并存在如上所述的本发明的不同方面的许多其它变化,为了简明,它们没有在细节中提供;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; under the idea of the present invention, the technical features in the above embodiments or different embodiments can also be combined, The steps may be carried out in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity; although the invention has been The skilled person should understand that it is still possible to modify the technical solutions recorded in the foregoing embodiments, or to perform equivalent replacements on some of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the implementation of the present invention. scope of technical solutions.

Claims (11)

  1. 一种目标跟踪方法,其特征在于,包括:A target tracking method, comprising:
    根据跟踪目标的表象特征,通过预设的跟踪算法,在图像帧中确定所述跟踪目标所在的目标区域;According to the appearance characteristics of the tracking target, the target area where the tracking target is located is determined in the image frame through a preset tracking algorithm;
    在所述图像帧中,通过预设的深度学习算法,获得若干个物体区域;In the image frame, several object regions are obtained through a preset deep learning algorithm;
    在所述若干个物体区域中,选择与所述目标区域的物体属性相同的物体区域作为目标物体区域;In the several object regions, select the object region with the same object attribute as the target region as the target object region;
    根据所述目标物体区域,通过调整所述目标区域的位置和大小的方式,生成优化的目标区域。According to the target object area, an optimized target area is generated by adjusting the position and size of the target area.
  2. 根据权利要求1所述的方法,其特征在于,在根据跟踪目标的表象特征,通过预设的跟踪算法,在图像帧中确定所述跟踪目标所在的目标区域的步骤之前,所述方法还包括:The method according to claim 1, characterized in that, before the step of determining the target area where the tracking target is located in the image frame by using a preset tracking algorithm according to the appearance characteristics of the tracking target, the method further comprises: :
    在初始图像帧中,通过所述深度学习算法,识别获得若干个可选的物体区域,每一个所述物体区域标记有对应的物体标签;In the initial image frame, through the deep learning algorithm, several selectable object regions are identified and obtained, and each of the object regions is marked with a corresponding object label;
    确定用户选中的物体区域作为跟踪目标。Determine the object area selected by the user as the tracking target.
  3. 根据权利要求2所述的方法,其特征在于,所述在所述若干个物体区域中,选择与所述目标区域的物体属性相同的物体区域作为目标物体区域,具体包括:The method according to claim 2, wherein, in the several object regions, selecting an object region with the same object attribute as the target region as the target object region, specifically comprising:
    筛选选择与所述跟踪目标的物体标签相同的物体区域作为候选物体区域;Screening and selecting the same object region as the object label of the tracking target as the candidate object region;
    在候选的物体区域中,选取与上一帧图像帧的跟踪结果的重叠度最大的候选物体区域作为所述目标物体区域。Among the candidate object regions, the candidate object region with the largest overlap with the tracking result of the previous image frame is selected as the target object region.
  4. 根据权利要求3所述的方法,其特征在于,所述选取与所述目标区域重叠度最大的候选物体区域作为所述目标物体区域,具体包括:The method according to claim 3, wherein the selecting the candidate object region with the largest overlap with the target region as the target object region specifically includes:
    计算所述候选物体区域与所述跟踪结果的交集面积和并集面积;Calculate the intersection area and union area of the candidate object area and the tracking result;
    将所述交集面积和所述并集面积的比值,作为所述候选物体区域与所述跟踪结果的重叠度。The ratio of the intersection area and the union area is taken as the degree of overlap between the candidate object area and the tracking result.
  5. 根据权利要求1所述的方法,其特征在于,所述表象特征包括:梯度方向直方图、局部二值模式和颜色特征。The method according to claim 1, wherein the appearance features include: gradient direction histogram, local binary pattern and color feature.
  6. 根据权利要求1-5任一项所述的方法,其特征在于,所述根据所述目 标物体区域,通过调整所述目标区域的位置和大小的方式,生成优化的目标区域,具体包括:The method according to any one of claims 1-5, wherein, according to the target object area, by adjusting the position and size of the target area, an optimized target area is generated, specifically including:
    设置所述目标物体区域的第一权重和所述目标区域的第二权重;setting the first weight of the target object area and the second weight of the target area;
    根据所述第一权重和所述第二权重,对所述目标物体区域的中心点和所述目标区域的中心点进行加权求和,获得所述优化的目标区域的中心点;所述目标物体区域的位置由所述目标物体区域的中心点表示,所述目标区域的位置由所述目标区域的中心点表示,所述优化的目标区域的位置由所述优化的目标区域的中心点表示,并且According to the first weight and the second weight, the center point of the target object area and the center point of the target area are weighted and summed to obtain the center point of the optimized target area; the target object The position of the area is represented by the center point of the target object area, the position of the target area is represented by the center point of the target area, the position of the optimized target area is represented by the center point of the optimized target area, and
    根据所述第一权重和所述第二权重,对所述目标物体区域与所述目标区域的大小进行加权求和,获得所述优化的目标区域的大小。According to the first weight and the second weight, weighted summation is performed on the size of the target object area and the target area to obtain the optimized size of the target area.
  7. 根据权利要求6所述的方法,其特征在于,所述优化的目标区域的中心点通过如下算式计算获得:The method according to claim 6, wherein the center point of the optimized target area is calculated and obtained by the following formula:
    center_x_opt=λ*center_x_track+(1-λ)*center_x_detect;center_x_opt=λ*center_x_track+(1-λ)*center_x_detect;
    center_y_opt=λ*center_y_track+(1-λ)*center_y_detect;center_y_opt=λ*center_y_track+(1-λ)*center_y_detect;
    其中,center_x_opt为优化的目标区域的中心点在图像帧的水平方向上的坐标,center_y_opt为优化的目标区域的中心点在图像帧的垂直方向上的坐标;center_x_track为目标区域的中心点在图像帧的水平方向上的坐标,center_y_track为目标区域的中心点在图像帧的垂直方向上的坐标,center_x_detect为目标物体区域的中心点在图像帧的水平方向上的坐标,center_y_detect为目标物体区域的中心点在图像帧的垂直方向上的坐标,λ为第二权重。Among them, center_x_opt is the coordinate of the center point of the optimized target area in the horizontal direction of the image frame, center_y_opt is the coordinate of the center point of the optimized target area in the vertical direction of the image frame; center_x_track is the center point of the target area in the image frame The coordinates in the horizontal direction, center_y_track is the coordinates of the center point of the target area in the vertical direction of the image frame, center_x_detect is the coordinates of the center point of the target object area in the horizontal direction of the image frame, center_y_detect is the center point of the target object area Coordinates in the vertical direction of the image frame, λ is the second weight.
  8. 根据权利要求6所述的方法,其特征在于,所述优化的目标区域的大小通过如下算式计算获得:The method according to claim 6, wherein the size of the optimized target area is obtained by calculating the following formula:
    width_opt=λ*width_track+(1-λ)*width_detect;width_opt=λ*width_track+(1-λ)*width_detect;
    height_opt=λ*height_track+(1-λ)*height_detect;height_opt=λ*height_track+(1-λ)*height_detect;
    其中,width_opt为优化的目标区域的宽度,width_track为目标区域的宽度,width_detect为目标物体区域的宽度,height_opt为优化的目标区域的高度,height_track为目标区域的高度,height_detect为目标物体区域的高度,λ为第二权重。Among them, width_opt is the width of the optimized target area, width_track is the width of the target area, width_detect is the width of the target object area, height_opt is the height of the optimized target area, height_track is the height of the target area, height_detect is the height of the target object area, λ is the second weight.
  9. 一种目标跟踪装置,其特征在于,包括:A target tracking device, comprising:
    目标跟踪模块,用于根据跟踪目标的表象特征,通过预设的跟踪算法,在图像帧中确定所述跟踪目标所在的目标区域;a target tracking module, used for determining the target area where the tracking target is located in the image frame through a preset tracking algorithm according to the apparent feature of the tracking target;
    深度学习识别模块,用于在所述图像帧中,通过预设的深度学习算法,获得若干个物体区域;A deep learning recognition module, used to obtain several object regions in the image frame through a preset deep learning algorithm;
    选择模块,用于在所述若干个物体区域中,选择与所述目标区域的物体属性相同的物体区域作为目标物体区域;a selection module, configured to select, among the several object regions, an object region with the same object attribute as the target region as the target object region;
    优化模块,用于根据所述目标物体区域,调整所述目标区域的位置和大小,生成优化的目标区域。The optimization module is configured to adjust the position and size of the target area according to the target object area to generate an optimized target area.
  10. 一种图像处理芯片,其特征在于,包括:处理器以及与所述处理器通信连接的存储器;An image processing chip, comprising: a processor and a memory communicatively connected to the processor;
    所述存储器中存储有计算机程序指令,所述计算机程序指令在被所述处理器调用时,以使所述处理器执行如权利要求1-8任一项所述的目标跟踪方法。Computer program instructions are stored in the memory, and when called by the processor, the computer program instructions cause the processor to execute the target tracking method according to any one of claims 1-8.
  11. 一种无人机,其特征在于,包括:无人机主体,安装在所述无人机主体的云台上的图像采集设备以及图像处理芯片;An unmanned aerial vehicle, characterized in that it comprises: an unmanned aerial vehicle body, an image acquisition device and an image processing chip installed on the gimbal of the unmanned aerial vehicle body;
    所述图像采集设备用于连续采集多帧图像;所述图像处理芯片用于接收所述图像采集设备连续采集的多帧图像,并对接收到的所述多帧图像执行如权利要求1-8任一项所述的目标跟踪方法,实现对跟踪目标的跟踪。The image acquisition device is used to continuously collect multiple frames of images; the image processing chip is used to receive the multiple frames of images continuously collected by the image acquisition device, and to perform the steps of claims 1-8 on the received multiple frames of images. The target tracking method described in any one of them realizes the tracking of the tracking target.
PCT/CN2021/108893 2020-08-12 2021-07-28 Target tracking method and apparatus WO2022033306A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010805592.XA CN112037255A (en) 2020-08-12 2020-08-12 Target tracking method and device
CN202010805592.X 2020-08-12

Publications (1)

Publication Number Publication Date
WO2022033306A1 true WO2022033306A1 (en) 2022-02-17

Family

ID=73577165

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/108893 WO2022033306A1 (en) 2020-08-12 2021-07-28 Target tracking method and apparatus

Country Status (2)

Country Link
CN (1) CN112037255A (en)
WO (1) WO2022033306A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112037255A (en) * 2020-08-12 2020-12-04 深圳市道通智能航空技术有限公司 Target tracking method and device
CN112560651B (en) * 2020-12-09 2023-02-03 燕山大学 Target tracking method and device based on combination of depth network and target segmentation

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109409283A (en) * 2018-10-24 2019-03-01 深圳市锦润防务科技有限公司 A kind of method, system and the storage medium of surface vessel tracking and monitoring
CN109785385A (en) * 2019-01-22 2019-05-21 中国科学院自动化研究所 Visual target tracking method and system
CN109993769A (en) * 2019-03-07 2019-07-09 安徽创世科技股份有限公司 A kind of multiple-target system of deep learning SSD algorithm combination KCF algorithm
CN111098815A (en) * 2019-11-11 2020-05-05 武汉市众向科技有限公司 ADAS front vehicle collision early warning method based on monocular vision fusion millimeter waves
CN112037255A (en) * 2020-08-12 2020-12-04 深圳市道通智能航空技术有限公司 Target tracking method and device

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100231977A1 (en) * 2006-08-08 2010-09-16 Kimoto Co., Ltd Screening apparatus and screening method
CN107341817B (en) * 2017-06-16 2019-05-21 哈尔滨工业大学(威海) Self-adaptive visual track algorithm based on online metric learning
CN109284673B (en) * 2018-08-07 2022-02-22 北京市商汤科技开发有限公司 Object tracking method and device, electronic equipment and storage medium
CN109461207A (en) * 2018-11-05 2019-03-12 胡翰 A kind of point cloud data building singulation method and device
CN110189333B (en) * 2019-05-22 2022-03-15 湖北亿咖通科技有限公司 Semi-automatic marking method and device for semantic segmentation of picture
CN110222686B (en) * 2019-05-27 2021-05-07 腾讯科技(深圳)有限公司 Object detection method, object detection device, computer equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109409283A (en) * 2018-10-24 2019-03-01 深圳市锦润防务科技有限公司 A kind of method, system and the storage medium of surface vessel tracking and monitoring
CN109785385A (en) * 2019-01-22 2019-05-21 中国科学院自动化研究所 Visual target tracking method and system
CN109993769A (en) * 2019-03-07 2019-07-09 安徽创世科技股份有限公司 A kind of multiple-target system of deep learning SSD algorithm combination KCF algorithm
CN111098815A (en) * 2019-11-11 2020-05-05 武汉市众向科技有限公司 ADAS front vehicle collision early warning method based on monocular vision fusion millimeter waves
CN112037255A (en) * 2020-08-12 2020-12-04 深圳市道通智能航空技术有限公司 Target tracking method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LU XIANKAI: "Research on Object Tracking Based on Deep Learning", CHINESE DOCTORAL DISSERTATIONS FULL-TEXT DATABASE, UNIVERSITY OF CHINESE ACADEMY OF SCIENCES, CN, 15 January 2020 (2020-01-15), CN , XP055899976, ISSN: 1674-022X *

Also Published As

Publication number Publication date
CN112037255A (en) 2020-12-04

Similar Documents

Publication Publication Date Title
CN107808143B (en) Dynamic gesture recognition method based on computer vision
CN105830062B (en) System, method and apparatus for coded object formation
CN105830009B (en) Method for video processing and equipment
WO2019128507A1 (en) Image processing method and apparatus, storage medium and electronic device
WO2022033306A1 (en) Target tracking method and apparatus
US9924107B2 (en) Determination of exposure time for an image frame
CN108198130B (en) Image processing method, image processing device, storage medium and electronic equipment
EP3001354A1 (en) Object detection method and device for online training
WO2021175071A1 (en) Image processing method and apparatus, storage medium, and electronic device
EP4174716A1 (en) Pedestrian tracking method and device, and computer readable storage medium
CN112817755A (en) Edge cloud cooperative deep learning target detection method based on target tracking acceleration
CN110069125B (en) Virtual object control method and device
WO2024060978A1 (en) Key point detection model training method and apparatus and virtual character driving method and apparatus
CN108983968A (en) A kind of image big data intersection control routine and method based on virtual reality
CN115699082A (en) Defect detection method and device, storage medium and electronic equipment
WO2021047492A1 (en) Target tracking method, device, and computer system
CN110245609A (en) Pedestrian track generation method, device and readable storage medium storing program for executing
CN113050860A (en) Control identification method and related device
CN106980372B (en) A kind of unmanned plane control method and system without ground control terminal
CN106777071B (en) Method and device for acquiring reference information by image recognition
CN114092920B (en) Model training method, image classification method, device and storage medium
CN114722937A (en) Abnormal data detection method and device, electronic equipment and storage medium
JP2009123150A (en) Object detection apparatus and method, object detection system and program
WO2021217403A1 (en) Method and apparatus for controlling movable platform, and device and storage medium
CN110197459B (en) Image stylization generation method and device and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21855359

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21855359

Country of ref document: EP

Kind code of ref document: A1