WO2022033306A1 - Procédé et appareil de suivi de cible - Google Patents

Procédé et appareil de suivi de cible Download PDF

Info

Publication number
WO2022033306A1
WO2022033306A1 PCT/CN2021/108893 CN2021108893W WO2022033306A1 WO 2022033306 A1 WO2022033306 A1 WO 2022033306A1 CN 2021108893 W CN2021108893 W CN 2021108893W WO 2022033306 A1 WO2022033306 A1 WO 2022033306A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
area
tracking
target area
center
Prior art date
Application number
PCT/CN2021/108893
Other languages
English (en)
Chinese (zh)
Inventor
李亚学
Original Assignee
深圳市道通智能航空技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市道通智能航空技术股份有限公司 filed Critical 深圳市道通智能航空技术股份有限公司
Publication of WO2022033306A1 publication Critical patent/WO2022033306A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the invention relates to the technical field of machine vision, in particular to a target tracking method, device, image processing chip and unmanned aerial vehicle.
  • Object tracking is a technique of predicting the size and position of a target object in subsequent image frames of a video sequence given the target size and position of the initial image frame of the video sequence. It has a wide range of applications in many fields such as video surveillance, human-computer interaction and multimedia analysis.
  • the tracked target is prone to change in shape due to non-rigid motion, and is subject to illumination transformation and background environment interference.
  • the embodiments of the present invention aim to provide a target tracking method, a device, an image processing chip and an unmanned aerial vehicle, which can solve the defects of the existing target tracking method.
  • a target tracking method includes:
  • the target area where the tracking target is located is determined in the image frame through a preset tracking algorithm
  • the position and size of the target area are adjusted to generate an optimized target area.
  • the deep learning algorithm In the initial image frame, through the deep learning algorithm, several optional object regions are identified and obtained, and each of the object regions is marked with a corresponding object label;
  • selecting an object region with the same object attribute as the target region as the target object region specifically including:
  • the candidate object region with the largest overlap with the tracking result of the previous image frame is selected as the target object region.
  • the selection of the candidate object region with the largest overlap with the target region as the target object region specifically includes:
  • the ratio of the intersection area and the union area is used as the degree of overlap between the candidate object area and the target object area.
  • the representational features include: a gradient direction histogram, a local binary pattern, and a color feature.
  • generating an optimized target area by adjusting the position and size of the target area according to the target object area specifically includes:
  • the center point of the target object area and the center point of the target area are weighted and summed to obtain the center point of the optimized target area; the target object
  • the position of the area is represented by the center point of the target object area
  • the position of the target area is represented by the center point of the target area
  • the position of the optimized target area is represented by the center point of the optimized target area
  • weighted summation is performed on the size of the target object area and the target area to obtain the size of the optimized target area.
  • the center point of the optimized target area is calculated and obtained by the following formula:
  • center_x_opt ⁇ *center_x_track+(1- ⁇ )*center_x_detect
  • center_y_opt ⁇ *center_y_track+(1- ⁇ )*center_y_detect
  • center_x_opt is the coordinate of the center point of the optimized target area in the horizontal direction of the image frame
  • center_y_opt is the coordinate of the center point of the optimized target area in the vertical direction of the image frame
  • center_x_track is the center point of the target area in the image frame
  • center_x_detect is the coordinates of the center point of the target object area in the horizontal direction of the image frame
  • center_y_detect is the center point of the target object area Coordinates in the vertical direction of the image frame
  • is the second weight.
  • the size of the optimized target area is calculated and obtained by the following formula:
  • width_opt ⁇ *width_track+(1- ⁇ )*width_detect
  • height_opt ⁇ *height_track+(1- ⁇ )*height_detect
  • width_opt is the width of the optimized target area
  • width_track is the width of the target area
  • width_detect is the width of the target object area
  • height_opt is the height of the optimized target area
  • height_track is the height of the target area
  • height_detect is the height of the target object area
  • is the second weight.
  • a target tracking device comprising:
  • a target tracking module used for determining the target area where the tracking target is located in the image frame through a preset tracking algorithm according to the apparent feature of the tracking target;
  • a deep learning recognition module used to obtain several object regions in the image frame through a preset deep learning algorithm
  • a selection module configured to select, among the several object regions, an object region with the same object attribute as the target region as the target object region;
  • the optimization module is configured to adjust the position and size of the target area according to the target object area to generate an optimized target area.
  • an image processing chip comprising: a processor and a memory communicatively connected to the processor; the memory stores computer program instructions, and the computer Program instructions, when invoked by the processor, cause the processor to perform the object tracking method described above.
  • an unmanned aerial vehicle comprising: an unmanned aerial vehicle main body, an image acquisition device and an image processing chip installed on the gimbal of the unmanned aerial vehicle main body;
  • the image acquisition device is used to continuously collect multiple frames of images;
  • the image processing chip is used to receive the multiple frames of images continuously collected by the image acquisition device, and to perform the above-mentioned target tracking on the received multiple frames of images method to realize the tracking of the tracking target.
  • the target tracking method of the embodiment of the present invention is optimized and adjusted based on the original tracking algorithm and combined with the detection results of deep learning, which can better adapt to and resist interference in complex environments, and is effective. Improves the overall performance of object tracking.
  • FIG. 1 is a schematic diagram of an application scenario of a target tracking method according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of a target tracking apparatus provided by an embodiment of the present invention.
  • FIG. 3 is a method flowchart of a target tracking method provided by an embodiment of the present invention.
  • FIG. 4 is a method flowchart of a method for selecting a target object region provided by an embodiment of the present invention
  • FIG. 5 is a schematic diagram of an application example of a target tracking method provided by an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of an image processing chip provided by an embodiment of the present invention.
  • the traditional object tracking process includes generating candidate samples, feature extraction, scoring candidate samples using the observation model, updating the observation model to adapt to changes in the target, and fusion to obtain the final decision result and so on.
  • feature extraction refers to the process of extracting discriminative features to represent the target. Extracting and obtaining discriminative features is the basis of candidate sample scoring and is the key to determining the performance of target tracking. Most of the existing improvements on the performance of target tracking methods focus on how to select appropriate features.
  • the target tracking method provided by the embodiment of the present invention can overcome the interference of occlusion, deformation, background noise, scale transformation, etc. target tracking performance.
  • FIG. 1 is an application scenario of a target tracking method provided by an embodiment of the present invention.
  • a drone 10 equipped with an aerial camera, an intelligent terminal 20 and a wireless network 30 are included.
  • the drone 10 may be any type of powered unmanned aerial vehicle, including but not limited to quadcopter drones, fixed-wing aircraft, and helicopter models. It can have the corresponding volume or power according to the needs of the actual situation, so as to provide the load capacity, flight speed and flight cruising range that can meet the needs of use.
  • the UAV 10 may be equipped with any type of image capture device, including a motion camera, a high-definition camera, or a wide-angle camera.
  • a motion camera a high-definition camera
  • a wide-angle camera As one of the functional modules carried on the UAV, it can be installed and fixed on the UAV by installing a fixed bracket such as a gimbal, and is controlled by the UAV 10 to perform the task of image acquisition.
  • one or more functional modules can also be added to the UAV, so that the UAV can realize the corresponding functions, such as the built-in main control chip, which is used as the control core of UAV flight and data transmission, or a picture transmission device. , and upload the acquired image information to a device (such as a server or an intelligent terminal) that establishes a connection with the drone.
  • a device such as a server or an intelligent terminal
  • the smart terminal 20 may be any type of smart device used to establish a communication connection with the drone, such as a mobile phone, a tablet computer, or a smart remote control.
  • the smart terminal 20 may be equipped with one or more different user interaction devices to collect user instructions or display and feed back information to the user.
  • buttons, display screens, touch screens, speakers, and remote control joysticks are included in the smart terminal 20 .
  • the smart terminal 20 may be equipped with a touch display screen, through which the user's remote control instructions for the drone are received, and the image information obtained by the aerial camera is displayed to the user through the touch display screen.
  • the touch screen switches the image information currently displayed on the display.
  • the existing image vision processing technology may also be integrated between the drone 10 and the intelligent terminal 20 to further provide more intelligent services.
  • the drone 10 can collect images through an aerial camera, and then the intelligent terminal 20 executes the target tracking method provided by the embodiment of the present invention to track a specific face in the video, and finally realizes the communication between the user and the drone. Human-computer interaction.
  • the target tracking method can also be executed by the drone 10 or an external server, and the final data result can be directly provided to the intelligent terminal 20 .
  • the wireless network 30 can be a wireless communication network based on any type of data transmission principle for establishing a data transmission channel between two nodes, such as a Bluetooth network, a WiFi network, a wireless cellular network or a combination thereof located in a specific signal frequency band, to achieve Data transmission between the drone 10, the smart terminal 20 and/or the server.
  • FIG. 2 is a structural block diagram of a target tracking apparatus provided by an embodiment of the present invention.
  • the target tracking device can be executed by any suitable type of electronic computing platform, such as an image processing chip built in the drone, a server or an intelligent terminal that establishes a wireless communication connection with the drone.
  • the composition of the target tracking device is described in the form of functional modules.
  • FIG. 2 can be selectively implemented by software, hardware or a combination of software and hardware according to actual needs. For example, it may be implemented by the processor calling an associated software application stored in memory.
  • the target tracking device 200 includes: a target tracking module 210 , a deep learning identification module 220 , a selection module 230 and an optimization module 240 .
  • the target tracking module 210 is configured to determine the target area where the tracking target is located in the image frame by using a preset tracking algorithm according to the appearance characteristics of the tracking target.
  • representational features refers to some hand-designed discriminative features used in traditional object tracking methods. It has the characteristics of fast operation speed and small delay. Specifically, the representation features include gradient direction histograms, local binary patterns and color features.
  • the target tracking module 210 is a functional module for executing traditional target tracking methods, and specifically, any suitable type of tracking algorithm can be selected.
  • the deep learning identification module 220 is configured to obtain several object regions in the image frame through a preset deep learning algorithm.
  • Deep learning is a method of image recognition using deep neural networks trained on sample data. Through deep learning, multiple different object regions can be identified in the image frame. Each object area represents a specific object.
  • the selection module 230 is configured to select, among the several object regions, an object region with the same object attribute as the target region as the target object region.
  • Object attribute refers to the metric used to judge whether the object area and the target area belong to the same object. Specifically, it may be composed of one or more indicators or conditions, and basically, object regions belonging to the same target object can be determined or selected.
  • the selection module 230 is specifically configured to select an object region that is the same as the object label of the tracking target as a candidate object region; and in the candidate object region, select the tracking result of the previous image frame The candidate object region with the largest overlap degree is used as the target object region.
  • the optimization module 240 is configured to adjust the position and size of the target area according to the target object area to generate an optimized target area.
  • the "target object area” is an area obtained based on deep learning, which has a relatively good resistance to interference factors in complex environments and is not easily interfered. Therefore, on the basis of the traditional tracking algorithm, the target object region is introduced as a reference, so that the effect of the tracking result can be optimized, and an optimized target region can be generated.
  • the optimization module 240 is specifically configured to: set a first weight of the target object area and a second weight of the target area;
  • the center point of the target object area and the center point of the target area are weighted and summed to obtain the center point of the optimized target area; and according to the The first weight and the second weight are weighted summation of the size of the target object area and the target area to obtain the size of the optimized target area.
  • the position of the target object area is represented by the center point of the target object area
  • the position of the target area is represented by the center point of the target area
  • the position of the optimized target area is represented by the optimized target area.
  • the center point of the area is represented.
  • the center point of the optimized target area can be calculated by the following formula:
  • center_x_opt ⁇ *center_x_track+(1- ⁇ )*center_x_detect
  • center_y_opt ⁇ *center_y_track+(1- ⁇ )*center_y_detect
  • center_x_opt is the coordinate of the center point of the optimized target area in the horizontal direction of the image frame
  • center_y_opt is the coordinate of the center point of the optimized target area in the vertical direction of the image frame
  • center_x_track is the center point of the target area in the image frame
  • center_x_detect is the coordinates of the center point of the target object area in the horizontal direction of the image frame
  • center_y_detect is the center point of the target object area Coordinates in the vertical direction of the image frame
  • is the second weight.
  • the size of the optimized target area can be calculated by the following formula:
  • width_opt ⁇ *width_track+(1- ⁇ )*width_detect
  • height_opt ⁇ *height_track+(1- ⁇ )*height_detect
  • width_opt is the width of the optimized target area
  • width_track is the width of the target area
  • width_detect is the width of the target object area
  • height_opt is the height of the optimized target area
  • height_track is the height of the target area
  • height_detect is the height of the target object area
  • is the second weight.
  • the image acquisition device applied to the UAV is taken as an example.
  • the target tracking method can also be used in other types of scenarios and devices to improve the performance of the target tracking algorithm.
  • the target tracking method disclosed in the embodiment of the present invention is not limited to be applied to the UAV shown in FIG. 1 .
  • the target tracking device 200 may further include a marking module 250 .
  • the marking module 250 is used to identify and obtain several selectable object regions through the deep learning algorithm in the initial image frame, and determine the object region selected by the user as the tracking target. Wherein, each of the object regions is marked with a corresponding object label.
  • the "object label” (label) is the output of the deep learning algorithm, which is used to mark the object corresponding to the object area.
  • the specific form of the object label depends on the deep learning algorithm used and its training data.
  • the target object to be tracked can be selected by the user as a candidate object.
  • FIG. 3 is a method flowchart of a target tracking method provided by an embodiment of the present invention. As shown in Figure 3, the target tracking method may include the following steps:
  • the "image frame” refers to a certain frame of image being processed in the video sequence.
  • the tracking algorithm takes the video sequence composed of continuous image frames as the processing object, and predicts and tracks the position of the target in the image frame by frame.
  • the tracking algorithm can use any type of fast tracking algorithm in the prior art, which takes the appearance feature as the discriminating feature of the tracking target.
  • the appearance feature includes but is not limited to histogram of gradient orientation (HOG), local binary pattern (LBP), color feature, and the like.
  • the gradient direction histogram is easily disturbed by the non-rigid body motion of the target (such as the movement of the person who is the tracking target from standing to squatting) and occlusion.
  • the color features are easily affected by changes in the lighting environment.
  • the target area refers to a rectangular frame with a specific size in the image frame that is calculated and output by the "tracking algorithm" and contains the tracking target. Specifically, it can be calculated and obtained by any type of tracking algorithm.
  • the "deep learning algorithm” can be of any type, using sample data to realize the image processing method of the neural network model. Through deep learning algorithms, multiple objects present in an image frame can be obtained with high confidence.
  • the output of the deep learning algorithm is also a rectangular box containing recognizable objects.
  • the deep learning algorithm also outputs the object label corresponding to each object area, which is used to mark the target object (such as a face, an airplane, etc.) corresponding to the object area.
  • the specific filtering method used depends on the actual object properties used.
  • the technical personnel can choose to use one or more measurement standards as the actual object attributes according to the needs of the actual situation.
  • the steps shown in FIG. 4 may be specifically adopted to select and obtain a target object region from a plurality of object regions as a reference for adjustment and optimization:
  • the "object label” (label) is output by the deep learning algorithm and used to mark the object corresponding to the object area.
  • label is output by the deep learning algorithm and used to mark the object corresponding to the object area.
  • the exact form of object labels depends on the deep learning algorithm used and its training data.
  • the degree of overlap between the candidate region and the tracking result of the previous image frame can be used as a criterion, and a target object region that can be used as an adjustment reference and standard can be further selected.
  • the object label and the degree of overlap are used to judge the attributes of the object, so that the real tracking target can be found from the output result of the deep learning algorithm with certainty, and this can be used as the basis for adjusting and optimizing the tracking result.
  • the target object area adjust the position and size of the target area to generate an optimized target area.
  • adjustment refers to using any suitable type of function mapping method to integrate the target object area and the target area, and by adjusting the position and size, generate and output an optimization result, that is, the optimized target area.
  • Forms of adjustment include changing and optimizing the position and size of the rectangular box representing the target area in the image frame.
  • the tracking target can be tracked by linking the optimized target areas of each image frame in a series of continuous image frame sequences, and the changes of the position and size of the tracking target in the image frame sequence can be determined.
  • the video sequence is composed of n consecutive image frames (the target tracking method provided by the embodiment of the present invention is sequentially performed in the 2nd to the nth image frame, and the predicted tracking target is in each image frame. position.)
  • the tracking result, the object area and the target area can all be represented by the smallest enclosing rectangle containing the target object.
  • the target area where the tracking target is located is determined according to the user's instruction.
  • the initial image frame refers to the starting point of target tracking.
  • the initial image frame is the first image frame in the video sequence.
  • any image frame in the video sequence can also be randomly selected as the initial image frame, which is used as the starting point of target tracking.
  • a deep learning algorithm can be used to identify and detect in the initial image frame, and several optional object regions (each of the object regions) can be obtained. are marked with corresponding object labels, denoted by L1 to L4 in Figure 5) as optional tracking targets.
  • the user can issue a corresponding user selection instruction according to his own needs, and select one of these optional object areas as the tracking target (eg L4).
  • a target object region with the same object attribute as the tracking target is selected.
  • the object region obtained by detection first select the object region that is the same as the object label of the tracking target of the initial image frame as the candidate object region. Then, the object area with the highest degree of overlap with the target area D of the previous image frame is used as the target object area.
  • the degree of overlap between the two rectangular boxes can be represented by the intersection ratio between the object area and the target area.
  • the "intersection over union ratio” (IoU) refers to the ratio between the intersection and union of two regions of interest.
  • the calculation process is specifically: calculating the intersection area and union area of the candidate object area and the target area, and using the ratio of the intersection area and the union area as the candidate object area and the target object. The overlap of the regions.
  • the optimized target area in the image frame can be obtained.
  • the target tracking result in a video sequence composed of a series of consecutive image frames can be obtained.
  • the complete target tracking method can be expressed by the following formula (1):
  • Bbox represents the tracking target
  • the subscripts track and detect represent the target area obtained by the tracking algorithm and the object area obtained by the deep learning algorithm, respectively.
  • the object area may contain multiple, use them separately Representation (j is the serial number of the object area).
  • Represents the target area in the i-th image frame represents the optimized tracking result of the i-th image frame, Represents the target object area (that is, the object area with the same object label as the target area of the previous frame and the largest overlap).
  • the specific integration method for the tracking algorithm and the deep learning algorithm is as follows: first, the first weight of the target object area and the second weight of the target area are set. Then, according to the first weight and the second weight, the center point of the target object area and the center point of the target area are weighted and summed to obtain the optimized center point of the target area and according to the The first weight and the second weight are weighted and summed to the size of the target object area and the target area to obtain the size of the optimized target area.
  • the position of the target object area is represented by the center point of the target object area
  • the position of the target area is represented by the center point of the target area
  • the position of the optimized target area is represented by the optimized target area.
  • the center point of the area indicates that the first weight and the second weight of the target object area and the target area can be preset by technicians according to the actual situation, and are constant values that can be determined experimentally or empirically.
  • the confidence of the target area may be used as the weight (second weight) occupied by the tracking algorithm.
  • the weight occupied by the deep learning algorithm ie, the first weight
  • the first weight is 1 minus the confidence of the target area.
  • the position of the optimized target area is obtained by calculating the following formulas (2) and (3):
  • center_x_opt ⁇ *center_x_track+(1- ⁇ )*center_x_detect (2);
  • center_y_opt ⁇ *center_y_track+(1- ⁇ )*center_y_detect (3)
  • the optimized target area, the object area, and the target area are all represented by a rectangular frame. Therefore, the optimized target area, the object area, and the position of the target area can be represented by the position coordinates of the center of the rectangular frame in the image frame. That is, the position of the optimized target area can be expressed as (center_x_opt, center_y_opt), the position of the object area can be expressed as (center_x_detect, center_y_detect), and the position of the target area can be expressed as (center_x_track, center_y_track).
  • the size of the optimized target area can be calculated by the following formulas (4) and (5):
  • width_opt ⁇ *width_track+(1- ⁇ )*width_detect (4);
  • height_opt ⁇ *height_track+(1- ⁇ )*height_detect (5);
  • width_opt is the width of the optimized target area
  • width_track is the width of the target area
  • width_detect is the width of the target object area
  • height_opt is the height of the optimized target area
  • height_track is the height of the target area
  • height_detect is the height of the target object area
  • is the confidence level.
  • the target tracking method provided by the embodiment of the present invention is optimized and adjusted in combination with the detection results of deep learning, which can better adapt to and resist interference in complex environments, and effectively improve the overall performance of target tracking.
  • An embodiment of the present invention further provides a non-volatile computer storage medium, where the computer storage medium stores at least one executable instruction, and the computer-executable instruction can execute the target tracking method in any of the foregoing method embodiments.
  • FIG. 6 shows a schematic structural diagram of an image processing chip according to an embodiment of the present invention.
  • the specific embodiment of the present invention does not limit the specific implementation of the image processing chip.
  • the image processing chip may include: a processor (processor) 602 , a communication interface (Communications Interface) 604 , a memory (memory) 606 , and a communication bus 608 .
  • processor processor
  • Communication interface Communication Interface
  • memory memory
  • communication bus 608
  • the processor 602 , the communication interface 604 , and the memory 606 communicate with each other through the communication bus 608 .
  • the communication interface 604 is used to communicate with network elements of other devices such as clients or other servers.
  • the processor 602 is configured to execute the program 610, and specifically may execute the relevant steps in the above-mentioned embodiments of the target tracking method.
  • the program 610 may include program code including computer operation instructions.
  • the processor 602 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present invention.
  • the one or more processors included in the network slicing device may be the same type of processors, such as one or more CPUs; or may be different types of processors, such as one or more CPUs and one or more ASICs.
  • the memory 606 is used to store the program 610 .
  • Memory 606 may include high-speed RAM memory, and may also include non-volatile memory, such as at least one disk memory.
  • Program 610 may be used to cause processor 602 to perform the following steps:
  • the target area where the tracking target is located is determined in the image frame.
  • several object regions are obtained through a preset deep learning algorithm.
  • an object region with the same object attribute as the target region is selected as the target object region.
  • an optimized target area is generated by adjusting the position and size of the target area.
  • the program 610 is further configured to cause the processor 602 to perform the following steps before executing the step of determining the target area where the tracking target is located in the image frame by using a preset tracking algorithm: first, in the initial image In the frame, several selectable object regions are identified and obtained through the deep learning algorithm, and each of the object regions is marked with a corresponding object label. Then, the object area selected by the user is determined as the tracking target.
  • the program 610 may be used to cause the processor 602 to perform the step of selecting an object region with the same object attribute as the target region among the several object regions as the target object region, specifically Used for:
  • the program 610 may be configured to cause the processor 602 to, when performing the step of selecting the candidate object region with the largest overlap with the target region as the target object region, be specifically configured to: calculate the candidate object region The intersection area and the union area of the object area and the tracking result; the ratio of the intersection area and the union area is taken as the degree of overlap between the candidate object area and the tracking result.
  • the program 610 may be used to cause the processor 602 to perform the step of generating an optimized target area by adjusting the position and size of the target area according to the target object area, specifically using At:
  • the processor 602 can obtain the center point of the optimized target area by calculating the following formula:
  • center_x_opt ⁇ *center_x_track+(1- ⁇ )*center_x_detect
  • center_y_opt ⁇ *center_y_track+(1- ⁇ )*center_y_detect
  • center_x_opt is the coordinate of the center point of the optimized target area in the horizontal direction of the image frame
  • center_y_opt is the coordinate of the center point of the optimized target area in the vertical direction of the image frame
  • center_x_track is the center point of the target area in the image frame
  • center_x_detect is the coordinates of the center point of the target object area in the horizontal direction of the image frame
  • center_y_detect is the center point of the target object area Coordinates in the vertical direction of the image frame
  • is the second weight.
  • the processor 602 can also obtain the size of the optimized target area by calculating the following formula:
  • width_opt ⁇ *width_track+(1- ⁇ )*width_detect
  • height_opt ⁇ *height_track+(1- ⁇ )*height_detect
  • width_opt is the width of the optimized target area
  • width_track is the width of the target area
  • width_detect is the width of the target object area
  • height_opt is the height of the optimized target area
  • height_track is the height of the target area
  • height_detect is the height of the target object area
  • is the second weight.
  • each step of the exemplary target tracking method described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, computer software or a combination of the two, for the purpose of clear illustration Interchangeability of hardware and software, the above description has generally described the components and steps of each example in terms of functions. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution.
  • the computer software can be stored in a computer-readable storage medium, and when the program is executed, it can include the processes of the above-mentioned method embodiments.
  • the storage medium can be a magnetic disk, an optical disk, a read-only storage memory, or a random storage memory, and the like.
  • the embodiment of the present invention further provides an unmanned aerial vehicle.
  • the unmanned aerial vehicle comprises: an unmanned aerial vehicle main body, an image acquisition device and an image acquisition chip installed on the gimbal of the unmanned aerial vehicle main body.
  • the image acquisition device is used to continuously collect multiple frames of images;
  • the image processing chip is used to receive the multiple frames of images continuously collected by the image acquisition device, and to perform the following steps on the received multiple frames of images:
  • the target area where the tracking target is located is determined in the image frame.
  • several object regions are obtained through a preset deep learning algorithm.
  • an object region with the same object attribute as the target region is selected as the target object region.
  • an optimized target area is generated by adjusting the position and size of the target area.
  • the UAV can track the tracking target.
  • the image processing chip before the step of determining the target area where the tracking target is located in the image frame, the image processing chip further performs the following steps:
  • the image processing chip when the image processing chip performs the step of selecting an object region with the same object attribute as the target region from among the several object regions as the target object region, the image processing chip is specifically configured to:
  • the image processing chip when the image processing chip performs the step of selecting the candidate object region with the largest overlap with the target region as the target object region, the image processing chip is specifically configured to: calculate the difference between the candidate object region and the target object region. The intersection area and the union area of the tracking results; the ratio of the intersection area and the union area is used as the overlap between the candidate object area and the tracking result.
  • the image processing chip when the image processing chip performs the step of generating an optimized target area by adjusting the position and size of the target area according to the target object area, the image processing chip is specifically configured to:
  • the image processing chip can obtain the center point of the optimized target area by calculating the following formula:
  • center_x_opt ⁇ *center_x_track+(1- ⁇ )*center_x_detect
  • center_y_opt ⁇ *center_y_track+(1- ⁇ )*center_y_detect
  • center_x_opt is the coordinate of the center point of the optimized target area in the horizontal direction of the image frame
  • center_y_opt is the coordinate of the center point of the optimized target area in the vertical direction of the image frame
  • center_x_track is the center point of the target area in the image frame
  • center_x_detect is the coordinates of the center point of the target object area in the horizontal direction of the image frame
  • center_y_detect is the center point of the target object area Coordinates in the vertical direction of the image frame
  • is the second weight.
  • the image processing chip can also obtain the size of the optimized target area by calculating the following formula:
  • width_opt ⁇ *width_track+(1- ⁇ )*width_detect
  • height_opt ⁇ *height_track+(1- ⁇ )*height_detect
  • width_opt is the width of the optimized target area
  • width_track is the width of the target area
  • width_detect is the width of the target object area
  • height_opt is the height of the optimized target area
  • height_track is the height of the target area
  • height_detect is the height of the target object area
  • is the second weight.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

La présente invention se rapporte, dans ses modes de réalisation, à un procédé et à un appareil de suivi de cible. Les modes de réalisation consistent : selon une caractéristique d'aspect d'une cible de suivi et au moyen d'un algorithme de suivi prédéfini, à déterminer, à partir d'une trame d'image, une zone cible dans laquelle se trouve la cible de suivi ; à obtenir de multiples zones d'objet à partir de la trame d'image au moyen d'un algorithme d'apprentissage profond prédéfini ; à sélectionner, parmi les multiples zones d'objet, une zone d'objet ayant le même attribut d'objet que la zone cible, et à utiliser celle-ci en tant que zone d'objet cible ; et à ajuster la position et la taille de la zone cible en fonction de la zone d'objet cible de sorte à générer une zone cible optimisée. Dans la présente invention, sur la base d'un algorithme de suivi original, l'optimisation et le réglage sont effectués en combinaison avec un résultat de détection d'apprentissage profond de telle sorte qu'il est possible de mieux s'adapter à une interférence dans un environnement compliqué et qu'il est possible de mieux résister à cette dernière, ce qui permet d'améliorer efficacement la performance globale de suivi de cible.
PCT/CN2021/108893 2020-08-12 2021-07-28 Procédé et appareil de suivi de cible WO2022033306A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010805592.XA CN112037255B (zh) 2020-08-12 2020-08-12 目标跟踪方法和装置
CN202010805592.X 2020-08-12

Publications (1)

Publication Number Publication Date
WO2022033306A1 true WO2022033306A1 (fr) 2022-02-17

Family

ID=73577165

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/108893 WO2022033306A1 (fr) 2020-08-12 2021-07-28 Procédé et appareil de suivi de cible

Country Status (2)

Country Link
CN (1) CN112037255B (fr)
WO (1) WO2022033306A1 (fr)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112037255B (zh) * 2020-08-12 2024-08-02 深圳市道通智能航空技术股份有限公司 目标跟踪方法和装置
CN112560651B (zh) * 2020-12-09 2023-02-03 燕山大学 基于深度网络和目标分割结合的目标跟踪方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109409283A (zh) * 2018-10-24 2019-03-01 深圳市锦润防务科技有限公司 一种海面舰船跟踪和监控的方法、系统和存储介质
CN109785385A (zh) * 2019-01-22 2019-05-21 中国科学院自动化研究所 视觉目标跟踪方法及系统
CN109993769A (zh) * 2019-03-07 2019-07-09 安徽创世科技股份有限公司 一种深度学习ssd算法结合kcf算法的多目标跟踪系统
CN111098815A (zh) * 2019-11-11 2020-05-05 武汉市众向科技有限公司 一种基于单目视觉融合毫米波的adas前车碰撞预警方法
CN112037255A (zh) * 2020-08-12 2020-12-04 深圳市道通智能航空技术有限公司 目标跟踪方法和装置

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2660453A1 (fr) * 2006-08-08 2008-02-14 Kimoto Co., Ltd. Dispositif et procede de tramage
CN107341817B (zh) * 2017-06-16 2019-05-21 哈尔滨工业大学(威海) 基于在线度量学习的自适应视觉跟踪算法
CN109284673B (zh) * 2018-08-07 2022-02-22 北京市商汤科技开发有限公司 对象跟踪方法及装置、电子设备及存储介质
CN109461207A (zh) * 2018-11-05 2019-03-12 胡翰 一种点云数据建筑物单体化方法及装置
CN110189333B (zh) * 2019-05-22 2022-03-15 湖北亿咖通科技有限公司 一种图片语义分割半自动标注方法及装置
CN110222686B (zh) * 2019-05-27 2021-05-07 腾讯科技(深圳)有限公司 物体检测方法、装置、计算机设备和存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109409283A (zh) * 2018-10-24 2019-03-01 深圳市锦润防务科技有限公司 一种海面舰船跟踪和监控的方法、系统和存储介质
CN109785385A (zh) * 2019-01-22 2019-05-21 中国科学院自动化研究所 视觉目标跟踪方法及系统
CN109993769A (zh) * 2019-03-07 2019-07-09 安徽创世科技股份有限公司 一种深度学习ssd算法结合kcf算法的多目标跟踪系统
CN111098815A (zh) * 2019-11-11 2020-05-05 武汉市众向科技有限公司 一种基于单目视觉融合毫米波的adas前车碰撞预警方法
CN112037255A (zh) * 2020-08-12 2020-12-04 深圳市道通智能航空技术有限公司 目标跟踪方法和装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LU XIANKAI: "Research on Object Tracking Based on Deep Learning", CHINESE DOCTORAL DISSERTATIONS FULL-TEXT DATABASE, UNIVERSITY OF CHINESE ACADEMY OF SCIENCES, CN, 15 January 2020 (2020-01-15), CN , XP055899976, ISSN: 1674-022X *

Also Published As

Publication number Publication date
CN112037255A (zh) 2020-12-04
CN112037255B (zh) 2024-08-02

Similar Documents

Publication Publication Date Title
CN107808143B (zh) 基于计算机视觉的动态手势识别方法
CN105830062B (zh) 用于编码对象阵型的系统、方法及设备
CN110163076B (zh) 一种图像数据处理方法和相关装置
CN105830009B (zh) 用于视频处理的方法和设备
US10440284B2 (en) Determination of exposure time for an image frame
US20240202893A1 (en) Method and device for detecting defect, storage medium and electronic device
WO2022033306A1 (fr) Procédé et appareil de suivi de cible
CN108198130B (zh) 图像处理方法、装置、存储介质及电子设备
WO2021175071A1 (fr) Procédé et appareil de traitement d'image, support de stockage et dispositif électronique
EP4174716A1 (fr) Procédé et dispositif de suivi de piéton, et support d'enregistrement lisible par ordinateur
CN112817755A (zh) 基于目标追踪加速的边云协同深度学习目标检测方法
WO2021047492A1 (fr) Procédé de suivi de cible, dispositif, et système informatique
CN110245609A (zh) 行人轨迹生成方法、装置及可读存储介质
CN112633313A (zh) 一种网络终端的不良信息识别方法及局域网终端设备
WO2024060978A1 (fr) Procédé et appareil d'entraînement de modèle de détection de points clés et procédé et appareil de pilotage de personnage virtuel
CN114092920B (zh) 一种模型训练的方法、图像分类的方法、装置及存储介质
CN113050860A (zh) 一种控件识别方法和相关装置
CN112580750A (zh) 图像识别方法、装置、电子设备及存储介质
CN106777071B (zh) 一种图像识别获取参考信息的方法和装置
WO2021217403A1 (fr) Procédé et appareil de commande de plateforme mobile et dispositif et support de stockage
CN113705309A (zh) 一种景别类型判断方法、装置、电子设备和存储介质
CN109727218B (zh) 一种完整图形提取方法
CN113723168A (zh) 一种基于人工智能的主体识别方法、相关装置及存储介质
WO2023216918A1 (fr) Procédé et appareil de rendu d'image, dispositif électronique et support de stockage
CN110941974B (zh) 虚拟对象的控制方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21855359

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21855359

Country of ref document: EP

Kind code of ref document: A1