WO2021047492A1 - Procédé de suivi de cible, dispositif, et système informatique - Google Patents

Procédé de suivi de cible, dispositif, et système informatique Download PDF

Info

Publication number
WO2021047492A1
WO2021047492A1 PCT/CN2020/113895 CN2020113895W WO2021047492A1 WO 2021047492 A1 WO2021047492 A1 WO 2021047492A1 CN 2020113895 W CN2020113895 W CN 2020113895W WO 2021047492 A1 WO2021047492 A1 WO 2021047492A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
zoom
image sequence
trend
scale
Prior art date
Application number
PCT/CN2020/113895
Other languages
English (en)
Chinese (zh)
Inventor
孙海洋
吕思霖
陈颖
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2021047492A1 publication Critical patent/WO2021047492A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/223Analysis of motion using block-matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Definitions

  • This application relates to the technical field of target tracking, in particular to target tracking methods, devices and computer systems.
  • Visual object tracking is an important research direction in computer vision, and has a wide range of applications, such as: video surveillance, human-computer interaction (for example, interactive games based on gestures, facial expressions, etc.), unmanned driving, and so on.
  • the visual target (single target) tracking task is to predict the size and position of the target in subsequent frames given the target size and position in the initial frame of a certain image sequence.
  • This basic task process can be divided into the following frameworks: input the initial target frame, generate many candidate frames in the next frame, extract the characteristics of these candidate frames, then score these candidate frames, and finally find the one with the highest score among these scores
  • the candidate frame is used as the target of prediction, or multiple prediction values are merged to obtain a better prediction target.
  • Visual moving target tracking is a very challenging task, because for moving targets, the moving scene is very complex and often changes, or the target itself will also change constantly. So how to identify and track the ever-changing target in a complex scene becomes a challenging task.
  • Scale Variation is a phenomenon in which the scale changes from far to near or from near to far during the movement of a target. This makes predicting the size of the target frame one of the challenges in target tracking. How to predict the scale change coefficient of the target quickly and accurately directly affects the accuracy of tracking.
  • the usual practice in the prior art is: when the motion model generates candidate samples, a large number of candidate frames of different scales are generated, or target tracking is performed on multiple targets of different scales, and multiple prediction results are generated, and the best one is selected. Excellent as the final prediction target.
  • This application provides a target tracking method, device, and computer system, which can reduce the time complexity of target tracking.
  • a target tracking method includes:
  • a target tracking method includes:
  • the plurality of sample groups include sample groups obtained by performing cyclic shifts respectively according to a plurality of zoom search areas, and the zoom search area includes a plurality of zoom search areas with different zoom directions ;
  • a larger weighting factor is assigned to the sample group corresponding to the zoom search area in the zoom direction with the same change trend, so as to determine the best matching degree.
  • a target tracking method in a traffic scene includes:
  • the target is compared with the previous frame in the current frame. Forecasting the trend of scale changes;
  • a target tracking method in a traffic scene includes:
  • the target is compared with the previous frame in the current frame. Forecasting the trend of scale changes;
  • the plurality of sample groups include sample groups obtained by performing cyclic shifts respectively according to a plurality of zoom search areas, and the zoom search area includes a plurality of zoom search areas with different zoom directions ;
  • a larger weighting factor is assigned to the sample group corresponding to the zoom search area in the zoom direction with the same change trend, so as to determine the best matching degree.
  • a target tracking method in a human-computer interaction scene includes:
  • the trend of the scale change of the target in the current frame relative to the previous frame is predicted ;
  • a target tracking method in a human-computer interaction scene includes:
  • the trend of the scale change of the target in the current frame relative to the previous frame is predicted ;
  • the plurality of sample groups include sample groups obtained by performing cyclic shifts respectively according to a plurality of zoom search areas, and the zoom search area includes a plurality of zoom search areas with different zoom directions ;
  • a larger weighting factor is assigned to the sample group corresponding to the zoom search area in the zoom direction with the same change trend, so as to determine the best matching degree.
  • a target tracking device includes:
  • the scale change trend prediction unit is used to predict the scale change trend of the target in the current frame relative to the previous frame during the target tracking of the image sequence;
  • the zoom search area quantity control unit is configured to reduce the number of zoom search areas corresponding to the opposite trend according to the prediction result when the zoom search area is generated for the current frame.
  • a target tracking device includes:
  • the scale change trend prediction unit is used to predict the scale change trend of the target in the current frame relative to the previous frame during the target tracking of the image sequence;
  • a maximum matching degree obtaining unit in a group is configured to obtain maximum matching degree information for a plurality of sample groups, the plurality of sample groups including sample groups obtained by performing cyclic shifts respectively according to a plurality of zoom search regions, and the zoom search region Including multiple zoom search areas with different zoom directions;
  • the weight factor assignment unit is used to assign a larger weight factor to the sample group corresponding to the zoom search area in the zoom direction with the same change trend according to the prediction result, so as to determine the best matching degree.
  • a target tracking device in a traffic scene including:
  • the first image sequence obtaining unit is used to obtain the image sequence collected by the camera equipped with the roadside equipment;
  • the scale change trend prediction unit is used to perform target tracking on the image sequence, according to the vertical motion direction information of the target in the multi-frame images that have been tracked, to determine the current target of the target Predict the trend of the scale change in the frame relative to the previous frame;
  • the zoom search area quantity control unit is configured to reduce the number of zoom search areas corresponding to the opposite trend according to the prediction result when the zoom search area is generated for the current frame.
  • a target tracking device in a traffic scene including:
  • the first image sequence obtaining unit is used to obtain the image sequence collected by the camera equipped with the roadside equipment;
  • the scale change trend prediction unit is used to perform target tracking on the image sequence, according to the vertical motion direction information of the target in the multi-frame images that have been tracked, to determine the current target of the target Predict the trend of the scale change in the frame relative to the previous frame;
  • a maximum matching degree obtaining unit in a group is configured to obtain maximum matching degree information for a plurality of sample groups, the plurality of sample groups including sample groups obtained by performing cyclic shifts respectively according to a plurality of zoom search regions, and the zoom search region Including multiple zoom search areas with different zoom directions;
  • the weight factor assignment unit is used to assign a larger weight factor to the sample group corresponding to the zoom search area in the zoom direction with the same change trend according to the prediction result, so as to determine the best matching degree.
  • a target tracking device in a human-computer interaction scene includes:
  • the second image sequence obtaining unit is used to obtain the image sequence collected by the terminal device in real time during the human-computer interaction process
  • the scale change trend prediction unit is used to perform target tracking on the image sequence, according to the size change information of the target in the multi-frame images that have been tracked, to compare the target in the current frame with respect to the upper Predict the scale change trend of one frame;
  • the zoom search area quantity control unit is configured to reduce the number of zoom search areas corresponding to the opposite trend according to the prediction result when the zoom search area is generated for the current frame.
  • a target tracking device in a human-computer interaction scene includes:
  • the second image sequence obtaining unit is used to obtain the image sequence collected by the terminal device in real time during the human-computer interaction process
  • the scale change trend prediction unit is used to perform target tracking on the image sequence, according to the size change information of the target in the multi-frame images that have been tracked, to compare the target in the current frame with respect to the upper Predict the scale change trend of one frame;
  • a maximum matching degree obtaining unit in a group is configured to obtain maximum matching degree information for a plurality of sample groups, the plurality of sample groups including sample groups obtained by performing cyclic shifts respectively according to a plurality of zoom search regions, and the zoom search region Including multiple zoom search areas with different zoom directions;
  • the weight factor assignment unit is used to assign a larger weight factor to the sample group corresponding to the zoom search area in the zoom direction with the same change trend according to the prediction result, so as to determine the best matching degree.
  • a computer system including:
  • One or more processors are One or more processors.
  • a memory associated with the one or more processors where the memory is used to store program instructions, and when the program instructions are read and executed by the one or more processors, perform the following operations:
  • a computer system including:
  • One or more processors are One or more processors.
  • a memory associated with the one or more processors where the memory is used to store program instructions, and when the program instructions are read and executed by the one or more processors, perform the following operations:
  • the plurality of sample groups include sample groups obtained by performing cyclic shifts respectively according to a plurality of zoom search areas, and the zoom search area includes a plurality of zoom search areas with different zoom directions ;
  • a larger weighting factor is assigned to the sample group corresponding to the zoom search area in the zoom direction with the same change trend, so as to determine the best matching degree.
  • the scale change trend of the target in the current frame relative to the previous frame can be predicted.
  • the zoom search area can be based on the prediction result , To reduce the number of zoomed search areas corresponding to the opposite trend. In this way, by reducing the scaled ROI of the invalid scale, the number of feature extractions and the number of detections in the target frame detection step can be reduced, thereby reducing the time complexity of tracking.
  • Figure 1 is a schematic diagram of determining the basic search area in the current frame
  • Figure 2 is a schematic diagram of determining the zoom search area in the current frame
  • Figures 3-1 and 3-2 are schematic diagrams of generating basic samples and offset samples based on different search areas
  • Figure 4 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • FIG. 5 is a flowchart of the first method provided by an embodiment of the present application.
  • FIG. 6 is a flowchart of a second method provided by an embodiment of the present application.
  • FIG. 7 is a flowchart of a third method provided by an embodiment of the present application.
  • FIG. 8 is a flowchart of a fourth method provided by an embodiment of the present application.
  • FIG. 9 is a flowchart of a fifth method provided by an embodiment of the present application.
  • FIG. 10 is a flowchart of a sixth method provided by an embodiment of the present application.
  • FIG. 11 is a schematic diagram of a first device provided by an embodiment of the present application.
  • FIG. 12 is a schematic diagram of a second device provided by an embodiment of the present application.
  • FIG. 13 is a schematic diagram of a third device provided by an embodiment of the present application.
  • FIG. 14 is a schematic diagram of a fourth device provided by an embodiment of the present application.
  • FIG. 15 is a schematic diagram of a fifth device provided by an embodiment of the present application.
  • FIG. 16 is a schematic diagram of a sixth device provided by an embodiment of the present application.
  • Fig. 17 is a schematic diagram of a computer system provided by an embodiment of the present application.
  • the visual target tracking algorithm can be divided into a generative method and a discriminative method according to whether the observation model is a generative model or a discriminative model.
  • the discriminant methods represented by Correlation Filter and Deep Learning have achieved satisfactory results.
  • Correlation filtering originates from the field of signal processing. Correlation is used to express the degree of similarity between two signals, and convolution is usually used to express correlation operations. Then the basic idea of the tracking method based on correlation filtering is to find a filter template, let the image of the next frame be convolved with the filter template, and the area with the largest response is the target of prediction.
  • the embodiment of the present application is an improvement based on the target tracking algorithm based on correlation filtering.
  • the correlation filter is introduced in the target tracking field, by generating several (region) samples in a certain area of the current frame, and selecting the (region) sample with the greatest correlation with the filtering template from the samples, as the tracking result of the current frame.
  • the specific steps mainly include:
  • search region search region, a.k.a., ROI
  • the premise of determining the search area in the current frame is that in the previous frame of the current frame, the position and size of the circumscribed rectangular frame of the target (referred to as the "target frame" for short) have been determined.
  • the position and size of the target frame in the previous frame are shown.
  • the target frame of the same size can be drawn in the current frame according to the position of the target frame in the previous frame.
  • the search area can be expanded.
  • the target frame position can be centered and larger than the target frame size (for example, 2.5 times the target frame size)
  • a region determined for the search range is recorded as based ROI.
  • the size of the based ROI is M ⁇ N, and K (K is generally an even number, such as 2) scaled ROIs is generated according to the scale factor s (s is a number greater than 1.0, such as 1.05).
  • Its size is (s i M)x(s i N), where i is the number of the ROI, and the value is When i is less than 0, the scale factor s is greater than 1.0, and the scaled ROI generated is smaller than the based ROI, which is recorded as smaller scaled ROI.
  • the proportion of the target in the scaled ROI is greater than the proportion of the target in the based ROI.
  • the smaller scaled ROI can cover the situation where the target scale becomes larger.
  • bigger scaled ROI scaled ROI generated when i is greater than 0
  • bigger scaled ROI can cover the situation where the target scale becomes smaller.
  • the characteristics that characterize the current ROI can be obtained. For example, keeping the aspect ratio unchanged and scaling the ROI to the specified template size (template size) to obtain the template image (the template can be generated according to the characteristics of the target obtained by marking in the first frame, etc., in the subsequent tracking process , The template can also be updated).
  • the template image can be obtained by down-sampling the ROI image.
  • the target position of the previous frame can be used to generate a based ROI and several scaled ROIs.
  • the detection result of the ROI (including the maximum matching degree y max and the corresponding sample x max ) can be obtained according to the following steps:
  • a detection base sample can be obtained after a certain feature extraction.
  • the based ROI and each scaled ROI generated in step 1 can each be used as a basic sample.
  • each offset sample can be as shown in Figure 3-1; if one of the larger scaled ROI shown in Figure 2 is used as the basic sample, each offset sample The sample shift can be as shown in Figure 3-2.
  • a total of one based ROI and four scaled ROIs as shown in Figure 2 are generated, a total of five sets of samples can be obtained. Assuming that each set of samples includes a basic sample and four offset samples, the total can be Get 25 samples.
  • the existing classifier can be used to classify (match) all the samples, and get the maximum matching value (y max, i ) and the corresponding sample (x max, i ).
  • the maximum matching degree of each sample and the corresponding sample group can be obtained according to the above steps, namely K is the number of scaled ROIs, and is generally an even number, and i is 0 to indicate the detection result of the basic sample.
  • the best matching degree (that is, y best ) and the corresponding sample (that is, x best ) in the y max of all ROIs can be filtered according to certain rules, and the corresponding ROI is recorded as the best ROI.
  • the maximum matching degree y max in each sample group can be determined first, and then the best matching degree y best is selected from the maximum matching degree y max obtained from each sample group.
  • the position and scale of the current frame target can be obtained according to the following method:
  • the offset of the target relative to the center point of the best ROI in the current frame is calculated by the offset of the sample x best relative to the basic sample obtained based on the best ROI, and this position is recorded as the target tracking position.
  • the change of the target scale of the current frame relative to the target scale of the previous frame is obtained.
  • the size of the based ROI is M ⁇ N
  • the size of the best ROI is (s k M)x(s k N), where s is the scale factor when the scaled ROI is generated, and k is the serial number of the best ROI in the scaled ROI, if k 0 means that the based ROI is the best ROI.
  • the size of the target in the previous frame is m ⁇ n
  • the size of the target in the current frame is (m/s k )x(n/s k ).
  • the tracking algorithm in the prior art has at least the following problems:
  • the largest matching degree (ie y best ) of the smaller scaled ROI and the larger scaled ROI is the same The weighting factor.
  • the target and the camera increase or decrease according to a certain direction, that is, the scale change trend of the target from the previous frame to the current frame is actually determined. Therefore, the smaller scaled ROI and The credibility of one of the directions in a bigger scaled ROI must be higher.
  • it is not known in advance which direction the scaled ROI is more credible it is only possible to assign the same weighting factor to the sample group generated based on the smaller scaled ROI and the larger scaled ROI. However, this It may cause the wrong y best to be filtered out, which may result in insufficient accuracy of target tracking.
  • the number of larger scaled ROIs in the scaled ROI is reduced, and when it is judged that the target scale becomes smaller, the number of smaller scaled ROIs in the scaled ROI is reduced. In this way, it is possible to reduce the complexity of tracking time caused by an ROI of an invalid scale.
  • a larger weighting factor can be assigned to the sample group corresponding to the scaled ROI of the zoomed search area with the same change trend. Conducive to improving the accuracy of target tracking.
  • the solutions provided in the embodiments of the present application can be used in a variety of specific application scenarios.
  • an image sequence is collected by a camera device installed on a roadside device, and then a target in the image sequence needs to be tracked.
  • the solution provided in the embodiments of this application can be used to reduce the time complexity of the algorithm. Degree to improve tracking accuracy.
  • a specific target tracking program/module can be provided, and the method provided in the embodiment of the present application can be implemented in the program/module.
  • the program/module can be run on a cloud server, and the specific camera device 401 can be installed on the roadside device 402, which can collect images of vehicles and other targets 403 on the road and the surrounding environment, and the collected image sequence can be uploaded Go to the cloud server and perform target tracking on the specific image sequence in the cloud server.
  • the target tracking solution provided by the embodiments of the present application can also be applied to other application scenarios, including human-computer interaction, unmanned driving, and so on.
  • the first embodiment provides a target tracking method from the perspective of a server-side target tracking program/module.
  • the method may specifically include:
  • S501 in the process of target tracking of the image sequence, predict the scale change trend of the target in the current frame relative to the previous frame;
  • the "target” refers to the object to be specifically tracked.
  • the circumscribed rectangular frame of the target that is, the target frame
  • the specific target tracking process is to mark the target frame of the target in each subsequent image frame by tracking the position and size of the target in each frame in each subsequent image frame.
  • the change trend of the scale of the target in the current frame relative to the previous frame can be predicted.
  • the so-called scale of the target in the current frame that is, how many pixels the target occupies in the current frame, because the total size of each frame of the image is the same, so if a target occupies more pixels in a certain image frame More, the larger the scale.
  • the number of pixels occupied by the same target in the previous frame is different from the number of pixels in the current frame, for example, if the number increases, the scale of the target in the current frame will increase relative to the previous frame.
  • Trend, and vice versa is a decreasing trend.
  • a certain method may be used to determine the scale of the target in the current frame. Predict the change trend relative to the previous frame.
  • the scale change trend of the same target between different frames is usually continuous, that is to say, if the scale of the target gradually becomes larger in the previous frames, the target is in the current frame. Relative to the previous frame, it usually has a tendency to gradually become larger, and so on. Therefore, it can be achieved by using the historical tracking result as a priori information for prediction.
  • the change trend of the scale can be predicted according to the change of the position and/or size of the target in the multiple frames of images that have been tracked.
  • the scale change trend can be predicted according to the position change of the target in the multi-frame images that have been tracked. Wherein, if the target moves below the image, it can be determined that the scale of the target has a tendency to increase, and if the target moves upwards of the image, it can be determined that the scale of the target has a tendency to decrease.
  • the scale change trend can be predicted according to the size change of the target in the multi-frame images that have been tracked.
  • the target is already When the tracked image gradually becomes larger, the scale of the target has a tendency to become larger, if the target gradually becomes smaller in the image that has been tracked, the scale of the target has a tendency to become smaller, and so on.
  • the image sequence includes: an image sequence captured by a camera installed on a roadside device in the traffic scene; at this time, since the target enters the camera's field of view to start to leave the camera's field of view, the target moves in the vertical direction
  • the direction will not change, therefore, the movement direction of the target in the vertical direction of the image can be determined according to the position of the target in the multi-frame image that has been tracked; then, according to the movement in the vertical direction
  • the direction determines whether the target enters or leaves the camera field of view, and then the scale change trend of the target can be determined according to the situation of entering or leaving the camera field of view. For example, if you enter the camera's field of view, the scale of the target will gradually become larger, if you leave the camera's field of view, the scale of the target will gradually become smaller, and so on.
  • the change trend of the scale can also be predicted based on the profile information of the camera.
  • the camera may have some specific configurations. For example, in a traffic scene, a camera at a certain angle may only shoot the target in the direction of the vehicle, while cameras at other angles may Only shoot the target in the direction of the car, and so on. These information are all written in the configuration file of the camera. Therefore, the change trend of the scale can be predicted based on the configuration information in the configuration file that only captures the direction of the vehicle coming or going.
  • the scale of the target in the current frame is usually larger than the previous frame; and if it only shoots the direction of the vehicle, Then the scale of the target in the current frame is usually smaller than the previous frame, and so on.
  • the scale change trend of the target in the current frame relative to the previous frame can be predicted in a variety of ways.
  • the number of zoom search areas corresponding to the opposite trend can be reduced according to the prediction result. That is, when the search area is determined in step 1 of the aforementioned algorithm flow introduction, the number of larger scaled ROI and smaller scaled ROI is no longer the same number, but the number of ROIs in one direction is reduced.
  • the number of larger scaled ROIs in the scaled ROI is reduced.
  • the size of the based ROI is M ⁇ N
  • each ROI needs to be cyclically shifted to generate sample groups, and then respectively matched with template features. Therefore, by reducing the scaled ROI of invalid scales, the number of feature extraction and target frame detection can be reduced. The number of detections in the steps further reduces the time complexity of tracking.
  • the specific target tracking algorithm can have many different application scenarios, for example, it can include traffic scenarios, monitoring scenarios, human-computer interaction scenarios, and so on.
  • the requirements for security in different application scenarios may be different. For example, in traffic scenarios and surveillance scenarios, the requirements for security are relatively high, while in human-computer interaction scenarios, it is usually for entertainment and other purposes. The security requirements are relatively low.
  • the accuracy of the target tracking algorithm can also be adjusted according to the different security requirements of different scenarios in which the target tracking is located. For example, in scenarios with higher security requirements, the accuracy of the algorithm can be improved to improve the accuracy of target tracking, and so on.
  • the second embodiment also provides a target tracking method from the perspective of the target tracking program or module in the server.
  • the method may specifically include:
  • S601 In the process of target tracking of the image sequence, predict the scale change trend of the target in the current frame relative to the previous frame;
  • step S601 can be the same as step S501 in the first embodiment, therefore, it can be executed by reference, and will not be repeated here.
  • S602 Obtain maximum matching degree information for multiple sample groups respectively, where the multiple sample groups include sample groups obtained by performing cyclic shifts respectively according to multiple zoom search areas, and the zoom search area includes multiple zooms in different zoom directions. Search area
  • each sample group can be generated for each search area, and each sample group includes a basic sample and multiple offset samples; then, for each sample in each sample group Perform feature extraction separately, and calculate the feature matching degree between each sample feature and template feature. Next, within each sample group, the matching degree corresponding to each sample is compared to obtain the maximum matching degree within the sample group.
  • the respective maximum matching degrees can be compared between different sample groups in the subsequent to obtain the best matching degrees.
  • the offset of the sample corresponding to the best matching degree relative to the basic sample obtained based on the best ROI calculates the offset of the target relative to the center point of the best ROI in the current frame, and this position can be recorded as the target tracking position.
  • the scale change trend of the target in the current frame relative to the previous frame is predicted, a more suitable weight can be assigned to each sample group on the basis of this prior information. factor.
  • the sample group corresponding to the zoom search area in the zoom direction that is the same as the change trend will have a higher degree of confidence, and therefore, a larger weighting factor can be assigned.
  • the sample group corresponding to the zoom search area in the zoom direction opposite to the change trend will have a lower confidence, and therefore, a smaller weight factor can be assigned. That is, the larger weighting factor mentioned in the embodiment of the present application refers to the weighting factor but the weighting factor assigned to the sample group corresponding to the zoomed search area in the zooming direction opposite to the change trend.
  • the weighting factor assigned to the maximum matching degree of the sample group corresponding to bigger scaeld ROI is a
  • the weighting factor assigned to the maximum matching degree of the sample group corresponding to samller scaeld ROI is b.
  • the weight factor corresponding to based ROI should be greater than the weight factor corresponding to bigger scaild ROI (denoted as bigger factor) and the weight factor corresponding to smaller scaled ROI (denoted as Is the smaller factor).
  • the value of based factor is 1.0
  • the value of bigger factor and maller factor is less than 1.0, and then, on this basis, the difference between bigger factor and maller factor is reflected.
  • the maximum matching degree corresponding to the specific sample group can be adjusted by the weighting factor, and then the maximum matching degree is compared between multiple sample groups to obtain the best matching degree .
  • the sample corresponding to the best matching degree can be used to determine the position and size of the specific target in the current frame.
  • the third embodiment combines the specific technical solution provided in the first embodiment with the traffic scene, and provides a target tracking method in the traffic scene.
  • the method may specifically include:
  • S701 Obtain an image sequence collected by a camera equipped with a roadside device
  • Roadside equipment can be pre-deployed on one or both sides of the road.
  • Cameras can be installed in it to collect image sequences of the road and surrounding environment. The collected results can be submitted to the cloud server. Image sequence for target tracking.
  • S702 In the process of tracking the target in the image sequence, according to the movement direction information of the target in the vertical direction in the multiple frames of images that have been tracked, compare the target in the current frame with respect to the previous one. Predict the change trend of the frame scale;
  • the target can be directly tracked in the vertical direction in the multi-frame images that have been tracked.
  • the movement direction information predicts the scale change trend of the target in the current frame relative to the previous frame. For example, if the target moves downward in the vertical direction of the image, the target scale tends to become larger, otherwise, if the target moves upward in the vertical direction of the image, the target scale tends to become smaller.
  • real-time target tracking processing can usually be performed according to the image sequence collected in real time. Therefore, the requirements for hardware and computing resources are usually relatively high. In fact, the complexity of target tracking may be different for different situations. Therefore, if computing resources are used to process target tracking in various situations, a certain degree of waste of computing resources may be caused. For example, in the case of relatively congested traffic, there are many cars on the road, the number of targets that need to be tracked may also be more, and because the image picture is more cluttered, the tracking complexity will be higher; and in the case of smooth traffic, There are fewer cars on the road, and the number of targets to be tracked and the complexity are also reduced. In addition, for some road sections, the road conditions may change with time.
  • the computing resources allocated to the target tracking algorithm can also be dynamically adjusted during the target tracking process. For example, in one manner, computing resources for target tracking can be dynamically allocated according to the number of targets to be tracked in the image sequence. Or, in another manner, it is also possible to dynamically allocate computing resources for target tracking according to whether the acquisition time of the image sequence belongs to the peak traffic flow period, and so on.
  • the fourth embodiment combines the specific technical solution provided in the second embodiment with the traffic scene, and provides a target tracking method in the traffic scene.
  • the method may specifically include:
  • S802 In the process of tracking the target in the image sequence, according to the movement direction information of the target in the vertical direction in the multiple frames of images that have been tracked, the target is compared to the previous one in the current frame. Predict the change trend of the frame scale;
  • S803 Obtain maximum matching degree information for a plurality of sample groups respectively, the plurality of sample groups include sample groups obtained by performing cyclic shifts respectively according to a plurality of zoom search areas, and the zoom search area includes a plurality of zooms in different zoom directions. Search area
  • the fifth embodiment combines the specific technical solution provided in the first embodiment with the human-computer interaction scenario to provide a target tracking method in the human-computer interaction scenario.
  • the method may specifically include:
  • S901 Obtain the image sequence collected by the terminal device in real time during the human-computer interaction process
  • the specific human-computer interaction may include an interaction performed by collecting and tracking a specific target image, the specific target including a gesture image or a face image, and so on. For example, by tracking certain shapes of gestures to provide interactive materials, and so on. Therefore, this kind of target tracking is based on the image sequence acquired in real time.
  • the specific target tracking program or module can be run directly on the client terminal of the terminal device. Of course, it can also run on the server side. At this time, Real-time data transmission can be carried out between the client and the server.
  • the camera Because in the process of human-computer interaction to collect gestures or face images, the camera usually shoots the target from a parallel perspective. Therefore, the size change information of the target in the multi-frame image that has been tracked can be used to determine the size of the target.
  • the target's scale change trend in the current frame relative to the previous frame is predicted.
  • the sixth embodiment combines the specific technical solution provided in the first embodiment with the human-computer interaction scenario to provide a target tracking method in the human-computer interaction scenario.
  • the method may specifically include:
  • S1003 Obtain maximum matching degree information for a plurality of sample groups respectively, the plurality of sample groups include sample groups obtained by performing cyclic shifts respectively according to a plurality of zoom search regions, and the zoom search region includes a plurality of zooms in different zoom directions Search area
  • this embodiment of the present application also provides a target tracking device.
  • the device may include:
  • the scale change trend prediction unit 1101 is used to predict the scale change trend of the target in the current frame relative to the previous frame during the target tracking process of the image sequence;
  • the zoom search area quantity control unit 1102 is configured to reduce the number of zoom search areas corresponding to the opposite trend according to the prediction result when the zoom search area is generated for the current frame.
  • scale change trend prediction unit may be specifically used for:
  • the scale of the target tends to become larger. If the target moves above the image, the target's The scale has a tendency to become smaller.
  • the image sequence includes: an image sequence captured by a camera installed on a roadside device in a traffic scene;
  • the scale change trend prediction unit may be specifically used for:
  • the scale change trend of the target is determined according to the situation of entering or leaving the camera field of view.
  • scale change trend prediction unit may also be used for:
  • the change trend of the scale is predicted.
  • the change trend of the scale is predicted according to the configuration information in the camera configuration file that only captures the direction of the vehicle coming or going.
  • the zoom search area quantity control unit may be specifically used for:
  • zoom search area quantity control unit may also be used for:
  • the device may also include:
  • the accuracy adjustment unit is used to adjust the accuracy of the target tracking algorithm according to the different security requirements of the different scenes where the target tracking is located.
  • the embodiment of the present application also provides a target tracking device.
  • the device may include:
  • the scale change trend prediction unit 1201 is used to predict the scale change trend of the target in the current frame relative to the previous frame in the process of target tracking of the image sequence;
  • the maximum matching degree within a group obtaining unit 1202 is configured to obtain maximum matching degree information for a plurality of sample groups, the plurality of sample groups including sample groups obtained by performing cyclic shifts respectively according to a plurality of zoom search regions, the zoom search The area includes multiple zoom search areas with different zoom directions;
  • the weight factor assigning unit 1203 is configured to assign a larger weight factor to the sample group corresponding to the zoom search area in the zoom direction with the same change trend according to the prediction result, so as to determine the best matching degree.
  • the larger weighting factor is greater than the weighting factor assigned to the sample group corresponding to the zoomed search area in the zooming direction opposite to the changing trend.
  • the weighting factors assigned to the sample group corresponding to the zoomed search area are all smaller than the weighting factors assigned to the sample group corresponding to the basic search area.
  • the weight factor allocation unit may also be used for:
  • the embodiment of the present application also provides a target tracking device in a traffic scene.
  • the device may include:
  • the first image sequence obtaining unit 1301 is configured to obtain the image sequence collected by the camera equipped with the roadside equipment;
  • the scale change trend prediction unit 1302 is used for tracking the target in the image sequence according to the vertical movement direction information of the target in the multi-frame images that have been tracked, to determine the position of the target Predict the scale change trend in the current frame relative to the previous frame;
  • the zoom search area quantity control unit 1303 is configured to reduce the number of zoom search areas corresponding to the opposite trend according to the prediction result when the zoom search area is generated for the current frame.
  • the device may also include:
  • the first computing resource allocation unit is configured to dynamically allocate computing resources for target tracking according to the number of targets to be tracked in the image sequence.
  • the first computing resource allocation unit is configured to dynamically allocate computing resources for target tracking according to whether the acquisition time of the image sequence belongs to the peak traffic flow period.
  • the embodiment of the present application also provides a target tracking device in a traffic scene.
  • the device may include:
  • the first image sequence obtaining unit 1401 is configured to obtain the image sequence collected by the camera equipped with the roadside equipment;
  • the scale change trend prediction unit 1402 is configured to, in the process of target tracking of the image sequence, according to the vertical movement direction information of the target in the multi-frame images that have been tracked, the target is Predict the scale change trend in the current frame relative to the previous frame;
  • the maximum matching degree obtaining unit 1403 in a group is configured to obtain maximum matching degree information for a plurality of sample groups, the plurality of sample groups including sample groups obtained by performing cyclic shifts respectively according to a plurality of zoom search regions, and the zoom search
  • the area includes multiple zoom search areas with different zoom directions;
  • the weight factor assignment unit 1404 is configured to assign a larger weight factor to the sample group corresponding to the zoom search area in the zoom direction with the same change trend according to the prediction result, so as to determine the best matching degree.
  • this embodiment of the present application also provides a target tracking device in a human-computer interaction scenario.
  • the device may include:
  • the second image sequence obtaining unit 1501 is configured to obtain the image sequence collected by the terminal device in real time during the human-computer interaction process
  • the scale change trend prediction unit 1502 is configured to, in the process of tracking the target in the image sequence, according to the size change information of the target in the multi-frame image that has been tracked, compare the target in the current frame Predict the scale change trend of the previous frame;
  • the zoom search area quantity control unit 1503 is configured to reduce the number of zoom search areas corresponding to the opposite trend according to the prediction result when the zoom search area is generated for the current frame.
  • the human-computer interaction includes interaction performed by collecting and tracking a specific target image, and the specific target includes a gesture image or a face image.
  • the embodiment of the present application also provides a target tracking device in a human-computer interaction scenario.
  • the device may include:
  • the second image sequence obtaining unit 1601 is configured to obtain the image sequence collected by the terminal device in real time during the human-computer interaction process
  • the scale change trend prediction unit 1602 is used to perform target tracking on the image sequence, according to the size change information of the target in the multi-frame images that have been tracked, to compare the target with respect to the current frame Predict the scale change trend of the previous frame;
  • the maximum matching degree within a group obtaining unit 1603 is configured to obtain maximum matching degree information for a plurality of sample groups, the plurality of sample groups including sample groups obtained by performing cyclic shifts respectively according to a plurality of zoom search regions, and the zoom search
  • the area includes multiple zoom search areas with different zoom directions;
  • the weight factor assignment unit 1604 is configured to assign a larger weight factor to the sample group corresponding to the zoom search area in the zoom direction with the same change trend according to the prediction result, so as to determine the best matching degree.
  • an embodiment of the present application also provides a computer system, including:
  • One or more processors are One or more processors.
  • a memory associated with the one or more processors where the memory is used to store program instructions, and when the program instructions are read and executed by the one or more processors, perform the following operations:
  • One or more processors are One or more processors.
  • a memory associated with the one or more processors where the memory is used to store program instructions, and when the program instructions are read and executed by the one or more processors, perform the following operations:
  • the plurality of sample groups include sample groups obtained by performing cyclic shifts respectively according to a plurality of zoom search areas, and the zoom search area includes a plurality of zoom search areas with different zoom directions ;
  • a larger weighting factor is assigned to the sample group corresponding to the zoom search area in the zoom direction with the same change trend, so as to determine the best matching degree.
  • FIG. 17 exemplarily shows the architecture of a computer system, which may specifically include a processor 1710, a video display adapter 1711, a disk drive 1712, an input/output interface 1713, a network interface 1714, and a memory 1720.
  • the processor 1710, the video display adapter 1711, the disk drive 1712, the input/output interface 1713, the network interface 1714, and the memory 1720 may be communicatively connected through the communication bus 1730.
  • the processor 1710 may be implemented by a general CPU (Central Processing Unit, central processing unit), a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc., for Perform relevant procedures to realize the technical solutions provided in this application.
  • a general CPU Central Processing Unit, central processing unit
  • a microprocessor e.g., central processing unit
  • ASIC Application Specific Integrated Circuit
  • the memory 1720 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory, random access memory), static storage device, dynamic storage device, etc.
  • the memory 1720 may store an operating system 1721 for controlling the operation of the electronic device 1700, and a basic input output system (BIOS) 1722 for controlling the low-level operation of the electronic device 1700.
  • BIOS basic input output system
  • a web browser 1723, a data storage management system 1724, and a target tracking processing system 1725 can also be stored.
  • the above-mentioned target tracking processing system 1725 may be an application program that specifically implements the operations of the foregoing steps in the embodiment of the present application. In short, when the technical solution provided by the present application is implemented through software or firmware, the related program code is stored in the memory 1720 and is called and executed by the processor 1710.
  • the input/output interface 1713 is used to connect input/output modules to realize information input and output.
  • the input/output/module can be configured in the device as a component (not shown in the figure), or it can be connected to the device to provide corresponding functions.
  • the input device may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and an output device may include a display, a speaker, a vibrator, an indicator light, and the like.
  • the network interface 1714 is used to connect a communication module (not shown in the figure) to realize the communication interaction between the device and other devices.
  • the communication module can realize communication through wired means (such as USB, network cable, etc.), or through wireless means (such as mobile network, WIFI, Bluetooth, etc.).
  • the bus 1730 includes a path to transmit information between various components of the device (for example, the processor 1710, the video display adapter 1711, the disk drive 1712, the input/output interface 1713, the network interface 1714, and the memory 1720).
  • various components of the device for example, the processor 1710, the video display adapter 1711, the disk drive 1712, the input/output interface 1713, the network interface 1714, and the memory 1720.
  • the electronic device 1700 can also obtain information about specific receiving conditions from the virtual resource object receiving condition information database 1741 for condition determination, and so on.
  • the above device only shows the processor 1710, the video display adapter 1711, the disk drive 1712, the input/output interface 1713, the network interface 1714, the memory 1720, the bus 1730, etc., in the specific implementation process, the The equipment may also include other components necessary for normal operation.
  • the above-mentioned device may also include only the components necessary to implement the solution of the present application, and not necessarily include all the components shown in the figure.

Abstract

L'invention concerne un procédé de suivi de cible, un dispositif, et un système informatique. Ledit procédé consiste : pendant le suivi de cible d'une séquence d'images, à prédire la tendance de variation d'échelle d'une cible dans la trame actuelle par rapport à une trame précédente (S501) ; et lorsque des zones de recherche d'échelle de la trame actuelle sont générées, à réduire, en fonction du résultat de prédiction, le nombre de zones de recherche d'échelle correspondant à la tendance opposée (S502). Le procédé réduit la complexité temporelle du suivi de cible.
PCT/CN2020/113895 2019-09-12 2020-09-08 Procédé de suivi de cible, dispositif, et système informatique WO2021047492A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910865797.4A CN112489077A (zh) 2019-09-12 2019-09-12 目标跟踪方法、装置及计算机系统
CN201910865797.4 2019-09-12

Publications (1)

Publication Number Publication Date
WO2021047492A1 true WO2021047492A1 (fr) 2021-03-18

Family

ID=74866853

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/113895 WO2021047492A1 (fr) 2019-09-12 2020-09-08 Procédé de suivi de cible, dispositif, et système informatique

Country Status (2)

Country Link
CN (1) CN112489077A (fr)
WO (1) WO2021047492A1 (fr)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113642413A (zh) * 2021-07-16 2021-11-12 新线科技有限公司 控制方法、装置、设备及介质
CN114663462A (zh) * 2022-04-07 2022-06-24 北京远度互联科技有限公司 目标跟踪方法、装置、电子设备及存储介质
CN115690767B (zh) * 2022-10-26 2023-08-22 北京远度互联科技有限公司 车牌识别方法、装置、无人机及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102117487A (zh) * 2011-02-25 2011-07-06 南京大学 一种针对视频运动目标的尺度方向自适应Mean-shift跟踪方法
CN103927508A (zh) * 2013-01-11 2014-07-16 浙江大华技术股份有限公司 一种目标车辆跟踪方法及装置
US20180268559A1 (en) * 2017-03-16 2018-09-20 Electronics And Telecommunications Research Institute Method for tracking object in video in real time in consideration of both color and shape and apparatus therefor
CN109087333A (zh) * 2018-06-14 2018-12-25 中国科学院福建物质结构研究所 基于相关性滤波跟踪算法的目标尺度估计方法及其装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102117487A (zh) * 2011-02-25 2011-07-06 南京大学 一种针对视频运动目标的尺度方向自适应Mean-shift跟踪方法
CN103927508A (zh) * 2013-01-11 2014-07-16 浙江大华技术股份有限公司 一种目标车辆跟踪方法及装置
US20180268559A1 (en) * 2017-03-16 2018-09-20 Electronics And Telecommunications Research Institute Method for tracking object in video in real time in consideration of both color and shape and apparatus therefor
CN109087333A (zh) * 2018-06-14 2018-12-25 中国科学院福建物质结构研究所 基于相关性滤波跟踪算法的目标尺度估计方法及其装置

Also Published As

Publication number Publication date
CN112489077A (zh) 2021-03-12

Similar Documents

Publication Publication Date Title
Wan et al. Edge computing enabled video segmentation for real-time traffic monitoring in internet of vehicles
WO2021047492A1 (fr) Procédé de suivi de cible, dispositif, et système informatique
US11062123B2 (en) Method, terminal, and storage medium for tracking facial critical area
CN109284670B (zh) 一种基于多尺度注意力机制的行人检测方法及装置
US9179071B2 (en) Electronic device and image selection method thereof
WO2018113523A1 (fr) Dispositif et procédé de traitement d'images et support d'informations
WO2019114036A1 (fr) Procédé et dispositif de détection de visage, dispositif informatique et support d'informations lisible par ordinateur
US9224211B2 (en) Method and system for motion detection in an image
WO2017079522A1 (fr) Réseaux neuronaux convolutionnels sensibles à une sous-catégorie et permettant une détection d'objets
CN113286194A (zh) 视频处理方法、装置、电子设备及可读存储介质
US20160217326A1 (en) Fall detection device, fall detection method, fall detection camera and computer program
CN105930822A (zh) 一种人脸抓拍方法及系统
CN111709310A (zh) 一种基于深度学习的手势跟踪与识别方法
WO2021249114A1 (fr) Dispositif de suivi de cible et procédé de suivi de cible
CN113378641B (zh) 基于深度神经网络和注意力机制的手势识别方法
CN111291587A (zh) 一种基于密集人群的行人检测方法、存储介质及处理器
CN112287802A (zh) 人脸图像检测方法、系统、存储介质及设备
CN111008994A (zh) 基于MPSoC的运动目标实时检测跟踪系统及方法
CN116434159A (zh) 一种基于改进YOLO V7和Deep-Sort的交通流量统计方法
CN114169425A (zh) 训练目标跟踪模型和目标跟踪的方法和装置
CN110309790B (zh) 一种用于道路目标检测的场景建模方法和装置
CN112734747A (zh) 一种目标检测方法、装置、电子设备和存储介质
JP6384167B2 (ja) 移動体追跡装置及び移動体追跡方法、並びにコンピュータ・プログラム
CN112580435A (zh) 人脸定位方法、人脸模型训练与检测方法及装置
CN111915713A (zh) 一种三维动态场景的创建方法、计算机设备、存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20862537

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20862537

Country of ref document: EP

Kind code of ref document: A1