WO2022222227A1 - Target detection network-based target tracking method and apparatus, device, and medium - Google Patents

Target detection network-based target tracking method and apparatus, device, and medium Download PDF

Info

Publication number
WO2022222227A1
WO2022222227A1 PCT/CN2021/096757 CN2021096757W WO2022222227A1 WO 2022222227 A1 WO2022222227 A1 WO 2022222227A1 CN 2021096757 W CN2021096757 W CN 2021096757W WO 2022222227 A1 WO2022222227 A1 WO 2022222227A1
Authority
WO
WIPO (PCT)
Prior art keywords
detected
image
position information
layer
target
Prior art date
Application number
PCT/CN2021/096757
Other languages
French (fr)
Chinese (zh)
Inventor
赵娅琳
陆进
刘玉宇
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022222227A1 publication Critical patent/WO2022222227A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/277Analysis of motion involving stochastic approaches, e.g. using Kalman filters

Definitions

  • the present application relates to the technical field of detection models, and in particular, to a target tracking method, apparatus, device and medium based on a target detection network.
  • Multi-target tracking technology is used in many application fields, such as motion correction, security monitoring, unmanned driving, etc.
  • security monitoring system it is a common task to accurately locate and track the target.
  • Embodiments of the present application provide a target tracking method, apparatus, device, and medium based on a target detection network, so as to solve the problem of low accuracy of multi-target tracking.
  • a target tracking method based on target detection network comprising:
  • the Kalman filter model is used to predict the first predicted position information of the target object in the second to-be-detected image, and the first ROI area corresponding to the first predicted position information is determined;
  • the second position information extract the ROI area on the second to-be-detected image to obtain a second ROI area
  • the first tracking matching result of the target object in the second image to be detected is determined by the Hungarian algorithm.
  • a target tracking device based on a target detection network comprising:
  • the first position information acquisition module is used to acquire the first position information of the target object in the first image to be detected, and the second position information of the target object in the second image to be detected;
  • the second image to be detected refers to An image in the video to be detected that is time-adjacent to the first image to be detected and located after the first image to be detected;
  • the first position information prediction module is used to predict the first predicted position information of the target object in the second to-be-detected image by using the Kalman filter model according to the first position information, and determine the first predicted position information corresponding to the first predicted position information.
  • a first ROI region extraction module configured to perform ROI region extraction on the second to-be-detected image according to the second position information to obtain a second ROI region
  • the first position coincidence degree determination module is used to determine the first minimum cosine distance between the first ROI area and the second ROI area, and at the same time determine the difference between the second position information and the first predicted position information.
  • the first position coincidence degree between is used to determine the first minimum cosine distance between the first ROI area and the second ROI area, and at the same time determine the difference between the second position information and the first predicted position information.
  • the first tracking matching module is configured to determine the first tracking matching result of the target object in the second image to be detected by using the Hungarian algorithm according to the first minimum cosine distance and the first position coincidence degree.
  • a computer device comprising a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, the processor implementing the following steps when executing the computer-readable instructions:
  • the Kalman filter model is used to predict the first predicted position information of the target object in the second to-be-detected image, and the first ROI area corresponding to the first predicted position information is determined;
  • the second position information extract the ROI area on the second to-be-detected image to obtain a second ROI area
  • the first tracking matching result of the target object in the second image to be detected is determined by the Hungarian algorithm.
  • One or more readable storage media storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the following steps:
  • the Kalman filter model is used to predict the first predicted position information of the target object in the second to-be-detected image, and the first ROI area corresponding to the first predicted position information is determined;
  • the second position information extract the ROI area on the second to-be-detected image to obtain a second ROI area
  • the first tracking matching result of the target object in the second image to be detected is determined by the Hungarian algorithm.
  • the above-mentioned target tracking method, device, device and medium based on target detection network obtains the first position information of the target object in the first image to be detected, and the second position information of the target object in the second image to be detected ;
  • the second to-be-detected image refers to an image that is time-adjacent to the first to-be-detected image in the video to be detected and the time is after the first to-be-detected image;
  • the Mann filter model predicts the first predicted position information of the target object in the second to-be-detected image, and determines the first ROI area corresponding to the first predicted position information; 2.
  • the application can better use the shallow features in the target detection network as the appearance features of target tracking, and then use the Kalman filter model to determine the appearance features of the target detection network, so that it is possible to The position information of the target object in the image to be detected in the next frame is predicted by the appearance feature, and the calculation speed is fast, and the accuracy of the target tracking is improved.
  • FIG. 1 is a schematic diagram of an application environment of a target tracking method based on a target detection network in an embodiment of the present application
  • FIG. 2 is a flowchart of a target tracking method based on a target detection network in an embodiment of the present application
  • FIG. 3 is a flowchart of a target tracking method based on a target detection network in an embodiment of the present application
  • FIG. 4 is a flowchart of a target tracking method based on a target detection network in an embodiment of the present application
  • FIG. 5 is a schematic block diagram of a target tracking device based on a target detection network according to an embodiment of the present application
  • FIG. 6 is a schematic block diagram of a first location information acquisition module in a target tracking device based on a target detection network according to an embodiment of the present application
  • FIG. 7 is a schematic block diagram of a target detection sub-module in a target tracking device based on a target detection network according to an embodiment of the present application
  • FIG. 8 is a schematic diagram of a computer device in an embodiment of the present application.
  • the target tracking method based on the target detection network can be applied in the application environment shown in FIG. 1 .
  • the target tracking method based on the target detection network is applied in the target tracking system based on the target detection network.
  • the target tracking system based on the target detection network includes a client and a server as shown in FIG. 1 , and the client and the server pass through It communicates with the network and is used to solve the problem of low accuracy of multi-target tracking.
  • the client also known as the client, refers to the program corresponding to the server and providing local services for the client.
  • Clients can be installed on, but not limited to, various personal computers, laptops, smartphones, tablets, and portable wearable devices.
  • the server can be implemented as an independent server or a server cluster composed of multiple servers.
  • a target tracking method based on a target detection network is provided, and the method is applied to the server in FIG. 1 as an example for description, including the following steps:
  • S10 Acquire first position information of the target object in the first image to be detected, and second position information of the target object in the second image to be detected;
  • the first to-be-detected images are adjacent in time and located after the first to-be-detected images.
  • both the first image to be detected and the second image to be detected can be images of any two consecutive frames in the video to be detected (for example, assuming that the first image to be detected is the first frame of image in the video to be detected, the second The image to be detected is the second frame image in the video to be detected), and the video to be detected can be selected according to specific application scenarios.
  • the first image to be detected and the second image to be detected can be from surveillance video. The image of two consecutive frames selected in .
  • the first position information refers to the position where the target object appears in the first image to be detected (exemplarily, the first position information may also include information such as the movement direction and movement speed of the target object), and the position may After the target detection is performed on the image to be detected, it is represented by a set of all image blocks corresponding to the target object; similarly, the second position information refers to the position where the target object appears in the second image to be detected; optionally, the target object It can be one target individual or multiple target individuals.
  • step S10 includes:
  • S101 Acquire a video to be detected, where the video to be detected includes multiple frames of images to be detected.
  • S102 Record the image to be detected in any frame of the video to be detected as the first image to be detected.
  • the video to be detected contains multiple frames of images to be detected. Since the frame rate of the images to be detected in the video to be detected is generally more than 25 frames per second, and the images to be detected in two consecutive frames may be relatively close, the target object changes. It is unlikely that the image to be detected is detected, so when the target detection is performed on each image to be detected in the video to be detected, the image to be detected can be obtained at intervals, that is, the image to be detected can be obtained from the video to be detected at intervals of n frames, and then the image to be detected can be obtained.
  • the acquired images to be detected form a composite video in chronological order; in the composite video, an image to be detected can be arbitrarily selected as the first image to be detected, and at this time the second image to be detected is the composite video and the selected first image.
  • a to-be-detected image adjacent to and behind the first to-be-detected image (the second to-be-detected image is not adjacent to the first to-be-detected image in the to-be-detected video and behind the first to-be-detected image , but the image to be detected after the interval of n frames from the first image to be detected).
  • n may be selected according to a specific application scenario, for example, n may be 4, 5, etc.
  • the efficiency of target detection on the images to be detected can be improved, and the calculation pressure of the computer can be reduced. Further, if the computational pressure of the computer is not considered, target detection can also be performed on all the images to be detected in the video to be detected.
  • S103 Perform target detection on the first image to be detected through a target detection network to obtain the first position information; meanwhile, perform target detection on the second image to be detected through a target detection network to obtain the second position information .
  • the second to-be-detected image is determined according to the first to-be-detected image, so as to record the first to-be-detected image
  • the second to-be-detected image is sequentially input into the target detection network, and then the target detection network is used to perform target detection on the first to-be-detected image to obtain first position information; the target detection network is used to perform target detection on the second to-be-detected image, Obtain second location information.
  • step S103 target detection is performed on the first image to be detected through a target detection network to obtain the first position information, including:
  • S1031 Input the first image to be detected into a backbone network in the target detection network, so as to perform downsampling processing on the first image to be detected, to obtain a plurality of images to be detected corresponding to the first image to be detected Detect feature layers.
  • the backbone network may adopt a Darknet network, a Resnet network, or the like.
  • the down-sampling process refers to reducing the first image to be detected, that is, the size of the image obtained after down-sampling is smaller than the size of the first image to be detected.
  • the first to-be-detected image is input into the target detection network, and the first to-be-detected image is down-sampled through the backbone network in the target detection network,
  • the first image to be detected is down-sampled five times through the backbone network to obtain five feature layers of different sizes to be detected.
  • the feature layer to be detected is obtained after sampling the first image to be detected for the first time, the feature layer to be detected is compared with the first image to be detected, and the size of the feature layer to be detected is the th
  • the size of a to-be-detected image is half (that is, the length and width are both reduced by half), but the number of channels of the to-be-detected feature layer is twice the number of channels of the first to-be-detected image.
  • the first image to be detected is down-sampled five times through the backbone network. After that, the feature layers to be detected obtained by the first down-sampling will be discarded, and the total number of feature layers to be detected finally adopted in this embodiment is four.
  • S1032 Perform layer processing on each of the feature layers to be detected in sequence to obtain a target feature layer corresponding to the first image to be detected.
  • the feature layers to be detected include a first feature layer, a second feature layer, a third feature layer, and a fourth feature layer; step S1032 includes:
  • Convolution processing is performed on the fourth feature layer, and the fourth feature layer after the convolution processing is up-sampled to obtain a fifth feature layer having the same dimension as the third feature layer.
  • the fourth feature layer is the layer obtained by the last downsampling in the five downsampling of the image to be detected, that is, the size of the fourth feature layer is the smallest;
  • the detection image is input to the backbone network in the target detection network, so as to perform down-sampling processing on the first image to be detected, and after obtaining a plurality of feature layers to be detected corresponding to the first image to be detected, the first image to be detected is obtained. Convolution processing is performed on four feature layers.
  • step S1031 convolution processing is performed on the fourth feature layer through a convolutional network with a convolution kernel of 3*3, so that the fourth feature layer has the same number of channels as the third feature layer (
  • step S1031 it is pointed out that the size of the to-be-detected feature layer obtained after each downsampling is halved and the number of channels is doubled compared with the to-be-detected feature layer before downsampling), and the fourth feature layer after convolution processing is processed.
  • Perform upsampling so that the fourth feature layer after upsampling and convolution processing has the same size and the same number of channels as the third feature layer, so as to obtain the fifth feature with the same dimension as the third feature layer. layer.
  • the fifth feature layer and the third feature layer are dimensionally superimposed to obtain a first superimposed layer
  • convolution processing is performed on the first superimposed layer, and the first superimposed layer after the convolution processing is processed.
  • the layer is upsampled to obtain a sixth feature layer with the same dimension as the second feature layer.
  • the first superimposed layer is subjected to convolution processing, so that the number of channels of the first superimposed layer and the second feature layer is The same, and upsampling the first overlay layer after convolution, so that the first overlay layer after upsampling and convolution processing has the same size and the same number of channels as the second feature layer, so as to obtain A sixth feature layer with the same dimensions as the second feature layer.
  • the sixth feature layer and the second feature layer are dimensionally superimposed to obtain a second superimposed layer
  • convolution processing is performed on the second superimposed layer, and the second superimposed layer after the convolution processing is processed.
  • the layer is upsampled to obtain a seventh feature layer with the same dimension as the first feature layer.
  • the second superimposed layer is subjected to convolution processing, so that the number of channels between the second superimposed layer and the first feature layer is The same, and upsampling the first overlay layer after convolution, so that the first overlay layer after upsampling and convolution processing has the same size and the same number of channels as the first feature layer, so as to obtain A seventh feature layer with the same dimensions as the first feature layer.
  • the seventh feature layer and the first feature layer are dimensionally superimposed to obtain a third superimposed layer
  • convolution processing is performed on the third superimposed layer, and the third superimposed layer after the convolution processing is processed.
  • the layer is upsampled to obtain the target feature layer.
  • the seventh feature layer having the same dimension as the first feature layer
  • convolution processing is performed on the third superimposed layer, so that the number of channels of the third superimposed layer is doubled, and the convolution processing is performed.
  • the third superimposed layer is up-sampled, so that the size of the third superimposed layer after convolution processing is doubled, so as to obtain the target feature layer.
  • S1303 Perform location feature extraction on the target feature layer to obtain the first location information.
  • position feature extraction is performed on the target feature layer, that is, in the target feature layer.
  • the pixel frame associated with the target object is extracted, and then the first position information of the target object on the first image to be detected is obtained.
  • S20 According to the first position information, use a Kalman filter model to predict the first predicted position information of the target object in the second to-be-detected image, and determine a first ROI area corresponding to the first predicted position information.
  • the Kalman filter model is a state estimation model using Kalman filter, and the Kalman filter model is used according to the position information (such as the first image to be detected) of the target object on the image to be detected in the previous frame (such as the first image to be detected). a position information), to predict the position information of the target object in the next frame of the image to be detected (eg, the second image to be detected). Further, the Kalman filter model needs to be trained by the to-be-detected images of the first k frames in the video to be detected, so that the Kalman filter model can better predict the position information of the target object when it moves in step S20, thereby improving the performance of the Kalman filter model. target tracking accuracy.
  • a Kalman filter model is used to predict the position information of the target object in the second image to be detected, That is, the first predicted position information, and in the second to-be-detected image, an area associated with the first predicted position information, that is, the first ROI area, is extracted.
  • S40 Determine a first minimum cosine distance between the first ROI region and the second ROI region, and simultaneously determine a first position coincidence degree between the second position information and the first predicted position information.
  • the first minimum cosine distance is used to characterize the feature similarity between the first ROI area and the second ROI area; the first position coincidence is used to characterize the position between the second position information and the first predicted position information. similarity.
  • the value range of the first minimum cosine distance can be 0 to 1, the larger the first minimum cosine distance, the higher the degree of similarity of features between the first ROI area and the second ROI area; at the same time, the second position information is determined.
  • the value range of the first position coincidence degree can also be 0 to 1. The higher the first position coincidence degree is, it represents the second position information and the first predicted position. The greater the degree of correlation between the information.
  • step S40 the determining the first position coincidence degree between the second position information and the first predicted position information includes:
  • intersection position information between the second position information and the first predicted position information is determined, and the union position information between the second position information and the first predicted position information is simultaneously determined.
  • intersection position information refers to the shared position information between the second position information and the first predicted position information; the union position information refers to all the position information of the second position information and the first predicted position information, and also That is, the shared location information and the uniquely owned location information are included.
  • the position coincidence degree is determined according to the intersection position information and the union position information.
  • the position coincidence degree can be determined according to the following expression:
  • C is the position coincidence degree
  • A is the second position information
  • B is the first predicted position information
  • is the union position information
  • is the intersection position information.
  • S50 According to the first minimum cosine distance and the first position coincidence degree, determine the first tracking matching result of the target object in the second to-be-detected image by using the Hungarian algorithm.
  • the first tracking matching result may be a result indicating that the matching is successful, that is, indicating that the first predicted position information matches the second position information (for example, the first predicted position information contains the characteristics of the target object, and the second position The information also contains the characteristics of the target object); it can also represent the result of the matching failure, that is, the first predicted position information does not match the second position information (for example, the first predicted position information contains the characteristics of the target object, while the first predicted position information does not match the second position information.
  • the second position information does not include the feature of the target object; or the first predicted position information does not include the feature of the target object, and the second position information includes the feature of the target object).
  • the first minimum cosine distance between the first ROI area and the second ROI area is determined, and at the same time, the first position coincidence between the second position information and the first predicted position information is determined.
  • the first tracking matching result of the target object in the second to-be-detected image is determined by the Hungarian algorithm.
  • a preset cosine threshold for example, set to 0.9, 0.95, etc.
  • the first position coincidence degree is greater than or equal to the preset position coincidence threshold, it is determined that the first tracking matching result is a successful matching result; if the first position coincidence degree is greater than or equal to the preset position coincidence threshold If the minimum cosine distance is smaller than the preset cosine threshold, and/or the first position coincidence degree is smaller than the preset position coincidence threshold, it is determined that the first tracking matching result is a matching failure result.
  • the matching result can be determined.
  • the first tracking and matching result, the corresponding first image to be detected and the second image to be detected are used as training samples for training the Kalman filter model and improving the prediction accuracy of the Kalman filter model.
  • the following loss function can be used to constrain the target detection network:
  • L is the loss function of the target detection network
  • L 1 is the focus loss function of the target detection network
  • L 2 is the position loss function
  • L 3 is the pixel offset loss function ; is 1
  • ⁇ 2 is the weight of the pixel offset loss function (the value can be 0.1).
  • L 1 can be characterized by the following expression:
  • L2 can be characterized by the following expression :
  • A is the second position information
  • B is the first predicted position information
  • is the union position information
  • is the intersection position information
  • is the minimum closure area.
  • L3 refers to the offset value of the pixel during the downsampling process for the first image to be detected in step S1031 (similarly, the downsampling process also needs to be performed for the second image to be detected).
  • the shallow features in the target detection network can be better used as the appearance features of target tracking, and then the appearance determined by the target detection network through the Kalman filter model feature, so that the position information of the target object in the next frame of the image to be detected can be predicted by the appearance feature, and the calculation speed is fast, which improves the accuracy of target tracking.
  • step S50 that is, after determining the first tracking matching result of the target object in the second image to be detected by using the Hungarian algorithm, the method includes:
  • the total number of matching failures is accumulated by one.
  • the total number of matching failures refers to the total number of times that the first tracking matching result is a matching failure result.
  • the third to-be-detected image refers to an image that is time-adjacent to the second to-be-detected image and located after the second to-be-detected image.
  • the preset failure threshold can be 3 times, 4 times, etc.
  • the preset detection time can be 2 minutes, 5 minutes, etc. It can be understood that the second image to be detected and the third image to be detected can be any two consecutive frames of images in the video to be detected.
  • the detected image is the second frame of the video to be detected, and the third to-be-detected image is the third frame of the video to be detected.
  • the preset detection time when the total number of matching failures is less than the preset failure threshold, it indicates that the matching error may be caused by the target object being temporarily occluded, and the target tracking of the to-be-detected images corresponding to the subsequent frames is continued. , and then obtain the third to-be-detected image in the video to be detected; input the third to-be-detected image into the target detection network, and perform target detection on the third to-be-detected image through the target detection network, and obtain the target object in the third to-be-detected image. third position information in the image.
  • a Kalman filter model is used to predict the second predicted position information of the target object in the third to-be-detected image, and a third ROI region corresponding to the second predicted position information is determined.
  • the Kalman filter model is used to predict the position of the target object in the third to-be-detected image, That is, the second predicted position information, and in the third to-be-detected image, an area associated with the second predicted position information, that is, the third ROI area, is extracted.
  • ROI region extraction is performed on the third to-be-detected image to obtain a fourth ROI region.
  • an area associated with the third position information is extracted in the third image to be detected, to obtain a fourth ROI area.
  • a second minimum cosine distance between the third ROI region and the fourth ROI region is determined, and at the same time, a second position coincidence degree between the second position information and the predicted position information is determined.
  • the second minimum cosine distance between the third ROI region and the fourth ROI region is determined , the value range of the second minimum cosine distance can be 0 to 1, and the larger the second minimum cosine distance is, the higher the degree of feature similarity between the third ROI area and the fourth ROI area is; at the same time, the third position information is determined.
  • the value range of the second position coincidence degree can also be 0 to 1. The higher the second position coincidence degree is, the third position information and the second predicted position are represented. The greater the degree of correlation between the information.
  • the second tracking matching result of the target object in the third image to be detected is determined by the Hungarian algorithm.
  • the second tracking matching result may be a result indicating that the matching is successful, that is, indicating that the second predicted position information matches the third position information (for example, the second predicted position information contains the characteristics of the target object, and the third position The information also contains the characteristics of the target object); it can also represent the result of the matching failure, that is, the second predicted position information does not match the third position information (for example, the second predicted position information contains the characteristics of the target object, and the first predicted position information does not match the third position information.
  • the third position information does not include the feature of the target object; or the second predicted position information does not include the feature of the target object, and the third position information includes the feature of the target object).
  • the second minimum cosine distance between the third ROI area and the fourth ROI area is determined, and at the same time, the second position coincidence between the third position information and the second predicted position information is determined.
  • the second tracking matching result of the target object in the third to-be-detected image is determined by the Hungarian algorithm.
  • a preset cosine threshold for example, set to 0.9, 0.95, etc.
  • the second position coincidence degree is greater than or equal to the preset position coincidence threshold, it is determined that the second tracking matching result is a successful matching result;
  • the two minimum cosine distances are less than the preset cosine threshold, and/or the second position coincidence degree is less than the preset position coincidence threshold, then it is determined that the second tracking matching result is a matching failure result.
  • the method includes:
  • the total number of matching failures is accumulated by one.
  • the tracking ID associated with the target object is deleted, and the end of tracking the target object is confirmed.
  • the tracking ID is a unique ID assigned to each target object before the target object is tracked. If the target object contains multiple target individuals, each target individual can be assigned a tracking ID.
  • the preset detection time when the total number of matching failures is greater than or equal to the preset failure threshold, it is understandable that within the preset detection time, that is, the tracking matching results corresponding to three consecutive target tracking are all matching failures, and then the The target object is not temporarily occluded in a short period of time to cause the matching failure, but the target object leaves the detection area, which causes the matching failure. Therefore, the tracking ID associated with the target object can be deleted. In subsequent target tracking, no need to Continuing to track the target object, that is, confirming that the tracking of the target object ends, can reduce the computational complexity of the computer. Further, within the preset detection time, when the total number of matching failures is greater than or equal to the preset failure threshold, the total number of matching failures may be cleared.
  • the target tracking accuracy by judging whether the total number of matching failures within the preset detection time is greater than or equal to the preset failure threshold, it is further determined whether the target object is temporarily blocked in a short time, or the target object is not in the detection area. target tracking accuracy.
  • a target tracking apparatus based on a target detection network corresponds to the target tracking method based on the target detection network in the above embodiment.
  • the target tracking device based on the target detection network includes a first position information acquisition module 10 , a first position information prediction module 20 , a first ROI region extraction module 30 , a first position coincidence degree determination module 40 , and a first position coincidence degree determination module 40 .
  • a tracking matching module 50 The detailed description of each functional module is as follows:
  • the first position information acquisition module 10 is used to obtain the first position information of the target object in the first image to be detected, and the second position information of the target object in the second image to be detected;
  • the second image to be detected is Refers to the image in the video to be detected that is adjacent to the first image to be detected and the time is after the first image to be detected;
  • the first position information prediction module 20 is configured to use a Kalman filter model to predict the first predicted position information of the target object in the second to-be-detected image according to the first position information, and determine the difference between the first predicted position and the target object. the first ROI area corresponding to the information;
  • the first ROI region extraction module 30 is configured to perform ROI region extraction on the second to-be-detected image according to the second position information to obtain a second ROI region;
  • the first position coincidence degree determination module 40 is configured to determine the first minimum cosine distance between the first ROI area and the second ROI area, and simultaneously determine the second position information and the first predicted position information The first position coincidence between;
  • the first tracking matching module 50 is configured to determine the first tracking matching result of the target object in the second image to be detected by using the Hungarian algorithm according to the first minimum cosine distance and the first position coincidence degree.
  • the first location information acquisition module 10 includes:
  • a sub-module 101 for obtaining a video to be detected configured to obtain a video to be detected, and the video to be detected includes multiple frames of images to be detected;
  • a to-be-detected image recording sub-module 102 configured to record the to-be-detected image in any frame of the to-be-detected video as the first to-be-detected image;
  • the target detection sub-module 103 is configured to perform target detection on the selected first image to be detected through a target detection network to obtain the first position information; meanwhile, target the second image to be detected through a target detection network detection to obtain the second position information.
  • the target detection sub-module 103 includes:
  • the downsampling processing unit 1031 is configured to input the first image to be detected into the backbone network in the target detection network, so as to perform downsampling processing on the first image to be detected, and obtain a Multiple feature layers to be detected corresponding to the image;
  • a layer processing unit 1032 configured to sequentially perform layer processing on each of the feature layers to be detected to obtain a target feature layer corresponding to the first image to be detected;
  • the location feature extraction unit 1033 is configured to perform location feature extraction on the target feature layer to obtain the first location information.
  • the feature layers to be detected include a first feature layer, a second feature layer, a third feature layer and a fourth feature layer;
  • the layer processing unit includes:
  • the first layer processing sub-unit is used to perform convolution processing on the fourth feature layer, and upsample the fourth feature layer after the convolution processing to obtain the same dimension as the third feature layer.
  • the fifth feature layer is used to perform convolution processing on the fourth feature layer, and upsample the fourth feature layer after the convolution processing to obtain the same dimension as the third feature layer.
  • the second layer processing subunit is configured to perform convolution processing on the first overlay layer after dimensionally overlaying the fifth feature layer and the third feature layer to obtain a first overlay layer, and up-sampling the first superimposed layer after convolution processing to obtain a sixth feature layer with the same dimension as the second feature layer;
  • the third layer processing subunit is configured to perform convolution processing on the second superimposed layer after dimensionally superimposing the sixth feature layer and the second feature layer to obtain a second superimposed layer, and up-sampling the second superimposed layer after convolution processing to obtain a seventh feature layer with the same dimension as the first feature layer;
  • the fourth layer processing subunit is configured to perform convolution processing on the third overlapping layer after dimensionally overlapping the seventh feature layer and the first feature layer to obtain a third overlapping layer, and up-sampling the third superimposed layer after convolution processing to obtain the target feature layer.
  • the first position coincidence degree determination module 40 includes:
  • an intersection position information determination submodule configured to determine the intersection position information between the second position information and the predicted position information, and simultaneously determine the union position between the second position information and the predicted position information information;
  • the position coincidence degree determination submodule is configured to determine the position coincidence degree according to the intersection position information and the union position information.
  • the target tracking device based on the target detection network further includes:
  • a first matching failure total number accumulation module configured to add one to the total number of matching failures when the first tracking matching result is a matching failure result
  • a second location information acquisition module configured to acquire the third image to be detected in the video to be detected, and the third image to be detected of the target object in the third image to be detected when the total number of matching failures is less than the preset failure threshold Three position information; the third image to be detected refers to an image that is time-adjacent to the second image to be detected and located after the second image to be detected;
  • the second position information prediction module is configured to use the Kalman filter model to predict the second predicted position information of the target object in the third to-be-detected image according to the second position information, and determine the difference between the second predicted position information and the second predicted position information.
  • a second ROI region extraction module configured to perform ROI region extraction on the third to-be-detected image according to the third position information to obtain a fourth ROI region
  • a second position coincidence degree determination module configured to determine the second minimum cosine distance between the third ROI area and the fourth ROI area, and at the same time determine the difference between the second position information and the predicted position information The second position coincidence degree;
  • the second tracking matching module is configured to determine, according to the second minimum cosine distance and the second position coincidence degree, the second tracking matching result of the target object in the third image to be detected by using the Hungarian algorithm.
  • the target tracking device based on the target detection network includes:
  • a second accumulating module for the total number of matching failures configured to add one to the total number of matching failures when the second tracking matching result is a matching failure result
  • a tracking end confirmation module configured to delete the tracking ID associated with the target object when the total number of matching failures is greater than or equal to the preset failure threshold, and confirm that the tracking of the target object is ended.
  • Each module in the above-mentioned target tracking device based on target detection network can be implemented in whole or in part by software, hardware and combinations thereof.
  • the above modules can be embedded in or independent of the processor in the computer device in the form of hardware, or stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
  • a computer device is provided, and the computer device may be a server, and its internal structure diagram may be as shown in FIG. 8 .
  • the computer device includes a processor, memory, a network interface, and a database connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a readable storage medium, an internal memory.
  • the readable storage medium stores an operating system, computer readable instructions and a database.
  • the internal memory provides an environment for the execution of the operating system and computer-readable instructions in the readable storage medium.
  • the database of the computer device is used to store the data used by the target tracking method based on the target detection network in the above embodiment.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer readable instructions when executed by a processor, implement a target tracking method based on a target detection network.
  • the readable storage medium provided by this embodiment includes a non-volatile readable storage medium and a volatile readable storage medium.
  • a computer apparatus comprising a memory, a processor, and computer readable instructions stored in the memory and executable on the processor, the processor executing the computer readable instructions Implement the following steps when instructing:
  • the Kalman filter model is used to predict the first predicted position information of the target object in the second to-be-detected image, and the first ROI area corresponding to the first predicted position information is determined;
  • the second position information extract the ROI area on the second to-be-detected image to obtain a second ROI area
  • the first tracking matching result of the target object in the second image to be detected is determined by the Hungarian algorithm.
  • one or more readable storage media are provided that store computer-readable instructions that, when executed by one or more processors, cause the one or more processors to execute follows the steps below:
  • the Kalman filter model is used to predict the first predicted position information of the target object in the second to-be-detected image, and the first ROI area corresponding to the first predicted position information is determined;
  • the second position information extract the ROI area on the second to-be-detected image to obtain a second ROI area
  • the first tracking matching result of the target object in the second image to be detected is determined by the Hungarian algorithm.
  • Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Abstract

A target detection network-based target tracking method and apparatus, a device, and a medium. The method is performed by means of obtaining first position information of a target object in a first image to undergo detection, and second position information of the target object in a second image to undergo detection (S10); utilizing a Kalman filter model to predict first predicted position information of the target object in the second image to undergo detection according to the first position information, and determining a first ROI region corresponding to the first predicted position information (S20); performing ROI region extraction on the second image to undergo detection according to the second position information, and obtaining a second ROI region (S30); determining a first least cosine distance between the first ROI region and the second ROI region, and simultaneously determining a first positional coincidence degree between the second position information and the first predicted position information (S40); and determining a first tracking matching result by means of the Hungarian algorithm and according to the first least cosine distance and the first positional coincidence degree (S50). The efficiency and accuracy of target tracking are improved.

Description

基于目标检测网络的目标跟踪方法、装置、设备及介质Target tracking method, device, device and medium based on target detection network
本申请要求于2021年4月22日提交中国专利局、申请号为202110434628.2,发明名称为“基于目标检测网络的目标跟踪方法、装置、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on April 22, 2021 with the application number 202110434628.2 and the title of the invention is "target tracking method, device, equipment and medium based on target detection network", the entire content of which is Incorporated herein by reference.
技术领域technical field
本申请涉及检测模型技术领域,尤其涉及一种基于目标检测网络的目标跟踪方法、装置、设备及介质。The present application relates to the technical field of detection models, and in particular, to a target tracking method, apparatus, device and medium based on a target detection network.
背景技术Background technique
多目标跟踪技术应用于多个应用领域,例如运动校正,安防监控,无人驾驶领域等。而在安防监控系统中,对目标进行准确的定位与跟踪是常见的任务。Multi-target tracking technology is used in many application fields, such as motion correction, security monitoring, unmanned driving, etc. In the security monitoring system, it is a common task to accurately locate and track the target.
发明人意识到,由于需要跟踪的对象往往同时存在多个,各个跟踪对象之间外观相似度较为接近,现有技术中的安防监控技术无法单从外观特征判断目标身份,造成检测目标和跟踪轨迹错误匹配,影响多目标跟踪的准确性;进一步地,在目标跟踪过程中还会存在跟踪对象之间存在遮挡、尺度变化等情况,且无法判别跟踪对象是因遮挡暂时消失还是已经离开检测区域,导致多目标跟踪的准确率较低。The inventor realized that since there are often multiple objects to be tracked at the same time, and the appearance similarity between the tracking objects is relatively close, the security monitoring technology in the prior art cannot determine the identity of the target only from the appearance features, resulting in the detection of the target and the tracking trajectory. Incorrect matching will affect the accuracy of multi-target tracking; further, in the process of target tracking, there will be occlusions and scale changes between the tracking objects, and it is impossible to determine whether the tracking objects temporarily disappear due to occlusion or have left the detection area. This leads to low accuracy of multi-target tracking.
申请内容Application content
本申请实施例提供一种基于目标检测网络的目标跟踪方法、装置、设备及介质,以解决多目标跟踪的准确率较低的问题。Embodiments of the present application provide a target tracking method, apparatus, device, and medium based on a target detection network, so as to solve the problem of low accuracy of multi-target tracking.
一种基于目标检测网络的目标跟踪方法,包括:A target tracking method based on target detection network, comprising:
获取目标对象在第一待检测图像中的第一位置信息,以及目标对象在第二待检测图像中的第二位置信息;所述第二待检测图像是指在待检测视频中与所述第一待检测图像时间相邻且时间位于所述第一待检测图像之后的图像;Obtain the first position information of the target object in the first image to be detected, and the second position information of the target object in the second image to be detected; an image whose time is adjacent to the image to be detected and whose time is located after the first image to be detected;
根据所述第一位置信息,采用卡尔曼滤波模型预测所述第二待检测图像中目标对象的第一预测位置信息,并确定与所述第一预测位置信息对应的第一ROI区域;According to the first position information, the Kalman filter model is used to predict the first predicted position information of the target object in the second to-be-detected image, and the first ROI area corresponding to the first predicted position information is determined;
根据所述第二位置信息,对所述第二待检测图像进行ROI区域提取,得到第二ROI区域;According to the second position information, extract the ROI area on the second to-be-detected image to obtain a second ROI area;
确定所述第一ROI区域与所述第二ROI区域之间的第一最小余弦距离,同时确定所述第二位置信息与所述第一预测位置信息之间的第一位置重合度;determining a first minimum cosine distance between the first ROI region and the second ROI region, and simultaneously determining a first position coincidence degree between the second position information and the first predicted position information;
根据所述第一最小余弦距离以及所述第一位置重合度,通过匈牙利算法确定所述目标对象在第二待检测图像中的第一跟踪匹配结果。According to the first minimum cosine distance and the first position coincidence degree, the first tracking matching result of the target object in the second image to be detected is determined by the Hungarian algorithm.
一种基于目标检测网络的目标跟踪装置,包括:A target tracking device based on a target detection network, comprising:
第一位置信息获取模块,用于获取目标对象在第一待检测图像中的第一位置信息,以及目标对象在第二待检测图像中的第二位置信息;所述第二待检测图像是指在待检测视频中与所述第一待检测图像时间相邻且时间位于所述第一待检测图像之后的图像;The first position information acquisition module is used to acquire the first position information of the target object in the first image to be detected, and the second position information of the target object in the second image to be detected; the second image to be detected refers to An image in the video to be detected that is time-adjacent to the first image to be detected and located after the first image to be detected;
第一位置信息预测模块,用于根据所述第一位置信息,采用卡尔曼滤波模型预测所述第二待检测图像中目标对象的第一预测位置信息,并确定与所述第一预测位置信息对应的第一ROI区域;The first position information prediction module is used to predict the first predicted position information of the target object in the second to-be-detected image by using the Kalman filter model according to the first position information, and determine the first predicted position information corresponding to the first predicted position information. The corresponding first ROI area;
第一ROI区域提取模块,用于根据所述第二位置信息,对所述第二待检测图像进行ROI区域提取,得到第二ROI区域;a first ROI region extraction module, configured to perform ROI region extraction on the second to-be-detected image according to the second position information to obtain a second ROI region;
第一位置重合度确定模块,用于确定所述第一ROI区域与所述第二ROI区域之间的第一最小余弦距离,同时确定所述第二位置信息与所述第一预测位置信息之间的第一位置重 合度;The first position coincidence degree determination module is used to determine the first minimum cosine distance between the first ROI area and the second ROI area, and at the same time determine the difference between the second position information and the first predicted position information. The first position coincidence degree between ;
第一跟踪匹配模块,用于根据所述第一最小余弦距离以及所述第一位置重合度,通过匈牙利算法确定所述目标对象在第二待检测图像中的第一跟踪匹配结果。The first tracking matching module is configured to determine the first tracking matching result of the target object in the second image to be detected by using the Hungarian algorithm according to the first minimum cosine distance and the first position coincidence degree.
一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:A computer device, comprising a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, the processor implementing the following steps when executing the computer-readable instructions:
获取目标对象在第一待检测图像中的第一位置信息,以及目标对象在第二待检测图像中的第二位置信息;所述第二待检测图像是指在待检测视频中与所述第一待检测图像时间相邻且时间位于所述第一待检测图像之后的图像;Obtain the first position information of the target object in the first image to be detected, and the second position information of the target object in the second image to be detected; an image whose time is adjacent to the image to be detected and whose time is located after the first image to be detected;
根据所述第一位置信息,采用卡尔曼滤波模型预测所述第二待检测图像中目标对象的第一预测位置信息,并确定与所述第一预测位置信息对应的第一ROI区域;According to the first position information, the Kalman filter model is used to predict the first predicted position information of the target object in the second to-be-detected image, and the first ROI area corresponding to the first predicted position information is determined;
根据所述第二位置信息,对所述第二待检测图像进行ROI区域提取,得到第二ROI区域;According to the second position information, extract the ROI area on the second to-be-detected image to obtain a second ROI area;
确定所述第一ROI区域与所述第二ROI区域之间的第一最小余弦距离,同时确定所述第二位置信息与所述第一预测位置信息之间的第一位置重合度;determining a first minimum cosine distance between the first ROI region and the second ROI region, and simultaneously determining a first position coincidence degree between the second position information and the first predicted position information;
根据所述第一最小余弦距离以及所述第一位置重合度,通过匈牙利算法确定所述目标对象在第二待检测图像中的第一跟踪匹配结果。According to the first minimum cosine distance and the first position coincidence degree, the first tracking matching result of the target object in the second image to be detected is determined by the Hungarian algorithm.
一个或多个存储有计算机可读指令的可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:One or more readable storage media storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the following steps:
获取目标对象在第一待检测图像中的第一位置信息,以及目标对象在第二待检测图像中的第二位置信息;所述第二待检测图像是指在待检测视频中与所述第一待检测图像时间相邻且时间位于所述第一待检测图像之后的图像;Obtain the first position information of the target object in the first image to be detected, and the second position information of the target object in the second image to be detected; an image whose time is adjacent to the image to be detected and whose time is located after the first image to be detected;
根据所述第一位置信息,采用卡尔曼滤波模型预测所述第二待检测图像中目标对象的第一预测位置信息,并确定与所述第一预测位置信息对应的第一ROI区域;According to the first position information, the Kalman filter model is used to predict the first predicted position information of the target object in the second to-be-detected image, and the first ROI area corresponding to the first predicted position information is determined;
根据所述第二位置信息,对所述第二待检测图像进行ROI区域提取,得到第二ROI区域;According to the second position information, extract the ROI area on the second to-be-detected image to obtain a second ROI area;
确定所述第一ROI区域与所述第二ROI区域之间的第一最小余弦距离,同时确定所述第二位置信息与所述第一预测位置信息之间的第一位置重合度;determining a first minimum cosine distance between the first ROI region and the second ROI region, and simultaneously determining a first position coincidence degree between the second position information and the first predicted position information;
根据所述第一最小余弦距离以及所述第一位置重合度,通过匈牙利算法确定所述目标对象在第二待检测图像中的第一跟踪匹配结果。According to the first minimum cosine distance and the first position coincidence degree, the first tracking matching result of the target object in the second image to be detected is determined by the Hungarian algorithm.
上述基于目标检测网络的目标跟踪方法、装置、设备及介质,该方法通过获取目标对象在第一待检测图像中的第一位置信息,以及目标对象在第二待检测图像中的第二位置信息;所述第二待检测图像是指在待检测视频中与所述第一待检测图像时间相邻且时间位于所述第一待检测图像之后的图像;根据所述第一位置信息,采用卡尔曼滤波模型预测所述第二待检测图像中目标对象的第一预测位置信息,并确定与所述第一预测位置信息对应的第一ROI区域;根据所述第二位置信息,对所述第二待检测图像进行ROI区域提取,得到第二ROI区域;确定所述第一ROI区域与所述第二ROI区域之间的第一最小余弦距离,同时确定所述第二位置信息与所述第一预测位置信息之间的第一位置重合度;根据所述第一最小余弦距离以及所述第一位置重合度,通过匈牙利算法确定所述目标对象在第二待检测图像中的第一跟踪匹配结果。The above-mentioned target tracking method, device, device and medium based on target detection network, the method obtains the first position information of the target object in the first image to be detected, and the second position information of the target object in the second image to be detected ; The second to-be-detected image refers to an image that is time-adjacent to the first to-be-detected image in the video to be detected and the time is after the first to-be-detected image; The Mann filter model predicts the first predicted position information of the target object in the second to-be-detected image, and determines the first ROI area corresponding to the first predicted position information; 2. Extract the ROI area of the image to be detected to obtain a second ROI area; determine the first minimum cosine distance between the first ROI area and the second ROI area, and simultaneously determine the second position information and the first ROI area. A first position coincidence degree between predicted position information; according to the first minimum cosine distance and the first position coincidence degree, determine the first tracking matching of the target object in the second image to be detected by the Hungarian algorithm result.
本申请通过引入目标检测网络以及卡尔曼滤波模型,可以较好的将目标检测网络中的浅层特征作为目标跟踪的外观特征,进而通过卡尔曼滤波模型根据目标检测网络确定的外观特征,使得可以通过外观特征预测下一帧待检测图像中目标对象所处位置信息,且计算速度较快,提高了目标跟踪的准确率。By introducing the target detection network and the Kalman filter model, the application can better use the shallow features in the target detection network as the appearance features of target tracking, and then use the Kalman filter model to determine the appearance features of the target detection network, so that it is possible to The position information of the target object in the image to be detected in the next frame is predicted by the appearance feature, and the calculation speed is fast, and the accuracy of the target tracking is improved.
本申请的一个或多个实施例的细节在下面的附图和描述中提出,本申请的其他特征和优点将从说明书、附图以及权利要求变得明显。The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below, and other features and advantages of the application will become apparent from the description, drawings, and claims.
附图说明Description of drawings
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions of the embodiments of the present application more clearly, the following briefly introduces the drawings that are used in the description of the embodiments of the present application. Obviously, the drawings in the following description are only some embodiments of the present application. , for those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative labor.
图1是本申请一实施例中基于目标检测网络的目标跟踪方法的一应用环境示意图;1 is a schematic diagram of an application environment of a target tracking method based on a target detection network in an embodiment of the present application;
图2是本申请一实施例中基于目标检测网络的目标跟踪方法的一流程图;2 is a flowchart of a target tracking method based on a target detection network in an embodiment of the present application;
图3是本申请一实施例中基于目标检测网络的目标跟踪方法的一流程图;3 is a flowchart of a target tracking method based on a target detection network in an embodiment of the present application;
图4是本申请一实施例中基于目标检测网络的目标跟踪方法的一流程图;4 is a flowchart of a target tracking method based on a target detection network in an embodiment of the present application;
图5是本申请一实施例中基于目标检测网络的目标跟踪装置的一原理框图;5 is a schematic block diagram of a target tracking device based on a target detection network according to an embodiment of the present application;
图6是本申请一实施例中基于目标检测网络的目标跟踪装置中第一位置信息获取模块的一原理框图;6 is a schematic block diagram of a first location information acquisition module in a target tracking device based on a target detection network according to an embodiment of the present application;
图7是本申请一实施例中基于目标检测网络的目标跟踪装置中目标检测子模块的一原理框图;7 is a schematic block diagram of a target detection sub-module in a target tracking device based on a target detection network according to an embodiment of the present application;
图8是本申请一实施例中计算机设备的一示意图。FIG. 8 is a schematic diagram of a computer device in an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present application.
本申请实施例提供的基于目标检测网络的目标跟踪方法,该基于目标检测网络的目标跟踪方法可应用如图1所示的应用环境中。具体地,该基于目标检测网络的目标跟踪方法应用在基于目标检测网络的目标跟踪系统中,该基于目标检测网络的目标跟踪系统包括如图1所示的客户端和服务器,客户端与服务器通过网络进行通信,用于解决多目标跟踪的准确率较低的问题。其中,客户端又称为用户端,是指与服务器相对应,为客户提供本地服务的程序。客户端可安装在但不限于各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备上。服务器可以用独立的服务器或者是多个服务器组成的服务器集群来实现。The target tracking method based on the target detection network provided by the embodiment of the present application, the target tracking method based on the target detection network can be applied in the application environment shown in FIG. 1 . Specifically, the target tracking method based on the target detection network is applied in the target tracking system based on the target detection network. The target tracking system based on the target detection network includes a client and a server as shown in FIG. 1 , and the client and the server pass through It communicates with the network and is used to solve the problem of low accuracy of multi-target tracking. Among them, the client, also known as the client, refers to the program corresponding to the server and providing local services for the client. Clients can be installed on, but not limited to, various personal computers, laptops, smartphones, tablets, and portable wearable devices. The server can be implemented as an independent server or a server cluster composed of multiple servers.
在一实施例中,如图2所示,提供一种基于目标检测网络的目标跟踪方法,以该方法应用在图1中的服务器为例进行说明,包括如下步骤:In one embodiment, as shown in FIG. 2, a target tracking method based on a target detection network is provided, and the method is applied to the server in FIG. 1 as an example for description, including the following steps:
S10:获取目标对象在第一待检测图像中的第一位置信息,以及目标对象在第二待检测图像中的第二位置信息;所述第二待检测图像是指在待检测视频中与所述第一待检测图像时间相邻且时间位于所述第一待检测图像之后的图像。S10: Acquire first position information of the target object in the first image to be detected, and second position information of the target object in the second image to be detected; The first to-be-detected images are adjacent in time and located after the first to-be-detected images.
可以理解地,第一待检测图像以及第二待检测图像均可以为待检测视频中任意连续两帧的图像(如假设第一待检测图像为待检测视频中的第一帧图像,则第二待检测图像为待检测视频中的第二帧图像),待检测视频可以根据具体应用场景进行选取,如在智能监控领域中,该第一待检测图像以及第二待检测图像可以为从监控视频中选取的连续两帧的图像。进一步地,第一位置信息指的是目标对象在第一待检测图像中出现的位置(示例性地,该第一位置信息还可以包括目标对象的运动方向,运动速度等信息),该位置可以在对待检测图像进行目标检测之后,通过与目标对象对应的所有图像块的集合表征;同理,第二位置信息是指目标对象在第二待检测图像中出现的位置;可选地,目标对象可以为一个目标个体,也可以为多个目标个体。Understandably, both the first image to be detected and the second image to be detected can be images of any two consecutive frames in the video to be detected (for example, assuming that the first image to be detected is the first frame of image in the video to be detected, the second The image to be detected is the second frame image in the video to be detected), and the video to be detected can be selected according to specific application scenarios. For example, in the field of intelligent monitoring, the first image to be detected and the second image to be detected can be from surveillance video. The image of two consecutive frames selected in . Further, the first position information refers to the position where the target object appears in the first image to be detected (exemplarily, the first position information may also include information such as the movement direction and movement speed of the target object), and the position may After the target detection is performed on the image to be detected, it is represented by a set of all image blocks corresponding to the target object; similarly, the second position information refers to the position where the target object appears in the second image to be detected; optionally, the target object It can be one target individual or multiple target individuals.
在一实施例中,如图3所示,步骤S10中,包括:In one embodiment, as shown in FIG. 3 , step S10 includes:
S101:获取待检测视频,所述待检测视频中包含多帧待检测图像。S101: Acquire a video to be detected, where the video to be detected includes multiple frames of images to be detected.
S102:将所述待检测视频中任意一帧所述待检测图像记录为所述第一待检测图像。S102: Record the image to be detected in any frame of the video to be detected as the first image to be detected.
可以理解地,待检测视频中包含多帧的待检测图像,由于待检测视频中待检测图像帧速一般为每秒25帧以上,且连续两帧的待检测图像可能较为接近,目标对象出现变化的可能性不大,因此在对待检测视频中的各待检测图像进行目标检测时,可以通过间隔获取待检测图像的方式,也即可以自待检测视频中间隔n帧获取待检测图像,进而将获取的各待检测图像按照时间顺序形成合成视频;在该合成视频中可以任意选取一个待检测图像作为第一待检测图像,而此时第二待检测图像则为合成视频中与被选取的第一待检测图像相邻,且处于该第一待检测图像之后的待检测图像(该第二待检测图像在待检测视频中并不是与第一待检测图像相邻且处于第一待检测图像之后的图像,而是与第一待检测图像间隔n帧之后的待检测图像)。其中,n可以根据具体应用场景进行选取,示例性地,n可以取4,5等。进而通过间隔获取待检测图像的方式可以提高对待检测图像进行目标检测的效率,减少计算机的计算压力。进一步地,若不考虑计算机的计算压力,则也可以对待检测视频中所有待检测图像进行目标检测。Understandably, the video to be detected contains multiple frames of images to be detected. Since the frame rate of the images to be detected in the video to be detected is generally more than 25 frames per second, and the images to be detected in two consecutive frames may be relatively close, the target object changes. It is unlikely that the image to be detected is detected, so when the target detection is performed on each image to be detected in the video to be detected, the image to be detected can be obtained at intervals, that is, the image to be detected can be obtained from the video to be detected at intervals of n frames, and then the image to be detected can be obtained. The acquired images to be detected form a composite video in chronological order; in the composite video, an image to be detected can be arbitrarily selected as the first image to be detected, and at this time the second image to be detected is the composite video and the selected first image. A to-be-detected image adjacent to and behind the first to-be-detected image (the second to-be-detected image is not adjacent to the first to-be-detected image in the to-be-detected video and behind the first to-be-detected image , but the image to be detected after the interval of n frames from the first image to be detected). Wherein, n may be selected according to a specific application scenario, for example, n may be 4, 5, etc. Furthermore, by obtaining the images to be detected at intervals, the efficiency of target detection on the images to be detected can be improved, and the calculation pressure of the computer can be reduced. Further, if the computational pressure of the computer is not considered, target detection can also be performed on all the images to be detected in the video to be detected.
S103:通过目标检测网络对所述第一待检测图像进行目标检测,得到所述第一位置信息;同时通过目标检测网络对所述第二待检测图像进行目标检测,得到所述第二位置信息。S103: Perform target detection on the first image to be detected through a target detection network to obtain the first position information; meanwhile, perform target detection on the second image to be detected through a target detection network to obtain the second position information .
具体地,在将所述待检测视频中任意一帧所述待检测图像记录为所述第一待检测图像,根据该第一待检测图像确定第二待检测图像,以将第一待检测图像以及第二待检测图像依次输入至目标检测网络中,进而通过目标检测网络对该第一待检测图像进行目标检测,得到第一位置信息;通过目标检测网络对第二待检测图像进行目标检测,得到第二位置信息。Specifically, after recording any frame of the to-be-detected image in the to-be-detected video as the first to-be-detected image, the second to-be-detected image is determined according to the first to-be-detected image, so as to record the first to-be-detected image And the second to-be-detected image is sequentially input into the target detection network, and then the target detection network is used to perform target detection on the first to-be-detected image to obtain first position information; the target detection network is used to perform target detection on the second to-be-detected image, Obtain second location information.
在一具体实施例中,如图4所示,步骤S103中,通过目标检测网络对所述第一待检测图像进行目标检测,得到所述第一位置信息,包括:In a specific embodiment, as shown in FIG. 4 , in step S103, target detection is performed on the first image to be detected through a target detection network to obtain the first position information, including:
S1031:将所述第一待检测图像输入至所述目标检测网络中的骨干网络,以对所述第一待检测图像进行下采样处理,得到与所述第一待检测图像对应的多个待检测特征图层。S1031: Input the first image to be detected into a backbone network in the target detection network, so as to perform downsampling processing on the first image to be detected, to obtain a plurality of images to be detected corresponding to the first image to be detected Detect feature layers.
可选地,骨干网络可以采用Darknet网络、Resnet网络等。下采样处理指的是对第一待检测图像进行缩小处理,也即进行下采样后得到的图像的尺寸小于第一待检测图像的尺寸。Optionally, the backbone network may adopt a Darknet network, a Resnet network, or the like. The down-sampling process refers to reducing the first image to be detected, that is, the size of the image obtained after down-sampling is smaller than the size of the first image to be detected.
具体地,在确定第一待检测图像以及第二待检测图像之后,将第一待检测图像输入至目标检测网络中,通过目标检测网络中的骨干网络对第一待检测图像进行下采样处理,在本实施例中,通过骨干网络对第一待检测图像进行五次下采样处理,以得到五种不同尺寸的待检测特征图层。示例性地,如对第一待检测图像进行第一次采样之后得到的待检测特征图层,该待检测特征图层与第一待检测图像相比,该待检测特征图层的尺寸为第一待检测图像的尺寸的一半(也即长度以及宽度均缩小一半),但该待检测特征图层的通道数为第一待检测图像的通道数的两倍。Specifically, after determining the first to-be-detected image and the second to-be-detected image, the first to-be-detected image is input into the target detection network, and the first to-be-detected image is down-sampled through the backbone network in the target detection network, In this embodiment, the first image to be detected is down-sampled five times through the backbone network to obtain five feature layers of different sizes to be detected. Exemplarily, if the feature layer to be detected is obtained after sampling the first image to be detected for the first time, the feature layer to be detected is compared with the first image to be detected, and the size of the feature layer to be detected is the th The size of a to-be-detected image is half (that is, the length and width are both reduced by half), but the number of channels of the to-be-detected feature layer is twice the number of channels of the first to-be-detected image.
进一步地,由于第一次对第一待检测图像进行下采样得到的待检测特征图层的语义较少,因此在本实施例中,通过骨干网络对第一待检测图像进行五次下采样处理之后,会弃用第一下采样得到的待检测特征图层,进而本实施例中最终采用的待检测特征图层的总数量为四个。Further, since the feature layer to be detected obtained by down-sampling the first image to be detected for the first time has less semantics, in this embodiment, the first image to be detected is down-sampled five times through the backbone network. After that, the feature layers to be detected obtained by the first down-sampling will be discarded, and the total number of feature layers to be detected finally adopted in this embodiment is four.
S1032:对各所述待检测特征图层依次进行图层处理,得到与所述第一待检测图像对应的目标特征图层。S1032: Perform layer processing on each of the feature layers to be detected in sequence to obtain a target feature layer corresponding to the first image to be detected.
在一具体实施方式中,所述待检测特征图层中包括第一特征图层、第二特征图层、第三特征图层以及第四特征图层;步骤S1032中,包括:In a specific embodiment, the feature layers to be detected include a first feature layer, a second feature layer, a third feature layer, and a fourth feature layer; step S1032 includes:
对所述第四特征图层进行卷积处理,并对卷积处理后的第四特征图层进行上采样,得到与第三特征图层具有相同维度的第五特征图层。Convolution processing is performed on the fourth feature layer, and the fourth feature layer after the convolution processing is up-sampled to obtain a fifth feature layer having the same dimension as the third feature layer.
可以理解地,第四特征图层是对待检测图像进行五次下采样中最后一次下采样得到的 图层,也即该第四特征图层的尺寸最小;具体地,在将所述第一待检测图像输入至所述目标检测网络中的骨干网络,以对所述第一待检测图像进行下采样处理,得到与所述第一待检测图像对应的多个待检测特征图层之后,对第四特征图层进行卷积处理,如通过卷积核为3*3的卷积网络对第四特征图层进行卷积处理,使得第四特征图层与第三特征图层的通道数相同(步骤S1031中指出每经过一次下采样之后得到待检测特征图层与下采样之前的待检测特征图层相比,尺寸减半,通道数加倍),并对卷积处理后的第四特征图层进行上采样,使得上采样以及卷积处理后的第四特征图层,与第三特征图层具有相同的尺寸以及相同的通道数,从而得到与第三特征图层具有相同维度的第五特征图层。It can be understood that the fourth feature layer is the layer obtained by the last downsampling in the five downsampling of the image to be detected, that is, the size of the fourth feature layer is the smallest; The detection image is input to the backbone network in the target detection network, so as to perform down-sampling processing on the first image to be detected, and after obtaining a plurality of feature layers to be detected corresponding to the first image to be detected, the first image to be detected is obtained. Convolution processing is performed on four feature layers. For example, convolution processing is performed on the fourth feature layer through a convolutional network with a convolution kernel of 3*3, so that the fourth feature layer has the same number of channels as the third feature layer ( In step S1031, it is pointed out that the size of the to-be-detected feature layer obtained after each downsampling is halved and the number of channels is doubled compared with the to-be-detected feature layer before downsampling), and the fourth feature layer after convolution processing is processed. Perform upsampling, so that the fourth feature layer after upsampling and convolution processing has the same size and the same number of channels as the third feature layer, so as to obtain the fifth feature with the same dimension as the third feature layer. layer.
将所述第五特征图层与所述第三特征图层进行维度叠加得到第一叠加图层之后,对所述第一叠加图层进行卷积处理,并对卷积处理后的第一叠加图层进行上采样,得到与第二特征图层具有相同维度的第六特征图层。After the fifth feature layer and the third feature layer are dimensionally superimposed to obtain a first superimposed layer, convolution processing is performed on the first superimposed layer, and the first superimposed layer after the convolution processing is processed. The layer is upsampled to obtain a sixth feature layer with the same dimension as the second feature layer.
具体地,在对所述第四特征图层进行卷积处理,并对卷积处理后的第四特征图层进行上采样,得到与第三特征图层具有相同维度的第五特征图层之后,将第五特征图层与第三特征图层进行维度叠加得到第一叠加图层之后,对第一叠加图层进行卷积处理,使得第一叠加图层与第二特征图层的通道数相同,并对卷积后的第一叠加图层进行上采样,使得上采样以及卷积处理后的第一叠加图层,与第二特征图层具有相同的尺寸以及相同的通道数,从而得到与第二特征图层具有相同维度的第六特征图层。Specifically, after performing convolution processing on the fourth feature layer and up-sampling the fourth feature layer after convolution processing to obtain a fifth feature layer with the same dimension as the third feature layer , after the fifth feature layer and the third feature layer are dimensionally superimposed to obtain the first superimposed layer, the first superimposed layer is subjected to convolution processing, so that the number of channels of the first superimposed layer and the second feature layer is The same, and upsampling the first overlay layer after convolution, so that the first overlay layer after upsampling and convolution processing has the same size and the same number of channels as the second feature layer, so as to obtain A sixth feature layer with the same dimensions as the second feature layer.
将所述第六特征图层与所述第二特征图层进行维度叠加得到第二叠加图层之后,对所述第二叠加图层进行卷积处理,并对卷积处理后的第二叠加图层进行上采样,得到与第一特征图层具有相同维度的第七特征图层。After the sixth feature layer and the second feature layer are dimensionally superimposed to obtain a second superimposed layer, convolution processing is performed on the second superimposed layer, and the second superimposed layer after the convolution processing is processed. The layer is upsampled to obtain a seventh feature layer with the same dimension as the first feature layer.
具体地,在对所述第一叠加图层进行卷积处理,并对卷积处理后的第一叠加图层进行上采样,得到与第二特征图层具有相同维度的第六特征图层之后,将第六特征图层与第二特征图层进行维度叠加得到第二叠加图层之后,对第二叠加图层进行卷积处理,使得第二叠加图层与第一特征图层的通道数相同,并对卷积后的第一叠加图层进行上采样,使得上采样以及卷积处理后的第一叠加图层,与第一特征图层具有相同的尺寸以及相同的通道数,从而得到与第一特征图层具有相同维度的第七特征图层。Specifically, after performing convolution processing on the first superimposed layer and up-sampling the first superimposed layer after the convolution processing to obtain a sixth feature layer with the same dimension as the second feature layer , after the sixth feature layer and the second feature layer are dimensionally superimposed to obtain the second superimposed layer, the second superimposed layer is subjected to convolution processing, so that the number of channels between the second superimposed layer and the first feature layer is The same, and upsampling the first overlay layer after convolution, so that the first overlay layer after upsampling and convolution processing has the same size and the same number of channels as the first feature layer, so as to obtain A seventh feature layer with the same dimensions as the first feature layer.
将所述第七特征图层与所述第一特征图层进行维度叠加得到第三叠加图层之后,对所述第三叠加图层进行卷积处理,并对卷积处理后的第三叠加图层进行上采样,得到所述目标特征图层。After the seventh feature layer and the first feature layer are dimensionally superimposed to obtain a third superimposed layer, convolution processing is performed on the third superimposed layer, and the third superimposed layer after the convolution processing is processed. The layer is upsampled to obtain the target feature layer.
具体地,对所述第二叠加图层进行卷积处理,并对卷积处理后的第二叠加图层进行上采样,得到与第一特征图层具有相同维度的第七特征图层之后,将第七特征图层与第一特征图层进行维度叠加得到第三叠加图层之后,对第三叠加图层进行卷积处理,使得第三叠加图层的通道数加倍,并对卷积处理后的第三叠加图层进行上采样,使得卷积处理后的第三叠加图层的尺寸加倍,从而得到目标特征图层。Specifically, after performing convolution processing on the second superimposed layer, and performing up-sampling on the second superimposed layer after the convolution processing, to obtain a seventh feature layer having the same dimension as the first feature layer, After the seventh feature layer and the first feature layer are dimensionally superimposed to obtain the third superimposed layer, convolution processing is performed on the third superimposed layer, so that the number of channels of the third superimposed layer is doubled, and the convolution processing is performed. The third superimposed layer is up-sampled, so that the size of the third superimposed layer after convolution processing is doubled, so as to obtain the target feature layer.
S1303:对所述目标特征图层进行位置特征提取,得到所述第一位置信息。S1303: Perform location feature extraction on the target feature layer to obtain the first location information.
具体地,在对各所述待检测特征图层依次进行图层处理,得到与所述第一待检测图像对应的目标特征图层之后,对目标特征图层进行位置特征提取,也即在目标特征图层上,提取与目标对象关联的像素框,进而得到目标对象在第一待检测图像上的第一位置信息。Specifically, after layer processing is performed on each of the feature layers to be detected in turn to obtain a target feature layer corresponding to the first image to be detected, position feature extraction is performed on the target feature layer, that is, in the target feature layer. On the feature layer, the pixel frame associated with the target object is extracted, and then the first position information of the target object on the first image to be detected is obtained.
S20:根据所述第一位置信息,采用卡尔曼滤波模型预测所述第二待检测图像中目标对象的第一预测位置信息,并确定与所述第一预测位置信息对应的第一ROI区域。S20: According to the first position information, use a Kalman filter model to predict the first predicted position information of the target object in the second to-be-detected image, and determine a first ROI area corresponding to the first predicted position information.
可以理解地,卡尔曼滤波模型是采用卡尔曼滤波的状态估计模型,该卡尔曼滤波模型用于根据目标对象在前一帧待检测图像(如第一待检测图像)上的位置信息(如第一位置信息),对目标对象处于下一帧待检测图像(如第二待检测图像)的位置信息进行预测。进一步地,该卡尔曼滤波模型需要通过待检测视频中前k帧的待检测图像对其进行训练,使得卡尔曼滤波模型在步骤S20中运动时能够较好的预测出目标对象的位置信息,提高目 标跟踪的准确率。It can be understood that the Kalman filter model is a state estimation model using Kalman filter, and the Kalman filter model is used according to the position information (such as the first image to be detected) of the target object on the image to be detected in the previous frame (such as the first image to be detected). a position information), to predict the position information of the target object in the next frame of the image to be detected (eg, the second image to be detected). Further, the Kalman filter model needs to be trained by the to-be-detected images of the first k frames in the video to be detected, so that the Kalman filter model can better predict the position information of the target object when it moves in step S20, thereby improving the performance of the Kalman filter model. target tracking accuracy.
具体地,在获取第一待检测图像中目标对象所处的第一位置信息之后,根据该第一位置信息,采用卡尔曼滤波模型预测目标对象在第二待检测图像中所处的位置信息,也即第一预测位置信息,并在第二待检测图像中,提取与该第一预测位置信息关联的区域,也即第一ROI区域。Specifically, after obtaining the first position information of the target object in the first image to be detected, according to the first position information, a Kalman filter model is used to predict the position information of the target object in the second image to be detected, That is, the first predicted position information, and in the second to-be-detected image, an area associated with the first predicted position information, that is, the first ROI area, is extracted.
S30:根据所述第二位置信息,对所述第二待检测图像进行ROI区域提取,得到第二ROI区域。S30: According to the second position information, perform ROI region extraction on the second to-be-detected image to obtain a second ROI region.
具体地,在获取第二待检测图像中目标对象所处的第二位置信息之后,根据该第二位置信息,在第二待检测图像中提取与第二位置信息关联的区域,得到第二ROI区域。Specifically, after acquiring the second position information of the target object in the second to-be-detected image, according to the second position information, extract an area associated with the second position information in the second to-be-detected image to obtain a second ROI area.
S40:确定所述第一ROI区域与所述第二ROI区域之间的第一最小余弦距离,同时确定所述第二位置信息与所述第一预测位置信息之间的第一位置重合度。S40: Determine a first minimum cosine distance between the first ROI region and the second ROI region, and simultaneously determine a first position coincidence degree between the second position information and the first predicted position information.
可以理解地,第一最小余弦距离用于表征第一ROI区域与第二ROI区域之间的特征相似度;第一位置重合度用于表征第二位置信息与第一预测位置信息之间的位置相似程度。Understandably, the first minimum cosine distance is used to characterize the feature similarity between the first ROI area and the second ROI area; the first position coincidence is used to characterize the position between the second position information and the first predicted position information. similarity.
具体地,在根据所述第二位置信息,对所述第二待检测图像进行ROI区域提取,得到第二ROI区域之后,确定第一ROI区域与第二ROI区域之间的第一最小余弦距离,该第一最小余弦距离的取值范围可以为0至1,第一最小余弦距离越大,表征第一ROI区域与第二ROI区域之间的特征相似程度越高;同时确定第二位置信息与第一预测位置信息之间的第一位置重合度,该第一位置重合度的取值范围也可以为0至1,第一位置重合度越高,表征第二位置信息与第一预测位置信息之间的相关程度越大。Specifically, after extracting the ROI region of the second image to be detected according to the second position information to obtain the second ROI region, determine the first minimum cosine distance between the first ROI region and the second ROI region , the value range of the first minimum cosine distance can be 0 to 1, the larger the first minimum cosine distance, the higher the degree of similarity of features between the first ROI area and the second ROI area; at the same time, the second position information is determined. The first position coincidence degree with the first predicted position information. The value range of the first position coincidence degree can also be 0 to 1. The higher the first position coincidence degree is, it represents the second position information and the first predicted position. The greater the degree of correlation between the information.
在一实施例中,步骤S40中,所述确定所述第二位置信息与所述第一预测位置信息之间的第一位置重合度,包括:In an embodiment, in step S40, the determining the first position coincidence degree between the second position information and the first predicted position information includes:
确定所述第二位置信息与所述第一预测位置信息之间的交集位置信息,同时确定所述第二位置信息与所述第一预测位置信息之间的并集位置信息。The intersection position information between the second position information and the first predicted position information is determined, and the union position information between the second position information and the first predicted position information is simultaneously determined.
可以理解地,交集位置信息指的是第二位置信息与第一预测位置信息之间的共有位置信息;并集位置信息指的是第二位置信息与第一预测位置信息的所有位置信息,也即包括共有位置信息,以及独自拥有的位置信息。Understandably, the intersection position information refers to the shared position information between the second position information and the first predicted position information; the union position information refers to all the position information of the second position information and the first predicted position information, and also That is, the shared location information and the uniquely owned location information are included.
根据所述交集位置信息以及所述并集位置信息,确定所述位置重合度。The position coincidence degree is determined according to the intersection position information and the union position information.
具体地,可以根据如下述表达式确定所述位置重合度:Specifically, the position coincidence degree can be determined according to the following expression:
Figure PCTCN2021096757-appb-000001
Figure PCTCN2021096757-appb-000001
其中,C为位置重合度;A为第二位置信息;B为第一预测位置信息;|A∪B|为并集位置信息;|A∩B|为交集位置信息。Among them, C is the position coincidence degree; A is the second position information; B is the first predicted position information; |A∪B| is the union position information; |A∩B| is the intersection position information.
S50:根据所述第一最小余弦距离以及所述第一位置重合度,通过匈牙利算法确定所述目标对象在第二待检测图像中的第一跟踪匹配结果。S50: According to the first minimum cosine distance and the first position coincidence degree, determine the first tracking matching result of the target object in the second to-be-detected image by using the Hungarian algorithm.
可以理解地,第一跟踪匹配结果可以为表征匹配成功的结果,也即表征第一预测位置信息与第二位置信息相匹配(如第一预测位置信息中包含目标对象的特征,且第二位置信息中也包含目标对象的特征);也可以为表征匹配失败的结果,也即表征第一预测位置信息与第二位置信息不匹配(如第一预测位置信息中包含目标对象的特征,而第二位置信息中未包含目标对象的特征;亦或者第一预测位置信息中未包含目标对象的特征,而第二位置信息中包含目标对象的特征)。Understandably, the first tracking matching result may be a result indicating that the matching is successful, that is, indicating that the first predicted position information matches the second position information (for example, the first predicted position information contains the characteristics of the target object, and the second position The information also contains the characteristics of the target object); it can also represent the result of the matching failure, that is, the first predicted position information does not match the second position information (for example, the first predicted position information contains the characteristics of the target object, while the first predicted position information does not match the second position information. The second position information does not include the feature of the target object; or the first predicted position information does not include the feature of the target object, and the second position information includes the feature of the target object).
具体地,在确定所述第一ROI区域与所述第二ROI区域之间的第一最小余弦距离,同时确定所述第二位置信息与所述第一预测位置信息之间的第一位置重合度之后,将第一最 小余弦距离以及第一位置重合度作为跟踪代价,通过匈牙利算法确定目标对象在第二待检测图像中的第一跟踪匹配结果。示例性地,如将第一最小余弦距离与预设余弦阈值(如设定为0.9,0.95等)进行比较,在第一最小余弦距离大于或等于预设余弦阈值时,则将第一位置重合度与预设位置重合阈值(如设定为0.9,0.95等)进行比较,在第一位置重合度大于或等于预设位置重合阈值时,则确定第一跟踪匹配结果为匹配成功结果;若第一最小余弦距离小于预设余弦阈值,和/或,第一位置重合度小于预设位置重合阈值,则确定第一跟踪匹配结果为匹配失败结果。Specifically, the first minimum cosine distance between the first ROI area and the second ROI area is determined, and at the same time, the first position coincidence between the second position information and the first predicted position information is determined. After the first minimum cosine distance and the first position coincidence degree are used as the tracking cost, the first tracking matching result of the target object in the second to-be-detected image is determined by the Hungarian algorithm. Exemplarily, if the first minimum cosine distance is compared with a preset cosine threshold (for example, set to 0.9, 0.95, etc.), when the first minimum cosine distance is greater than or equal to the preset cosine threshold, the first position is overlapped. When the first position coincidence degree is greater than or equal to the preset position coincidence threshold, it is determined that the first tracking matching result is a successful matching result; if the first position coincidence degree is greater than or equal to the preset position coincidence threshold If the minimum cosine distance is smaller than the preset cosine threshold, and/or the first position coincidence degree is smaller than the preset position coincidence threshold, it is determined that the first tracking matching result is a matching failure result.
进一步地,在通过匈牙利算法确定所述目标对象在第二待检测图像中的第一跟踪匹配结果之后,不论该第一跟踪匹配结果是匹配成功结果,亦或者是匹配失败结果,均可以将该第一跟踪匹配结果,以及对应的第一待检测图像以及第二待检测图像作为训练样本,用于训练卡尔曼滤波模型,提高卡尔曼滤波模型的预测准确率。Further, after determining the first tracking matching result of the target object in the second to-be-detected image by the Hungarian algorithm, no matter whether the first tracking matching result is a successful matching result or a matching failure result, the matching result can be determined. The first tracking and matching result, the corresponding first image to be detected and the second image to be detected are used as training samples for training the Kalman filter model and improving the prediction accuracy of the Kalman filter model.
进一步地,为了提高目标检测网络的准确率,可以采用如下损失函数对目标检测网络进行约束:Further, in order to improve the accuracy of the target detection network, the following loss function can be used to constrain the target detection network:
L=L 11L 22L 3 L=L 11 L 22 L 3
其中,L为目标检测网络的损失函数;L 1为目标检测网络的焦点损失函数;L 2为位置损失函数;L 3为像素偏移损失函数;λ 1为位置损失函数的权重(可以取值为1);λ 2为像素偏移损失函数的权重(可以取值为0.1)。 Among them, L is the loss function of the target detection network; L 1 is the focus loss function of the target detection network; L 2 is the position loss function; L 3 is the pixel offset loss function ; is 1); λ 2 is the weight of the pixel offset loss function (the value can be 0.1).
进一步地,L 1可以通过下述表达式表征: Further, L 1 can be characterized by the following expression:
Figure PCTCN2021096757-appb-000002
Figure PCTCN2021096757-appb-000002
Figure PCTCN2021096757-appb-000003
Figure PCTCN2021096757-appb-000003
其中,N为目标对象中目标个体的总个数;
Figure PCTCN2021096757-appb-000004
为第一预测位置信息;Y m为第二位置信息;α以及β为目标检测网络的检测参数,可以根据具体地应用场景进行设定;if Y m==1表征第二位置信息中包含目标对象中第m个目标个体;otherwise表征第二位置信息中未包含目标对象中第m个目标个体。
Among them, N is the total number of target individuals in the target object;
Figure PCTCN2021096757-appb-000004
is the first predicted position information; Y m is the second position information; α and β are the detection parameters of the target detection network, which can be set according to specific application scenarios; if Y m == 1 indicates that the second position information contains the target The m-th target individual in the object; otherwise it indicates that the m-th target individual in the target object is not included in the second position information.
进一步地,L 2可以通过下述表达式表征: Further, L2 can be characterized by the following expression :
Figure PCTCN2021096757-appb-000005
Figure PCTCN2021096757-appb-000005
其中,A为第二位置信息;B为第一预测位置信息;|A∪B|为并集位置信息;|A∩B|为交集位置信息;|A c|为最小闭包面积。 Among them, A is the second position information; B is the first predicted position information; |A∪B| is the union position information; |A∩B| is the intersection position information; |A c | is the minimum closure area.
进一步地,L 3指的是步骤S1031中对第一待检测图像进行下采样处理(同理,对第二待检测图像也需要进行下采样处理)过程中,像素的偏移值。 Further, L3 refers to the offset value of the pixel during the downsampling process for the first image to be detected in step S1031 (similarly, the downsampling process also needs to be performed for the second image to be detected).
在本实施例中,通过引入目标检测网络以及卡尔曼滤波模型,可以较好的将目标检测网络中的浅层特征作为目标跟踪的外观特征,进而通过卡尔曼滤波模型根据目标检测网络确定的外观特征,使得可以通过外观特征预测下一帧待检测图像中目标对象所处位置信息,且计算速度较快,提高了目标跟踪的准确率。In this embodiment, by introducing the target detection network and the Kalman filter model, the shallow features in the target detection network can be better used as the appearance features of target tracking, and then the appearance determined by the target detection network through the Kalman filter model feature, so that the position information of the target object in the next frame of the image to be detected can be predicted by the appearance feature, and the calculation speed is fast, which improves the accuracy of target tracking.
在一实施例中,步骤S50之后,也即所述通过匈牙利算法确定所述目标对象在第二待检测图像中的第一跟踪匹配结果之后,包括:In one embodiment, after step S50, that is, after determining the first tracking matching result of the target object in the second image to be detected by using the Hungarian algorithm, the method includes:
在所述第一跟踪匹配结果为匹配失败结果时,将匹配失败总次数累加一。When the first tracking matching result is a matching failure result, the total number of matching failures is accumulated by one.
可以理解地,匹配失败总次数指的是第一跟踪匹配结果为匹配失败结果的总次数。Understandably, the total number of matching failures refers to the total number of times that the first tracking matching result is a matching failure result.
在预设检测时间内,所述匹配失败总次数小于预设失败阈值时,获取所述待检测视频中第三待检测图像,以及目标对象在所述第三待检测图像中的第三位置信息;所述第三待检测图像是指与所述第二待检测图像时间相邻且时间位于所述第二待检测图像之后的图像。During the preset detection time, when the total number of matching failures is less than the preset failure threshold, acquire the third image to be detected in the video to be detected, and the third position information of the target object in the third image to be detected ; The third to-be-detected image refers to an image that is time-adjacent to the second to-be-detected image and located after the second to-be-detected image.
可选地,预设失败阈值可以为3次,4次等。预设检测时间可以为2分钟,5分钟等。可以理解地,第二待检测图像以及第三待检测图像可以为待检测视频中任意连续两帧的图像,如假设第一待检测图像为待检测视频中的第一帧图像,则第二待检测图像为待检测视频中的第二帧图像,第三待检测图像为待检测视频中的第三帧图像。Optionally, the preset failure threshold can be 3 times, 4 times, etc. The preset detection time can be 2 minutes, 5 minutes, etc. It can be understood that the second image to be detected and the third image to be detected can be any two consecutive frames of images in the video to be detected. The detected image is the second frame of the video to be detected, and the third to-be-detected image is the third frame of the video to be detected.
在预设检测时间内,所述匹配失败总次数小于预设失败阈值时,表征可能是由于该目标对象可能暂时被遮挡导致的匹配错误,进而继续对后续帧数对应的待检测图像进行目标跟踪,进而获取待检测视频中的第三待检测图像;并将第三待检测图像输入至目标检测网络中,通过目标检测网络对第三待检测图像进行目标检测,得到目标对象在第三待检测图像中的第三位置信息。Within the preset detection time, when the total number of matching failures is less than the preset failure threshold, it indicates that the matching error may be caused by the target object being temporarily occluded, and the target tracking of the to-be-detected images corresponding to the subsequent frames is continued. , and then obtain the third to-be-detected image in the video to be detected; input the third to-be-detected image into the target detection network, and perform target detection on the third to-be-detected image through the target detection network, and obtain the target object in the third to-be-detected image. third position information in the image.
根据所述第二位置信息,采用卡尔曼滤波模型预测所述第三待检测图像中目标对象的第二预测位置信息,并确定与所述第二预测位置信息对应的第三ROI区域。According to the second position information, a Kalman filter model is used to predict the second predicted position information of the target object in the third to-be-detected image, and a third ROI region corresponding to the second predicted position information is determined.
具体地,在获取第二待检测图像中目标对象所处的第二位置信息之后,根据该第二位置信息,采用卡尔曼滤波模型预测目标对象在第三待检测图像中所处的位置信息,也即第二预测位置信息,并在第三待检测图像中,提取与该第二预测位置信息关联的区域,也即第三ROI区域。Specifically, after acquiring the second position information of the target object in the second to-be-detected image, according to the second position information, the Kalman filter model is used to predict the position of the target object in the third to-be-detected image, That is, the second predicted position information, and in the third to-be-detected image, an area associated with the second predicted position information, that is, the third ROI area, is extracted.
根据所述第三位置信息,对所述第三待检测图像进行ROI区域提取,得到第四ROI区域。According to the third position information, ROI region extraction is performed on the third to-be-detected image to obtain a fourth ROI region.
具体地,在获取第三待检测图像中目标对象所处的第三位置信息之后,根据该第三位置信息,在第三待检测图像中提取与第三位置信息关联的区域,得到第四ROI区域。Specifically, after acquiring the third position information of the target object in the third image to be detected, according to the third position information, an area associated with the third position information is extracted in the third image to be detected, to obtain a fourth ROI area.
确定所述第三ROI区域与所述第四ROI区域之间的第二最小余弦距离,同时确定所述第二位置信息与所述预测位置信息之间的第二位置重合度。A second minimum cosine distance between the third ROI region and the fourth ROI region is determined, and at the same time, a second position coincidence degree between the second position information and the predicted position information is determined.
具体地,在根据所述第三位置信息,对所述第三待检测图像进行ROI区域提取,得到第四ROI区域之后,确定第三ROI区域与第四ROI区域之间的第二最小余弦距离,该第二最小余弦距离的取值范围可以为0至1,第二最小余弦距离越大,表征第三ROI区域与第四ROI区域之间的特征相似程度越高;同时确定第三位置信息与第二预测位置信息之间的第二位置重合度,该第二位置重合度的取值范围也可以为0至1,第二位置重合度越高,表征第三位置信息与第二预测位置信息之间的相关程度越大。Specifically, after performing ROI region extraction on the third image to be detected according to the third position information to obtain a fourth ROI region, the second minimum cosine distance between the third ROI region and the fourth ROI region is determined , the value range of the second minimum cosine distance can be 0 to 1, and the larger the second minimum cosine distance is, the higher the degree of feature similarity between the third ROI area and the fourth ROI area is; at the same time, the third position information is determined. The second position coincidence degree with the second predicted position information. The value range of the second position coincidence degree can also be 0 to 1. The higher the second position coincidence degree is, the third position information and the second predicted position are represented. The greater the degree of correlation between the information.
根据所述第二最小余弦距离以及所述第二位置重合度,通过匈牙利算法确定所述目标对象在第三待检测图像中的第二跟踪匹配结果。According to the second minimum cosine distance and the second position coincidence degree, the second tracking matching result of the target object in the third image to be detected is determined by the Hungarian algorithm.
可以理解地,第二跟踪匹配结果可以为表征匹配成功的结果,也即表征第二预测位置 信息与第三位置信息相匹配(如第二预测位置信息中包含目标对象的特征,且第三位置信息中也包含目标对象的特征);也可以为表征匹配失败的结果,也即表征第二预测位置信息与第三位置信息不匹配(如第二预测位置信息中包含目标对象的特征,而第三位置信息中未包含目标对象的特征;亦或者第二预测位置信息中未包含目标对象的特征,而第三位置信息中包含目标对象的特征)。Understandably, the second tracking matching result may be a result indicating that the matching is successful, that is, indicating that the second predicted position information matches the third position information (for example, the second predicted position information contains the characteristics of the target object, and the third position The information also contains the characteristics of the target object); it can also represent the result of the matching failure, that is, the second predicted position information does not match the third position information (for example, the second predicted position information contains the characteristics of the target object, and the first predicted position information does not match the third position information. The third position information does not include the feature of the target object; or the second predicted position information does not include the feature of the target object, and the third position information includes the feature of the target object).
具体地,在确定所述第三ROI区域与所述第四ROI区域之间的第二最小余弦距离,同时确定所述第三位置信息与所述第二预测位置信息之间的第二位置重合度之后,将第二最小余弦距离以及第二位置重合度作为跟踪代价,通过匈牙利算法确定目标对象在第三待检测图像中的第二跟踪匹配结果。示例性地,如将第二最小余弦距离与预设余弦阈值(如设定为0.9,0.95等)进行比较,在第二最小余弦距离大于或等于预设余弦阈值时,则将第二位置重合度与预设位置重合阈值(如设定为0.9,0.95等)进行比较,在第二位置重合度大于或等于预设位置重合阈值时,则确定第二跟踪匹配结果为匹配成功结果;若第二最小余弦距离小于预设余弦阈值,和/或,第二位置重合度小于预设位置重合阈值,则确定第二跟踪匹配结果为匹配失败结果。Specifically, the second minimum cosine distance between the third ROI area and the fourth ROI area is determined, and at the same time, the second position coincidence between the third position information and the second predicted position information is determined. After the second minimum cosine distance and the second position coincidence degree are used as the tracking cost, the second tracking matching result of the target object in the third to-be-detected image is determined by the Hungarian algorithm. Exemplarily, if the second minimum cosine distance is compared with a preset cosine threshold (for example, set to 0.9, 0.95, etc.), when the second minimum cosine distance is greater than or equal to the preset cosine threshold, the second position is overlapped. When the second position coincidence degree is greater than or equal to the preset position coincidence threshold, it is determined that the second tracking matching result is a successful matching result; The two minimum cosine distances are less than the preset cosine threshold, and/or the second position coincidence degree is less than the preset position coincidence threshold, then it is determined that the second tracking matching result is a matching failure result.
在一实施例中,所述通过匈牙利算法确定所述目标对象在第三待检测图像中的第二跟踪匹配结果之后,包括:In one embodiment, after determining the second tracking matching result of the target object in the third image to be detected by using the Hungarian algorithm, the method includes:
在所述第二跟踪匹配结果为匹配失败结果时,将匹配失败总次数累加一;When the second tracking matching result is a matching failure result, adding one to the total number of matching failures;
具体地,在通过匈牙利算法确定所述目标对象在第三待检测图像中的第二跟踪匹配结果之后,若第二跟踪匹配结果为匹配失败结果时,则将匹配失败总次数累加一。Specifically, after determining the second tracking matching result of the target object in the third image to be detected by the Hungarian algorithm, if the second tracking matching result is a matching failure result, the total number of matching failures is accumulated by one.
在预设检测时间内,所述匹配失败总次数大于或等于所述预设失败阈值时,删除与所述目标对象关联的跟踪ID,并确认对所述目标对象跟踪结束。Within the preset detection time, when the total number of matching failures is greater than or equal to the preset failure threshold, the tracking ID associated with the target object is deleted, and the end of tracking the target object is confirmed.
可以理解地,跟踪ID是在对目标对象进行目标跟踪之前分配至每个目标对象的唯一ID,若目标对象中包含多个目标个体,则可以为每一个目标个体分配一个跟踪ID。It can be understood that the tracking ID is a unique ID assigned to each target object before the target object is tracked. If the target object contains multiple target individuals, each target individual can be assigned a tracking ID.
在预设检测时间内,匹配失败总次数大于或等于预设失败阈值时,可以理解地,在预设检测时间内,也即表征连续三次目标跟踪对应的跟踪匹配结果均为匹配失败,进而表征该目标对象并不是在短时间内被暂时的遮挡导致匹配失败,而是该目标对象离开检测区域,进而导致匹配失败,因此可以删除与目标对象关联的跟踪ID,在后续目标跟踪中,不需要继续对该目标对象进行跟踪,也即确认对目标对象跟踪结束,可以降低计算机的计算复杂度。进一步地,在预设检测时间内,匹配失败总次数大于或等于预设失败阈值时,可以将该匹配失败总次数清零。In the preset detection time, when the total number of matching failures is greater than or equal to the preset failure threshold, it is understandable that within the preset detection time, that is, the tracking matching results corresponding to three consecutive target tracking are all matching failures, and then the The target object is not temporarily occluded in a short period of time to cause the matching failure, but the target object leaves the detection area, which causes the matching failure. Therefore, the tracking ID associated with the target object can be deleted. In subsequent target tracking, no need to Continuing to track the target object, that is, confirming that the tracking of the target object ends, can reduce the computational complexity of the computer. Further, within the preset detection time, when the total number of matching failures is greater than or equal to the preset failure threshold, the total number of matching failures may be cleared.
在本实施例中,通过判定是否在预设检测时间内,匹配失败总次数大于或等于预设失败阈值,进而判定目标对象是短时间内被暂时遮挡,亦或者目标对象不在检测区域内,提高目标跟踪的准确率。In this embodiment, by judging whether the total number of matching failures within the preset detection time is greater than or equal to the preset failure threshold, it is further determined whether the target object is temporarily blocked in a short time, or the target object is not in the detection area. target tracking accuracy.
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that the size of the sequence numbers of the steps in the above embodiments does not mean the sequence of execution, and the execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
在一实施例中,提供一种基于目标检测网络的目标跟踪装置,该基于目标检测网络的目标跟踪装置与上述实施例中基于目标检测网络的目标跟踪方法一一对应。如图5所示,该基于目标检测网络的目标跟踪装置包括第一位置信息获取模块10、第一位置信息预测模块20、第一ROI区域提取模块30、第一位置重合度确定模块40和第一跟踪匹配模块50。各功能模块详细说明如下:In one embodiment, a target tracking apparatus based on a target detection network is provided, and the target tracking apparatus based on a target detection network corresponds to the target tracking method based on the target detection network in the above embodiment. As shown in FIG. 5 , the target tracking device based on the target detection network includes a first position information acquisition module 10 , a first position information prediction module 20 , a first ROI region extraction module 30 , a first position coincidence degree determination module 40 , and a first position coincidence degree determination module 40 . A tracking matching module 50 . The detailed description of each functional module is as follows:
第一位置信息获取模块10,用于获取目标对象在第一待检测图像中的第一位置信息,以及目标对象在第二待检测图像中的第二位置信息;所述第二待检测图像是指在待检测视频中与所述第一待检测图像时间相邻且时间位于所述第一待检测图像之后的图像;The first position information acquisition module 10 is used to obtain the first position information of the target object in the first image to be detected, and the second position information of the target object in the second image to be detected; the second image to be detected is Refers to the image in the video to be detected that is adjacent to the first image to be detected and the time is after the first image to be detected;
第一位置信息预测模块20,用于根据所述第一位置信息,采用卡尔曼滤波模型预测所述第二待检测图像中目标对象的第一预测位置信息,并确定与所述第一预测位置信息对应 的第一ROI区域;The first position information prediction module 20 is configured to use a Kalman filter model to predict the first predicted position information of the target object in the second to-be-detected image according to the first position information, and determine the difference between the first predicted position and the target object. the first ROI area corresponding to the information;
第一ROI区域提取模块30,用于根据所述第二位置信息,对所述第二待检测图像进行ROI区域提取,得到第二ROI区域;The first ROI region extraction module 30 is configured to perform ROI region extraction on the second to-be-detected image according to the second position information to obtain a second ROI region;
第一位置重合度确定模块40,用于确定所述第一ROI区域与所述第二ROI区域之间的第一最小余弦距离,同时确定所述第二位置信息与所述第一预测位置信息之间的第一位置重合度;The first position coincidence degree determination module 40 is configured to determine the first minimum cosine distance between the first ROI area and the second ROI area, and simultaneously determine the second position information and the first predicted position information The first position coincidence between;
第一跟踪匹配模块50,用于根据所述第一最小余弦距离以及所述第一位置重合度,通过匈牙利算法确定所述目标对象在第二待检测图像中的第一跟踪匹配结果。The first tracking matching module 50 is configured to determine the first tracking matching result of the target object in the second image to be detected by using the Hungarian algorithm according to the first minimum cosine distance and the first position coincidence degree.
优选地,如图6所示,第一位置信息获取模块10包括:Preferably, as shown in FIG. 6 , the first location information acquisition module 10 includes:
待检测视频获取子模块101,用于获取待检测视频,所述待检测视频中包含多帧待检测图像;A sub-module 101 for obtaining a video to be detected, configured to obtain a video to be detected, and the video to be detected includes multiple frames of images to be detected;
待检测图像记录子模块102,用于将所述待检测视频中任意一帧所述待检测图像记录为所述第一待检测图像;A to-be-detected image recording sub-module 102, configured to record the to-be-detected image in any frame of the to-be-detected video as the first to-be-detected image;
目标检测子模块103,用于通过目标检测网络对被选取的所述第一待检测图像进行目标检测,得到所述第一位置信息;同时通过目标检测网络对所述第二待检测图像进行目标检测,得到所述第二位置信息。The target detection sub-module 103 is configured to perform target detection on the selected first image to be detected through a target detection network to obtain the first position information; meanwhile, target the second image to be detected through a target detection network detection to obtain the second position information.
优选地,如图7所示,所述目标检测子模块103,包括:Preferably, as shown in FIG. 7 , the target detection sub-module 103 includes:
下采样处理单元1031,用于将所述第一待检测图像输入至所述目标检测网络中的骨干网络,以对所述第一待检测图像进行下采样处理,得到与所述第一待检测图像对应的多个待检测特征图层;The downsampling processing unit 1031 is configured to input the first image to be detected into the backbone network in the target detection network, so as to perform downsampling processing on the first image to be detected, and obtain a Multiple feature layers to be detected corresponding to the image;
图层处理单元1032,用于对各所述待检测特征图层依次进行图层处理,得到与所述第一待检测图像对应的目标特征图层;A layer processing unit 1032, configured to sequentially perform layer processing on each of the feature layers to be detected to obtain a target feature layer corresponding to the first image to be detected;
位置特征提取单元1033,用于对所述目标特征图层进行位置特征提取,得到所述第一位置信息。The location feature extraction unit 1033 is configured to perform location feature extraction on the target feature layer to obtain the first location information.
优选地,所述待检测特征图层中包括第一特征图层、第二特征图层、第三特征图层以及第四特征图层;所述图层处理单元,包括:Preferably, the feature layers to be detected include a first feature layer, a second feature layer, a third feature layer and a fourth feature layer; the layer processing unit includes:
第一图层处理子单元,用于对所述第四特征图层进行卷积处理,并对卷积处理后的第四特征图层进行上采样,得到与第三特征图层具有相同维度的第五特征图层;The first layer processing sub-unit is used to perform convolution processing on the fourth feature layer, and upsample the fourth feature layer after the convolution processing to obtain the same dimension as the third feature layer. The fifth feature layer;
第二图层处理子单元,用于将所述第五特征图层与所述第三特征图层进行维度叠加得到第一叠加图层之后,对所述第一叠加图层进行卷积处理,并对卷积处理后的第一叠加图层进行上采样,得到与第二特征图层具有相同维度的第六特征图层;The second layer processing subunit is configured to perform convolution processing on the first overlay layer after dimensionally overlaying the fifth feature layer and the third feature layer to obtain a first overlay layer, and up-sampling the first superimposed layer after convolution processing to obtain a sixth feature layer with the same dimension as the second feature layer;
第三图层处理子单元,用于将所述第六特征图层与所述第二特征图层进行维度叠加得到第二叠加图层之后,对所述第二叠加图层进行卷积处理,并对卷积处理后的第二叠加图层进行上采样,得到与第一特征图层具有相同维度的第七特征图层;The third layer processing subunit is configured to perform convolution processing on the second superimposed layer after dimensionally superimposing the sixth feature layer and the second feature layer to obtain a second superimposed layer, and up-sampling the second superimposed layer after convolution processing to obtain a seventh feature layer with the same dimension as the first feature layer;
第四图层处理子单元,用于将所述第七特征图层与所述第一特征图层进行维度叠加得到第三叠加图层之后,对所述第三叠加图层进行卷积处理,并对卷积处理后的第三叠加图层进行上采样,得到所述目标特征图层。The fourth layer processing subunit is configured to perform convolution processing on the third overlapping layer after dimensionally overlapping the seventh feature layer and the first feature layer to obtain a third overlapping layer, and up-sampling the third superimposed layer after convolution processing to obtain the target feature layer.
优选地,第一位置重合度确定模块40,包括:Preferably, the first position coincidence degree determination module 40 includes:
交并位置信息确定子模块,用于确定所述第二位置信息与所述预测位置信息之间的交集位置信息,同时确定所述第二位置信息与所述预测位置信息之间的并集位置信息;an intersection position information determination submodule, configured to determine the intersection position information between the second position information and the predicted position information, and simultaneously determine the union position between the second position information and the predicted position information information;
位置重合度确定子模块,用于根据所述交集位置信息以及所述并集位置信息,确定所述位置重合度。The position coincidence degree determination submodule is configured to determine the position coincidence degree according to the intersection position information and the union position information.
优选地,基于目标检测网络的目标跟踪装置还包括:Preferably, the target tracking device based on the target detection network further includes:
第一匹配失败总次数累加模块,用于在所述第一跟踪匹配结果为匹配失败结果时,将匹配失败总次数累加一;a first matching failure total number accumulation module, configured to add one to the total number of matching failures when the first tracking matching result is a matching failure result;
第二位置信息获取模块,用于在所述匹配失败总次数小于预设失败阈值时,获取所述待检测视频中第三待检测图像,以及目标对象在所述第三待检测图像中的第三位置信息;所述第三待检测图像是指与所述第二待检测图像时间相邻且时间位于所述第二待检测图像之后的图像;A second location information acquisition module, configured to acquire the third image to be detected in the video to be detected, and the third image to be detected of the target object in the third image to be detected when the total number of matching failures is less than the preset failure threshold Three position information; the third image to be detected refers to an image that is time-adjacent to the second image to be detected and located after the second image to be detected;
第二位置信息预测模块,用于根据所述第二位置信息,采用卡尔曼滤波模型预测所述第三待检测图像中目标对象的第二预测位置信息,并确定与所述第二预测位置信息对应的第三ROI区域;The second position information prediction module is configured to use the Kalman filter model to predict the second predicted position information of the target object in the third to-be-detected image according to the second position information, and determine the difference between the second predicted position information and the second predicted position information. The corresponding third ROI area;
第二ROI区域提取模块,用于根据所述第三位置信息,对所述第三待检测图像进行ROI区域提取,得到第四ROI区域;A second ROI region extraction module, configured to perform ROI region extraction on the third to-be-detected image according to the third position information to obtain a fourth ROI region;
第二位置重合度确定模块,用于确定所述第三ROI区域与所述第四ROI区域之间的第二最小余弦距离,同时确定所述第二位置信息与所述预测位置信息之间的第二位置重合度;A second position coincidence degree determination module, configured to determine the second minimum cosine distance between the third ROI area and the fourth ROI area, and at the same time determine the difference between the second position information and the predicted position information The second position coincidence degree;
第二跟踪匹配模块,用于根据所述第二最小余弦距离以及所述第二位置重合度,通过匈牙利算法确定所述目标对象在第三待检测图像中的第二跟踪匹配结果。The second tracking matching module is configured to determine, according to the second minimum cosine distance and the second position coincidence degree, the second tracking matching result of the target object in the third image to be detected by using the Hungarian algorithm.
优选地,基于目标检测网络的目标跟踪装置,包括:Preferably, the target tracking device based on the target detection network includes:
第二匹配失败总次数累加模块,用于在所述第二跟踪匹配结果为匹配失败结果时,将匹配失败总次数累加一;A second accumulating module for the total number of matching failures, configured to add one to the total number of matching failures when the second tracking matching result is a matching failure result;
跟踪结束确认模块,用于在所述匹配失败总次数大于或等于所述预设失败阈值时,删除与所述目标对象关联的跟踪ID,并确认对所述目标对象跟踪结束。A tracking end confirmation module, configured to delete the tracking ID associated with the target object when the total number of matching failures is greater than or equal to the preset failure threshold, and confirm that the tracking of the target object is ended.
关于基于目标检测网络的目标跟踪装置的具体限定可以参见上文中对于基于目标检测网络的目标跟踪方法的限定,在此不再赘述。上述基于目标检测网络的目标跟踪装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For the specific definition of the target tracking device based on the target detection network, reference may be made to the above definition of the target tracking method based on the target detection network, which will not be repeated here. Each module in the above-mentioned target tracking device based on target detection network can be implemented in whole or in part by software, hardware and combinations thereof. The above modules can be embedded in or independent of the processor in the computer device in the form of hardware, or stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图8所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括可读存储介质、内存储器。该可读存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为可读存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储上述实施例中基于目标检测网络的目标跟踪方法所使用到的数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种基于目标检测网络的目标跟踪方法。本实施例所提供的可读存储介质包括非易失性可读存储介质和易失性可读存储介质。In one embodiment, a computer device is provided, and the computer device may be a server, and its internal structure diagram may be as shown in FIG. 8 . The computer device includes a processor, memory, a network interface, and a database connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a readable storage medium, an internal memory. The readable storage medium stores an operating system, computer readable instructions and a database. The internal memory provides an environment for the execution of the operating system and computer-readable instructions in the readable storage medium. The database of the computer device is used to store the data used by the target tracking method based on the target detection network in the above embodiment. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer readable instructions, when executed by a processor, implement a target tracking method based on a target detection network. The readable storage medium provided by this embodiment includes a non-volatile readable storage medium and a volatile readable storage medium.
在一个实施例中,提供了一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:In one embodiment, there is provided a computer apparatus comprising a memory, a processor, and computer readable instructions stored in the memory and executable on the processor, the processor executing the computer readable instructions Implement the following steps when instructing:
获取目标对象在第一待检测图像中的第一位置信息,以及目标对象在第二待检测图像中的第二位置信息;所述第二待检测图像是指在待检测视频中与所述第一待检测图像时间相邻且时间位于所述第一待检测图像之后的图像;Obtain the first position information of the target object in the first image to be detected, and the second position information of the target object in the second image to be detected; an image whose time is adjacent to the image to be detected and whose time is located after the first image to be detected;
根据所述第一位置信息,采用卡尔曼滤波模型预测所述第二待检测图像中目标对象的第一预测位置信息,并确定与所述第一预测位置信息对应的第一ROI区域;According to the first position information, the Kalman filter model is used to predict the first predicted position information of the target object in the second to-be-detected image, and the first ROI area corresponding to the first predicted position information is determined;
根据所述第二位置信息,对所述第二待检测图像进行ROI区域提取,得到第二ROI区域;According to the second position information, extract the ROI area on the second to-be-detected image to obtain a second ROI area;
确定所述第一ROI区域与所述第二ROI区域之间的第一最小余弦距离,同时确定所述第二位置信息与所述第一预测位置信息之间的第一位置重合度;determining a first minimum cosine distance between the first ROI region and the second ROI region, and simultaneously determining a first position coincidence degree between the second position information and the first predicted position information;
根据所述第一最小余弦距离以及所述第一位置重合度,通过匈牙利算法确定所述目标对象在第二待检测图像中的第一跟踪匹配结果。According to the first minimum cosine distance and the first position coincidence degree, the first tracking matching result of the target object in the second image to be detected is determined by the Hungarian algorithm.
在一实施例中,提供了一个或多个存储有计算机可读指令的可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:In one embodiment, one or more readable storage media are provided that store computer-readable instructions that, when executed by one or more processors, cause the one or more processors to execute Follow the steps below:
获取目标对象在第一待检测图像中的第一位置信息,以及目标对象在第二待检测图像中的第二位置信息;所述第二待检测图像是指在待检测视频中与所述第一待检测图像时间相邻且时间位于所述第一待检测图像之后的图像;Obtain the first position information of the target object in the first image to be detected, and the second position information of the target object in the second image to be detected; an image whose time is adjacent to the image to be detected and whose time is located after the first image to be detected;
根据所述第一位置信息,采用卡尔曼滤波模型预测所述第二待检测图像中目标对象的第一预测位置信息,并确定与所述第一预测位置信息对应的第一ROI区域;According to the first position information, the Kalman filter model is used to predict the first predicted position information of the target object in the second to-be-detected image, and the first ROI area corresponding to the first predicted position information is determined;
根据所述第二位置信息,对所述第二待检测图像进行ROI区域提取,得到第二ROI区域;According to the second position information, extract the ROI area on the second to-be-detected image to obtain a second ROI area;
确定所述第一ROI区域与所述第二ROI区域之间的第一最小余弦距离,同时确定所述第二位置信息与所述第一预测位置信息之间的第一位置重合度;determining a first minimum cosine distance between the first ROI region and the second ROI region, and simultaneously determining a first position coincidence degree between the second position information and the first predicted position information;
根据所述第一最小余弦距离以及所述第一位置重合度,通过匈牙利算法确定所述目标对象在第二待检测图像中的第一跟踪匹配结果。According to the first minimum cosine distance and the first position coincidence degree, the first tracking matching result of the target object in the second image to be detected is determined by the Hungarian algorithm.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质或者易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through computer-readable instructions, and the computer-readable instructions can be stored in a non-volatile computer. In a readable storage medium or a volatile computer-readable storage medium, the computer-readable instructions, when executed, may include the processes of the foregoing method embodiments. Wherein, any reference to memory, storage, database or other medium used in the various embodiments provided in this application may include non-volatile and/or volatile memory. Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。Those skilled in the art can clearly understand that, for the convenience and simplicity of description, only the division of the above-mentioned functional units and modules is used as an example. Module completion, that is, dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above.
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the above-mentioned embodiments, those of ordinary skill in the art should understand that: it is still possible to implement the above-mentioned implementations. The technical solutions described in the examples are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions in the embodiments of the application, and should be included in the within the scope of protection of this application.

Claims (20)

  1. 一种基于目标检测网络的目标跟踪方法,其中,包括:A target tracking method based on target detection network, comprising:
    获取目标对象在第一待检测图像中的第一位置信息,以及目标对象在第二待检测图像中的第二位置信息;所述第二待检测图像是指在待检测视频中与所述第一待检测图像时间相邻且时间位于所述第一待检测图像之后的图像;Obtain the first position information of the target object in the first image to be detected, and the second position information of the target object in the second image to be detected; an image whose time is adjacent to the image to be detected and whose time is located after the first image to be detected;
    根据所述第一位置信息,采用卡尔曼滤波模型预测所述第二待检测图像中目标对象的第一预测位置信息,并确定与所述第一预测位置信息对应的第一ROI区域;According to the first position information, the Kalman filter model is used to predict the first predicted position information of the target object in the second to-be-detected image, and the first ROI area corresponding to the first predicted position information is determined;
    根据所述第二位置信息,对所述第二待检测图像进行ROI区域提取,得到第二ROI区域;According to the second position information, extract the ROI area on the second to-be-detected image to obtain a second ROI area;
    确定所述第一ROI区域与所述第二ROI区域之间的第一最小余弦距离,同时确定所述第二位置信息与所述第一预测位置信息之间的第一位置重合度;determining a first minimum cosine distance between the first ROI region and the second ROI region, and simultaneously determining a first position coincidence degree between the second position information and the first predicted position information;
    根据所述第一最小余弦距离以及所述第一位置重合度,通过匈牙利算法确定所述目标对象在第二待检测图像中的第一跟踪匹配结果。According to the first minimum cosine distance and the first position coincidence degree, the first tracking matching result of the target object in the second image to be detected is determined by the Hungarian algorithm.
  2. 如权利要求1所述的基于目标检测网络的目标跟踪方法,其中,所述获取目标对象在第一待检测图像中的第一位置信息,以及目标对象在第二待检测图像中的第二位置信息,包括:The target tracking method based on a target detection network according to claim 1, wherein the acquiring first position information of the target object in the first to-be-detected image and the second position of the target object in the second to-be-detected image information, including:
    获取待检测视频,所述待检测视频中包含多帧待检测图像;Obtaining a video to be detected, the video to be detected includes multiple frames of images to be detected;
    将所述待检测视频中任意一帧所述待检测图像记录为所述第一待检测图像;recording the to-be-detected image of any frame in the to-be-detected video as the first to-be-detected image;
    通过目标检测网络对所述第一待检测图像进行目标检测,得到所述第一位置信息;同时通过目标检测网络对所述第二待检测图像进行目标检测,得到所述第二位置信息。Target detection is performed on the first image to be detected through a target detection network to obtain the first position information; meanwhile, target detection is performed on the second image to be detected through a target detection network to obtain the second position information.
  3. 如权利要求2所述的基于目标检测网络的目标跟踪方法,其中,所述通过目标检测网络对被选取的所述第一待检测图像进行目标检测,得到所述第一位置信息,包括:The target tracking method based on a target detection network according to claim 2, wherein the target detection is performed on the selected first image to be detected through a target detection network to obtain the first position information, comprising:
    将所述第一待检测图像输入至所述目标检测网络中的骨干网络,以对所述第一待检测图像进行下采样处理,得到与所述第一待检测图像对应的多个待检测特征图层;Inputting the first image to be detected into the backbone network in the target detection network to perform downsampling processing on the first image to be detected to obtain a plurality of features to be detected corresponding to the first image to be detected layer;
    对各所述待检测特征图层依次进行图层处理,得到与所述第一待检测图像对应的目标特征图层;Perform layer processing on each of the feature layers to be detected in sequence to obtain a target feature layer corresponding to the first image to be detected;
    对所述目标特征图层进行位置特征提取,得到所述第一位置信息。Perform location feature extraction on the target feature layer to obtain the first location information.
  4. 如权利要求3所述的基于目标检测网络的目标跟踪方法,其中,所述待检测特征图层中包括第一特征图层、第二特征图层、第三特征图层以及第四特征图层;所述对各所述待检测特征图层依次进行图层处理,得到与所述第一待检测图像对应的目标特征图层,包括:The target tracking method based on a target detection network according to claim 3, wherein the feature layers to be detected include a first feature layer, a second feature layer, a third feature layer and a fourth feature layer ; The layer processing is performed on each of the feature layers to be detected in turn to obtain a target feature layer corresponding to the first image to be detected, including:
    对所述第四特征图层进行卷积处理,并对卷积处理后的第四特征图层进行上采样,得到与第三特征图层具有相同维度的第五特征图层;Performing convolution processing on the fourth feature layer, and up-sampling the fourth feature layer after the convolution processing, to obtain a fifth feature layer having the same dimension as the third feature layer;
    将所述第五特征图层与所述第三特征图层进行维度叠加得到第一叠加图层之后,对所述第一叠加图层进行卷积处理,并对卷积处理后的第一叠加图层进行上采样,得到与第二特征图层具有相同维度的第六特征图层;After the fifth feature layer and the third feature layer are dimensionally superimposed to obtain a first superimposed layer, convolution processing is performed on the first superimposed layer, and the first superimposed layer after the convolution processing is processed. The layer is upsampled to obtain a sixth feature layer with the same dimension as the second feature layer;
    将所述第六特征图层与所述第二特征图层进行维度叠加得到第二叠加图层之后,对所述第二叠加图层进行卷积处理,并对卷积处理后的第二叠加图层进行上采样,得到与第一特征图层具有相同维度的第七特征图层;After the sixth feature layer and the second feature layer are dimensionally superimposed to obtain a second superimposed layer, convolution processing is performed on the second superimposed layer, and the second superimposed layer after the convolution processing is processed. The layer is upsampled to obtain a seventh feature layer with the same dimension as the first feature layer;
    将所述第七特征图层与所述第一特征图层进行维度叠加得到第三叠加图层之后,对所述第三叠加图层进行卷积处理,并对卷积处理后的第三叠加图层进行上采样,得到所述目标特征图层。After the seventh feature layer and the first feature layer are dimensionally superimposed to obtain a third superimposed layer, convolution processing is performed on the third superimposed layer, and the third superimposed layer after the convolution processing is processed. The layer is upsampled to obtain the target feature layer.
  5. 如权利要求1所述的基于目标检测网络的目标跟踪方法,其中,所述确定所述第二位置信息与所述第一预测位置信息之间的位置重合度,包括:The target tracking method based on a target detection network according to claim 1, wherein the determining the position coincidence degree between the second position information and the first predicted position information comprises:
    确定所述第二位置信息与所述第一预测位置信息之间的交集位置信息,同时确定所述第二位置信息与所述第一预测位置信息之间的并集位置信息;determining the intersection location information between the second location information and the first predicted location information, and simultaneously determining the union location information between the second location information and the first predicted location information;
    根据所述交集位置信息以及所述并集位置信息,确定所述位置重合度。The position coincidence degree is determined according to the intersection position information and the union position information.
  6. 如权利要求1所述的基于目标检测网络的目标跟踪方法,其中,所述通过匈牙利算法确定所述目标对象在第二待检测图像中的第一跟踪匹配结果之后,包括:The target tracking method based on a target detection network according to claim 1, wherein after determining the first tracking matching result of the target object in the second image to be detected by the Hungarian algorithm, the method comprises:
    在所述第一跟踪匹配结果为匹配失败结果时,将匹配失败总次数累加一;When the first tracking matching result is a matching failure result, adding one to the total number of matching failures;
    在预设检测时间内,所述匹配失败总次数小于预设失败阈值时,获取所述待检测视频中第三待检测图像,以及目标对象在所述第三待检测图像中的第三位置信息;所述第三待检测图像是指与所述第二待检测图像时间相邻且时间位于所述第二待检测图像之后的图像;During the preset detection time, when the total number of matching failures is less than the preset failure threshold, acquire the third image to be detected in the video to be detected, and the third position information of the target object in the third image to be detected ; the third to-be-detected image refers to an image time-adjacent to the second to-be-detected image and after the second to-be-detected image;
    根据所述第二位置信息,采用卡尔曼滤波模型预测所述第三待检测图像中目标对象的第二预测位置信息,并确定与所述第二预测位置信息对应的第三ROI区域;According to the second position information, the Kalman filter model is used to predict the second predicted position information of the target object in the third to-be-detected image, and the third ROI area corresponding to the second predicted position information is determined;
    根据所述第三位置信息,对所述第三待检测图像进行ROI区域提取,得到第四ROI区域;According to the third position information, extract the ROI area on the third to-be-detected image to obtain a fourth ROI area;
    确定所述第三ROI区域与所述第四ROI区域之间的第二最小余弦距离,同时确定所述第二位置信息与所述预测位置信息之间的第二位置重合度;determining a second minimum cosine distance between the third ROI region and the fourth ROI region, and simultaneously determining a second position coincidence degree between the second position information and the predicted position information;
    根据所述第二最小余弦距离以及所述第二位置重合度,通过匈牙利算法确定所述目标对象在第三待检测图像中的第二跟踪匹配结果。According to the second minimum cosine distance and the second position coincidence degree, the second tracking matching result of the target object in the third image to be detected is determined by the Hungarian algorithm.
  7. 如权利要求6所述的基于目标检测网络的目标跟踪方法,其中,所述通过匈牙利算法确定所述目标对象在第三待检测图像中的第二跟踪匹配结果之后,包括:The target tracking method based on a target detection network according to claim 6, wherein after determining the second tracking matching result of the target object in the third to-be-detected image by the Hungarian algorithm, the method comprises:
    在预设检测时间内,所述第二跟踪匹配结果为匹配失败结果时,将匹配失败总次数累加一;Within the preset detection time, when the second tracking matching result is a matching failure result, adding one to the total number of matching failures;
    在所述匹配失败总次数大于或等于所述预设失败阈值时,删除与所述目标对象关联的跟踪ID,并确认对所述目标对象跟踪结束。When the total number of matching failures is greater than or equal to the preset failure threshold, delete the tracking ID associated with the target object, and confirm that the tracking of the target object ends.
  8. 一种基于目标检测网络的目标跟踪装置,其中,包括:A target tracking device based on a target detection network, comprising:
    第一位置信息获取模块,用于获取目标对象在第一待检测图像中的第一位置信息,以及目标对象在第二待检测图像中的第二位置信息;所述第二待检测图像是指在待检测视频中与所述第一待检测图像时间相邻且时间位于所述第一待检测图像之后的图像;The first position information acquisition module is used to acquire the first position information of the target object in the first image to be detected, and the second position information of the target object in the second image to be detected; the second image to be detected refers to An image in the video to be detected that is time-adjacent to the first image to be detected and located after the first image to be detected;
    第一位置信息预测模块,用于根据所述第一位置信息,采用卡尔曼滤波模型预测所述第二待检测图像中目标对象的第一预测位置信息,并确定与所述第一预测位置信息对应的第一ROI区域;The first position information prediction module is used to predict the first predicted position information of the target object in the second to-be-detected image by using the Kalman filter model according to the first position information, and determine the first predicted position information corresponding to the first predicted position information. The corresponding first ROI area;
    第一ROI区域提取模块,用于根据所述第二位置信息,对所述第二待检测图像进行ROI区域提取,得到第二ROI区域;a first ROI region extraction module, configured to perform ROI region extraction on the second to-be-detected image according to the second position information to obtain a second ROI region;
    第一位置重合度确定模块,用于确定所述第一ROI区域与所述第二ROI区域之间的第一最小余弦距离,同时确定所述第二位置信息与所述第一预测位置信息之间的第一位置重合度;The first position coincidence degree determination module is used to determine the first minimum cosine distance between the first ROI area and the second ROI area, and at the same time determine the difference between the second position information and the first predicted position information. The first position coincidence degree between ;
    第一跟踪匹配模块,用于根据所述第一最小余弦距离以及所述第一位置重合度,通过匈牙利算法确定所述目标对象在第二待检测图像中的第一跟踪匹配结果。The first tracking matching module is configured to determine the first tracking matching result of the target object in the second image to be detected by using the Hungarian algorithm according to the first minimum cosine distance and the first position coincidence degree.
  9. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其中,所述处理器执行所述计算机可读指令时实现如下步骤:A computer device comprising a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, wherein the processor implements the following steps when executing the computer-readable instructions:
    获取目标对象在第一待检测图像中的第一位置信息,以及目标对象在第二待检测图像中的第二位置信息;所述第二待检测图像是指在待检测视频中与所述第一待检测图像时间相邻且时间位于所述第一待检测图像之后的图像;Obtain the first position information of the target object in the first image to be detected, and the second position information of the target object in the second image to be detected; an image whose time is adjacent to the image to be detected and whose time is located after the first image to be detected;
    根据所述第一位置信息,采用卡尔曼滤波模型预测所述第二待检测图像中目标对象的第一预测位置信息,并确定与所述第一预测位置信息对应的第一ROI区域;According to the first position information, the Kalman filter model is used to predict the first predicted position information of the target object in the second to-be-detected image, and the first ROI area corresponding to the first predicted position information is determined;
    根据所述第二位置信息,对所述第二待检测图像进行ROI区域提取,得到第二ROI区域;According to the second position information, extract the ROI area on the second to-be-detected image to obtain a second ROI area;
    确定所述第一ROI区域与所述第二ROI区域之间的第一最小余弦距离,同时确定所述第二位置信息与所述第一预测位置信息之间的第一位置重合度;determining a first minimum cosine distance between the first ROI region and the second ROI region, and simultaneously determining a first position coincidence degree between the second position information and the first predicted position information;
    根据所述第一最小余弦距离以及所述第一位置重合度,通过匈牙利算法确定所述目标对象在第二待检测图像中的第一跟踪匹配结果。According to the first minimum cosine distance and the first position coincidence degree, the first tracking matching result of the target object in the second image to be detected is determined by the Hungarian algorithm.
  10. 如权利要求9所述的计算机设备,其中,所述获取目标对象在第一待检测图像中的第一位置信息,以及目标对象在第二待检测图像中的第二位置信息,包括:The computer device according to claim 9, wherein the acquiring the first position information of the target object in the first image to be detected and the second position information of the target object in the second image to be detected comprises:
    获取待检测视频,所述待检测视频中包含多帧待检测图像;Obtaining a video to be detected, the video to be detected includes multiple frames of images to be detected;
    将所述待检测视频中任意一帧所述待检测图像记录为所述第一待检测图像;recording the to-be-detected image of any frame in the to-be-detected video as the first to-be-detected image;
    通过目标检测网络对所述第一待检测图像进行目标检测,得到所述第一位置信息;同时通过目标检测网络对所述第二待检测图像进行目标检测,得到所述第二位置信息。Target detection is performed on the first image to be detected through a target detection network to obtain the first position information; meanwhile, target detection is performed on the second image to be detected through a target detection network to obtain the second position information.
  11. 如权利要求10所述的计算机设备,其中,所述通过目标检测网络对被选取的所述第一待检测图像进行目标检测,得到所述第一位置信息,包括:The computer device according to claim 10, wherein, performing target detection on the selected first image to be detected through a target detection network to obtain the first position information, comprising:
    将所述第一待检测图像输入至所述目标检测网络中的骨干网络,以对所述第一待检测图像进行下采样处理,得到与所述第一待检测图像对应的多个待检测特征图层;Inputting the first image to be detected into the backbone network in the target detection network to perform downsampling processing on the first image to be detected to obtain a plurality of features to be detected corresponding to the first image to be detected layer;
    对各所述待检测特征图层依次进行图层处理,得到与所述第一待检测图像对应的目标特征图层;Perform layer processing on each of the feature layers to be detected in sequence to obtain a target feature layer corresponding to the first image to be detected;
    对所述目标特征图层进行位置特征提取,得到所述第一位置信息。Perform location feature extraction on the target feature layer to obtain the first location information.
  12. 如权利要求11所述的计算机设备,其中,所述待检测特征图层中包括第一特征图层、第二特征图层、第三特征图层以及第四特征图层;所述对各所述待检测特征图层依次进行图层处理,得到与所述第一待检测图像对应的目标特征图层,包括:The computer device according to claim 11, wherein the feature layers to be detected include a first feature layer, a second feature layer, a third feature layer and a fourth feature layer; The to-be-detected feature layers are sequentially processed by layers to obtain a target feature layer corresponding to the first to-be-detected image, including:
    对所述第四特征图层进行卷积处理,并对卷积处理后的第四特征图层进行上采样,得到与第三特征图层具有相同维度的第五特征图层;Performing convolution processing on the fourth feature layer, and up-sampling the fourth feature layer after the convolution processing, to obtain a fifth feature layer having the same dimension as the third feature layer;
    将所述第五特征图层与所述第三特征图层进行维度叠加得到第一叠加图层之后,对所述第一叠加图层进行卷积处理,并对卷积处理后的第一叠加图层进行上采样,得到与第二特征图层具有相同维度的第六特征图层;After the fifth feature layer and the third feature layer are dimensionally superimposed to obtain a first superimposed layer, convolution processing is performed on the first superimposed layer, and the first superimposed layer after the convolution processing is processed. The layer is upsampled to obtain a sixth feature layer with the same dimension as the second feature layer;
    将所述第六特征图层与所述第二特征图层进行维度叠加得到第二叠加图层之后,对所述第二叠加图层进行卷积处理,并对卷积处理后的第二叠加图层进行上采样,得到与第一特征图层具有相同维度的第七特征图层;After the sixth feature layer and the second feature layer are dimensionally superimposed to obtain a second superimposed layer, convolution processing is performed on the second superimposed layer, and the second superimposed layer after the convolution processing is processed. The layer is upsampled to obtain a seventh feature layer with the same dimension as the first feature layer;
    将所述第七特征图层与所述第一特征图层进行维度叠加得到第三叠加图层之后,对所述第三叠加图层进行卷积处理,并对卷积处理后的第三叠加图层进行上采样,得到所述目标特征图层。After the seventh feature layer and the first feature layer are dimensionally superimposed to obtain a third superimposed layer, convolution processing is performed on the third superimposed layer, and the third superimposed layer after the convolution processing is processed. The layer is upsampled to obtain the target feature layer.
  13. 如权利要求9所述的计算机设备,其中,所述确定所述第二位置信息与所述第一预测位置信息之间的位置重合度,包括:The computer device according to claim 9, wherein the determining the position coincidence degree between the second position information and the first predicted position information comprises:
    确定所述第二位置信息与所述第一预测位置信息之间的交集位置信息,同时确定所述第二位置信息与所述第一预测位置信息之间的并集位置信息;determining the intersection location information between the second location information and the first predicted location information, and simultaneously determining the union location information between the second location information and the first predicted location information;
    根据所述交集位置信息以及所述并集位置信息,确定所述位置重合度。The position coincidence degree is determined according to the intersection position information and the union position information.
  14. 如权利要求9所述的计算机设备,其中,所述通过匈牙利算法确定所述目标对象在第二待检测图像中的第一跟踪匹配结果之后,包括:The computer device according to claim 9, wherein after determining the first tracking matching result of the target object in the second image to be detected by using the Hungarian algorithm, the method comprises:
    在所述第一跟踪匹配结果为匹配失败结果时,将匹配失败总次数累加一;When the first tracking matching result is a matching failure result, adding one to the total number of matching failures;
    在预设检测时间内,所述匹配失败总次数小于预设失败阈值时,获取所述待检测视频中第三待检测图像,以及目标对象在所述第三待检测图像中的第三位置信息;所述第三待检测图像是指与所述第二待检测图像时间相邻且时间位于所述第二待检测图像之后的图像;During the preset detection time, when the total number of matching failures is less than the preset failure threshold, acquire the third image to be detected in the video to be detected, and the third position information of the target object in the third image to be detected ; the third to-be-detected image refers to an image time-adjacent to the second to-be-detected image and after the second to-be-detected image;
    根据所述第二位置信息,采用卡尔曼滤波模型预测所述第三待检测图像中目标对象的第二预测位置信息,并确定与所述第二预测位置信息对应的第三ROI区域;According to the second position information, the Kalman filter model is used to predict the second predicted position information of the target object in the third to-be-detected image, and the third ROI area corresponding to the second predicted position information is determined;
    根据所述第三位置信息,对所述第三待检测图像进行ROI区域提取,得到第四ROI区域;According to the third position information, extract the ROI area on the third to-be-detected image to obtain a fourth ROI area;
    确定所述第三ROI区域与所述第四ROI区域之间的第二最小余弦距离,同时确定所述第二位置信息与所述预测位置信息之间的第二位置重合度;determining a second minimum cosine distance between the third ROI region and the fourth ROI region, and simultaneously determining a second position coincidence degree between the second position information and the predicted position information;
    根据所述第二最小余弦距离以及所述第二位置重合度,通过匈牙利算法确定所述目标对象在第三待检测图像中的第二跟踪匹配结果。According to the second minimum cosine distance and the second position coincidence degree, the second tracking matching result of the target object in the third image to be detected is determined by the Hungarian algorithm.
  15. 一个或多个存储有计算机可读指令的可读存储介质,其中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:One or more readable storage media storing computer-readable instructions, wherein the computer-readable instructions, when executed by one or more processors, cause the one or more processors to perform the following steps:
    获取目标对象在第一待检测图像中的第一位置信息,以及目标对象在第二待检测图像中的第二位置信息;所述第二待检测图像是指在待检测视频中与所述第一待检测图像时间相邻且时间位于所述第一待检测图像之后的图像;Obtain the first position information of the target object in the first image to be detected, and the second position information of the target object in the second image to be detected; an image whose time is adjacent to the image to be detected and whose time is located after the first image to be detected;
    根据所述第一位置信息,采用卡尔曼滤波模型预测所述第二待检测图像中目标对象的第一预测位置信息,并确定与所述第一预测位置信息对应的第一ROI区域;According to the first position information, the Kalman filter model is used to predict the first predicted position information of the target object in the second to-be-detected image, and the first ROI area corresponding to the first predicted position information is determined;
    根据所述第二位置信息,对所述第二待检测图像进行ROI区域提取,得到第二ROI区域;According to the second position information, extract the ROI area on the second to-be-detected image to obtain a second ROI area;
    确定所述第一ROI区域与所述第二ROI区域之间的第一最小余弦距离,同时确定所述第二位置信息与所述第一预测位置信息之间的第一位置重合度;determining a first minimum cosine distance between the first ROI region and the second ROI region, and simultaneously determining a first position coincidence degree between the second position information and the first predicted position information;
    根据所述第一最小余弦距离以及所述第一位置重合度,通过匈牙利算法确定所述目标对象在第二待检测图像中的第一跟踪匹配结果。According to the first minimum cosine distance and the first position coincidence degree, the first tracking matching result of the target object in the second image to be detected is determined by the Hungarian algorithm.
  16. 如权利要求15所述的可读存储介质,其中,所述获取目标对象在第一待检测图像中的第一位置信息,以及目标对象在第二待检测图像中的第二位置信息,包括:The readable storage medium according to claim 15, wherein the acquiring the first position information of the target object in the first image to be detected and the second position information of the target object in the second image to be detected comprises:
    获取待检测视频,所述待检测视频中包含多帧待检测图像;Obtaining a video to be detected, the video to be detected includes multiple frames of images to be detected;
    将所述待检测视频中任意一帧所述待检测图像记录为所述第一待检测图像;recording the to-be-detected image of any frame in the to-be-detected video as the first to-be-detected image;
    通过目标检测网络对所述第一待检测图像进行目标检测,得到所述第一位置信息;同时通过目标检测网络对所述第二待检测图像进行目标检测,得到所述第二位置信息。Target detection is performed on the first image to be detected through a target detection network to obtain the first position information; meanwhile, target detection is performed on the second image to be detected through a target detection network to obtain the second position information.
  17. 如权利要求16所述的可读存储介质,其中,所述通过目标检测网络对被选取的所述第一待检测图像进行目标检测,得到所述第一位置信息,包括:The readable storage medium according to claim 16, wherein the performing target detection on the selected first image to be detected through a target detection network to obtain the first position information comprises:
    将所述第一待检测图像输入至所述目标检测网络中的骨干网络,以对所述第一待检测图像进行下采样处理,得到与所述第一待检测图像对应的多个待检测特征图层;Inputting the first image to be detected into the backbone network in the target detection network to perform downsampling processing on the first image to be detected to obtain a plurality of features to be detected corresponding to the first image to be detected layer;
    对各所述待检测特征图层依次进行图层处理,得到与所述第一待检测图像对应的目标特征图层;Perform layer processing on each of the feature layers to be detected in sequence to obtain a target feature layer corresponding to the first image to be detected;
    对所述目标特征图层进行位置特征提取,得到所述第一位置信息。Perform location feature extraction on the target feature layer to obtain the first location information.
  18. 如权利要求17所述的可读存储介质,其中,所述待检测特征图层中包括第一特征图层、第二特征图层、第三特征图层以及第四特征图层;所述对各所述待检测特征图层依次进行图层处理,得到与所述第一待检测图像对应的目标特征图层,包括:The readable storage medium of claim 17, wherein the feature layers to be detected include a first feature layer, a second feature layer, a third feature layer, and a fourth feature layer; the pair of Each of the feature layers to be detected performs layer processing in sequence to obtain a target feature layer corresponding to the first image to be detected, including:
    对所述第四特征图层进行卷积处理,并对卷积处理后的第四特征图层进行上采样,得到与第三特征图层具有相同维度的第五特征图层;Performing convolution processing on the fourth feature layer, and up-sampling the fourth feature layer after the convolution processing, to obtain a fifth feature layer having the same dimension as the third feature layer;
    将所述第五特征图层与所述第三特征图层进行维度叠加得到第一叠加图层之后,对所述第一叠加图层进行卷积处理,并对卷积处理后的第一叠加图层进行上采样,得到与第二特征图层具有相同维度的第六特征图层;After the fifth feature layer and the third feature layer are dimensionally superimposed to obtain a first superimposed layer, convolution processing is performed on the first superimposed layer, and the first superimposed layer after the convolution processing is processed. The layer is upsampled to obtain a sixth feature layer with the same dimension as the second feature layer;
    将所述第六特征图层与所述第二特征图层进行维度叠加得到第二叠加图层之后,对所述第二叠加图层进行卷积处理,并对卷积处理后的第二叠加图层进行上采样,得到与第一特征图层具有相同维度的第七特征图层;After the sixth feature layer and the second feature layer are dimensionally superimposed to obtain a second superimposed layer, convolution processing is performed on the second superimposed layer, and the second superimposed layer after the convolution processing is processed. The layer is upsampled to obtain a seventh feature layer with the same dimension as the first feature layer;
    将所述第七特征图层与所述第一特征图层进行维度叠加得到第三叠加图层之后,对所述第三叠加图层进行卷积处理,并对卷积处理后的第三叠加图层进行上采样,得到所述目标特征图层。After the seventh feature layer and the first feature layer are dimensionally superimposed to obtain a third superimposed layer, convolution processing is performed on the third superimposed layer, and the third superimposed layer after the convolution processing is processed. The layer is upsampled to obtain the target feature layer.
  19. 如权利要求15所述的可读存储介质,其中,所述确定所述第二位置信息与所述第一预测位置信息之间的位置重合度,包括:The readable storage medium of claim 15, wherein the determining a position coincidence degree between the second position information and the first predicted position information comprises:
    确定所述第二位置信息与所述第一预测位置信息之间的交集位置信息,同时确定所述第二位置信息与所述第一预测位置信息之间的并集位置信息;determining the intersection location information between the second location information and the first predicted location information, and simultaneously determining the union location information between the second location information and the first predicted location information;
    根据所述交集位置信息以及所述并集位置信息,确定所述位置重合度。The position coincidence degree is determined according to the intersection position information and the union position information.
  20. 如权利要求15所述的可读存储介质,其中,所述通过匈牙利算法确定所述目标对象在第二待检测图像中的第一跟踪匹配结果之后,包括:The readable storage medium according to claim 15, wherein after determining the first tracking matching result of the target object in the second image to be detected by using the Hungarian algorithm, the method comprises:
    在所述第一跟踪匹配结果为匹配失败结果时,将匹配失败总次数累加一;When the first tracking matching result is a matching failure result, accumulating the total number of matching failures by one;
    在预设检测时间内,所述匹配失败总次数小于预设失败阈值时,获取所述待检测视频中第三待检测图像,以及目标对象在所述第三待检测图像中的第三位置信息;所述第三待检测图像是指与所述第二待检测图像时间相邻且时间位于所述第二待检测图像之后的图像;During the preset detection time, when the total number of matching failures is less than the preset failure threshold, acquire the third image to be detected in the video to be detected, and the third position information of the target object in the third image to be detected ; The third to-be-detected image refers to an image that is time-adjacent to the second to-be-detected image and located after the second to-be-detected image;
    根据所述第二位置信息,采用卡尔曼滤波模型预测所述第三待检测图像中目标对象的第二预测位置信息,并确定与所述第二预测位置信息对应的第三ROI区域;According to the second position information, the Kalman filter model is used to predict the second predicted position information of the target object in the third to-be-detected image, and the third ROI area corresponding to the second predicted position information is determined;
    根据所述第三位置信息,对所述第三待检测图像进行ROI区域提取,得到第四ROI区域;According to the third position information, extract the ROI area on the third image to be detected to obtain a fourth ROI area;
    确定所述第三ROI区域与所述第四ROI区域之间的第二最小余弦距离,同时确定所述第二位置信息与所述预测位置信息之间的第二位置重合度;determining a second minimum cosine distance between the third ROI region and the fourth ROI region, and simultaneously determining a second position coincidence degree between the second position information and the predicted position information;
    根据所述第二最小余弦距离以及所述第二位置重合度,通过匈牙利算法确定所述目标对象在第三待检测图像中的第二跟踪匹配结果。According to the second minimum cosine distance and the second position coincidence degree, the second tracking matching result of the target object in the third image to be detected is determined by the Hungarian algorithm.
PCT/CN2021/096757 2021-04-22 2021-05-28 Target detection network-based target tracking method and apparatus, device, and medium WO2022222227A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110434628.2 2021-04-22
CN202110434628.2A CN113159032B (en) 2021-04-22 2021-04-22 Target tracking method, device, equipment and medium based on target detection network

Publications (1)

Publication Number Publication Date
WO2022222227A1 true WO2022222227A1 (en) 2022-10-27

Family

ID=76869290

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/096757 WO2022222227A1 (en) 2021-04-22 2021-05-28 Target detection network-based target tracking method and apparatus, device, and medium

Country Status (2)

Country Link
CN (1) CN113159032B (en)
WO (1) WO2022222227A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113688699B (en) * 2021-08-09 2024-03-08 平安科技(深圳)有限公司 Target object detection method and device, storage medium and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9811916B1 (en) * 2013-09-25 2017-11-07 Amazon Technologies, Inc. Approaches for head tracking
CN110796686A (en) * 2019-10-29 2020-02-14 浙江大华技术股份有限公司 Target tracking method and device and storage device
CN110866428A (en) * 2018-08-28 2020-03-06 杭州海康威视数字技术股份有限公司 Target tracking method and device, electronic equipment and storage medium
CN111127513A (en) * 2019-12-02 2020-05-08 北京交通大学 Multi-target tracking method

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109635657B (en) * 2018-11-12 2023-01-06 平安科技(深圳)有限公司 Target tracking method, device, equipment and storage medium
CN109816690A (en) * 2018-12-25 2019-05-28 北京飞搜科技有限公司 Multi-target tracking method and system based on depth characteristic
CN110517292A (en) * 2019-08-29 2019-11-29 京东方科技集团股份有限公司 Method for tracking target, device, system and computer readable storage medium
CN112651994A (en) * 2020-12-18 2021-04-13 零八一电子集团有限公司 Ground multi-target tracking method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9811916B1 (en) * 2013-09-25 2017-11-07 Amazon Technologies, Inc. Approaches for head tracking
CN110866428A (en) * 2018-08-28 2020-03-06 杭州海康威视数字技术股份有限公司 Target tracking method and device, electronic equipment and storage medium
CN110796686A (en) * 2019-10-29 2020-02-14 浙江大华技术股份有限公司 Target tracking method and device and storage device
CN111127513A (en) * 2019-12-02 2020-05-08 北京交通大学 Multi-target tracking method

Also Published As

Publication number Publication date
CN113159032B (en) 2023-06-30
CN113159032A (en) 2021-07-23

Similar Documents

Publication Publication Date Title
CN109635657B (en) Target tracking method, device, equipment and storage medium
CN110569721B (en) Recognition model training method, image recognition method, device, equipment and medium
EP4148669A2 (en) Target tracking method for panoramic video, readable storage medium, and computer device
CN109344789B (en) Face tracking method and device
WO2019237516A1 (en) Target tracking method and apparatus, computer device, and storage medium
WO2015180100A1 (en) Facial landmark localization using coarse-to-fine cascaded neural networks
CN111402294B (en) Target tracking method, target tracking device, computer-readable storage medium and computer equipment
US10990829B2 (en) Stitching maps generated using simultaneous localization and mapping
CN111191533B (en) Pedestrian re-recognition processing method, device, computer equipment and storage medium
CN112989962B (en) Track generation method, track generation device, electronic equipment and storage medium
WO2021051547A1 (en) Violent behavior detection method and system
CN112749726B (en) Training method and device for target detection model, computer equipment and storage medium
WO2022222227A1 (en) Target detection network-based target tracking method and apparatus, device, and medium
WO2021227723A1 (en) Target detection method and apparatus, computer device and readable storage medium
CN111639513A (en) Ship shielding identification method and device and electronic equipment
CN112381071A (en) Behavior analysis method of target in video stream, terminal device and medium
US10963720B2 (en) Estimating grouped observations
WO2020024394A1 (en) Background elimination method and device, computer device and storage medium
CN113256683A (en) Target tracking method and related equipment
CN110706257B (en) Identification method of effective characteristic point pair, and camera state determination method and device
CN112070035A (en) Target tracking method and device based on video stream and storage medium
CN112927258A (en) Target tracking method and device
CN116091781B (en) Data processing method and device for image recognition
US20220122341A1 (en) Target detection method and apparatus, electronic device, and computer storage medium
CN114387296A (en) Target track tracking method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21937452

Country of ref document: EP

Kind code of ref document: A1