WO2023093086A1 - Target tracking method and apparatus, training method and apparatus for model related thereto, and device, medium and computer program product - Google Patents

Target tracking method and apparatus, training method and apparatus for model related thereto, and device, medium and computer program product Download PDF

Info

Publication number
WO2023093086A1
WO2023093086A1 PCT/CN2022/106523 CN2022106523W WO2023093086A1 WO 2023093086 A1 WO2023093086 A1 WO 2023093086A1 CN 2022106523 W CN2022106523 W CN 2022106523W WO 2023093086 A1 WO2023093086 A1 WO 2023093086A1
Authority
WO
WIPO (PCT)
Prior art keywords
matching
image
sample
information
mask image
Prior art date
Application number
PCT/CN2022/106523
Other languages
French (fr)
Chinese (zh)
Inventor
章国锋
鲍虎军
叶伟才
兰馨悦
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2023093086A1 publication Critical patent/WO2023093086A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/248Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the embodiment of the present disclosure is based on the Chinese patent application with the application number 202111424075.9, the application date is November 26, 2021, and the application name is "Target Tracking and Related Model Training Method and Related Devices, Equipment, and Media", and requires the Chinese The priority of the patent application, the entire content of the Chinese patent application is hereby incorporated by reference into this disclosure.
  • the present disclosure relates to but is not limited to the technical field of image processing, and in particular relates to a method, device, device, medium, and computer program product for training a target tracking and related model.
  • Object tracking technology is widely used in many application scenarios. Taking video panoptic segmentation (Video Panoptic Segmentation, VPS) as an example, it is not only required to generate consistent panoramic segmentation between frames, but also to achieve inter-frame tracking for all pixels, so as to improve the realization effect of many technologies such as automatic driving, video surveillance, and video editing. .
  • VPS Video Panoptic Segmentation
  • the existing target tracking methods still face many problems in terms of tracking accuracy, such as tracking loss, which seriously affects the implementation effect of target tracking when it is applied to the above-mentioned technologies such as automatic driving, video surveillance, and video editing.
  • tracking loss which seriously affects the implementation effect of target tracking when it is applied to the above-mentioned technologies such as automatic driving, video surveillance, and video editing.
  • how to improve the target tracking accuracy has become an urgent problem to be solved.
  • Embodiments of the present disclosure provide a target tracking and related model training method, device, device, medium, and computer program product.
  • the first aspect of the embodiments of the present disclosure provides an object tracking method, including: separately performing object segmentation on the first image and the second image, and obtaining the first mask image of the first object in the first image and the first mask image of the first object in the second image.
  • the second mask image of the two objects performing object matching in the feature dimension based on the first mask image and the second mask image to obtain the first matching information, and based on the first mask image and the second mask image in the spatial dimension performing object matching to obtain second matching information; fusing the first matching information and the second matching information to obtain tracking information; wherein the tracking information includes whether the first object and the second object are the same object.
  • target segmentation is performed on the first image and the second image respectively to obtain the first mask image of the first object in the first image and the second mask image of the second object in the second image
  • first Object matching is performed on the mask image and the second mask image in the feature dimension to obtain the first matching information
  • object matching is performed on the spatial dimension based on the first mask image and the second mask image to obtain the second matching information
  • the first matching information and the second matching information are fused together to obtain tracking information
  • the tracking information includes whether the first object and the object are the same object, that is, in the process of target tracking, on the one hand, in the feature dimension, between images
  • Object matching can help ensure the tracking effect of large-sized objects.
  • object matching between images in the spatial dimension can help ensure the tracking effect of small-sized objects, and based on this, the two matching
  • the matching information obtained by this method is used to obtain tracking information, so it can take into account both large-size objects and small-size objects, which is conducive to improving the accuracy of target tracking.
  • the second aspect of the present disclosure provides a method for training a target tracking model, including: obtaining a first sample mask image of a first sample object in a first sample image, and a first sample mask image of a second sample object in a second sample image. Two sample mask images and sample tracking information; wherein, the sample tracking information includes whether the first sample object and the second sample object are actually the same object; the first matching network based on the target tracking model combines the first sample mask image and the second sample object Object matching is performed on the second sample mask image in the feature dimension to obtain the first predicted matching information, and the first sample mask image and the second sample mask image are object-matched in the spatial dimension based on the second matching network of the target tracking model.
  • the predicted tracking information includes the first sample object and the second sample object Whether the object is predicted to be the same object; based on the difference between the sample tracking information and the predicted tracking information, the network parameters of the object tracking model are adjusted.
  • object matching between images in the feature dimension can help ensure the tracking effect of large-sized objects; on the other hand, object matching between images in the spatial dimension can help ensure the tracking effect of
  • the tracking effect of small-sized objects is based on the fusion of the matching information obtained by the two matching methods to obtain tracking information, so it can take into account both large-sized objects and small-sized objects, which is conducive to improving the accuracy of the target tracking model.
  • the third aspect of the embodiments of the present disclosure provides a target tracking device, including: a target segmentation part, an object matching part and an information fusion part, and the target segmentation part is configured to perform target segmentation on the first image and the second image respectively, to obtain The first mask image of the first object in the first image and the second mask image of the second object in the second image; the object matching part is configured to be based on the first mask image and the second mask image in the feature dimension Perform object matching to obtain first matching information, and perform object matching in the spatial dimension based on the first mask image and the second mask image to obtain second matching information; the information fusion part is configured to fuse the first matching information and the second mask image Second, matching information to obtain tracking information; wherein, the tracking information includes whether the first object and the second object are the same object.
  • the fourth aspect of an embodiment of the present disclosure provides a training device for a target tracking model, including: a sample acquisition part, a sample matching part, a sample fusion part, and a parameter adjustment part, and the sample acquisition part is configured to obtain the first sample image The first sample mask image of the first sample object, the second sample mask image of the second sample object in the second sample image, and sample tracking information; wherein, the sample tracking information includes the first sample object and the second sample Whether the object is actually the same object; the sample matching part is configured to match the first sample mask image and the second sample mask image in the feature dimension based on the first matching network of the target tracking model to obtain the first predicted match Information, and based on the second matching network of the target tracking model, the first sample mask image and the second sample mask image are matched in the spatial dimension to obtain the second predicted matching information; the sample fusion part is configured to use the target The information fusion network of the tracking model fuses the first predicted matching information and the second predicted matching information to obtain predicted tracking information; wherein, the predicted tracking information includes whether the first
  • a fifth aspect of an embodiment of the present disclosure provides an electronic device, including a memory and a processor coupled to each other, and the processor is configured to execute program instructions stored in the memory to implement the target tracking method in the first aspect above, or Realize the training method of the target tracking model in the second aspect above.
  • the sixth aspect of the embodiments of the present disclosure provides a computer-readable storage medium, on which program instructions are stored.
  • the program instructions are executed by a processor, the target tracking method in the above-mentioned first aspect is realized, or the object tracking method in the above-mentioned second aspect is realized.
  • the seventh aspect of the embodiments of the present disclosure provides a computer program product, the computer program product includes a computer program or an instruction, and when the computer program or instruction is run on an electronic device, the electronic device executes the above-mentioned first The target tracking method in one aspect, or the target tracking model training method in the second aspect above.
  • FIG. 1 is a schematic flowchart of a target tracking method provided by an embodiment of the present disclosure
  • FIG. 2 is a schematic framework diagram of a target tracking model provided by an embodiment of the present disclosure
  • FIG. 3 is a process schematic diagram of an information fusion process provided by an embodiment of the present disclosure.
  • Fig. 4A is a schematic diagram of a panorama segmented image provided by an embodiment of the present disclosure
  • FIG. 4B is another schematic diagram of a panorama segmented image provided by an embodiment of the present disclosure.
  • FIG. 5 is a schematic flow chart of object matching in the feature dimension provided by an embodiment of the present disclosure.
  • FIG. 6 is a schematic diagram of a process of object matching in the feature dimension provided by an embodiment of the present disclosure.
  • FIG. 7 is a schematic flow chart of object matching in spatial dimensions provided by an embodiment of the present disclosure.
  • FIG. 8 is a schematic diagram of a process of object matching in the spatial dimension provided by an embodiment of the present disclosure.
  • FIG. 9 is a schematic flowchart of a target tracking method provided by an embodiment of the present disclosure.
  • FIG. 10 is a schematic diagram of time consistency constraints provided by an embodiment of the present disclosure.
  • FIG. 11 is a schematic flowchart of a method for training a target tracking model provided by an embodiment of the present disclosure
  • Fig. 12 is a schematic framework diagram of a target tracking device provided by an embodiment of the present disclosure.
  • FIG. 13 is a schematic frame diagram of a training device for a target tracking model provided by an embodiment of the present disclosure
  • Fig. 14 is a schematic frame diagram of an electronic device provided by an embodiment of the present disclosure.
  • Fig. 15 is a schematic diagram of a computer-readable storage medium provided by an embodiment of the present disclosure.
  • system and “network” are often used interchangeably herein.
  • the term “and/or” in this article is just an association relationship describing associated objects, which means that there can be three relationships, for example, A and/or B can mean: A exists alone, A and B exist simultaneously, and there exists alone B these three situations.
  • the character "/” in this article generally indicates that the contextual objects are an “or” relationship.
  • “many” herein means two or more than two.
  • FIG. 1 is a schematic flowchart of a target tracking method provided by an embodiment of the present disclosure. Specifically, the following steps may be included:
  • Step S11 Carry out object segmentation on the first image and the second image respectively to obtain a first mask image of the first object in the first image and a second mask image of the second object in the second image.
  • the first image and the second image can be two consecutive frames of images in the captured video data; or, the first image and the second image can also be separated by several frames of images in the video data, which is not done here limited.
  • the first image may be obtained by shooting before the second image.
  • the first image can be marked as t- ⁇
  • the second image can be marked as t, where, when the first image and the second image are two adjacent frames of images, ⁇ is 1, and in the first In the case that the image and the second image are separated by one frame of image, ⁇ is 2, and so on, and examples are not given here.
  • the first image and the second image can be captured by electronic devices integrated with cameras such as smartphones and autonomous driving devices, and the frame rate and movement rate of the camera can be combined , to determine the number of image frames between the first image and the second image.
  • the faster the moving rate the greater the change between adjacent images, and the fewer the number of image frames at intervals.
  • the slower the moving rate the smaller the change between adjacent images, and the fewer the number of image frames at intervals. It can be more; or, the larger the frame rate, the smaller the change between adjacent images, and the more the number of image frames at intervals can be.
  • the smaller the frame rate the greater the change between adjacent images, and the interval of image frames The number can be less.
  • the first object in the first image may not be limited to one, for example, it may include one first object, two first objects, three first objects, etc., which is not limited here; similarly, the first The second object in the second image may not be limited to one, for example, may include one second object, two second objects, three second objects, etc., which is not limited here.
  • the aforementioned objects may include, but are not limited to: objects such as pedestrians, vehicles, and street signs. It should be noted that in the embodiments of the present disclosure, multiple objects of the same type cannot be counted as the same object, that is, even if multiple objects are objects of the same type, they need to be counted as multiple objects.
  • the image may contain two pedestrians, such as pedestrian A and pedestrian B, then pedestrian A and pedestrian B may be counted as two objects, or the image may contain three vehicles, such as vehicle A, vehicle B and vehicle C, then vehicle A, vehicle B and vehicle C can be counted as three objects, and so on, no more examples here.
  • the first object and the second object are foreground objects in the first image and the second image respectively, such as the aforementioned pedestrians, vehicles, street signs, and the like.
  • the image may also contain background objects, such as but not limited to: roads, sky, buildings, etc.
  • the mask image of the first background object in the first image and the mask image of the second background object in the second image can also be obtained, In order to subsequently mark each foreground object and background object on the image in combination with the mask image and tracking information.
  • pixel regions belonging to the same object may be marked with the same color and the like.
  • the pixel area of pedestrian A may be marked red in the first image, and the pixel area of pedestrian A may also be marked red in the second image.
  • Other situations can be deduced by analogy, and no more examples will be given here.
  • each first mask image has the same size as the first image
  • each second mask image has the same size as the second image.
  • the pixel value of the pixel contained in it indicates the possibility that the pixel corresponding to the pixel position in the first image belongs to the first object, for example Conversely, the greater the possibility, the greater the pixel value; conversely, the smaller the possibility, the smaller the pixel value; similarly, for the second mask image of each second object, the pixels contained in it
  • the pixel value of represents the possibility that the pixel corresponding to the pixel position in the second image belongs to the second object. For example, the greater the possibility, the greater the pixel value; otherwise, the smaller the possibility, the pixel value Also smaller.
  • the corresponding meaning of the above positions may specifically have the same pixel coordinates.
  • the pixel point at pixel coordinate (i, j) in the first mask image corresponds to the pixel point at pixel coordinate (i, j) in the first image; or, the pixel point at pixel coordinate (i, j) in the second mask image
  • the pixel at the coordinate (m, n) corresponds to the pixel at the pixel coordinate (m, n) in the second image.
  • the preset threshold can be set according to the actual situation. For example, when the pixel value has been normalized to a range of 0 to 1, the preset threshold can be set to 0.5, 0.6, etc., which is not limited here.
  • the pixel value when the pixel value is higher than the preset threshold, it can be considered that the pixel belongs to the object. On this basis, the pixel value can be further reset to the first value (such as 1 ), on the contrary, if the pixel value is not higher than the preset threshold, it can be considered that the pixel does not belong to the object, on this basis, the pixel value can be further reset to a second value (eg, 0).
  • the first mask image of each first object it may be checked whether the pixel value of the pixel contained therein is higher than a preset threshold, and if so, the pixel value may be reset to the first value, otherwise can be reset to the second value to update the first mask image of each first object; similarly, for the second mask image of each second object, it can be checked whether the pixel value of the pixel contained in it is If it is higher than the preset threshold, then the pixel value can be reset to the first value, otherwise it can be reset to the second value, so as to update the second mask image of each second object.
  • a target tracking model may be pre-trained, please refer to FIG. 2 , which is a schematic diagram of a frame of the target tracking model.
  • the target tracking model may include a target segmentation network, and then the first image and the second image may be respectively input into the target segmentation network to obtain a first mask image of each first object and a second mask image of each second object. film image.
  • sample images can be collected in advance, and the sample mask images of each sample object in the sample image can be obtained, and then the target segmentation network can be used to segment the sample image to obtain the predicted mask image of each sample object, so that based on the same
  • the difference between the sample mask image and the predicted mask image of the object adjusts the network parameters of the object segmentation network.
  • loss functions such as dice segmentation loss and position loss can be used to measure the difference between the sample mask image and the predicted mask image belonging to the same object, and the loss value of the target segmentation network can be obtained , and use optimization methods such as gradient descent to adjust the network parameters of the target segmentation network.
  • optimization methods such as gradient descent to adjust the network parameters of the target segmentation network.
  • the target segmentation network may include but not limited to instance segmentation networks such as Mask R-CNN, PointRend, Instance-sensitive FCN, etc., in This does not limit the network structure of the target segmentation network.
  • the target segmentation network may include but not limited to such as For panoptic segmentation networks such as PanopticFCN, the network structure of the target segmentation network is not limited here.
  • Step S12 Perform object matching in the feature dimension based on the first mask image and the second mask image to obtain the first matching information, and perform object matching in the spatial dimension based on the first mask image and the second mask image to obtain the second mask image Two matching information.
  • the first feature representations of each first object can be extracted based on the first mask images of each first object, and the second feature representations of each second object can be extracted respectively.
  • the mask image is extracted to obtain the second feature representation of each second object, and on this basis, the feature similarity between each first object and each second object is obtained by using the first feature representation and the second feature representation, And based on the feature similarity between each first object and each second object, first matching information is obtained.
  • the above method only needs to perform feature extraction on the mask image of each object, and then measure the feature similarity, which can reduce the difference between images in the feature dimension.
  • the complexity of object matching between objects is beneficial to improve the tracking speed.
  • pre-training An object tracking model includes a first matching network.
  • the first matching network may include several feature extraction layers (such as convolutional layers, fully connected layers, etc.) and a multi-layer perceptron, the first mask image of each first object and the second mask image of each second object After the membrane image is preprocessed, it can be input into the first matching network for processing.
  • feature extraction layers such as convolutional layers, fully connected layers, etc.
  • multi-layer perceptron the first mask image of each first object and the second mask image of each second object
  • the above-mentioned first objects and each second object can be collectively referred to as N objects
  • the first mask images of the above-mentioned first objects and the second mask images of each second object can be collectively referred to as N masks image.
  • N feature representations can be obtained, and further processed by the multi-layer perceptron, and an N*N matrix is output, and each row of the matrix represents N One of the objects, each column of the matrix represents one of the N objects, and the element in the i-th row and column j in the matrix represents the distance between the i-th object in the N objects and the j-th object in the N objects matching degree, the matching degree between each first object and each second object can be extracted from the matrix to obtain the first matching information.
  • the first feature representation and the second The second feature representation can be performed by the first matching network.
  • the first matching network can only include a small number of network layers such as convolutional layers and fully connected layers, thereby greatly reducing the amount of parameters.
  • the second image can be used to perform optical flow prediction on the first image to obtain the optical flow image of the first image, and based on the optical flow image, the first The mask image is shifted pixel by pixel to obtain the predicted mask image of the first object at the shooting moment of the second image, and based on the difference between the predicted mask image of each first object and the second mask image of each second object The degree of coincidence between them is used to obtain the second matching information.
  • pixel offset and coincidence degree measurement please refer to the relevant description in the following disclosed embodiments.
  • the above method on the one hand, can realize object matching based on pixel-level matching, which is conducive to greatly improving the tracking effect, especially for small-sized objects.
  • after pixel-by-pixel offset based on the optical flow image only the image coincidence degree
  • the matching information can be obtained, and the complexity of object matching between images in the spatial dimension can also be reduced, which is conducive to improving the tracking speed.
  • the difference between the first mask image of each first object and the second mask image of each second object may also be obtained first.
  • the first optimal displacement vector is shifted pixel by pixel
  • the first mask image has the maximum overlap with the second mask image
  • each first mask image is recorded
  • the second optimal displacement vector between the first image and the second image can be obtained.
  • the first image is shifted pixel by pixel by the second optimal displacement vector, it has the maximum Coincidence degree.
  • the vector similarity between each first optimal displacement vector and the second optimal displacement vector can be measured. It should be noted that the closer the first optimal displacement vector is to the second optimal displacement vector, the The larger the vector similarity is, on the contrary, the farther the first optimal displacement vector is from the second optimal displacement vector, the smaller the vector similarity is. Based on this, for each first object and each second object, the corresponding vector similarity and maximum coincidence degree can be weighted to obtain the matching degree between the two, that is, the second matching information can be obtained.
  • Step S13 Fusing the first matching information and the second matching information to obtain tracking information.
  • the first matching information may include the matching degree between each first object and each second object, which may be referred to as the first matching degree for the convenience of distinction; similarly, the second The matching information may include the matching degree between each first object and each second object, which may be referred to as a second matching degree for the convenience of distinction.
  • the first matching degree in the first matching information and the second matching degree in the second matching information can be weighted respectively by using the first preset weight and the second preset weight to obtain the first weighted matching information and the second matching information.
  • the first weighted matching information includes a first weighted matching degree between the first object and the second object
  • the second weighted matching information includes a second weighted matching degree between the first object and the second object.
  • the first weighted matching information and the second weighted matching information may be fused to obtain final matching information
  • the final matching information includes a final matching degree between the first object and the second object. That is to say, during the fusion process, the preset weights can be directly used to carry out weighted fusion on the matching degree.
  • adaptive weighting can be performed on the first matching degree in the first matching information to obtain the first weighted matching information, and the second matching degree in the second matching information Based on this, the first weighted matching information and the second weighted matching information are fused to obtain the final matching information, and the tracking information is obtained by analyzing based on the final matching information.
  • the fusion process of matching information by performing adaptive weighting on the first matching information and the second matching information respectively, the importance of the two can be adaptively measured according to the actual situation, and then the fusion can be performed on this basis , which is conducive to greatly improving the tracking accuracy.
  • a target tracking model in order to improve the efficiency of target tracking, can be pre-trained to process the first image and the second image through the target tracking model to obtain tracking information, and the target tracking model can include Information Fusion Network.
  • the information fusion network may further include a first weighting subnetwork and a second weighting subnetwork, the first weighting subnetwork is used to adaptively weight the first matching information, and the second weighting subnetwork is used to weight the first matching information Two matching information is adaptively weighted.
  • the first weighted subnetwork may include but not limited to a 1*1 convolutional layer
  • the second weighted subnetwork may include but not limited to a 1*1 convolutional layer.
  • both the first matching information and the second matching information may be represented by a matrix.
  • both the first matching information and the second matching information can be represented by an M*N matrix, and for the first matching
  • the i-th row and j-th column element in the matrix represent the first matching degree between the i-th first object and the j-th second object
  • the i-th row and j-th element in the matrix Elements in column j represent the second matching degree between the i-th first object and the j-th second object.
  • the first weighted matching information obtained after the first matching information is adaptively weighted can also be represented by an M*N matrix
  • the second weighted matching information obtained after the second matching information is adaptively weighted It can also be represented by a matrix of M*N, and the meaning represented by each element in the matrix can refer to the above related description.
  • the element in row i and column j in the matrix representing the first weighted matching information can be combined with the element representing the second weighted matching information
  • the elements in row i and column j in the matrix are added directly to obtain a matrix representing the final matching information. That is to say, for each group of first objects and second objects, the first weighted matching degree and the second weighted matching degree can be directly added to obtain the final matching degree.
  • the first image may contain two first objects, the first object A and the first object B
  • the second image may contain two second objects, the second object A and the second object B, then the final matching
  • the information can be represented by a 2*2 matrix.
  • the first row of the matrix represents the final matching degree between the first object A and the second object A and the second object B respectively
  • the second row of the matrix represents the first object B respectively.
  • the first column of the matrix represents the final matching degree between the second object A and the first object A and the first object B respectively.
  • the second column of the matrix Represents the final matching degree between the second object B and the first object A and the first object B respectively.
  • the tracking information may specifically include whether the first object and the second object are the same object.
  • each pair of first objects and each second object can be combined as the current object group, and based on at least one of the first reference information and the second reference information of the current object group, determine Whether the current first object and the current second object are the same object, and the current first object is the first object in the current object group, and the current second object is the second object in the current object group.
  • the first reference information includes: current The final matching degrees between the first object and each second object
  • the second reference information includes: the final matching degrees between the current second object and each first object.
  • the final matching degree can also be represented by a matrix
  • the first reference information can include all the elements of the matrix row representing the first current object in the matrix
  • the second reference information can include the elements in the matrix representing the second current object All elements of the matrix columns of the object.
  • the final matching degree between the current first object and the current second object can be used as the matching degree to be analyzed, and in response to the matching degree to be analyzed as the The maximum value of , to determine that the current first object and the current second object are the same object.
  • the aforementioned final matching information represented by a 2*2 matrix as an example
  • the element in the first row and the first column in the matrix is the maximum value in the first row of the matrix
  • the determination operation can be completed only by searching for the maximum value in the first reference information, which is beneficial to reduce the determination complexity and increase the determination speed.
  • the final matching degree between the current first object and the current second object can be used as the matching degree to be analyzed, and in response to the matching degree to be analyzed as the The maximum value of , to determine that the current first object and the current second object are the same object.
  • the aforementioned final matching information represented by a 2*2 matrix as an example
  • the element in the first row and the first column in the matrix is the maximum value in the first column of the matrix
  • the determination operation can be completed only by searching for the maximum value in the second reference information, which is beneficial to reduce the determination complexity and increase the determination speed.
  • the final matching degree between the current first object and the current second object can be used as the matching degree to be analyzed, and in response to the matching degree to be analyzed is The maximum value in the first reference information and the second reference information determines that the current first object and the current second object are the same object.
  • the maximum value in the first reference information and the second reference information determines that the current first object and the current second object are the same object.
  • the determination operation is completed by searching for the maximum value in the first reference information and the second reference information at the same time, and collaborative verification can be realized on the basis of the first reference information and the second reference information, so as to realize a pair of One matching constraint is beneficial to reduce the complexity of determination and improve the accuracy of determination.
  • the matching degree to be analyzed is the maximum value, it can be further detected whether the matching degree to be assigned is higher than the preset threshold, and if Then it can be determined that the current first object and the current second object are the same object; otherwise, the current first object and the current second object can be considered not to be the same object.
  • the requirements for tracking accuracy are relatively loose.
  • analysis is performed directly based on these feature representations to obtain tracking information.
  • the probability that it and each second object are predicted to be the same object can be obtained based on the feature similarity between its first feature representation and the second feature representation of each second object value, and based on each probability value, a second object that is the same object as the first object is obtained.
  • the tracking information is analyzed and obtained directly based on the feature similarity between the first feature representation of the first object and the second feature representation of the second object, which is beneficial to reduce tracking complexity.
  • the feature similarity between the first feature representation and the second feature representations of each second object can be normalized to obtain the prediction that the first object and each second object are the same object probability value.
  • the first object can be The first feature representation of the object is denoted as M(i), and the second feature representation of the jth second object can be denoted as N(j).
  • the i-th first The probability values that the object and each second object are predicted to be the same object can be expressed as:
  • x ⁇ t indicates that it belongs to each second object in the second image, and the superscript T indicates transposition.
  • each second object is marked with a serial number value
  • the first second object can be marked with the serial number value "1”
  • the second second object can be marked with the serial number value "2”
  • the expected value can be obtained based on the serial number value of the second object and the probability value corresponding to the second object, and the value after rounding the expected value is used as the target serial number value, and the second object to which the target serial number value belongs , considered to be the same object as the first object.
  • the target serial number value can be recorded as Then the target serial number value can be expressed as:
  • t ⁇ t means that the first object in the first image t ⁇ is matched to the second object in the second image t. It should be noted that the rounding up operation is not shown in the above formula (2). In the actual application process, since the expected value may be a decimal, in order to determine the value of the target serial number, the rounding up operation can be directly performed on the expected value.
  • FIG. 4A and FIG. 4B are two schematic diagrams of panorama segmented images respectively.
  • Figure 4A represents the panorama segmentation image corresponding to the first image in Figure 2
  • Figure 4B represents the panorama segmentation image corresponding to the second image in Figure 2
  • the two images in Figure 4A and Figure 4B correspond to Pixel areas of the same object can be represented with the same gray scale.
  • target segmentation is performed on the first image and the second image respectively to obtain the first mask image of the first object in the first image and the second mask image of the second object in the second image
  • first mask Object matching is performed on the film image and the second mask image in the feature dimension to obtain the first matching information
  • object matching is performed on the spatial dimension based on the first mask image and the second mask image to obtain the second matching information.
  • the first matching information and the second matching information are fused to obtain the tracking information, and the tracking information includes whether the first object and the object are the same object, that is, in the process of target tracking, on the one hand, between images in the feature dimension Object matching can help ensure the tracking effect of large-sized objects.
  • object matching between images in the spatial dimension can help ensure the tracking effect of small-sized objects, and based on this, the two matching methods are combined
  • the obtained matching information is obtained as tracking information, so both large-size objects and small-size objects can be taken into consideration, which is beneficial to improving target tracking accuracy.
  • Figure 5 is a schematic flow chart of object matching in the feature dimension, which may include the following steps:
  • Step S51 Based on the first mask image of each first object, extract the first feature representation of each first object, and respectively based on the second mask image of each second object, extract the first feature representation of each second object. Two features are represented.
  • the object boundary can be determined based on the pixel values of each pixel in the mask image, and the object boundary is the boundary of the object to which the mask image belongs, and a region image is cut out from the mask image along the object boundary, and based on the region image Feature extraction, to obtain the feature representation of the belonging object, and in the case of the mask image being the first mask image, the belonging object is the first object, the feature representation is the first feature representation, and the mask image is the second mask image In the case of , the belonging object is the second object, and the feature representation is the second feature representation.
  • the above method can eliminate the interference of pixels irrelevant to the object to which the mask image belongs during the feature extraction process, which is conducive to improving the accuracy of feature representation.
  • the pixel points belonging to the object have a pixel value higher than a preset threshold (eg, 0.5, 0.6, etc.), Or its pixel value is directly set to the first value (eg, 1), then the pixel whose pixel value is higher than the preset threshold (or the pixel value is the first value) can be used as the target pixel, and the surrounding target pixel A rectangular box that acts as its object bounds.
  • a preset threshold eg, 0.5, 0.6, etc.
  • FIG. 6 is a schematic diagram of an embodiment of the process of object matching in the feature dimension.
  • the size of the first mask image can be expressed as M*H*W
  • the first mask image The size of the second mask image can be expressed as N*H*W.
  • H is the height of the mask image
  • W is the width of the mask image.
  • a target tracking model in order to improve the efficiency of target tracking, can be pre-trained, and the target tracking model includes a first matching network, and the first matching network can specifically include a first extraction sub-network and a second The sub-network is extracted, and the first extraction sub-network is used to extract the first feature representation, and the second extraction sub-network is used to extract the second feature representation.
  • both the first extraction sub-network and the second extraction sub-network can include several fully connected layers (Fully Connection layer, FC), as shown in Figure 6, can include two layers of fully connected layers ( That is, 2*FC in Figure 6), the first feature representation of 1024 dimensions and the second feature representation of 1024 dimensions are obtained.
  • FC Full Connection layer
  • Step S52 Obtain the feature similarity between each first object and each second object by using the first feature representation and the second feature representation.
  • the first feature representation of the first object may be multiplied by the second feature representation of the second object to obtain the feature similarity between the two.
  • the elements at the corresponding positions of the two can be multiplied and accumulated to obtain the feature similarity.
  • Step S53 Obtain first matching information based on the feature similarity between each first object and each second object.
  • the first matching information can finally be expressed as an M*N matrix, and the matrix The element in row i and column j in represents the first matching degree between the i first object and the j second object.
  • the first feature representation of each first object is extracted based on the first mask image of each first object
  • the first feature representation of each second object is extracted based on the second mask image of each second object.
  • FIG. 7 is a schematic flow chart of object matching in the spatial dimension, which may include the following steps:
  • Step S71 Using the second image to perform optical flow prediction on the first image to obtain an optical flow image of the first image.
  • the optical flow image can be a two-channel image, wherein one channel image includes the offset value of each pixel point in the first image in the horizontal direction, and the other channel image includes the offset value of each pixel point in the first image in the vertical direction offset value.
  • the pixel in the first image can be shifted according to the offset value in the horizontal direction and the vertical direction respectively, and a pixel position can be obtained, and the pixel in the second image is located at The pixel of the position is theoretically still itself.
  • a pixel position can be obtained, and according to the pixel position found in the second image The pixel point of is still the topmost pixel point of the first object A.
  • Other situations can be deduced by analogy, and no more examples will be given here.
  • a target tracking model in order to improve the efficiency of target tracking, can be pre-trained, and the target tracking model can include an optical flow prediction network, and the optical flow prediction network can include but not limited to RAFT (Recurrent All -Pairs Field Transforms for Optical Flow), etc., the network structure of the optical flow prediction network is not limited here.
  • the first image and the second image can be input into the optical flow prediction network to obtain an optical flow image. It should be noted that for the working principle of the optical flow prediction network, you can refer to the technical details of the optical flow prediction network such as RAFT.
  • Step S72 Based on the optical flow image, perform a pixel-by-pixel shift on the first mask image of the first object to obtain a predicted mask image of the first object at the shooting moment of the second image.
  • the optical flow image and the first mask image can be multiplied pixel by pixel to obtain the offset value of the pixel in the first mask image, and the first pixel coordinate of the pixel in the first mask image can be compared with the offset
  • the shift value is added to obtain the second pixel coordinate of the pixel point at the shooting moment of the second image (that is, the predicted pixel coordinate at the above shooting moment), and based on the second pixel coordinate of the pixel point in the first mask image, the predicted mask image.
  • the pixel value of each pixel in the first mask image may be multiplied by the pixel value of a pixel at a corresponding position in the optical flow image to obtain an offset value of the pixel in the first mask image.
  • corresponding positions reference may be made to relevant descriptions in the aforementioned disclosed embodiments.
  • Each grid in the mask image represents each pixel.
  • the pixel value of the grid filled with grayscale in the first mask image is 1, then the first The mask image can be expressed as a matrix as:
  • the pixel values of each pixel in the optical flow image of the horizontal channel can be 0, and the pixel values of each pixel in the optical flow image of the vertical channel can be 1, then the above-mentioned first mask image and After multiplying the optical flow images of the channels in the lateral direction, the offset value of each pixel in the first mask image in the lateral direction can be obtained:
  • the offset values of each pixel in the first mask image in the horizontal direction and the vertical direction can be obtained, plus the offset value of each pixel in the first mask image
  • the first pixel coordinate of the pixel point at the time of shooting can be obtained.
  • the pixel at the first pixel coordinate (1,1) in the first mask image since its offset values in the horizontal and vertical directions are both 0, it is The pixel coordinate is still (1,1); or, for the first pixel coordinate (1,2) in the first mask image as an example, since its offset value in the horizontal direction is 0 and the offset value in the vertical direction is 1, so its second pixel coordinate at the time of shooting is (1,3), and other pixels can be deduced in the same way, and no more examples are given here.
  • a predicted mask image as shown in the example of the mask image in FIG. 8 can be obtained.
  • Step S73 Obtain second matching information based on the degree of overlap between the predicted mask image of each first object and the second mask image of each second object.
  • the dice coefficient can be used to calculate the degree of overlap between the predicted mask image of the first object and the second mask image of the second object, and use the degree of overlap as the second value between the first object and the second object.
  • the matching degree can be regarded as obtaining the second matching information after obtaining the second matching degree between any first object and any second object.
  • the total number of pixels in the predicted mask image can be recorded as N, then the total number of pixels in the second mask image can also be recorded as N, and the i-th pixel in the predicted mask image
  • the pixel value of the pixel point can be recorded as p i
  • the pixel value of the i-th pixel point in the second mask image can be recorded as g i
  • the coincidence degree between the predicted mask image and the second mask image can be expressed as :
  • sim pos represents the coincidence degree. Taking the predicted mask image and the second mask image shown in Figure 8 as an example, the coincidence degree between the two is calculated as 3/8 by the above formula (6) , which is the Intersection over Union (IoU) between two mask images.
  • IoU Intersection over Union
  • the second matching information may also be represented by a matrix.
  • the second matching information can be represented by an M*N matrix, and the i-th object in the matrix
  • the element in the jth column of the row represents the second matching degree between the i-th first object and the j-th second object.
  • the second image is used to predict the optical flow of the first image to obtain the optical flow image of the first image, and based on the optical flow image, the first mask image of the first object is shifted pixel by pixel to obtain the first
  • the second matching information is obtained based on the predicted mask image of the object at the shooting moment of the second image and the degree of overlap between the predicted mask image of each first object and the second mask image of each second object, namely
  • object matching can be realized based on pixel-level matching, which is conducive to greatly improving the tracking effect, especially for small-sized objects.
  • After the pixel offset it is only necessary to measure the image coincidence to obtain the matching information, which can also reduce the complexity of object matching between images in the spatial dimension, which is conducive to improving the tracking speed.
  • FIG. 9 is a schematic flowchart of a target tracking method provided by an embodiment of the present disclosure, which may include the following steps:
  • Step S91 Carry out object segmentation on the first image and the second image respectively to obtain a first mask image of the first object in the first image and a second mask image of the second object in the second image.
  • Step S92 Perform object matching in the feature dimension based on the first mask image and the second mask image to obtain the first matching information, and perform object matching in the spatial dimension based on the first mask image and the second mask image to obtain the second Two matching information.
  • Step S93 Fusing the first matching information and the second matching information to obtain tracking information.
  • the tracking information includes whether the first object and the second object are the same object, and reference may be made to relevant descriptions in the aforementioned embodiments of the disclosure.
  • Step S94 in response to the tracking information meeting the preset condition, using the tracking information as the first tracking information, and acquiring a third image.
  • the third image, the first image and the second image are successively captured respectively.
  • the third image can be recorded as t- ⁇
  • the first image can be recorded as t
  • the second image It can be recorded as t+ ⁇ .
  • the preset condition may include: a target object exists in the second image. It should be noted that the target object is not the same object as any first object. At this time, the target object may be a new object that appears in the second image, or it may be blocked in the first image and appear in the second image. The middle occlusion disappears, so that no matching can be obtained when matching the first object in the second image, so further verification can be performed through the following verification process.
  • the above method can greatly alleviate the impact of object disappearance and occlusion on tracking accuracy through timing consistency verification, which is conducive to improving tracking accuracy.
  • the subsequent verification is triggered only when the second object that is not successfully matched appears in the second image, as another example in the actual application process.
  • the preset condition may also be set to be empty, that is, no additional condition is set for triggering the verification, and subsequent verification is triggered as long as tracking information is obtained.
  • Step S95 Perform target tracking based on the third image and the second image to obtain second tracking information.
  • the second tracking information includes whether the second object and the third object in the third image are the same object, and for the process of object tracking, please refer to any of the foregoing object tracking method embodiments.
  • Step S96 Perform consistency verification based on the first tracking information and the second tracking information, and obtain a verification result.
  • the same object in different images may have the same object identifier
  • the target object may be analyzed based on the second tracking information to obtain an analysis result
  • the object identifier of the reference object is used as The object identifier of the target object
  • the reference object is one of the third objects, that is, if there is an unmatched target object in the second image, if a third object is successfully matched in the third image,
  • the third object can be regarded as a reference object
  • the object identifier of the reference object is used as the object identifier of the target object, that is, the target object and the reference object are determined to be the same object;
  • the target object and the first object can also be included Any third object in the three images is not the same object, then a new object identifier can be marked for the target object, that is to say, if there is an unmatched target object in the second image, if in the third image
  • the above method can deal with the complex situation of reappearing after the disappearance of the object due to object occlusion, object deformation and other reasons through timing consistency verification, and verify according to the actual situation, which is conducive to improving the tracking effect of target tracking in complex situations.
  • the above verification operation can be used to constrain the tracking consistency between multiple frames of images.
  • you can use represents a differentiable operation Among them, s and t represent the time step, and the above-mentioned differentiable operation An object p in image x s used to measure time step s (ie ) and an object p in the image x t of time step t (ie ) between similarities.
  • differentiable operations can be implemented from image t- ⁇ to image t, and from image t to image t+ ⁇ From this the timing consistency can be established as follows:
  • FIG. 10 is a schematic diagram of an embodiment of a time consistency constraint.
  • the car in the dotted frame in the first image t is occluded by pedestrians, and is mistakenly segmented as a pedestrian, resulting in the loss of its true segmentation. Therefore, when tracking is performed based on the third image t ⁇ and the first image t, or the first image t and the second image t+ ⁇ , the object tracking of the car will fail.
  • this limitation can be solved by conducting based on the relationship between the third image t- ⁇ and the second image t+ ⁇ .
  • the matching information can be obtained by tracking the second image t+ ⁇ and the third image t- ⁇ , that is, each of the cars in the second image t+ ⁇
  • the matching degree between the object and each object in the third image t- ⁇ on this basis, if the matching degree between the car in the second image t+ ⁇ and an object in the third image t- ⁇ is higher than the preset
  • the threshold is set, it can be considered that the car in the second image t+ ⁇ and the object in the third image t- ⁇ are the same object, and the car in the second image t+ ⁇ is marked with the object in the third image t- ⁇
  • the object identifier on the other hand, can be marked with a new object identifier for the car in the second image t+ ⁇ .
  • the first image and the second image are respectively taken successively obtained, and based on the third image and the second image for target tracking, the second tracking information is obtained, and the second tracking information includes whether the second object and the third object in the third image are the same object, on this basis, Then, the consistency check is performed based on the first tracking information and the second tracking information, and the check result is obtained, so the inconsistency in the timing of the target tracking can be greatly reduced, which is conducive to further improving the tracking accuracy.
  • FIG. 11 is a schematic flowchart of a training method for a target tracking model in an embodiment of the present disclosure, which may include the following steps:
  • Step S111 Acquiring a first sample mask image of a first sample object in a first sample image, a second sample mask image of a second sample object in a second sample image, and sample tracking information.
  • the sample tracking information includes whether the first sample object and the second sample object are actually the same object. For example, when the first sample object and the second sample object are actually the same object, it can Mark as the first value (for example, 1), otherwise, if the first sample object and the second sample object are not actually the same object, it may be marked as the second value (for example, 0).
  • the meanings of the first sample mask image and the second sample mask image reference may be made to the related descriptions about the first mask image and the second mask image in the aforementioned disclosed embodiments.
  • the target tracking model may include a target segmentation network, and its network structure may refer to the relevant descriptions in the aforementioned disclosed embodiments.
  • the target segmentation model can be used to perform target segmentation on the first sample image and the second sample image respectively to obtain the first sample mask image and the second sample mask image.
  • the target segmentation network before the overall training of the target tracking network, the target segmentation network can be trained to converge.
  • the technical details of segmentation networks such as Mask R-CNN, PointRend, and Instance-sensitive FCN. .
  • Step S112 The first matching network based on the target tracking model performs object matching on the first sample mask image and the second sample mask image in the feature dimension to obtain the first predicted matching information, and performs the second matching based on the target tracking model The network performs object matching on the first sample mask image and the second sample mask image in the spatial dimension to obtain second predicted matching information.
  • the first matching network before the overall training of the target tracking model, the first matching network may be trained to convergence, that is, the first matching network has completed training before the overall training of the target tracking model. It should be noted that, in this case, the aforementioned target segmentation network has been trained before training the first matching network.
  • feature extraction can be performed on the first sample mask image of the first sample object based on the first extraction sub-network of the first matching network to obtain the first sample The first sample feature representation of the object, and performing feature extraction on the second sample mask image of the second sample object based on the second extraction sub-network of the first matching network, to obtain the second sample feature representation of the second sample object,
  • the first sample object and each The second sample object is predicted to be the predicted probability value of the same object, and based on the expected value of each predicted probability value, the predicted matching object of the first sample object and the relationship between the predicted matching object and the actual matching object of the first sample object are obtained
  • the difference between the sub-losses corresponding to the first sample object is obtained, and the predicted matching object is the second sample object predicted to be the same object as the first sample object, and the actual matching object is actually the same object as
  • the feature similarity can be normalized to obtain a predicted probability value, and the normalization operation can be implemented through softmax. Further, based on the serial number value of the second sample object and the predicted probability value corresponding to the second sample object, the expected value can be obtained, and the value after rounding up the expected value can be used as the target serial number value, and the second Sample object, as the predicted matching object of the first sample object.
  • a loss function such as cross-entropy can be used to calculate the sub-loss.
  • the sub-loss can be expressed as:
  • y is used to mark whether the predicted matching object is the same as the actual matching object of the first sample object.
  • y can be set to 1, and in the different case, y can be set to is 0, and p represents the predicted probability value corresponding to the aforementioned predicted matching object.
  • the sub-losses corresponding to these M first sample objects can be averaged to obtain the total loss of the first matching network
  • optimization methods such as gradient descent can be used to adjust the network parameters of the first matching network.
  • optimization methods such as gradient descent .
  • the second matching network may include an optical flow prediction network, configured to use the second sample image to perform optical flow prediction on the first sample image to obtain a sample optical flow image of the first sample image, and the second The sample matching information is obtained based on the sample optical flow image, and reference may be made to related descriptions about the optical flow image and the second matching information in the aforementioned disclosed embodiments.
  • Step S113 Using the information fusion network of the target tracking model to fuse the first predicted matching information and the second predicted matching information to obtain predicted tracking information.
  • the prediction tracking information includes whether the first sample object and the second sample object are predicted to be the same object, and the process of information fusion can refer to the relevant descriptions in the aforementioned disclosed embodiments.
  • Step S114 Based on the difference between the sample tracking information and the predicted tracking information, adjust the network parameters of the target tracking model.
  • loss functions such as cross-entropy can be used to process the difference between sample tracking information and predicted tracking information to obtain the total loss of the target tracking model, and then adjust the network parameters of the target tracking model based on optimization methods such as gradient descent.
  • optimization methods such as gradient descent.
  • the above-mentioned target segmentation network, the first matching network, and the second matching network have all been trained and converged, so the aforementioned target can be fixed during the process of adjusting the network parameters of the target tracking model.
  • the network parameters of the split network, the first matching network, and the second matching network only adjust the network parameters of the information fusion network.
  • the network parameters of each network can also be adjusted at the same time, which is not limited here.
  • the above solution performs object matching between images in the feature dimension, which can help to ensure the tracking effect of large-sized objects;
  • the tracking effect of large-size objects and based on this, the matching information obtained by fusing the two matching methods to obtain tracking information, so it can take into account both large-size objects and small-size objects, which is conducive to improving the accuracy of the target tracking model.
  • FIG. 12 is a schematic frame diagram of the target tracking device 120 .
  • the target tracking device 120 includes: a target segmentation part 121, an object matching part 122 and an information fusion part 123.
  • the target segmentation part 121 is configured to perform target segmentation on the first image and the second image respectively to obtain the first object in the first image
  • the object matching part 122 is configured to perform object matching in the feature dimension based on the first mask image and the second mask image, and obtain the first a matching information, and perform object matching in the spatial dimension based on the first mask image and the second mask image to obtain the second matching information
  • the information fusion part 123 is configured to fuse the first matching information and the second matching information to obtain Tracking information; wherein, the tracking information includes whether the first object and the second object are the same object.
  • the above solution performs object matching between images in the feature dimension, which can help to ensure the tracking effect of large-sized objects; Based on the tracking effect of large-size objects, and based on the fusion of the matching information obtained by the two matching methods to obtain tracking information, it can take into account both large-size objects and small-size objects, which is conducive to improving the accuracy of target tracking.
  • the object matching part 122 includes a feature extraction subsection configured to extract the first feature representations of each first object based on the first mask image of each first object, and respectively based on the first mask images of each first object The second mask image of the two objects is extracted to obtain the second feature representation of each second object; the object matching part 122 includes a similarity measure subsection configured to use the first feature representation and the second feature representation to obtain each first object Feature similarity with each second object; the object matching part 122 includes a first matching subsection configured to obtain first matching information based on feature similarity between each first object and each second object.
  • the feature extraction subsection includes a boundary determination section configured to determine the boundary of the object based on the pixel values of each pixel in the mask image; wherein, the boundary of the object is the boundary of the object to which the mask image belongs;
  • the feature extraction The sub-part includes an image cropping part, which is configured to cut out a region image from the mask image along the object boundary;
  • the feature extraction sub-part includes a representation extraction part, which is configured to perform feature extraction based on the region image, and obtain a feature representation of the object to which it belongs; wherein , when the mask image is the first mask image, the belonging object is the first object, and the feature representation is the first feature representation; when the mask image is the second mask image, the belonging object is the second object , the feature representation is the second feature representation.
  • the object matching part 122 includes an optical flow prediction subsection configured to use the second image to perform optical flow prediction on the first image to obtain an optical flow image of the first image; the object matching part 122 includes a pixel offset The shifting part is configured to shift the first mask image of the first object pixel by pixel based on the optical flow image, so as to obtain the predicted mask image of the first object at the shooting moment of the second image; the object matching part 122 includes The second matching subsection is configured to obtain second matching information based on the degree of overlap between the predicted mask image of each first object and the second mask image of each second object.
  • object matching can be achieved based on pixel-level matching, which is conducive to greatly improving the tracking effect, especially for small-sized objects.
  • pixel-level matching based on optical flow After the image is shifted pixel by pixel, it only needs to measure the image coincidence to obtain the matching information, and it can also reduce the complexity of object matching between images in the spatial dimension, which is conducive to improving the tracking speed.
  • the pixel offset subsection includes a pixel multiplication section configured to multiply the optical flow image and the first mask image pixel by pixel to obtain the offset value of the pixel in the first mask image
  • the pixel offset subsection includes a pixel addition section configured to add the first pixel coordinates of the pixel in the first mask image to the offset value to obtain the second pixel coordinates of the pixel at the shooting moment; the pixel offset
  • the shifting part includes an image acquisition part configured to obtain a predicted mask image based on the second pixel coordinates of the pixels in the first mask image.
  • the first matching information includes a first matching degree between the first object and the second object
  • the second matching information includes a second matching degree between the first object and the second object
  • the information fusion part 123 includes a weighting subpart configured to adaptively weight the first matching degree in the first matching information to obtain the first weighted matching information, and perform adaptive weighting to the second matching degree in the second matching information to obtain the second matching information Two weighted matching information; wherein, the first weighted matching information includes a first weighted matching degree between the first object and the second object, and the second weighted matching information includes a second weighted matching degree between the first object and the second object ;
  • the information fusion part 123 includes a fusion subsection configured to fuse the first weighted matching information and the second weighted matching information to obtain final matching information; wherein, the final matching information includes the final matching information between the first object and the second object Matching degree; the information fusion part 123 includes an analysis sub-part configured to analyze based on the final matching information to obtain tracking information.
  • the importance of the two can be adaptively measured according to the actual situation, and then fusion is performed on this basis. It is beneficial to greatly improve the tracking accuracy.
  • the tracking information is obtained by using a target tracking model to detect the first image and the second image
  • the target tracking model includes an information fusion network
  • the information fusion network includes a first weighted sub-network and a second weighted sub-network
  • the first weighting subnetwork is used to adaptively weight the first matching degree
  • the second weighting subnetwork is used to adaptively weight the second matching degree.
  • the neural network can be used to learn the importance of the feature dimension and the spatial dimension to target tracking according to the actual situation, which is conducive to improving the efficiency and accuracy of adaptive weighting.
  • the analysis subsection includes a combination section configured to combine pairs of each first object and each second object as a current object group respectively; the analysis subsection includes a determination section configured to be based on the current At least one of the first reference information and the second reference information of the object group determines whether the current first object and the current second object are the same object; wherein, the current first object is the first object in the current object group, The current second object is the second object in the current object group, the first reference information includes: the final matching degree between the current first object and each second object, and the second reference information includes: the current second object and each second object respectively The final degree of matching between the first objects.
  • the analysis sub-part includes a selection part configured to use the final matching degree between the current first object and the current second object as the matching degree to be analyzed; the determining part is also configured to perform any of the following: or: in response to the matching degree to be analyzed is the maximum value in the first reference information, determine that the current first object and the current second object are the same object; in response to the matching degree to be analyzed is the maximum value in the second reference information, determine that the current The first object and the current second object are the same object; in response to the matching degree to be analyzed being the maximum value of the first reference information and the second reference information, it is determined that the current first object and the current second object are the same object.
  • the first two determination methods only need to search for the maximum value in the first reference information or the second reference information to complete the determination operation, which is beneficial to reduce the determination complexity and improve the determination speed; on the other hand, through the final A determination method searches the maximum value of the first reference information and the second reference information at the same time to complete the determination operation, and can realize collaborative verification on the basis of the first reference information and the second reference information, which is beneficial to reduce the complexity of determination, and Improve determination accuracy.
  • the object tracking device 120 further includes a condition response part configured to use the tracking information as the first tracking information and acquire a third image in response to the tracking information meeting a preset condition; wherein, the third image, The first image and the second image are successively photographed respectively; the target tracking device 120 also includes a repeating tracking part configured to perform target tracking based on the third image and the second image to obtain second tracking information, wherein the second tracking The information includes whether the second object and the third object in the third image are the same object; the target tracking device 120 also includes an information checking part configured to perform a consistency check based on the first tracking information and the second tracking information, and obtain Check result.
  • the timing inconsistency in target tracking can be greatly reduced, which is conducive to further improving the tracking accuracy.
  • the preset condition includes: a target object exists in the second image; wherein, the target object is not the same object as any first object.
  • the preset condition is set to the fact that there is no target object in the second image, and the target object is not the same object as any first object, so it can pass the timing consistency check, greatly reducing the impact of object disappearance, occlusion, etc.
  • the impact of accuracy is beneficial to improve tracking accuracy.
  • the same object in different images has the same object identifier
  • the information checking part includes an information analysis subsection configured to analyze the target object based on the second tracking information to obtain an analysis result;
  • the information checking part includes The first response subpart is configured to use the object identifier of the reference object as the object identifier of the target object in response to the fact that the analysis result includes that the target object and the reference object are the same object; wherein, the reference object is one of the third objects; information verification
  • the section includes a second response subsection configured to mark the target object with a new object identification in response to the analysis result including that the target object is not the same object as any third object in the third image.
  • FIG. 13 is a schematic frame diagram of an object tracking model training device 130 provided by an embodiment of the present disclosure.
  • the training device 130 of the target tracking model includes: a sample acquisition part 131, a sample matching part 132, a sample fusion part 133 and a parameter adjustment part 134, the sample acquisition part 131 is configured to obtain the first sample object in the first sample image The first sample mask image, the second sample mask image of the second sample object in the second sample image, and sample tracking information; wherein the sample tracking information includes whether the first sample object and the second sample object are actually the same object
  • the sample matching part 132 is configured to perform object matching on the feature dimension of the first sample mask image and the second sample mask image based on the first matching network of the target tracking model to obtain the first predicted matching information, and based on the target
  • the second matching network of the tracking model performs object matching on the first sample mask image and the second sample mask image in the spatial dimension to obtain the second predicted matching information;
  • the sample fusion part 133 is configured to use the information of the target tracking model
  • the above solution performs object matching between images in the feature dimension, which can help to ensure the tracking effect of large-sized objects;
  • the tracking effect of large-size objects and based on this, the matching information obtained by fusing the two matching methods to obtain tracking information, so it can take into account both large-size objects and small-size objects, which is conducive to improving the accuracy of the target tracking model.
  • the first matching network has completed training before the overall training of the target tracking model
  • the training device 130 of the target tracking model further includes a sample feature extraction part configured as a first extraction sub-network based on the first matching network
  • Feature extraction is performed on the first sample mask image of the first sample object to obtain the first sample feature representation of the first sample object, and based on the second extraction sub-network of the first matching network for the second sample object
  • the second sample mask image is subjected to feature extraction to obtain the second sample feature representation of the second sample object
  • the training device 130 of the target tracking model also includes a sub-loss calculation part configured to, for each first sample object, based on the first The feature similarity between the first sample feature representation of the sample object and each second sample feature representation is obtained to obtain the predicted probability value that the first sample object and each second sample object are predicted to be the same object, and based on each The expected value of the predicted probability value, the predicted matching object of the first sample object is obtained, and the sub-loss corresponding to the first sample object is obtained based
  • the first matching network is trained first in the overall training target tracking model, which is beneficial to improve the training efficiency of the target tracking model; loss, which enables the first matching network to learn feature representations during training through differentiable matching.
  • the sub-loss calculation section includes a normalization subsection; or, the sub-loss calculation section includes an expectation calculation subsection, a sequence number determination subsection, and an object prediction subsection; or, the sub-loss calculation section includes a normalization subsection subsection, expectation calculation subsection, serial number determination subsection and object prediction subsection; the normalization subsection is configured to normalize the feature similarity to obtain the predicted probability value; the sub loss calculation section includes the expectation calculation subsection , is configured to obtain the expected value based on the serial number value of the second sample object and the predicted probability value corresponding to the second sample object; the sub-loss calculation part includes a serial number determination sub-section, which is configured to take the value after rounding the expected value as the target The serial number value; the sub-loss calculation part includes an object prediction subsection configured to use the second sample object to which the target serial number value belongs as the predicted matching object of the first sample object.
  • the operation determines the predicted matching object, which is beneficial to greatly reduce the complexity of determining the predicted matching object.
  • the target tracking model further includes a target segmentation network; or, the second matching network includes an optical flow prediction network; or, the target tracking model further includes a target segmentation network, and the second matching network includes an optical flow prediction network;
  • the first sample mask image and the second sample mask image are obtained by using the target segmentation network to perform target segmentation on the first sample image and the second sample image respectively, and the target segmentation network has been completed before training the first matching network Training;
  • the optical flow prediction network is used to perform optical flow prediction on the first sample image by using the second sample image to obtain a sample optical flow image of the first sample image, and the second sample matching information is obtained based on the sample optical flow image of.
  • the target tracking model also includes a target segmentation network.
  • the first sample mask image and the second sample mask image are obtained by using the target segmentation network to perform target segmentation on the first sample image and the second sample image respectively, and the target The segmentation network has been trained before training the first matching network, so by training the target segmentation network in stages, the target tracking model can be trained step by step, which is conducive to improving the training efficiency and effect; while the second matching network includes the optical flow prediction network , which is used to predict the optical flow of the first sample image by using the second sample image to obtain the sample optical flow image of the first sample image, and the matching information of the second sample is obtained based on the sample optical flow image, which is beneficial to improve the optical flow Accuracy and efficiency of stream forecasting.
  • a "part" may be a part of a circuit, a part of a processor, a part of a program or software, etc., of course it may also be a unit, a module or a non-modular one.
  • FIG. 14 is a schematic frame diagram of an electronic device 140 provided by an embodiment of the present disclosure.
  • the electronic device 140 includes a memory 141 and a processor 142 coupled to each other, and the processor 142 is configured to execute the program instructions stored in the memory 141, so as to realize the steps of any of the above object tracking method embodiments, or to realize any of the above object tracking The steps of the embodiment of the training method of the model.
  • the electronic device 140 may include, but is not limited to: a microcomputer and a server.
  • the electronic device 140 may also include mobile devices such as notebook computers and tablet computers, which are not limited here.
  • the processor 142 is configured to control itself and the memory 141 to implement the steps of any of the above object tracking method embodiments, or to implement the steps of any of the above object tracking model training method embodiments.
  • the processor 142 may also be called a CPU (Central Processing Unit, central processing unit).
  • the processor 142 may be an integrated circuit chip with signal processing capability.
  • the processor 142 can also be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), a field-programmable gate array (Field-Programmable Gate Array, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the processor 142 may be jointly realized by an integrated circuit chip.
  • the above solution performs object matching between images in the feature dimension, which can help to ensure the tracking effect of large-sized objects; tracking effect, and based on the fusion of the matching information obtained by the two matching methods to obtain tracking information, it can take into account both large-size objects and small-size objects, which is conducive to improving the accuracy of target tracking.
  • FIG. 15 is a schematic frame diagram of a computer-readable storage medium 150 provided by an embodiment of the present disclosure.
  • the computer-readable storage medium 150 stores program instructions 151 that can be executed by the processor, and the program instructions 151 are used to implement the steps of any of the above-mentioned object tracking method embodiments, or realize the steps of any of the above-mentioned object tracking model training method embodiments. .
  • the above solution performs object matching between images in the feature dimension, which can help to ensure the tracking effect of large-sized objects; Based on the tracking effect of large-size objects, and based on the fusion of the matching information obtained by the two matching methods to obtain tracking information, it can take into account both large-size objects and small-size objects, which is conducive to improving the accuracy of target tracking.
  • An embodiment of the present disclosure also provides a computer program product, the computer program product includes a computer program or an instruction, and when the computer program or instruction is run on an electronic device, the electronic device is made to perform any of the above objectives.
  • a computer program may take the form of a program, software, software module, script, or code in any form of programming language (including compiled or interpreted languages, or declarative or procedural languages) ) and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
  • This disclosure relates to the field of augmented reality.
  • acquiring the image information of the target object in the real environment and then using various visual correlation algorithms to detect or identify the relevant features, states and attributes of the target object, and thus obtain the image information that matches the specific application.
  • AR effect combining virtual and reality.
  • the target object may involve faces, limbs, gestures, actions, etc. related to the human body, or markers and markers related to objects, or sand tables, display areas or display items related to venues or places.
  • Vision-related algorithms can involve visual positioning, SLAM, 3D reconstruction, image registration, background segmentation, object key point extraction and tracking, object pose or depth detection, etc.
  • Specific applications can not only involve interactive scenes such as guided tours, navigation, explanations, reconstructions, virtual effect overlays and display related to real scenes or objects, but also special effects processing related to people, such as makeup beautification, body beautification, special effect display, virtual Interactive scenarios such as model display.
  • the relevant features, states and attributes of the target object can be detected or identified through the convolutional neural network.
  • the above-mentioned convolutional neural network is a network model obtained by performing model training based on a deep learning framework.
  • the disclosed methods and devices may be implemented in other ways.
  • the device implementation described above is only illustrative, for example, the division of parts is only a logical function division, and there may be other division methods in actual implementation, for example, units or components can be combined or integrated into another A system, or some feature, can be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • a unit described as a separate component may or may not be physically separated, and a component shown as a unit may or may not be a physical unit, that is, it may be located in one place, or may also be distributed to network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
  • the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of the present disclosure is essentially or part of the contribution to the prior art, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) execute all or part of the steps of the methods in various embodiments of the present disclosure.
  • the aforementioned storage medium may be a tangible device capable of holding and storing instructions used by the instruction execution device, and may be a volatile storage medium or a non-volatile storage medium.
  • a computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • Computer-readable storage media include: portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), or flash memory), static random access memory (SRAM), compact disc read only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanically encoded device, such as a printer with instructions stored thereon A hole card or a raised structure in a groove, and any suitable combination of the above.
  • RAM random access memory
  • ROM read-only memory
  • EPROM erasable programmable read-only memory
  • flash memory static random access memory
  • SRAM static random access memory
  • CD-ROM compact disc read only memory
  • DVD digital versatile disc
  • memory stick floppy disk
  • mechanically encoded device such as a printer with instructions stored thereon
  • a hole card or a raised structure in a groove and any suitable combination of the above.
  • computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., pulses of light through fiber optic cables), or transmitted electrical signals.
  • Embodiments of the present disclosure provide a target tracking and related model training method, device, device, medium, and computer program product, wherein the target tracking method includes: performing target segmentation on the first image and the second image respectively to obtain the first A first mask image of the first object in the image and a second mask image of the second object in the second image; performing object matching in feature dimensions based on the first mask image and the second mask image to obtain first matching information , and perform object matching in the spatial dimension based on the first mask image and the second mask image to obtain the second matching information; fuse the first matching information and the second matching information to obtain tracking information; wherein, the tracking information includes the first object Whether it is the same object as the second object.
  • the above solution can improve the target tracking accuracy.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

Disclosed in the embodiments of the present disclosure are a target tracking method and apparatus, a training method and apparatus for a model related thereto, and a device, a medium and a computer program product. The target tracking method comprises: respectively performing target segmentation on a first image and a second image, so as to obtain a first mask image of a first object in the first image and a second mask image of a second object in the second image; performing object matching in terms of a feature dimension on the basis of the first mask image and the second mask image, so as to obtain first matching information, and performing object matching in terms of a spatial dimension on the basis of the first mask image and the second mask image, so as to obtain second matching information; and fusing the first matching information and the second matching information, so as to obtain tracking information, wherein the tracking information comprises information regarding whether the first object and the second object are the same object.

Description

目标跟踪及相关模型的训练方法、装置、设备、介质、计算机程序产品Object tracking and related model training method, device, equipment, medium, computer program product
相关申请的交叉引用Cross References to Related Applications
本公开实施例基于申请号为202111424075.9、申请日为2021年11月26日、申请名称为“目标跟踪及相关模型的训练方法和相关装置、设备、介质”的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本公开作为参考。The embodiment of the present disclosure is based on the Chinese patent application with the application number 202111424075.9, the application date is November 26, 2021, and the application name is "Target Tracking and Related Model Training Method and Related Devices, Equipment, and Media", and requires the Chinese The priority of the patent application, the entire content of the Chinese patent application is hereby incorporated by reference into this disclosure.
技术领域technical field
本公开涉及但不限于图像处理技术领域,特别是涉及一种目标跟踪及相关模型的训练方法、装置、设备、介质、计算机程序产品。The present disclosure relates to but is not limited to the technical field of image processing, and in particular relates to a method, device, device, medium, and computer program product for training a target tracking and related model.
背景技术Background technique
目标跟踪技术广泛使用于众多应用场景中。以视频全景分割(Video Panoptic Segmentation,VPS)为例,不仅要求生成帧间一致的全景分割,还要求对所有像素实现帧间跟踪,从而提升自动驾驶、视频监控、视频编辑等诸多技术的实现效果。Object tracking technology is widely used in many application scenarios. Taking video panoptic segmentation (Video Panoptic Segmentation, VPS) as an example, it is not only required to generate consistent panoramic segmentation between frames, but also to achieve inter-frame tracking for all pixels, so as to improve the realization effect of many technologies such as automatic driving, video surveillance, and video editing. .
目前,现有的目标跟踪方式在跟踪精度方面仍面临着不少问题,如跟踪丢失等,从而在目标跟踪应用于上述自动驾驶、视频监控、视频剪辑等技术时严重影响其实现效果。有鉴于此,如何提升目标跟踪精度成为亟待解决的问题。At present, the existing target tracking methods still face many problems in terms of tracking accuracy, such as tracking loss, which seriously affects the implementation effect of target tracking when it is applied to the above-mentioned technologies such as automatic driving, video surveillance, and video editing. In view of this, how to improve the target tracking accuracy has become an urgent problem to be solved.
发明内容Contents of the invention
本公开实施例提供一种目标跟踪及相关模型的训练方法、装置、设备、介质、计算机程序产品。Embodiments of the present disclosure provide a target tracking and related model training method, device, device, medium, and computer program product.
本公开实施例第一方面提供了一种目标跟踪方法,包括:分别对第一图像和第二图像进行目标分割,得到第一图像中第一对象的第一掩膜图像和第二图像中第二对象的第二掩膜图像;基于第一掩膜图像和第二掩膜图像在特征维度进行对象匹配,得到第一匹配信息,并基于第一掩膜图像和第二掩膜图像在空间维度进行对象匹配,得到第二匹配信息;融合第一匹配信息和第二匹配信息,得到跟踪信息;其中,跟踪信息包括第一对象与第二对象是否为同一对象。The first aspect of the embodiments of the present disclosure provides an object tracking method, including: separately performing object segmentation on the first image and the second image, and obtaining the first mask image of the first object in the first image and the first mask image of the first object in the second image. The second mask image of the two objects; performing object matching in the feature dimension based on the first mask image and the second mask image to obtain the first matching information, and based on the first mask image and the second mask image in the spatial dimension performing object matching to obtain second matching information; fusing the first matching information and the second matching information to obtain tracking information; wherein the tracking information includes whether the first object and the second object are the same object.
上述方案中,分别对第一图像和第二图像进行目标分割,得到第一图像中第一对象的第一掩膜图像和第二图像中第二对象的第二掩膜图像,并基于第一掩膜图像和第二掩膜图像在特征维度进行对象匹配,得到第一匹配信息,以及基于第一掩膜图像和第二掩膜图像在空间维度进行对象匹配,得到第二匹配信息,在此基础上,再融合第一匹配信息和第二匹配信息,得到跟踪信息,且跟踪信息包括第一对象和对象是否为同一对象,即在目标跟踪过程中,一方面在特征维度上在图像之间进行对象匹配,能够有利于确保对大尺寸对象的跟踪效果,另一方面在空间维度上在图像之间进行对象匹配,能够有利于确保对小尺寸对象的跟踪效果,并基于此融合两种匹配方式所得到的匹配信息得到跟踪信息,故能够同时兼顾大尺寸对象和小尺寸对象,有利于提升目标跟踪精度。In the above solution, target segmentation is performed on the first image and the second image respectively to obtain the first mask image of the first object in the first image and the second mask image of the second object in the second image, and based on the first Object matching is performed on the mask image and the second mask image in the feature dimension to obtain the first matching information, and object matching is performed on the spatial dimension based on the first mask image and the second mask image to obtain the second matching information, where Based on this, the first matching information and the second matching information are fused together to obtain tracking information, and the tracking information includes whether the first object and the object are the same object, that is, in the process of target tracking, on the one hand, in the feature dimension, between images Object matching can help ensure the tracking effect of large-sized objects. On the other hand, object matching between images in the spatial dimension can help ensure the tracking effect of small-sized objects, and based on this, the two matching The matching information obtained by this method is used to obtain tracking information, so it can take into account both large-size objects and small-size objects, which is conducive to improving the accuracy of target tracking.
本公开第二方面提供了一种目标跟踪模型的训练方法,包括:获取第一样本图像中第一样本对象的第一样本掩膜图像、第二样本图像中第二样本对象的第二样本掩膜图像和样本跟踪信息;其中,样本跟踪信息包括第一样本对象与第二样本对象是否实际为同一对象;基于目标跟踪模型的第一匹配网络将第一样本掩膜图像和第二样本掩膜图像在特征维度进行对象匹配,得到第一预测匹配信息,并基于目标跟踪模型的第二匹配网络将第一样本掩膜图像和第二样本掩膜图像在空间维度进行对象匹配,得到第二预测匹配信息;利用目标跟踪模型的信息融合网络融合第一预测匹配信息和第二预测匹配信息,得到预测跟踪信息;其中,预测跟踪信息包括第一样本对象与第二样本对象是否预测为同一对象;基于样本跟踪信息与预测跟踪信息之间的差异,调整目标跟踪模型的网络参数。The second aspect of the present disclosure provides a method for training a target tracking model, including: obtaining a first sample mask image of a first sample object in a first sample image, and a first sample mask image of a second sample object in a second sample image. Two sample mask images and sample tracking information; wherein, the sample tracking information includes whether the first sample object and the second sample object are actually the same object; the first matching network based on the target tracking model combines the first sample mask image and the second sample object Object matching is performed on the second sample mask image in the feature dimension to obtain the first predicted matching information, and the first sample mask image and the second sample mask image are object-matched in the spatial dimension based on the second matching network of the target tracking model. matching to obtain the second predicted matching information; using the information fusion network of the target tracking model to fuse the first predicted matching information and the second predicted matching information to obtain the predicted tracking information; wherein the predicted tracking information includes the first sample object and the second sample object Whether the object is predicted to be the same object; based on the difference between the sample tracking information and the predicted tracking information, the network parameters of the object tracking model are adjusted.
上述方案中,一方面在特征维度上在图像之间进行对象匹配,能够有利于确保对大尺寸对象的跟踪效果,另一方面在空间维度上在图像之间进行对象匹配,能够有利于确保对小尺寸对象的跟踪效果,并基于此融合两种匹配方式所得到的匹配信息得到跟踪信息,故能够同时兼顾大尺寸对象和小尺寸对象,有利于提升目标跟踪模型的精度。In the above scheme, on the one hand, object matching between images in the feature dimension can help ensure the tracking effect of large-sized objects; on the other hand, object matching between images in the spatial dimension can help ensure the tracking effect of The tracking effect of small-sized objects is based on the fusion of the matching information obtained by the two matching methods to obtain tracking information, so it can take into account both large-sized objects and small-sized objects, which is conducive to improving the accuracy of the target tracking model.
本公开实施例第三方面提供了一种目标跟踪装置,包括:目标分割部分、对象匹配部分和信息融合部分,目标分割部分,被配置为分别对第一图像和第二图像进行目标分割,得到第一图像中第一对象的第一掩膜图像和第二图像中第二对象的第二掩膜图像;对象匹配部分,被配置为基于第一掩膜图像和第二掩膜 图像在特征维度进行对象匹配,得到第一匹配信息,并基于第一掩膜图像和第二掩膜图像在空间维度进行对象匹配,得到第二匹配信息;信息融合部分,被配置为融合第一匹配信息和第二匹配信息,得到跟踪信息;其中,跟踪信息包括第一对象与第二对象是否为同一对象。The third aspect of the embodiments of the present disclosure provides a target tracking device, including: a target segmentation part, an object matching part and an information fusion part, and the target segmentation part is configured to perform target segmentation on the first image and the second image respectively, to obtain The first mask image of the first object in the first image and the second mask image of the second object in the second image; the object matching part is configured to be based on the first mask image and the second mask image in the feature dimension Perform object matching to obtain first matching information, and perform object matching in the spatial dimension based on the first mask image and the second mask image to obtain second matching information; the information fusion part is configured to fuse the first matching information and the second mask image Second, matching information to obtain tracking information; wherein, the tracking information includes whether the first object and the second object are the same object.
本公开实施例第四方面提供了一种目标跟踪模型的训练装置,包括:样本获取部分、样本匹配部分、样本融合部分和参数调整部分,样本获取部分,被配置为获取第一样本图像中第一样本对象的第一样本掩膜图像、第二样本图像中第二样本对象的第二样本掩膜图像和样本跟踪信息;其中,样本跟踪信息包括第一样本对象与第二样本对象是否实际为同一对象;样本匹配部分,被配置为基于目标跟踪模型的第一匹配网络将第一样本掩膜图像和第二样本掩膜图像在特征维度进行对象匹配,得到第一预测匹配信息,并基于目标跟踪模型的第二匹配网络将第一样本掩膜图像和第二样本掩膜图像在空间维度进行对象匹配,得到第二预测匹配信息;样本融合部分,被配置为利用目标跟踪模型的信息融合网络融合第一预测匹配信息和第二预测匹配信息,得到预测跟踪信息;其中,预测跟踪信息包括第一样本对象与第二样本对象是否预测为同一对象;参数调整部分,被配置为基于样本跟踪信息与预测跟踪信息之间的差异,调整目标跟踪模型的网络参数。The fourth aspect of an embodiment of the present disclosure provides a training device for a target tracking model, including: a sample acquisition part, a sample matching part, a sample fusion part, and a parameter adjustment part, and the sample acquisition part is configured to obtain the first sample image The first sample mask image of the first sample object, the second sample mask image of the second sample object in the second sample image, and sample tracking information; wherein, the sample tracking information includes the first sample object and the second sample Whether the object is actually the same object; the sample matching part is configured to match the first sample mask image and the second sample mask image in the feature dimension based on the first matching network of the target tracking model to obtain the first predicted match Information, and based on the second matching network of the target tracking model, the first sample mask image and the second sample mask image are matched in the spatial dimension to obtain the second predicted matching information; the sample fusion part is configured to use the target The information fusion network of the tracking model fuses the first predicted matching information and the second predicted matching information to obtain predicted tracking information; wherein, the predicted tracking information includes whether the first sample object and the second sample object are predicted to be the same object; the parameter adjustment part, configured to adjust the network parameters of the object tracking model based on the difference between the sample tracking information and the predicted tracking information.
本公开实施例第五方面提供了一种电子设备,包括相互耦接的存储器和处理器,处理器被配置为执行存储器中存储的程序指令,以实现上述第一方面中的目标跟踪方法,或实现上述第二方面中的目标跟踪模型的训练方法。A fifth aspect of an embodiment of the present disclosure provides an electronic device, including a memory and a processor coupled to each other, and the processor is configured to execute program instructions stored in the memory to implement the target tracking method in the first aspect above, or Realize the training method of the target tracking model in the second aspect above.
本公开实施例第六方面提供了一种计算机可读存储介质,其上存储有程序指令,程序指令被处理器执行时实现上述第一方面中的目标跟踪方法,或实现上述第二方面中的目标跟踪模型的训练方法。The sixth aspect of the embodiments of the present disclosure provides a computer-readable storage medium, on which program instructions are stored. When the program instructions are executed by a processor, the target tracking method in the above-mentioned first aspect is realized, or the object tracking method in the above-mentioned second aspect is realized. A method for training object tracking models.
本公开实施例第七方面提供了一种计算机程序产品,所述计算机程序产品包括计算机程序或指令,在所述计算机程序或指令在电子设备上运行的情况下,使得所述电子设备执行上述第一方面中的目标跟踪方法,或上述第二方面中的目标跟踪模型的训练方法。The seventh aspect of the embodiments of the present disclosure provides a computer program product, the computer program product includes a computer program or an instruction, and when the computer program or instruction is run on an electronic device, the electronic device executes the above-mentioned first The target tracking method in one aspect, or the target tracking model training method in the second aspect above.
附图说明Description of drawings
为了更清楚地说明本公开实施例的技术方案,下面将对本公开实施例中所需要使用的附图进行说明。In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings that need to be used in the embodiments of the present disclosure will be described below.
图1是本公开实施例提供的目标跟踪方法的一个流程示意图;FIG. 1 is a schematic flowchart of a target tracking method provided by an embodiment of the present disclosure;
图2是本公开实施例提供的目标跟踪模型的一个框架示意图;FIG. 2 is a schematic framework diagram of a target tracking model provided by an embodiment of the present disclosure;
图3是本公开实施例提供的信息融合过程的一个过程示意图;FIG. 3 is a process schematic diagram of an information fusion process provided by an embodiment of the present disclosure;
图4A是本公开实施例提供的全景分割图像的一个示意图;Fig. 4A is a schematic diagram of a panorama segmented image provided by an embodiment of the present disclosure;
图4B是本公开实施例提供的全景分割图像的另一个示意图;FIG. 4B is another schematic diagram of a panorama segmented image provided by an embodiment of the present disclosure;
图5是本公开实施例提供的在特征维度进行对象匹配的一个流程示意图;FIG. 5 is a schematic flow chart of object matching in the feature dimension provided by an embodiment of the present disclosure;
图6是本公开实施例提供的在特征维度进行对象匹配的一个过程示意图;FIG. 6 is a schematic diagram of a process of object matching in the feature dimension provided by an embodiment of the present disclosure;
图7是本公开实施例提供的在空间维度进行对象匹配的一个流程示意图;FIG. 7 is a schematic flow chart of object matching in spatial dimensions provided by an embodiment of the present disclosure;
图8是本公开实施例提供的在空间维度进行对象匹配的一个过程示意图;FIG. 8 is a schematic diagram of a process of object matching in the spatial dimension provided by an embodiment of the present disclosure;
图9是本公开实施例提供的目标跟踪方法的一个流程示意图;FIG. 9 is a schematic flowchart of a target tracking method provided by an embodiment of the present disclosure;
图10是本公开实施例提供的时间一致性约束的一个示意图;FIG. 10 is a schematic diagram of time consistency constraints provided by an embodiment of the present disclosure;
图11是本公开实施例提供的目标跟踪模型的训练方法的一个流程示意图;FIG. 11 is a schematic flowchart of a method for training a target tracking model provided by an embodiment of the present disclosure;
图12是本公开实施例提供的目标跟踪装置的一个框架示意图;Fig. 12 is a schematic framework diagram of a target tracking device provided by an embodiment of the present disclosure;
图13是本公开实施例提供的目标跟踪模型的训练装置的一个框架示意图;FIG. 13 is a schematic frame diagram of a training device for a target tracking model provided by an embodiment of the present disclosure;
图14是本公开实施例提供的电子设备的一个框架示意图;Fig. 14 is a schematic frame diagram of an electronic device provided by an embodiment of the present disclosure;
图15是本公开实施例提供的计算机可读存储介质的一个框架示意图。Fig. 15 is a schematic diagram of a computer-readable storage medium provided by an embodiment of the present disclosure.
此处的附图被并入说明书中并构成本说明书的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。The accompanying drawings here are incorporated into the description and constitute a part of the present description. These drawings show embodiments consistent with the present disclosure, and are used together with the description to explain the technical solution of the present disclosure.
具体实施方式Detailed ways
下面结合说明书附图,对本公开实施例的方案进行详细说明。The solutions of the embodiments of the present disclosure will be described in detail below in conjunction with the accompanying drawings.
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、接口、技术之类的具体细节,以便透彻理解本公开。In the following description, for purposes of illustration rather than limitation, specific details, such as specific system architectures, interfaces, techniques, are set forth in order to provide a thorough understanding of the present disclosure.
本文中术语“系统”和“网络”在本文中常被可互换使用。本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。此外,本文中的“多”表示两个或者多于两个。The terms "system" and "network" are often used interchangeably herein. The term "and/or" in this article is just an association relationship describing associated objects, which means that there can be three relationships, for example, A and/or B can mean: A exists alone, A and B exist simultaneously, and there exists alone B these three situations. In addition, the character "/" in this article generally indicates that the contextual objects are an "or" relationship. In addition, "many" herein means two or more than two.
请参阅图1,图1是本公开实施例提供的目标跟踪方法的一个流程示意图。具体而言,可以包括如下步骤:Please refer to FIG. 1 . FIG. 1 is a schematic flowchart of a target tracking method provided by an embodiment of the present disclosure. Specifically, the following steps may be included:
步骤S11:分别对第一图像和第二图像进行目标分割,得到第一图像中第一对象的第一掩膜图像和第二图像中第二对象的第二掩膜图像。Step S11: Carry out object segmentation on the first image and the second image respectively to obtain a first mask image of the first object in the first image and a second mask image of the second object in the second image.
在一个实施场景中,第一图像和第二图像可以是拍摄到的视频数据中连续两帧图像;或者,第一图像和第二图像在视频数据中也可以间隔若干帧图像,在此不做限定。需要说明的是,第一图像可以先于第二图像拍摄得到。为了便于描述,可以将第一图像记为t-δ,将第二图像记为t,其中,在第一图像和第二图像为相邻两帧图像的情况下,δ为1,在第一图像和第二图像间隔一帧图像的情况下,δ为2,以此类推,在此不再一一举例。In an implementation scenario, the first image and the second image can be two consecutive frames of images in the captured video data; or, the first image and the second image can also be separated by several frames of images in the video data, which is not done here limited. It should be noted that, the first image may be obtained by shooting before the second image. For the convenience of description, the first image can be marked as t-δ, and the second image can be marked as t, where, when the first image and the second image are two adjacent frames of images, δ is 1, and in the first In the case that the image and the second image are separated by one frame of image, δ is 2, and so on, and examples are not given here.
在一个实施场景中,在实际应用过程中,第一图像和第二图像可以是由诸如智能手机、自动驾驶设备等集成有摄像头的电子设备拍摄得到的,则可以结合摄像头的帧率以及移动速率,来确定第一图像和第二图像之间所间隔的图像帧数。示例性地,移动速率越快,相邻图像之间的变化越大,间隔的图像帧数可以越少,反之移动速率越慢,相邻的图像之间的变化越小,间隔的图像帧数可以越多;或者,帧率越大,相邻图像之间的变化越小,间隔的图像帧数可以越多,反之帧率越小,相邻图像之间的变化越大,间隔的图像帧数可以越少。In an implementation scenario, in the actual application process, the first image and the second image can be captured by electronic devices integrated with cameras such as smartphones and autonomous driving devices, and the frame rate and movement rate of the camera can be combined , to determine the number of image frames between the first image and the second image. Exemplarily, the faster the moving rate, the greater the change between adjacent images, and the fewer the number of image frames at intervals. On the contrary, the slower the moving rate, the smaller the change between adjacent images, and the fewer the number of image frames at intervals. It can be more; or, the larger the frame rate, the smaller the change between adjacent images, and the more the number of image frames at intervals can be. On the contrary, the smaller the frame rate, the greater the change between adjacent images, and the interval of image frames The number can be less.
在一个实施场景中,第一图像中第一对象可以不限于一个,如可以包括一个第一对象、两个第一对象、三个第一对象等等,在此不做限定;类似地,第二图像中第二对象可以不限于一个,如可以包括一个第二对象、两个第二对象、三个第二对象等等,在此不做限定。此外,上述对象可以包括但不限于:行人、车辆、路牌等对象。需要说明的是,本公开实施例中,多个同类对象不可计为同一对象,也就是说,即使多个对象为同类对象也需计为多个对象。示例性地,图像中可以含有两个行人,如分别记为行人甲、行人乙,则行人甲和行人乙可以计为两个对象,或者,图像中可以含有三个车辆,如车辆A、车辆B和车辆C,则车辆A、车辆B和车辆C可以计为三个对象,以此类推,在此不再一一举例。In an implementation scenario, the first object in the first image may not be limited to one, for example, it may include one first object, two first objects, three first objects, etc., which is not limited here; similarly, the first The second object in the second image may not be limited to one, for example, may include one second object, two second objects, three second objects, etc., which is not limited here. In addition, the aforementioned objects may include, but are not limited to: objects such as pedestrians, vehicles, and street signs. It should be noted that in the embodiments of the present disclosure, multiple objects of the same type cannot be counted as the same object, that is, even if multiple objects are objects of the same type, they need to be counted as multiple objects. For example, the image may contain two pedestrians, such as pedestrian A and pedestrian B, then pedestrian A and pedestrian B may be counted as two objects, or the image may contain three vehicles, such as vehicle A, vehicle B and vehicle C, then vehicle A, vehicle B and vehicle C can be counted as three objects, and so on, no more examples here.
在一个实施场景中,上述第一对象和第二对象分别为第一图像和第二图像中的前景对象,如前述行人、车辆、路牌等。除此之外,图像中还可以含有背景对象,如可以包括但不限于:道路、天空、建筑物等。为了实现视频全景分割,在分别对第一图像和第二图像进行目标分割之后,还可以得到第一图像中第一背景对象的掩膜图像和第二图像中第二背景对象的掩膜图像,以便后续结合掩膜图像和跟踪信息在图像上对各个前景对象、背景对象进行标记。例如,可以将不同图像中属于同一对象(如,同一前景对象、同一背景对象)的像素区域标记为同一种颜色等。示例性地,可以在第一图像中将行人甲的像素区域标记为红色,并在第二图像中将行人甲的像素区域也标记为红色。其他情况可以以此类推,在此不再一一举例。In an implementation scenario, the first object and the second object are foreground objects in the first image and the second image respectively, such as the aforementioned pedestrians, vehicles, street signs, and the like. In addition, the image may also contain background objects, such as but not limited to: roads, sky, buildings, etc. In order to realize video panorama segmentation, after object segmentation is performed on the first image and the second image respectively, the mask image of the first background object in the first image and the mask image of the second background object in the second image can also be obtained, In order to subsequently mark each foreground object and background object on the image in combination with the mask image and tracking information. For example, pixel regions belonging to the same object (eg, the same foreground object and the same background object) in different images may be marked with the same color and the like. For example, the pixel area of pedestrian A may be marked red in the first image, and the pixel area of pedestrian A may also be marked red in the second image. Other situations can be deduced by analogy, and no more examples will be given here.
在一个实施场景中,每一第一掩膜图像均和第一图像具有相同尺寸,类似地,每一第二掩膜图像均和第二图像具有相同尺寸。进一步地,对于每一第一对象的第一掩膜图像而言,其所含像素点的像素值表示第一图像中与该像素点位置对应的像素点属于该第一对象的可能性,示例性地,可能性越大,像素值也越大,反之,可能性越小,像素值也越小;类似地,对于每一第二对象的第二掩膜图像而言,其所含像素点的像素值表示第二图像中与该像素点位置对应的像素点属于该第二对象的可能性,示例性地,可能性越大,像素值也越大,反之,可能性越小,像素值也越小。In one implementation scenario, each first mask image has the same size as the first image, and similarly, each second mask image has the same size as the second image. Further, for the first mask image of each first object, the pixel value of the pixel contained in it indicates the possibility that the pixel corresponding to the pixel position in the first image belongs to the first object, for example Conversely, the greater the possibility, the greater the pixel value; conversely, the smaller the possibility, the smaller the pixel value; similarly, for the second mask image of each second object, the pixels contained in it The pixel value of represents the possibility that the pixel corresponding to the pixel position in the second image belongs to the second object. For example, the greater the possibility, the greater the pixel value; otherwise, the smaller the possibility, the pixel value Also smaller.
在一个实施场景中,上述位置对应的含义具体可以为具有相同像素坐标。例如,第一掩膜图像中位于像素坐标(i,j)处的像素点与第一图像中位于像素坐标(i,j)处的像素点相对应;或者,第二掩膜图像中位于像素坐标(m,n)处的像素点与第二图像中位于像素坐标(m,n)处的像素点相对应。In an implementation scenario, the corresponding meaning of the above positions may specifically have the same pixel coordinates. For example, the pixel point at pixel coordinate (i, j) in the first mask image corresponds to the pixel point at pixel coordinate (i, j) in the first image; or, the pixel point at pixel coordinate (i, j) in the second mask image The pixel at the coordinate (m, n) corresponds to the pixel at the pixel coordinate (m, n) in the second image.
在一个实施场景中,对于每一第一对象的第一掩膜图像而言,在其所含像素点的像素值高于预设阈值的情况下,可以认为第一图像中与该像素点位置对应的像素点属于该第一对象,类似地,对于每一第二对象的第二掩膜图像而言,在其所含像素点的像素值高于预设阈值的情况下,可以认为第二图像中与该像素点位置对应的像素点属于该第二对象。需要说明的是,预设阈值可以根据实际情况设置,如在像素值已经归一化至0至1范围内的情况下,预设阈值可以设置为0.5、0.6等,在此不做限定。In one implementation scenario, for the first mask image of each first object, if the pixel value of the pixel contained in it is higher than the preset threshold, it can be considered that the position of the pixel in the first image is The corresponding pixel belongs to the first object. Similarly, for the second mask image of each second object, if the pixel value of the pixel contained in it is higher than the preset threshold, it can be considered as the second mask image. The pixel corresponding to the pixel position in the image belongs to the second object. It should be noted that the preset threshold can be set according to the actual situation. For example, when the pixel value has been normalized to a range of 0 to 1, the preset threshold can be set to 0.5, 0.6, etc., which is not limited here.
在一个实施场景中,如前所述,在像素值高于预设阈值的情况下,可以认为像素点属于对象,在此基础上,可以进一步将像素值重置为第一数值(如,1),反之,在像素值不高于预设阈值的情况下,可以认为像素点不属于对象,在此基础上,可以进一步将像素值重置为第二数值(如,0)。示例性地,对于每一第一对象的第一掩膜图像而言,可以检查其所含像素点的像素值是否高于预设阈值,若是则可以将像素值重置为第一数值,否则可以重置为第二数值,以更新各个第一对象的第一掩膜图像;类似地,对于每一第二对象的第二掩膜图像而言,可以检查其所含像素点的像素值是否高于预设阈值,若是则可以将像素值重置为第一数值,否则可以重置为第二数值,以更新各个第二对象的第二掩膜图像。In an implementation scenario, as mentioned above, when the pixel value is higher than the preset threshold, it can be considered that the pixel belongs to the object. On this basis, the pixel value can be further reset to the first value (such as 1 ), on the contrary, if the pixel value is not higher than the preset threshold, it can be considered that the pixel does not belong to the object, on this basis, the pixel value can be further reset to a second value (eg, 0). Exemplarily, for the first mask image of each first object, it may be checked whether the pixel value of the pixel contained therein is higher than a preset threshold, and if so, the pixel value may be reset to the first value, otherwise can be reset to the second value to update the first mask image of each first object; similarly, for the second mask image of each second object, it can be checked whether the pixel value of the pixel contained in it is If it is higher than the preset threshold, then the pixel value can be reset to the first value, otherwise it can be reset to the second value, so as to update the second mask image of each second object.
在一个实施场景中,为了提升目标分割效率,可以预先训练一个目标跟踪模型,请结合参阅图2,图2是目标跟踪模型的一个框架示意图。如图2所示,目标跟踪模型可以包括目标分割网络,则第一图像和第二图像可以分别输入目标分割网络,得到各个第一对象的第一掩膜图像和各个第二对象的第二掩膜图 像。具体地,可以预先收集若干样本图像,并获取样本图像中各个样本对象的样本掩膜图像,再利用目标分割网络对样本图像进行目标分割,得到各个样本对象的预测掩膜图像,从而可以基于同一对象的样本掩膜图像和预测掩膜图像之间的差异,调整目标分割网络的网络参数。In an implementation scenario, in order to improve the efficiency of target segmentation, a target tracking model may be pre-trained, please refer to FIG. 2 , which is a schematic diagram of a frame of the target tracking model. As shown in Figure 2, the target tracking model may include a target segmentation network, and then the first image and the second image may be respectively input into the target segmentation network to obtain a first mask image of each first object and a second mask image of each second object. film image. Specifically, several sample images can be collected in advance, and the sample mask images of each sample object in the sample image can be obtained, and then the target segmentation network can be used to segment the sample image to obtain the predicted mask image of each sample object, so that based on the same The difference between the sample mask image and the predicted mask image of the object adjusts the network parameters of the object segmentation network.
在一个实施场景中,示例性地,可以分别采用诸如dice segmentation loss和position loss等损失函数来度量属于同一对象的样本掩膜图像和预测掩膜图像之间的差异,得到目标分割网络的损失值,并采用诸如梯度下降等优化方式对目标分割网络的网络参数进行调整。差异的具体度量过程,可以参阅诸如dice segmentation loss和position loss等损失函数的技术细节,参数的具体调整过程,可以参阅诸如梯度下降等优化方式的技术细节。In an implementation scenario, for example, loss functions such as dice segmentation loss and position loss can be used to measure the difference between the sample mask image and the predicted mask image belonging to the same object, and the loss value of the target segmentation network can be obtained , and use optimization methods such as gradient descent to adjust the network parameters of the target segmentation network. For the specific measurement process of the difference, you can refer to the technical details of loss functions such as dice segmentation loss and position loss. For the specific adjustment process of parameters, you can refer to the technical details of optimization methods such as gradient descent.
在一个实施场景中,为了获取诸如第一对象、第二对象等前景对象的掩膜图像,目标分割网络可以包括但不限于诸如Mask R-CNN、PointRend、Instance-sensitive FCN等实例分割网络,在此对目标分割网络的网络结构不做限定。In an implementation scenario, in order to obtain mask images of foreground objects such as the first object and the second object, the target segmentation network may include but not limited to instance segmentation networks such as Mask R-CNN, PointRend, Instance-sensitive FCN, etc., in This does not limit the network structure of the target segmentation network.
在一个实施场景中,为了同时获取诸如第一对象、第二对象等前景对象的掩膜图像和诸如前述道路、天空、建筑物等背景对象的掩膜图像,目标分割网络可以包括但不限于诸如PanopticFCN等全景分割网络,在此对目标分割网络的网络结构不做限定。In an implementation scenario, in order to simultaneously acquire the mask images of foreground objects such as the first object and the second object and the mask images of background objects such as the aforementioned roads, sky, buildings, etc., the target segmentation network may include but not limited to such as For panoptic segmentation networks such as PanopticFCN, the network structure of the target segmentation network is not limited here.
步骤S12:基于第一掩膜图像和第二掩膜图像在特征维度进行对象匹配,得到第一匹配信息,并基于第一掩膜图像和第二掩膜图像在空间维度进行对象匹配,得到第二匹配信息。Step S12: Perform object matching in the feature dimension based on the first mask image and the second mask image to obtain the first matching information, and perform object matching in the spatial dimension based on the first mask image and the second mask image to obtain the second mask image Two matching information.
在一个实施场景中,对于在特征维度进行对象匹配,可以分别基于各第一对象的第一掩膜图像,提取得到各第一对象的第一特征表示,并分别基于各第二对象的第二掩膜图像,提取得到各第二对象的第二特征表示,在此基础上,再利用第一特征表示和第二特征表示,得到各第一对象与各第二对象之间的特征相似度,并基于各第一对象与各第二对象之间的特征相似度,得到第一匹配信息。特征提取以及特征匹配的过程,可以参阅下述相关公开实施例,上述方式,仅需对各对象的掩膜图像进行特征提取,再度量特征相似度即可,能够降低在特征维度上在图像之间进行对象匹配的复杂度,有利于提升跟踪速度。In an implementation scenario, for object matching in the feature dimension, the first feature representations of each first object can be extracted based on the first mask images of each first object, and the second feature representations of each second object can be extracted respectively. The mask image is extracted to obtain the second feature representation of each second object, and on this basis, the feature similarity between each first object and each second object is obtained by using the first feature representation and the second feature representation, And based on the feature similarity between each first object and each second object, first matching information is obtained. For the process of feature extraction and feature matching, please refer to the following related disclosed embodiments. The above method only needs to perform feature extraction on the mask image of each object, and then measure the feature similarity, which can reduce the difference between images in the feature dimension. The complexity of object matching between objects is beneficial to improve the tracking speed.
在一个实施场景中,区别于上述将特征提取、特征匹配等分阶段执行的方式,为了提升在特征维度进行对象匹配的效率,作为一种实际应用过程中可选择的实施方式,也可以预先训练一个目标跟踪模型,且目标跟踪模型包括第一匹配网络。具体地,第一匹配网络可以包括若干特征提取层(如,卷积层、全连接层等)以及多层感知机,各第一对象的第一掩膜图像和各第二对象的第二掩膜图像经预处理之后,可以输入第一匹配网络进行处理。预处理的相关过程,可以参阅下述公开实施例中相关描述。为了便于描述,上述各第一对象和各第二对象可以统称为N个对象,上述各第一对象的第一掩膜图像和各第二对象的第二掩膜图像可以统称为N个掩膜图像。在此过程中,上述N个掩膜图像经若干特征提取层处理之后,可以得到N个特征表示,并进一步由多层感知机继续处理,输出得到一个N*N的矩阵,矩阵每一行代表N个对象其中一个对象,矩阵每一列代表N个对象其中一个对象,矩阵中位于第i行第j列的元素,代表N个对象中第i个对象与N个对象中第j个对象之间的匹配度,则可以从矩阵中提取出各第一对象与各第二对象之间的匹配度,得到第一匹配信息。当然,在实际应用过程中,为了尽可能地将模型轻量化,以便于训练以及部署,可以选择上述将特征提取、特征匹配等分阶段执行的方式,且为了提升效率,第一特征表示以及第二特征表示可以由第一匹配网络执行,此时第一匹配网络可以仅包括少量诸如卷积层、全连接层等网络层,从而可以大大减少参数量,具体可以参阅下述相关公开实施例,在此暂不赘述。In an implementation scenario, different from the above-mentioned method of performing feature extraction and feature matching in stages, in order to improve the efficiency of object matching in the feature dimension, as an optional implementation in the actual application process, pre-training An object tracking model, and the object tracking model includes a first matching network. Specifically, the first matching network may include several feature extraction layers (such as convolutional layers, fully connected layers, etc.) and a multi-layer perceptron, the first mask image of each first object and the second mask image of each second object After the membrane image is preprocessed, it can be input into the first matching network for processing. For the related process of preprocessing, please refer to the relevant description in the following disclosed embodiments. For the convenience of description, the above-mentioned first objects and each second object can be collectively referred to as N objects, and the first mask images of the above-mentioned first objects and the second mask images of each second object can be collectively referred to as N masks image. In this process, after the above N mask images are processed by several feature extraction layers, N feature representations can be obtained, and further processed by the multi-layer perceptron, and an N*N matrix is output, and each row of the matrix represents N One of the objects, each column of the matrix represents one of the N objects, and the element in the i-th row and column j in the matrix represents the distance between the i-th object in the N objects and the j-th object in the N objects matching degree, the matching degree between each first object and each second object can be extracted from the matrix to obtain the first matching information. Of course, in the actual application process, in order to reduce the weight of the model as much as possible for training and deployment, you can choose the above-mentioned method of performing feature extraction and feature matching in stages, and in order to improve efficiency, the first feature representation and the second The second feature representation can be performed by the first matching network. At this time, the first matching network can only include a small number of network layers such as convolutional layers and fully connected layers, thereby greatly reducing the amount of parameters. For details, please refer to the following related disclosed embodiments. I won't go into details here.
在一个实施场景中,对于在空间维度进行对象匹配,可以利用第二图像对第一图像进行光流预测,得到第一图像的光流图像,并基于光流图像,对第一对象的第一掩膜图像进行逐像素偏移,得到第一对象在第二图像的拍摄时刻的预测掩膜图像,以及基于各个第一对象的预测掩膜图像分别与各第二对象的第二掩膜图像之间的重合度,得到第二匹配信息。上述光流预测、像素偏移以及重合度度量的具体过程,可以参阅下述公开实施例中相关描述。上述方式,一方面能够基于像素级匹配实现对象匹配,有利于大大提升跟踪效果,特别是小尺寸对象的跟踪效果,另一方面在基于光流图像进行逐像素偏移之后仅需度量图像重合度即可得到匹配信息,也能够降低在空间维度上在图像之间进行对象匹配的复杂度,有利于提升跟踪速度。In one implementation scenario, for object matching in the spatial dimension, the second image can be used to perform optical flow prediction on the first image to obtain the optical flow image of the first image, and based on the optical flow image, the first The mask image is shifted pixel by pixel to obtain the predicted mask image of the first object at the shooting moment of the second image, and based on the difference between the predicted mask image of each first object and the second mask image of each second object The degree of coincidence between them is used to obtain the second matching information. For the specific process of the above optical flow prediction, pixel offset and coincidence degree measurement, please refer to the relevant description in the following disclosed embodiments. The above method, on the one hand, can realize object matching based on pixel-level matching, which is conducive to greatly improving the tracking effect, especially for small-sized objects. On the other hand, after pixel-by-pixel offset based on the optical flow image, only the image coincidence degree The matching information can be obtained, and the complexity of object matching between images in the spatial dimension can also be reduced, which is conducive to improving the tracking speed.
在一个实施场景中,区别于上述方式,作为一种实际应用过程中可选择的实施方式,也可以先获取各第一对象的第一掩膜图像与各第二对象的第二掩膜图像之间的第一最优位移向量,需要说明的是,第一掩膜图像在经该第一最优位移向量逐像素偏移之后,与第二掩膜图像具有最大重合度,并记录各第一对象的第一掩膜图像与各第二对象的第二掩膜图像之间的第一最优位移向量以及对应的最大重合度。与此同时,可以获取第一图像与第二图像之间的第二最优位移向量,类似地,第一图像在经该第二最优位移向量逐像素偏移之后,与第二图像具有最大重合度。在此基础上,可以度量各个第一最优位移向量分别与第二最优位移向量之间的向量相似度,需要说明的是,第一最优位移向量与第二最优位移向量越接近,向量相似度越大,反之,第一最优位移向量与第二最优位移向量越远离,向量相似度越小。基于此,对于各第一对象与各第二对象而言,可以将其对应的向量相似度与最大重合度进行加权处理,得到两者之间的匹配度,即可得到第二匹配信息。In an implementation scenario, different from the above method, as an optional implementation method in the actual application process, the difference between the first mask image of each first object and the second mask image of each second object may also be obtained first. It should be noted that, after the first optimal displacement vector is shifted pixel by pixel, the first mask image has the maximum overlap with the second mask image, and each first mask image is recorded A first optimal displacement vector and a corresponding maximum overlap between the first mask image of the object and the second mask images of each second object. At the same time, the second optimal displacement vector between the first image and the second image can be obtained. Similarly, after the first image is shifted pixel by pixel by the second optimal displacement vector, it has the maximum Coincidence degree. On this basis, the vector similarity between each first optimal displacement vector and the second optimal displacement vector can be measured. It should be noted that the closer the first optimal displacement vector is to the second optimal displacement vector, the The larger the vector similarity is, on the contrary, the farther the first optimal displacement vector is from the second optimal displacement vector, the smaller the vector similarity is. Based on this, for each first object and each second object, the corresponding vector similarity and maximum coincidence degree can be weighted to obtain the matching degree between the two, that is, the second matching information can be obtained.
步骤S13:融合第一匹配信息和第二匹配信息,得到跟踪信息。Step S13: Fusing the first matching information and the second matching information to obtain tracking information.
在一个实施场景中,如前所述,第一匹配信息可以包括各第一对象与各第二对象之间的匹配度,为了便于区分,可以称之为第一匹配度;类似地,第二匹配信息可以包括各第一对象与各第二对象之间的匹配度,为了便于区分,可以称之为第二匹配度。在此基础上,可以利用第一预设权重、第二预设权重分别对第一匹配信息中第一匹配度、第二匹配信息中第二匹配度进行加权,得到第一加权匹配信息和第二加权匹配信息,且第一加权匹配信息包括第一对象与第二对象之间的第一加权匹配度,第二加权匹配信息包括第一对象与第二对象之间的第二加权匹配度。基于此,可以将第一加权匹配信息和第二加权匹配信息进行融合,得到最终匹配信息,且最终匹配信息包括第一对象与第二对象之间的最终匹配度。也就是说,在融合过程中,可以直接利用预设权重对匹配度进行加权融合。In an implementation scenario, as mentioned above, the first matching information may include the matching degree between each first object and each second object, which may be referred to as the first matching degree for the convenience of distinction; similarly, the second The matching information may include the matching degree between each first object and each second object, which may be referred to as a second matching degree for the convenience of distinction. On this basis, the first matching degree in the first matching information and the second matching degree in the second matching information can be weighted respectively by using the first preset weight and the second preset weight to obtain the first weighted matching information and the second matching information. Two weighted matching information, wherein the first weighted matching information includes a first weighted matching degree between the first object and the second object, and the second weighted matching information includes a second weighted matching degree between the first object and the second object. Based on this, the first weighted matching information and the second weighted matching information may be fused to obtain final matching information, and the final matching information includes a final matching degree between the first object and the second object. That is to say, during the fusion process, the preset weights can be directly used to carry out weighted fusion on the matching degree.
在一个实施场景中,为了提升融合准确性,区别于前述方式,可以对第一匹配信息中第一匹配度进行自适应加权,得到第一加权匹配信息,并对第二匹配信息中第二匹配度进行自适应加权,得到第二加权匹配信息,基于此再将第一加权匹配信息和第二加权匹配信息进行融合,得到最终匹配信息,并基于最终匹配信息进行分析,得到跟踪信息。上述方式,在匹配信息的融合过程中,通过对第一匹配信息、第二匹配信息分别进行自适应加权,能够根据实际情况自适应地分别度量两者的重要程度,在此基础上再进行融合,有利于大大提升跟踪准确性。In an implementation scenario, in order to improve the fusion accuracy, different from the aforementioned method, adaptive weighting can be performed on the first matching degree in the first matching information to obtain the first weighted matching information, and the second matching degree in the second matching information Based on this, the first weighted matching information and the second weighted matching information are fused to obtain the final matching information, and the tracking information is obtained by analyzing based on the final matching information. In the above method, in the fusion process of matching information, by performing adaptive weighting on the first matching information and the second matching information respectively, the importance of the two can be adaptively measured according to the actual situation, and then the fusion can be performed on this basis , which is conducive to greatly improving the tracking accuracy.
在一个实施场景中,如前所述,为了提升目标跟踪的效率,可以预先训练一个目标跟踪模型,以通过目标跟踪模型处理第一图像和第二图像,得到跟踪信息,且目标跟踪模型可以包括信息融合网络。请结合参阅图3,图3是信息融合过程的一个过程示意图。如图3所示,信息融合网络可以进一步包括第一加权子网络和第二加权子网络,第一加权子网络用于对第一匹配信息进行自适应加权,第二加权子网络用于对第二匹配信息进行自适应加权。具体地,为了尽可能地使目标跟踪模型轻量化,第一加权子网络可以包括但不限于1*1卷积层,第二加权子网络可以包括但不限于1*1卷积层。上述方式,能够通过神经网络根据实际情况获悉特征维度和空间维度两方面分别对目标跟踪的重要程度,有利于提升自适应加权的效率和精度。In an implementation scenario, as mentioned above, in order to improve the efficiency of target tracking, a target tracking model can be pre-trained to process the first image and the second image through the target tracking model to obtain tracking information, and the target tracking model can include Information Fusion Network. Please refer to FIG. 3 in conjunction with FIG. 3 , which is a process schematic diagram of the information fusion process. As shown in Figure 3, the information fusion network may further include a first weighting subnetwork and a second weighting subnetwork, the first weighting subnetwork is used to adaptively weight the first matching information, and the second weighting subnetwork is used to weight the first matching information Two matching information is adaptively weighted. Specifically, in order to reduce the weight of the target tracking model as much as possible, the first weighted subnetwork may include but not limited to a 1*1 convolutional layer, and the second weighted subnetwork may include but not limited to a 1*1 convolutional layer. In the above method, the importance of the feature dimension and the spatial dimension to target tracking can be learned according to the actual situation through the neural network, which is conducive to improving the efficiency and accuracy of adaptive weighting.
在一个实施场景中,如图2和图3所示,第一匹配信息和第二匹配信息均可以采用矩阵表示。以第一图像中存在M个第一对象且第二图像中存在N个第二对象为例,第一匹配信息和第二匹配信息均可以采用M*N的矩阵来表示,且对于第一匹配信息来说,矩阵中第i行第j列元素表示第i个第一对象与第j个第二对象之间的第一匹配度,而对于第二匹配信息来说,矩阵中第i行第j列元素表示第i个第一对象与第j个第二对象之间的第二匹配度。在此基础上,第一匹配信息经自适应加权之后所得到的第一加权匹配信息也可以采用M*N的矩阵来表示,第二匹配信息经自适应加权之后所得到的第二加权匹配信息也可以采用M*N的矩阵来表示,矩阵中每个元素所代表的含义可以参阅前述相关描述。In an implementation scenario, as shown in FIG. 2 and FIG. 3 , both the first matching information and the second matching information may be represented by a matrix. Taking M first objects in the first image and N second objects in the second image as an example, both the first matching information and the second matching information can be represented by an M*N matrix, and for the first matching In terms of information, the i-th row and j-th column element in the matrix represent the first matching degree between the i-th first object and the j-th second object, while for the second matching information, the i-th row and j-th element in the matrix Elements in column j represent the second matching degree between the i-th first object and the j-th second object. On this basis, the first weighted matching information obtained after the first matching information is adaptively weighted can also be represented by an M*N matrix, and the second weighted matching information obtained after the second matching information is adaptively weighted It can also be represented by a matrix of M*N, and the meaning represented by each element in the matrix can refer to the above related description.
在一个实施场景中,在将第一加权匹配信息和第二加权匹配信息进行融合过程中,可以将代表第一加权匹配信息的矩阵中第i行第j列元素与代表第二加权匹配信息的矩阵中第i行第j列元素直接相加,得到代表最终匹配信息的矩阵。也就是说,对于每一组第一对象和第二对象而言,可以将其第一加权匹配度和第二加权匹配度直接相加,得到其最终匹配度。示例性地,第一图像可以包含第一对象甲和第一对象乙共两个第一对象,且第二图像可以包含第二对象A和第二对象B共两个第二对象,则最终匹配信息可以采用2*2的矩阵来表示,该矩阵第一行代表第一对象甲分别与第二对象A、第二对象B之间的最终匹配度,该矩阵第二行代表第一对象乙分别与第二对象A、第二对象B之间的最终匹配度,该矩阵第一列代表第二对象A分别与第一对象甲、第一对象乙之间的最终匹配度,该矩阵第二列代表第二对象B分别与第一对象甲、第一对象乙之间的最终匹配度。In an implementation scenario, in the process of fusing the first weighted matching information and the second weighted matching information, the element in row i and column j in the matrix representing the first weighted matching information can be combined with the element representing the second weighted matching information The elements in row i and column j in the matrix are added directly to obtain a matrix representing the final matching information. That is to say, for each group of first objects and second objects, the first weighted matching degree and the second weighted matching degree can be directly added to obtain the final matching degree. Exemplarily, the first image may contain two first objects, the first object A and the first object B, and the second image may contain two second objects, the second object A and the second object B, then the final matching The information can be represented by a 2*2 matrix. The first row of the matrix represents the final matching degree between the first object A and the second object A and the second object B respectively, and the second row of the matrix represents the first object B respectively. The final matching degree between the second object A and the second object B. The first column of the matrix represents the final matching degree between the second object A and the first object A and the first object B respectively. The second column of the matrix Represents the final matching degree between the second object B and the first object A and the first object B respectively.
在一个实施场景中,需要说明的是,跟踪信息具体可以包括第一对象与第二对象是否为同一对象。在此基础上,可以将各个第一对象与各个第二对象的两两组合,分别作为当前对象组,并基于当前对象组的第一参考信息和第二参考信息中的至少一种信息,确定当前第一对象和当前第二对象是否为同一对象,且当前第一对象为当前对象组中的第一对象,当前第二对象为当前对象组中的第二对象,第一参考信息包括:当前第一对象分别与各个第二对象之间的最终匹配度,第二参考信息包括:当前第二对象分别与各个第一对象之间的最终匹配度。如前所述,最终匹配度也可以采用矩阵来表示,则第一参考信息可以包含矩阵中代表第一当前对象的矩阵行所有元素,类似地,第二参考信息可以包含矩阵中代表第二当前对象的矩阵列所有元素。上述方式,能够尽可能地避免遗漏,有利于提升跟踪精度,另一方面在确定过程中结合第一参考信息、第二参考信息中至少一者,也有利于提升确定的准确性。In an implementation scenario, it should be noted that the tracking information may specifically include whether the first object and the second object are the same object. On this basis, each pair of first objects and each second object can be combined as the current object group, and based on at least one of the first reference information and the second reference information of the current object group, determine Whether the current first object and the current second object are the same object, and the current first object is the first object in the current object group, and the current second object is the second object in the current object group. The first reference information includes: current The final matching degrees between the first object and each second object, the second reference information includes: the final matching degrees between the current second object and each first object. As mentioned above, the final matching degree can also be represented by a matrix, then the first reference information can include all the elements of the matrix row representing the first current object in the matrix, and similarly, the second reference information can include the elements in the matrix representing the second current object All elements of the matrix columns of the object. The above method can avoid omission as much as possible, which is beneficial to improve the tracking accuracy. On the other hand, combining at least one of the first reference information and the second reference information in the determination process is also beneficial to improve the determination accuracy.
这里,在仅结合第一参考信息的情况下,可以将当前第一对象与当前第二对象之间的最终匹配度,作为待分析匹配度,并响应于待分析匹配度为第一参考信息中的最大值,确定当前第一对象和当前第二对象为同一对象。以前述最终匹配信息采用2*2的矩阵来表示为例,在当前第一对象为第一对象甲且当前第二对象为第二对象A的情况下,若矩阵中第1行第1列元素为矩阵第一行中最大值,则可以确定第一对象甲和第二对象A为同一对象。其他情况可以以此类推,在此不再一一举例。上述方式,仅需搜索第一参考信 息中的最大值,即可完成确定操作,有利于降低确定复杂度,并提升确定速度。Here, in the case of only combining the first reference information, the final matching degree between the current first object and the current second object can be used as the matching degree to be analyzed, and in response to the matching degree to be analyzed as the The maximum value of , to determine that the current first object and the current second object are the same object. Taking the aforementioned final matching information represented by a 2*2 matrix as an example, in the case where the current first object is the first object A and the current second object is the second object A, if the element in the first row and the first column in the matrix is the maximum value in the first row of the matrix, it can be determined that the first object A and the second object A are the same object. Other situations can be deduced by analogy, and no more examples will be given here. In the above method, the determination operation can be completed only by searching for the maximum value in the first reference information, which is beneficial to reduce the determination complexity and increase the determination speed.
这里,在仅结合第二参考信息的情况下,可以将当前第一对象与当前第二对象之间的最终匹配度,作为待分析匹配度,并响应于待分析匹配度为第二参考信息中的最大值,确定当前第一对象和当前第二对象为同一对象。以前述最终匹配信息采用2*2的矩阵来表示为例,在当前第一对象为第一对象甲且当前第二对象为第二对象A的情况下,若矩阵中第1行第1列元素为矩阵第一列中最大值,则可以确定第一对象甲和第二对象A为同一对象。其他情况可以以此类推,在此不再一一举例。上述方式,仅需搜索第二参考信息中的最大值,即可完成确定操作,有利于降低确定复杂度,并提升确定速度。Here, in the case of only combining the second reference information, the final matching degree between the current first object and the current second object can be used as the matching degree to be analyzed, and in response to the matching degree to be analyzed as the The maximum value of , to determine that the current first object and the current second object are the same object. Taking the aforementioned final matching information represented by a 2*2 matrix as an example, in the case where the current first object is the first object A and the current second object is the second object A, if the element in the first row and the first column in the matrix is the maximum value in the first column of the matrix, it can be determined that the first object A and the second object A are the same object. Other situations can be deduced by analogy, and no more examples will be given here. In the above manner, the determination operation can be completed only by searching for the maximum value in the second reference information, which is beneficial to reduce the determination complexity and increase the determination speed.
这里,在同时结合第一参考信息和第二参考信息的情况下,可以将当前第一对象与当前第二对象之间的最终匹配度,作为待分析匹配度,并响应于待分析匹配度为第一参考信息和第二参考信息中的最大值,确定当前第一对象和当前第二对象为同一对象。以前述最终匹配信息采用2*2的矩阵来表示为例,在当前第一对象为第一对象甲且当前第二对象为第二对象A的情况下,若矩阵中第1行第1列元素同时为矩阵第一行中最大值以及第一列中最大值,则可以确定第一对象甲和第二对象A为同一对象。其他情况可以以此类推,在此不再一一举例。上述方式,通过同时搜索第一参考信息和第二参考信息中的最大值来完成确定操作,能够在第一参考信息和第二参考信息的基础上实现协同校验,以实现对象之间一对一的匹配约束,有利于降低确定复杂度,并提升确定精度。Here, in the case of combining the first reference information and the second reference information at the same time, the final matching degree between the current first object and the current second object can be used as the matching degree to be analyzed, and in response to the matching degree to be analyzed is The maximum value in the first reference information and the second reference information determines that the current first object and the current second object are the same object. Taking the aforementioned final matching information represented by a 2*2 matrix as an example, in the case where the current first object is the first object A and the current second object is the second object A, if the element in the first row and the first column in the matrix If they are both the maximum value in the first row of the matrix and the maximum value in the first column, it can be determined that the first object A and the second object A are the same object. Other situations can be deduced by analogy, and no more examples will be given here. In the above manner, the determination operation is completed by searching for the maximum value in the first reference information and the second reference information at the same time, and collaborative verification can be realized on the basis of the first reference information and the second reference information, so as to realize a pair of One matching constraint is beneficial to reduce the complexity of determination and improve the accuracy of determination.
此外,需要说明的是,为了进一步提高目标跟踪准确性和鲁棒性,在上述过程中,若确定待分析匹配度为最大值,还可以进一步检测待分配匹配度是否高于预设阈值,若是则可以确定当前第一对象和当前第二对象为同一对象,否则当前第一对象和当前第二对象可以认为并非同一对象。In addition, it should be noted that in order to further improve the accuracy and robustness of target tracking, in the above process, if it is determined that the matching degree to be analyzed is the maximum value, it can be further detected whether the matching degree to be assigned is higher than the preset threshold, and if Then it can be determined that the current first object and the current second object are the same object; otherwise, the current first object and the current second object can be considered not to be the same object.
在一个实施场景中,在对跟踪准确性要求相对宽松的情况下,也可以在特征维度进行对象匹配过程中,在提取得到各第一对象的第一特征表示以及各第二对象的第二特征表示之后,直接基于这些特征表示进行分析,得到跟踪信息。具体而言,对于各个第一对象而言,可以基于其第一特征表示分别与各个第二对象的第二特征表示之间的特征相似度,得到其与各个第二对象预测为同一对象的概率值,并基于各个概率值,得到与该第一对象为同一对象的第二对象。上述方式,直接基于第一对象的第一特征表示和第二对象的第二特征表示之间的特征相似度,分析得到跟踪信息,有利于降低跟踪复杂度。In one implementation scenario, when the requirements for tracking accuracy are relatively loose, it is also possible to extract the first feature representation of each first object and the second feature of each second object during the object matching process in the feature dimension. After representation, analysis is performed directly based on these feature representations to obtain tracking information. Specifically, for each first object, the probability that it and each second object are predicted to be the same object can be obtained based on the feature similarity between its first feature representation and the second feature representation of each second object value, and based on each probability value, a second object that is the same object as the first object is obtained. In the above manner, the tracking information is analyzed and obtained directly based on the feature similarity between the first feature representation of the first object and the second feature representation of the second object, which is beneficial to reduce tracking complexity.
在一个实施场景中,可以将第一特征表示分别与各个第二对象的第二特征表示之间的特征相似度进行归一化操作,得到第一对象分别与各个第二对象预测为同一对象的概率值。仍以第一图像中包含M个第一对象且第二图像中包含N个第二对象为例,在对M个第一对象中第i个第一对象进行对象匹配时,可以将该第一对象的第一特征表示记为M(i),相应地第j个第二对象的第二特征表示可以记为N(j),以通过softmax实现归一化操作为例,第i个第一对象分别与各个第二对象预测为同一对象的概率值可以表示为:In an implementation scenario, the feature similarity between the first feature representation and the second feature representations of each second object can be normalized to obtain the prediction that the first object and each second object are the same object probability value. Still taking the example that the first image contains M first objects and the second image contains N second objects, when performing object matching on the i-th first object among the M first objects, the first object can be The first feature representation of the object is denoted as M(i), and the second feature representation of the jth second object can be denoted as N(j). Taking the normalization operation through softmax as an example, the i-th first The probability values that the object and each second object are predicted to be the same object can be expressed as:
Figure PCTCN2022106523-appb-000001
Figure PCTCN2022106523-appb-000001
上述公式(1)中,x∈t表示属于第二图像中各个第二对象,上标T表示转置。In the above formula (1), x∈t indicates that it belongs to each second object in the second image, and the superscript T indicates transposition.
在一个实施场景中,各个第二对象分别标记有序号值,如第一个第二对象可以标记有序号值“1”,第二个第二对象可以标记有序号值“2”,以此类推,在此不再一一举例。在此基础上,可以基于第二对象的序号值和第二对象对应的概率值,得到期望值,并将期望值上取整之后的数值,作为目标序号值,以及将目标序号值所属的第二对象,视为与第一对象为同一对象。为了便于表述,可以将目标序号值记为
Figure PCTCN2022106523-appb-000002
则目标序号值可以表示为:
In an implementation scenario, each second object is marked with a serial number value, for example, the first second object can be marked with the serial number value "1", the second second object can be marked with the serial number value "2", and so on , no more examples here. On this basis, the expected value can be obtained based on the serial number value of the second object and the probability value corresponding to the second object, and the value after rounding the expected value is used as the target serial number value, and the second object to which the target serial number value belongs , considered to be the same object as the first object. For the convenience of expression, the target serial number value can be recorded as
Figure PCTCN2022106523-appb-000002
Then the target serial number value can be expressed as:
Figure PCTCN2022106523-appb-000003
Figure PCTCN2022106523-appb-000003
上述公式(2)中,t-δ→t表示第一图像t-δ中第一对象匹配至第二图像t中第二对象。需要说明的是,上述公式(2)中未表示出上取整操作,在实际应用过程中,由于期望值可能是小数,则为了确定目标序号值,可以直接对期望值进行上取整操作。In the above formula (2), t−δ→t means that the first object in the first image t−δ is matched to the second object in the second image t. It should be noted that the rounding up operation is not shown in the above formula (2). In the actual application process, since the expected value may be a decimal, in order to determine the value of the target serial number, the rounding up operation can be directly performed on the expected value.
在一个实施场景中,如前所述,在得到跟踪信息之后,可以将不同图像中属于同一对象(如,同一前景对象、同一背景对象)的像素区域标记为同一种颜色。请结合参阅图4A和图4B,图4A和图4B分别是全景分割图像的两个示意图。如图4A和图4B所示,图4A表示图2中第一图像对应的全景分割图像,图4B表示图2中第二图像对应的全景分割图像,图4A和图4B两幅图像中对应于同一对象的像素区域可以以相同灰度予以表示。In an implementation scenario, as mentioned above, after the tracking information is obtained, pixel regions belonging to the same object (eg, the same foreground object and the same background object) in different images may be marked with the same color. Please refer to FIG. 4A and FIG. 4B in conjunction. FIG. 4A and FIG. 4B are two schematic diagrams of panorama segmented images respectively. As shown in Figure 4A and Figure 4B, Figure 4A represents the panorama segmentation image corresponding to the first image in Figure 2, and Figure 4B represents the panorama segmentation image corresponding to the second image in Figure 2, and the two images in Figure 4A and Figure 4B correspond to Pixel areas of the same object can be represented with the same gray scale.
上述方案,分别对第一图像和第二图像进行目标分割,得到第一图像中第一对象的第一掩膜图像和第二图像中第二对象的第二掩膜图像,并基于第一掩膜图像和第二掩膜图像在特征维度进行对象匹配,得到第一匹配信息,以及基于第一掩膜图像和第二掩膜图像在空间维度进行对象匹配,得到第二匹配信息,在此基础上,再融合第一匹配信息和第二匹配信息,得到跟踪信息,且跟踪信息包括第一对象和对象是否为 同一对象,即在目标跟踪过程中,一方面在特征维度上在图像之间进行对象匹配,能够有利于确保对大尺寸对象的跟踪效果,另一方面在空间维度上在图像之间进行对象匹配,能够有利于确保对小尺寸对象的跟踪效果,并基于此融合两种匹配方式所得到的匹配信息得到跟踪信息,故能够同时兼顾大尺寸对象和小尺寸对象,有利于提升目标跟踪精度。In the above solution, target segmentation is performed on the first image and the second image respectively to obtain the first mask image of the first object in the first image and the second mask image of the second object in the second image, and based on the first mask Object matching is performed on the film image and the second mask image in the feature dimension to obtain the first matching information, and object matching is performed on the spatial dimension based on the first mask image and the second mask image to obtain the second matching information. On the other hand, the first matching information and the second matching information are fused to obtain the tracking information, and the tracking information includes whether the first object and the object are the same object, that is, in the process of target tracking, on the one hand, between images in the feature dimension Object matching can help ensure the tracking effect of large-sized objects. On the other hand, object matching between images in the spatial dimension can help ensure the tracking effect of small-sized objects, and based on this, the two matching methods are combined The obtained matching information is obtained as tracking information, so both large-size objects and small-size objects can be taken into consideration, which is beneficial to improving target tracking accuracy.
请参阅图5,图5是在特征维度进行对象匹配的一个流程示意图,可以包括如下步骤:Please refer to Figure 5. Figure 5 is a schematic flow chart of object matching in the feature dimension, which may include the following steps:
步骤S51:分别基于各第一对象的第一掩膜图像,提取得到各第一对象的第一特征表示,并分别基于各第二对象的第二掩膜图像,提取得到各第二对象的第二特征表示。Step S51: Based on the first mask image of each first object, extract the first feature representation of each first object, and respectively based on the second mask image of each second object, extract the first feature representation of each second object. Two features are represented.
这里,可以基于掩膜图像中各个像素点的像素值,确定对象边界,且对象边界为掩膜图像所属对象的边界,并沿对象边界从掩膜图像中裁剪出区域图像,以及基于区域图像进行特征提取,得到所属对象的特征表示,且在掩膜图像为第一掩膜图像的情况下,所属对象为第一对象,特征表示为第一特征表示,在掩膜图像为第二掩膜图像的情况下,所属对象为第二对象,特征表示为第二特征表示。上述方式,能够在特征提取过程中,排除与掩膜图像所属对象无关像素的干扰,有利于提升特征表示的准确性。Here, the object boundary can be determined based on the pixel values of each pixel in the mask image, and the object boundary is the boundary of the object to which the mask image belongs, and a region image is cut out from the mask image along the object boundary, and based on the region image Feature extraction, to obtain the feature representation of the belonging object, and in the case of the mask image being the first mask image, the belonging object is the first object, the feature representation is the first feature representation, and the mask image is the second mask image In the case of , the belonging object is the second object, and the feature representation is the second feature representation. The above method can eliminate the interference of pixels irrelevant to the object to which the mask image belongs during the feature extraction process, which is conducive to improving the accuracy of feature representation.
在一个实施场景中,如前所述公开实施例所述,对于每一对象的掩膜图像而言,属于该对象的像素点其像素值高于预设阈值(如,0.5、0.6等),或者其像素值直接被设置为第一数值(如,1),则可以将像素值高于预设阈值(或者像素值为第一数值)的像素点作为目标像素点,并将包围目标像素点的矩形框,作为其对象边界。In an implementation scenario, as described in the aforementioned disclosed embodiments, for the mask image of each object, the pixel points belonging to the object have a pixel value higher than a preset threshold (eg, 0.5, 0.6, etc.), Or its pixel value is directly set to the first value (eg, 1), then the pixel whose pixel value is higher than the preset threshold (or the pixel value is the first value) can be used as the target pixel, and the surrounding target pixel A rectangular box that acts as its object bounds.
在一个实施场景中,请结合参阅图6,图6是在特征维度进行对象匹配一实施例的过程示意图。如图6所示,仍以第一图像中包含M个第一对象且第二图像中包含N个第二对象为例,第一掩膜图像的尺寸可以表示为M*H*W,且第二掩膜图像的尺寸可以表示为N*H*W,需要说明的是,H为掩膜图像的高度,W为掩膜图像的宽度。经上述裁剪之后,可以进一步通过诸如双线性插值等插值算法调整为预设尺寸(如,256*512),并将空白区域填充为0,以得到区域图像。In an implementation scenario, please refer to FIG. 6 in conjunction with FIG. 6 , which is a schematic diagram of an embodiment of the process of object matching in the feature dimension. As shown in Figure 6, still taking the example that the first image contains M first objects and the second image contains N second objects, the size of the first mask image can be expressed as M*H*W, and the first mask image The size of the second mask image can be expressed as N*H*W. It should be noted that H is the height of the mask image, and W is the width of the mask image. After the above cropping, it can be further adjusted to a preset size (eg, 256*512) through an interpolation algorithm such as bilinear interpolation, and the blank area is filled with 0 to obtain an area image.
在一个实施场景中,如前所述,为了提升目标跟踪效率,可以预先训练一个目标跟踪模型,且目标跟踪模型包括第一匹配网络,第一匹配网络具体可以包括第一提取子网络和第二提取子网络,且第一提取子网络用于提取第一特征表示,第二提取子网络用于提取第二特征表示。为了进一步尽可能地轻量化网络模型,第一提取子网络和第二提取子网络均可以包括若干全连接层(Fully Connection layer,FC),如图6所示,可以包含两层全连接层(即图6中2*FC),得到1024维的第一特征表示和1024维的第二特征表示。需要说明是的,在实际应用过程中,并不限于第一提取子网络和第二提取子网络的网络结构,可以根据实际情况进行设置,如还可以包括卷积层等,在此不做限定。In an implementation scenario, as mentioned above, in order to improve the efficiency of target tracking, a target tracking model can be pre-trained, and the target tracking model includes a first matching network, and the first matching network can specifically include a first extraction sub-network and a second The sub-network is extracted, and the first extraction sub-network is used to extract the first feature representation, and the second extraction sub-network is used to extract the second feature representation. In order to further lighten the network model as much as possible, both the first extraction sub-network and the second extraction sub-network can include several fully connected layers (Fully Connection layer, FC), as shown in Figure 6, can include two layers of fully connected layers ( That is, 2*FC in Figure 6), the first feature representation of 1024 dimensions and the second feature representation of 1024 dimensions are obtained. It should be noted that in the actual application process, it is not limited to the network structure of the first extraction sub-network and the second extraction sub-network, which can be set according to the actual situation, such as convolutional layers, etc., which are not limited here .
步骤S52:利用第一特征表示和第二特征表示,得到各第一对象与各第二对象之间的特征相似度。Step S52: Obtain the feature similarity between each first object and each second object by using the first feature representation and the second feature representation.
这里,对于任一第一对象和任一第二对象而言,可以将第一对象的第一特征表示和第二对象的第二特征表示相乘,得到两者之间的特征相似度。以第一特征表示为1024维特征向量且第二特征表示也为1024维特征向量为例,可以将两者对应位置处元素相乘之后进行累加,得到特征相似度。Here, for any first object and any second object, the first feature representation of the first object may be multiplied by the second feature representation of the second object to obtain the feature similarity between the two. Taking the first feature representation as a 1024-dimensional feature vector and the second feature representation as a 1024-dimensional feature vector as an example, the elements at the corresponding positions of the two can be multiplied and accumulated to obtain the feature similarity.
步骤S53:基于各第一对象与各第二对象之间的特征相似度,得到第一匹配信息。Step S53: Obtain first matching information based on the feature similarity between each first object and each second object.
这里,在得到特征相似度之后,可以对计算得到的特征相似度进行归一化操作,得到第一匹配度。在得到任一第一对象与任一第二对象之间的第一匹配度之后,即可将这些第一匹配度视为第一匹配信息。此外,请结合参阅图6,仍以第一图像中包含M个第一对象,第二图像中包含N个第二对象为例,第一匹配信息最终可以表示为M*N的矩阵,且矩阵中第i行第j列元素表示第i个第一对象与第j个第二对象之间的第一匹配度。Here, after the feature similarity is obtained, a normalization operation may be performed on the calculated feature similarity to obtain the first matching degree. After obtaining the first matching degrees between any first object and any second object, these first matching degrees can be regarded as the first matching information. In addition, please refer to FIG. 6, still taking the example that the first image contains M first objects and the second image contains N second objects, the first matching information can finally be expressed as an M*N matrix, and the matrix The element in row i and column j in represents the first matching degree between the i first object and the j second object.
上述方案,分别基于各第一对象的第一掩膜图像,提取得到各第一对象的第一特征表示,并分别基于各第二对象的第二掩膜图像,提取得到各第二对象的第二特征表示,基于此再利用第一特征表示和第二特征表示,得到各第一对象与各第二对象之间的特征相似度,并基于各第一对象与各第二对象之间的特征相似度,得到第一匹配信息,即在特征维度上在图像之间进行对象匹配过程中,仅需对各对象的掩膜图像进行特征提取,再度量特征相似度即可,能够降低在特征维度上在图像之间进行对象匹配的复杂度,有利于提升跟踪速度。In the above solution, the first feature representation of each first object is extracted based on the first mask image of each first object, and the first feature representation of each second object is extracted based on the second mask image of each second object. Two feature representations, based on which the first feature representation and the second feature representation are used to obtain the feature similarity between each first object and each second object, and based on the feature between each first object and each second object Similarity, to obtain the first matching information, that is, in the process of object matching between images in the feature dimension, it is only necessary to perform feature extraction on the mask image of each object, and then measure the feature similarity, which can reduce the feature dimension On the complexity of object matching between images, it is beneficial to improve the tracking speed.
请参阅图7,图7是在空间维度进行对象匹配的一个流程示意图,可以包括如下步骤:Please refer to FIG. 7. FIG. 7 is a schematic flow chart of object matching in the spatial dimension, which may include the following steps:
步骤S71:利用第二图像对第一图像进行光流预测,得到第一图像的光流图像。Step S71: Using the second image to perform optical flow prediction on the first image to obtain an optical flow image of the first image.
在一个实施场景中,请结合参阅图8,图8是在空间维度进行对象匹配的一个过程示意图。如图8所示,光流图像可以为二通道图像,其中一个通道图像包括第一图像中各个像素点在横向方向的偏移值,另一个通道图像包括第一图像中各个像素点在纵向方向的偏移值。需要说明的是,在光流预测准确无误的情况下,第一图像中像素点在横向方向和纵向方向分别按照偏移值进行偏移之后可以得到一个像素位置,且第二图像中位于该像素位置的像素点理论上仍为其本身。示例性地,第一图像中第一对象甲最顶端的像素点在横向方向和纵向方向分别按照偏移值进行偏移之后可以得到一个像素位置,且按照该像素位置在第二图像中所找到的像素点仍为第一对象甲最顶端的像素点。其他情况可以以此类推,在此不再一一举例。In an implementation scenario, please refer to FIG. 8 , which is a schematic diagram of a process of object matching in a spatial dimension. As shown in Figure 8, the optical flow image can be a two-channel image, wherein one channel image includes the offset value of each pixel point in the first image in the horizontal direction, and the other channel image includes the offset value of each pixel point in the first image in the vertical direction offset value. It should be noted that, under the condition that the optical flow prediction is accurate, the pixel in the first image can be shifted according to the offset value in the horizontal direction and the vertical direction respectively, and a pixel position can be obtained, and the pixel in the second image is located at The pixel of the position is theoretically still itself. Exemplarily, after the topmost pixel of the first object A in the first image is offset according to the offset value in the horizontal direction and the vertical direction, a pixel position can be obtained, and according to the pixel position found in the second image The pixel point of is still the topmost pixel point of the first object A. Other situations can be deduced by analogy, and no more examples will be given here.
在一个实施场景中,如前所述,为了提升目标跟踪效率,可以预先训练一个目标跟踪模型,且目标跟踪模型可以包括光流预测网络,光流预测网络可以包括但不限于RAFT(即Recurrent All-Pairs Field Transforms for Optical Flow)等,在此对光流预测网络的网络结构不做限定。在此基础上,可以将第一图像和第二图像输入光流预测网络,得到光流图像。需要说明的是,光流预测网络的工作原理,可以参阅诸如RAFT等光流预测网络的技术细节。In an implementation scenario, as mentioned above, in order to improve the efficiency of target tracking, a target tracking model can be pre-trained, and the target tracking model can include an optical flow prediction network, and the optical flow prediction network can include but not limited to RAFT (Recurrent All -Pairs Field Transforms for Optical Flow), etc., the network structure of the optical flow prediction network is not limited here. On this basis, the first image and the second image can be input into the optical flow prediction network to obtain an optical flow image. It should be noted that for the working principle of the optical flow prediction network, you can refer to the technical details of the optical flow prediction network such as RAFT.
步骤S72:基于光流图像,对第一对象的第一掩膜图像进行逐像素偏移,得到第一对象在第二图像的拍摄时刻的预测掩膜图像。Step S72: Based on the optical flow image, perform a pixel-by-pixel shift on the first mask image of the first object to obtain a predicted mask image of the first object at the shooting moment of the second image.
这里,可以将光流图像和第一掩膜图像进行逐像素相乘,得到第一掩膜图像中像素点的偏移值,并将第一掩膜图像中像素点的第一像素坐标与偏移值相加,得到像素点在第二图像的拍摄时刻的第二像素坐标(即在上述拍摄时刻的预测像素坐标),并基于第一掩膜图像中像素点的第二像素坐标,得到预测掩膜图像。上述方式,在逐像素偏移过程中,仅需像素相乘、相加等简单运算即可,故能够大大降低像素偏移的复杂度,有利于进一步提升跟踪速度。Here, the optical flow image and the first mask image can be multiplied pixel by pixel to obtain the offset value of the pixel in the first mask image, and the first pixel coordinate of the pixel in the first mask image can be compared with the offset The shift value is added to obtain the second pixel coordinate of the pixel point at the shooting moment of the second image (that is, the predicted pixel coordinate at the above shooting moment), and based on the second pixel coordinate of the pixel point in the first mask image, the predicted mask image. In the above method, in the process of pixel-by-pixel offset, only simple operations such as pixel multiplication and addition are required, so the complexity of pixel offset can be greatly reduced, which is conducive to further improving the tracking speed.
在一个实施场景中,可以将第一掩膜图像中各个像素点的像素值与光流图像中对应位置处像素点的像素值相乘,得到第一掩膜图像中像素点的偏移值。关于对应位置的含义,可以参阅前述公开实施例中相关描述。请结合参阅图8中掩膜图像示例,掩膜图像中各个栅格分别代表各个像素点,为了便于描述,第一掩膜图像中采用灰度填充的栅格其像素值为1,则第一掩膜图像可以采用矩阵表示为:In an implementation scenario, the pixel value of each pixel in the first mask image may be multiplied by the pixel value of a pixel at a corresponding position in the optical flow image to obtain an offset value of the pixel in the first mask image. For meanings of corresponding positions, reference may be made to relevant descriptions in the aforementioned disclosed embodiments. Please refer to the mask image example in Figure 8. Each grid in the mask image represents each pixel. For the convenience of description, the pixel value of the grid filled with grayscale in the first mask image is 1, then the first The mask image can be expressed as a matrix as:
Figure PCTCN2022106523-appb-000004
Figure PCTCN2022106523-appb-000004
此外,横向方向通道的光流图像中各像素点的像素值可以均为0,而纵向方向通道的光流图像中各像素点的像素值可以均为1,则上述第一掩膜图像分别与横向方向通道的光流图像相乘之后,可以得到第一掩膜图像中各像素点在横向方向的偏移值:In addition, the pixel values of each pixel in the optical flow image of the horizontal channel can be 0, and the pixel values of each pixel in the optical flow image of the vertical channel can be 1, then the above-mentioned first mask image and After multiplying the optical flow images of the channels in the lateral direction, the offset value of each pixel in the first mask image in the lateral direction can be obtained:
Figure PCTCN2022106523-appb-000005
Figure PCTCN2022106523-appb-000005
类似地,上述第一掩膜图像分别与纵向方向通道的光流图像相乘之后,可以得到第一掩膜图像中各像素点在纵向方向的偏移值:Similarly, after the above first mask image is multiplied by the optical flow image of the longitudinal channel, the offset value of each pixel in the first mask image in the longitudinal direction can be obtained:
Figure PCTCN2022106523-appb-000006
Figure PCTCN2022106523-appb-000006
故此,结合上述矩阵(4)和矩阵(5),可以得到第一掩膜图像中各个像素点分别在横向方向和纵向方向上的偏移值,再加上像素点在第一掩膜图像中的第一像素坐标,即可得到像素点在拍摄时刻的第二像素坐标。示例性地,对于第一掩膜图像中第一像素坐标(1,1)处像素点而言,由于其在横向方向和纵向方向的偏移值均为0,故其在拍摄时刻的第二像素坐标仍为(1,1);或者,对于第一掩膜图像中第一像素坐标(1,2)为例,由于其在横向方向的偏移值为0且纵向方向的偏移值为1,故其在拍摄时刻的第二像素坐标为(1,3),其他像素点可以以此类推,在此不再一一举例。在对第一掩膜图像中所有像素点均执行像素偏移操作之后,即可得到如图8中掩膜图像示例处的预测掩膜图像。Therefore, by combining the above matrix (4) and matrix (5), the offset values of each pixel in the first mask image in the horizontal direction and the vertical direction can be obtained, plus the offset value of each pixel in the first mask image The first pixel coordinate of the pixel point at the time of shooting can be obtained. Exemplarily, for the pixel at the first pixel coordinate (1,1) in the first mask image, since its offset values in the horizontal and vertical directions are both 0, it is The pixel coordinate is still (1,1); or, for the first pixel coordinate (1,2) in the first mask image as an example, since its offset value in the horizontal direction is 0 and the offset value in the vertical direction is 1, so its second pixel coordinate at the time of shooting is (1,3), and other pixels can be deduced in the same way, and no more examples are given here. After the pixel shift operation is performed on all pixels in the first mask image, a predicted mask image as shown in the example of the mask image in FIG. 8 can be obtained.
步骤S73:基于各个第一对象的预测掩膜图像分别与各个第二对象的第二掩膜图像之间的重合度,得到第二匹配信息。Step S73: Obtain second matching information based on the degree of overlap between the predicted mask image of each first object and the second mask image of each second object.
这里,可以采用dice系数来计算第一对象的预测掩膜图像与第二对象的第二掩膜图像之间的重合度,并将该重合度作为第一对象与第二对象之间的第二匹配度,在得到任一第一对象与任一第二对象之间的第二匹配度之后,即可视为得到第二匹配信息。Here, the dice coefficient can be used to calculate the degree of overlap between the predicted mask image of the first object and the second mask image of the second object, and use the degree of overlap as the second value between the first object and the second object. The matching degree can be regarded as obtaining the second matching information after obtaining the second matching degree between any first object and any second object.
在一个实施场景中,为例便于描述,可以将预测掩膜图像中像素点总数记为N,则第二掩膜图像中像素点总数也可以记为N,且预测掩膜图像中第i个像素点的像素值可以记为p i,第二掩膜图像中第i个像素点的像素值可以记为g i,则预测掩膜图像和第二掩膜图像之间的重合度可以表示为: In an implementation scenario, as an example for ease of description, the total number of pixels in the predicted mask image can be recorded as N, then the total number of pixels in the second mask image can also be recorded as N, and the i-th pixel in the predicted mask image The pixel value of the pixel point can be recorded as p i , and the pixel value of the i-th pixel point in the second mask image can be recorded as g i , then the coincidence degree between the predicted mask image and the second mask image can be expressed as :
Figure PCTCN2022106523-appb-000007
Figure PCTCN2022106523-appb-000007
上述公式(6)中,sim pos表示重合度,以图8所示的预测掩膜图像和第二掩膜图像为例,两者之间的重合度经上述公式(6)计算为3/8,即两个掩膜图像之间的交并比(Intersection over Union,IoU)。其他情况可以以此类推,在此不再一一举例。 In the above formula (6), sim pos represents the coincidence degree. Taking the predicted mask image and the second mask image shown in Figure 8 as an example, the coincidence degree between the two is calculated as 3/8 by the above formula (6) , which is the Intersection over Union (IoU) between two mask images. Other situations can be deduced by analogy, and no more examples will be given here.
在一个实施场景中,如前所述公开实施例所述,第二匹配信息也可以采用矩阵来表示。请结合参阅图8,以第一图像中存在M个第一对象且第二图像中存在N个第二对象为例,第二匹配信息可以采用M*N的矩阵来表示,且矩阵中第i行第j列元素表示第i个第一对象与第j个第二对象之间的第二匹配度。In an implementation scenario, as described in the foregoing disclosed embodiments, the second matching information may also be represented by a matrix. Please refer to FIG. 8. Taking M first objects in the first image and N second objects in the second image as an example, the second matching information can be represented by an M*N matrix, and the i-th object in the matrix The element in the jth column of the row represents the second matching degree between the i-th first object and the j-th second object.
上述方案,利用第二图像对第一图像进行光流预测,得到第一图像的光流图像,并基于光流图像,对第一对象的第一掩膜图像进行逐像素偏移,得到第一对象在第二图像的拍摄时刻的预测掩膜图像,以及基于各第一对象的预测掩膜图像分别与各第二对象的第二掩膜图像之间的重合度,得到第二匹配信息,即在空间维度在图像之间进行对象匹配过程中,一方面能够基于像素级匹配实现对象匹配,有利于大大提升跟踪效果,特别是小尺寸对象的跟踪效果,另一方面在基于光流图像进行逐像素偏移之后仅需度量图像重合度即可得到匹配信息,也能够降低在空间维度在图像之间进行对象匹配的复杂度,有利于提升跟踪速度。In the above solution, the second image is used to predict the optical flow of the first image to obtain the optical flow image of the first image, and based on the optical flow image, the first mask image of the first object is shifted pixel by pixel to obtain the first The second matching information is obtained based on the predicted mask image of the object at the shooting moment of the second image and the degree of overlap between the predicted mask image of each first object and the second mask image of each second object, namely In the process of object matching between images in the spatial dimension, on the one hand, object matching can be realized based on pixel-level matching, which is conducive to greatly improving the tracking effect, especially for small-sized objects. After the pixel offset, it is only necessary to measure the image coincidence to obtain the matching information, which can also reduce the complexity of object matching between images in the spatial dimension, which is conducive to improving the tracking speed.
请参阅图9,图9是本公开实施例提供的目标跟踪方法的一个流程示意图,可以包括如下步骤:Please refer to FIG. 9. FIG. 9 is a schematic flowchart of a target tracking method provided by an embodiment of the present disclosure, which may include the following steps:
步骤S91:分别对第一图像和第二图像进行目标分割,得到第一图像中第一对象的第一掩膜图像和第二图像中第二对象的第二掩膜图像。Step S91: Carry out object segmentation on the first image and the second image respectively to obtain a first mask image of the first object in the first image and a second mask image of the second object in the second image.
这里,可以参阅前述公开实施例中相关描述。Here, reference may be made to relevant descriptions in the aforementioned disclosed embodiments.
步骤S92:基于第一掩膜图像和第二掩膜图像在特征维度进行对象匹配,得到第一匹配信息,并基于第一掩膜图像和第二掩膜图像在空间维度进行对象匹配,得到第二匹配信息。Step S92: Perform object matching in the feature dimension based on the first mask image and the second mask image to obtain the first matching information, and perform object matching in the spatial dimension based on the first mask image and the second mask image to obtain the second Two matching information.
这里,可以参阅前述公开实施例中相关描述。Here, reference may be made to relevant descriptions in the aforementioned disclosed embodiments.
步骤S93:融合第一匹配信息和第二匹配信息,得到跟踪信息。Step S93: Fusing the first matching information and the second matching information to obtain tracking information.
本公开实施例中,跟踪信息包括第一对象与第二对象是否为同一对象,可以参阅前述公开实施例中相关描述。In the embodiments of the present disclosure, the tracking information includes whether the first object and the second object are the same object, and reference may be made to relevant descriptions in the aforementioned embodiments of the disclosure.
步骤S94:响应于跟踪信息满足预设条件,将跟踪信息作为第一跟踪信息,并获取第三图像。Step S94: in response to the tracking information meeting the preset condition, using the tracking information as the first tracking information, and acquiring a third image.
本公开实施例中,第三图像、第一图像和第二图像分别是先后拍摄得到的,示例性地,可以将第三图像记为t-δ,第一图像可以记为t,第二图像可以记为t+δ。In the embodiment of the present disclosure, the third image, the first image and the second image are successively captured respectively. For example, the third image can be recorded as t-δ, the first image can be recorded as t, and the second image It can be recorded as t+δ.
在一个实施场景中,预设条件可以包括:第二图像中存在目标对象。需要说明的是,目标对象与任一第一对象均不是同一对象,此时目标对象有可能是在第二图像中新出现的对象,也有可能在第一图像中由于被遮挡且在第二图像中遮挡消失,使得在对第二图像中第一对象进行匹配时无法匹配得到,从而可以通过下述校验过程来进行进一步校验。上述方式,能够通过时序一致性校验,大大缓解对象消失、遮挡等情况对跟踪精度的影响,有利于提升跟踪精度。In an implementation scenario, the preset condition may include: a target object exists in the second image. It should be noted that the target object is not the same object as any first object. At this time, the target object may be a new object that appears in the second image, or it may be blocked in the first image and appear in the second image. The middle occlusion disappears, so that no matching can be obtained when matching the first object in the second image, so further verification can be performed through the following verification process. The above method can greatly alleviate the impact of object disappearance and occlusion on tracking accuracy through timing consistency verification, which is conducive to improving tracking accuracy.
在一个实施场景中,区别于前述将预设条件设置为第二图像中存在目标对象的方式,仅在第二图像出现未成功匹配的第二对象才触发后续校验,作为实际应用过程中另一种可能的实施方式,预设条件也可以设置为空,即对于触发校验不设置任何附加条件,只要得到跟踪信息就触发后续校验。In one implementation scenario, different from the aforementioned method of setting the preset condition as the existence of the target object in the second image, the subsequent verification is triggered only when the second object that is not successfully matched appears in the second image, as another example in the actual application process. In a possible implementation manner, the preset condition may also be set to be empty, that is, no additional condition is set for triggering the verification, and subsequent verification is triggered as long as tracking information is obtained.
步骤S95:基于第三图像和第二图像进行目标跟踪,得到第二跟踪信息。Step S95: Perform target tracking based on the third image and the second image to obtain second tracking information.
本公开实施例中,第二跟踪信息包括第二对象与第三图像中的第三对象是否为同一对象,目标跟踪的过程,可以参阅前述任一目标跟踪方法实施例。In the embodiment of the present disclosure, the second tracking information includes whether the second object and the third object in the third image are the same object, and for the process of object tracking, please refer to any of the foregoing object tracking method embodiments.
步骤S96:基于第一跟踪信息和第二跟踪信息进行一致性校验,得到校验结果。Step S96: Perform consistency verification based on the first tracking information and the second tracking information, and obtain a verification result.
这里,不同图像中相同对象可以具有相同对象标识,可以基于第二跟踪信息对目标对象进行分析,得到分析结果,响应于分析结果包括目标对象与参考对象为同一对象,将参考对象的对象标识作为目标对象的对象标识,且参考对象为其中一个第三对象,也就是说,在第二图像中存在未匹配成功的目标对象的情况下,若在第三图像中成功匹配到一个第三对象,则该第三对象即可视为参考对象,并将参考对象的对象标识作为目标对象的对象标识,即将目标对象和参考对象确定为同一对象;此外,还可以响应于分析结果包括目标对象与第三图像中任一第三对象均不是同一对象,则可以为目标对象标记新的对象标识,也就是说,在第二图像中存在未匹配成功的目标对象的情况下,若在第三图像中也匹配不到与该目标对象为同一对象的第三对象,则可以认为目标对象为在第二图像中新出现的,故可以为其标记一个新的对象标识。上述方式,能够通过时序一致性校验处理由于对象遮挡、对象变形等原因造成对象消失后重新出现的复杂情况,并根据实际情况进行校验,有利于提升目标跟踪在复杂情况下的跟踪效果。Here, the same object in different images may have the same object identifier, and the target object may be analyzed based on the second tracking information to obtain an analysis result, and in response to the analysis result including that the target object and the reference object are the same object, the object identifier of the reference object is used as The object identifier of the target object, and the reference object is one of the third objects, that is, if there is an unmatched target object in the second image, if a third object is successfully matched in the third image, Then the third object can be regarded as a reference object, and the object identifier of the reference object is used as the object identifier of the target object, that is, the target object and the reference object are determined to be the same object; in addition, in response to the analysis result, the target object and the first object can also be included Any third object in the three images is not the same object, then a new object identifier can be marked for the target object, that is to say, if there is an unmatched target object in the second image, if in the third image If no third object that is the same object as the target object can be matched, then the target object can be considered as newly appearing in the second image, so a new object identifier can be marked for it. The above method can deal with the complex situation of reappearing after the disappearance of the object due to object occlusion, object deformation and other reasons through timing consistency verification, and verify according to the actual situation, which is conducive to improving the tracking effect of target tracking in complex situations.
在一个实施场景中,上述校验操作可用于约束多帧图像之间的跟踪一致性,这里,可以用
Figure PCTCN2022106523-appb-000008
表示 可微分操作
Figure PCTCN2022106523-appb-000009
其中,s和t表示时间步,上述可微分操作
Figure PCTCN2022106523-appb-000010
用于度量时间步s的图像x s中某一对象p(即
Figure PCTCN2022106523-appb-000011
)与时间步t的图像x t中某一对象p(即
Figure PCTCN2022106523-appb-000012
)之间的相似度。如前所述,在实际应用过程中,可以由图像t-δ至图像t,由图像t至图像t+δ实施可微分操作
Figure PCTCN2022106523-appb-000013
由此可以建立如下时序一致性:
In an implementation scenario, the above verification operation can be used to constrain the tracking consistency between multiple frames of images. Here, you can use
Figure PCTCN2022106523-appb-000008
represents a differentiable operation
Figure PCTCN2022106523-appb-000009
Among them, s and t represent the time step, and the above-mentioned differentiable operation
Figure PCTCN2022106523-appb-000010
An object p in image x s used to measure time step s (ie
Figure PCTCN2022106523-appb-000011
) and an object p in the image x t of time step t (ie
Figure PCTCN2022106523-appb-000012
) between similarities. As mentioned above, in the actual application process, differentiable operations can be implemented from image t-δ to image t, and from image t to image t+δ
Figure PCTCN2022106523-appb-000013
From this the timing consistency can be established as follows:
Figure PCTCN2022106523-appb-000014
Figure PCTCN2022106523-appb-000014
在一个实施场景中,请结合参阅图10,图10是时间一致性约束一实施例的示意图。如图10所示,由于遮挡,第一图像t中虚线框内的汽车被行人遮挡,而被误分割为行人,导致其真正分割缺失。故在基于第三图像t-δ和第一图像t,或第一图像t和第二图像t+δ进行跟踪时,会导致对汽车这一对象跟踪失败。在此情况下,可以通过基于第三图像t-δ和第二图像t+δ之间关系传导来解决这一限制。由于第二图像t+δ中汽车未在第一图像t中成功匹配,通过第二图像t+δ和第三图像t-δ进行跟踪,可以得到匹配信息,即第二图像t+δ中各个对象与第三图像t-δ中各个对象之间的匹配度,在此基础上,如果第二图像t+δ中汽车与第三图像t-δ中某一对象之间的匹配度高于预设阈值,则可以认为第二图像t+δ中汽车与第三图像t-δ中该对象为同一对象,并为第二图像t+δ中汽车标记上第三图像t-δ中该对象的对象标识,反之,可以为第二图像t+δ中汽车标记上新的对象标识。In an implementation scenario, please refer to FIG. 10 , which is a schematic diagram of an embodiment of a time consistency constraint. As shown in Figure 10, due to occlusion, the car in the dotted frame in the first image t is occluded by pedestrians, and is mistakenly segmented as a pedestrian, resulting in the loss of its true segmentation. Therefore, when tracking is performed based on the third image t−δ and the first image t, or the first image t and the second image t+δ, the object tracking of the car will fail. In this case, this limitation can be solved by conducting based on the relationship between the third image t-δ and the second image t+δ. Since the car in the second image t+δ is not successfully matched in the first image t, the matching information can be obtained by tracking the second image t+δ and the third image t-δ, that is, each of the cars in the second image t+δ The matching degree between the object and each object in the third image t-δ, on this basis, if the matching degree between the car in the second image t+δ and an object in the third image t-δ is higher than the preset If the threshold is set, it can be considered that the car in the second image t+δ and the object in the third image t-δ are the same object, and the car in the second image t+δ is marked with the object in the third image t-δ The object identifier, on the other hand, can be marked with a new object identifier for the car in the second image t+δ.
上述方案,在得到跟踪信息之后,进一步响应于跟踪信息满足预设条件,将跟踪信息作为第一跟踪信息,并获取第三图像,且第三图像、第一图像和第二图像分别是先后拍摄得到的,并基于第三图像和第二图像进行目标跟踪,得到第二跟踪信息,且第二跟踪信息包括第二对象与第三图像中的第三对象是否为同一对象,在此基础上,再基于第一跟踪信息和第二跟踪信息进行一致性校验,得到校验结果,故能够大大减少目标跟踪在时序上出现不一致的情况,有利于进一步提升跟踪精度。In the above solution, after obtaining the tracking information, further responding to the tracking information meeting the preset condition, using the tracking information as the first tracking information, and acquiring the third image, and the third image, the first image and the second image are respectively taken successively obtained, and based on the third image and the second image for target tracking, the second tracking information is obtained, and the second tracking information includes whether the second object and the third object in the third image are the same object, on this basis, Then, the consistency check is performed based on the first tracking information and the second tracking information, and the check result is obtained, so the inconsistency in the timing of the target tracking can be greatly reduced, which is conducive to further improving the tracking accuracy.
请参阅图11,图11是本公开实施例目标跟踪模型的训练方法的一个流程示意图,可以包括如下步骤:Please refer to FIG. 11. FIG. 11 is a schematic flowchart of a training method for a target tracking model in an embodiment of the present disclosure, which may include the following steps:
步骤S111:获取第一样本图像中第一样本对象的第一样本掩膜图像、第二样本图像中第二样本对象的第二样本掩膜图像和样本跟踪信息。Step S111: Acquiring a first sample mask image of a first sample object in a first sample image, a second sample mask image of a second sample object in a second sample image, and sample tracking information.
本公开实施例中,样本跟踪信息包括第一样本对象与第二样本对象是否实际为同一对象,示例性地,在第一样本对象与第二样本对象实际为同一对象的情况下,可以标记为第一数值(如,1),反之,在第一样本对象与第二样本对象实际不为同一对象的情况下,可以标记为第二数值(如,0)。此外,关于第一样本掩膜图像、第二样本掩膜图像的含义,可以参阅前述公开实施例中关于第一掩膜图像、第二掩膜图像的相关描述。In the embodiment of the present disclosure, the sample tracking information includes whether the first sample object and the second sample object are actually the same object. For example, when the first sample object and the second sample object are actually the same object, it can Mark as the first value (for example, 1), otherwise, if the first sample object and the second sample object are not actually the same object, it may be marked as the second value (for example, 0). In addition, regarding the meanings of the first sample mask image and the second sample mask image, reference may be made to the related descriptions about the first mask image and the second mask image in the aforementioned disclosed embodiments.
在一个实施场景中,如前所述公开实施例中,为了提升获取掩膜图像的效率,目标跟踪模型可以包括目标分割网络,其网络结构可以参阅前述公开实施例中相关描述。在此基础上,可以利用目标分割模型分别对第一样本图像、第二样本图像进行目标分割,得到第一样本掩膜图像、第二样本掩膜图像。这里,可以参阅前述公开实施例中关于目标分割的相关描述。In an implementation scenario, as in the aforementioned disclosed embodiments, in order to improve the efficiency of acquiring mask images, the target tracking model may include a target segmentation network, and its network structure may refer to the relevant descriptions in the aforementioned disclosed embodiments. On this basis, the target segmentation model can be used to perform target segmentation on the first sample image and the second sample image respectively to obtain the first sample mask image and the second sample mask image. Here, reference may be made to relevant descriptions about object segmentation in the aforementioned disclosed embodiments.
在一个实施场景中,在整体训练目标跟踪网络之前,可以先将目标分割网络训练收敛,目标分割网络的训练过程,可以参阅诸如Mask R-CNN、PointRend、Instance-sensitive FCN等分割网络的技术细节。In an implementation scenario, before the overall training of the target tracking network, the target segmentation network can be trained to converge. For the training process of the target segmentation network, you can refer to the technical details of segmentation networks such as Mask R-CNN, PointRend, and Instance-sensitive FCN. .
步骤S112:基于目标跟踪模型的第一匹配网络将第一样本掩膜图像和第二样本掩膜图像在特征维度进行对象匹配,得到第一预测匹配信息,并基于目标跟踪模型的第二匹配网络将第一样本掩膜图像和第二样本掩膜图像在空间维度进行对象匹配,得到第二预测匹配信息。Step S112: The first matching network based on the target tracking model performs object matching on the first sample mask image and the second sample mask image in the feature dimension to obtain the first predicted matching information, and performs the second matching based on the target tracking model The network performs object matching on the first sample mask image and the second sample mask image in the spatial dimension to obtain second predicted matching information.
这里,可以参阅前述公开实施例中关于在特征维度进行对象匹配的相关描述,以及关于在空间维度进行对象匹配的相关描述。Here, reference may be made to related descriptions about object matching in the feature dimension and related descriptions about object matching in the space dimension in the aforementioned disclosed embodiments.
在一个实施场景中,在整体训练目标跟踪模型之前,可以先训练第一匹配网络至收敛,即第一匹配网络在整体训练目标跟踪模型之前已完成训练。需要说明的是,在此情况下,前述目标分割网络在训练第一匹配网络之前已完成训练。In an implementation scenario, before the overall training of the target tracking model, the first matching network may be trained to convergence, that is, the first matching network has completed training before the overall training of the target tracking model. It should be noted that, in this case, the aforementioned target segmentation network has been trained before training the first matching network.
在一个实施场景中,在第一匹配网络的训练过程中,可以基于第一匹配网络的第一提取子网络对第一样本对象的第一样本掩膜图像进行特征提取,得到第一样本对象的第一样本特征表示,并基于第一匹配网络的第二提取子网络对第二样本对象的第二样本掩膜图像进行特征提取,得到第二样本对象的第二样本特征表示,在此基础上,对于各个第一样本对象,基于第一样本对象的第一样本特征表示分别与各个第二样本特征表示之间的特征相似度,得到第一样本对象分别与各个第二样本对象预测为同一对象的预测概率值,并基于各个预测概率值的期望值,得到第一样本对象的预测匹配对象,以及基于预测匹配对象与第一样本对象的实际匹配对象之间的差异,得到第一样本对象对应的子损失,且预测匹配对象为与第一样本对 象预测为同一对象的第二样本对象,实际匹配对象为与第一样本对象实际为同一对象的第二样本对象,实际匹配对象是基于样本跟踪信息确定的,从而统计各个第一样本对象对应的子损失,得到第一匹配网络的总损失值,进而基于总损失值,调整第一匹配网络的网络参数。上述方式,一方面在整体训练目标跟踪模型先对第一匹配网络进行训练,有利于提升目标跟踪模型的训练效率,另一方面通过度量特征相似度等操作确定预测匹配对象,在此基础上再计算损失,能够通过可微分匹配使第一匹配网络在训练过程中学习特征表示。In an implementation scenario, during the training process of the first matching network, feature extraction can be performed on the first sample mask image of the first sample object based on the first extraction sub-network of the first matching network to obtain the first sample The first sample feature representation of the object, and performing feature extraction on the second sample mask image of the second sample object based on the second extraction sub-network of the first matching network, to obtain the second sample feature representation of the second sample object, On this basis, for each first sample object, based on the feature similarity between the first sample feature representation of the first sample object and each second sample feature representation, the first sample object and each The second sample object is predicted to be the predicted probability value of the same object, and based on the expected value of each predicted probability value, the predicted matching object of the first sample object and the relationship between the predicted matching object and the actual matching object of the first sample object are obtained The difference between the sub-losses corresponding to the first sample object is obtained, and the predicted matching object is the second sample object predicted to be the same object as the first sample object, and the actual matching object is actually the same object as the first sample object The second sample object, the actual matching object is determined based on the sample tracking information, so as to count the sub-loss corresponding to each first sample object, obtain the total loss value of the first matching network, and then adjust the first matching network based on the total loss value network parameters. The above method, on the one hand, trains the first matching network in the overall training target tracking model, which is beneficial to improve the training efficiency of the target tracking model; Computing a loss enables the first matching network to learn feature representations during training via differentiable matching.
在一个实施场景中,特征提取的过程,可以参阅前述公开实施例中相关描述。In an implementation scenario, for the feature extraction process, reference may be made to relevant descriptions in the aforementioned disclosed embodiments.
在一个实施场景中,可以将特征相似度进行归一化,得到预测概率值,归一化操作可以通过softmax实现。进一步地,可以基于第二样本对象的序号值和第二样本对象对应的预测概率值,得到期望值,并将期望值上取整之后的数值,作为目标序号值,以及将目标序号值所属的第二样本对象,作为第一样本对象的预测匹配对象。此外,预测概率值的计算过程,以及预测匹配对象的确定过程,可以参阅前述公开实施例中关于“基于其第一特征表示分别与各个第二对象的第二特征表示之间的特征相似度,得到其与各个第二对象预测为同一对象的概率值”,以及关于“基于各个概率值,得到与该第一对象为同一对象的第二对象”的相关描述。In an implementation scenario, the feature similarity can be normalized to obtain a predicted probability value, and the normalization operation can be implemented through softmax. Further, based on the serial number value of the second sample object and the predicted probability value corresponding to the second sample object, the expected value can be obtained, and the value after rounding up the expected value can be used as the target serial number value, and the second Sample object, as the predicted matching object of the first sample object. In addition, for the calculation process of the predicted probability value and the determination process of the predicted matching object, please refer to the "based on the feature similarity between its first feature representation and the second feature representation of each second object, obtain the probability values predicted to be the same object as each second object", and related descriptions about "obtain a second object that is the same object as the first object based on each probability value".
在一个实施场景中,可以采用诸如交叉熵等损失函数计算得到子损失。示例性的,子损失可以表示为:In an implementation scenario, a loss function such as cross-entropy can be used to calculate the sub-loss. Exemplarily, the sub-loss can be expressed as:
Figure PCTCN2022106523-appb-000015
Figure PCTCN2022106523-appb-000015
上述公式(8)中,y用于标记预测匹配对象与第一样本对象的实际匹配对象是否为相同,在相同的情况下,y可以设置为1,在不相同的情况下,y可以设置为0,此外,p表示前述预测匹配对象对应的预测概率值。进一步地,以第一样本图像中包含M个第一样本对象为例,可以对这M个第一样本对象对应的子损失进行取平均,得到第一匹配网络的总损失
Figure PCTCN2022106523-appb-000016
In the above formula (8), y is used to mark whether the predicted matching object is the same as the actual matching object of the first sample object. In the same case, y can be set to 1, and in the different case, y can be set to is 0, and p represents the predicted probability value corresponding to the aforementioned predicted matching object. Further, taking the first sample image containing M first sample objects as an example, the sub-losses corresponding to these M first sample objects can be averaged to obtain the total loss of the first matching network
Figure PCTCN2022106523-appb-000016
Figure PCTCN2022106523-appb-000017
Figure PCTCN2022106523-appb-000017
在一个实施场景中,在计算得到第一匹配网络的总损失之后,可以采用诸如梯度下降等优化方式对第一匹配网络的网络参数进行调整,调整过程可以参阅诸如梯度下降等优化方式的技术细节。In an implementation scenario, after calculating the total loss of the first matching network, optimization methods such as gradient descent can be used to adjust the network parameters of the first matching network. For the adjustment process, please refer to the technical details of optimization methods such as gradient descent .
在一个实施场景中,第二匹配网络可以包括光流预测网络,用于利用第二样本图像对第一样本图像进行光流预测,得到第一样本图像的样本光流图像,且第二样本匹配信息是基于样本光流图像得到的,可以参阅前述公开实施例中关于光流图像、第二匹配信息等相关描述。In an implementation scenario, the second matching network may include an optical flow prediction network, configured to use the second sample image to perform optical flow prediction on the first sample image to obtain a sample optical flow image of the first sample image, and the second The sample matching information is obtained based on the sample optical flow image, and reference may be made to related descriptions about the optical flow image and the second matching information in the aforementioned disclosed embodiments.
步骤S113:利用目标跟踪模型的信息融合网络融合第一预测匹配信息和第二预测匹配信息,得到预测跟踪信息。Step S113: Using the information fusion network of the target tracking model to fuse the first predicted matching information and the second predicted matching information to obtain predicted tracking information.
本公开实施例中,预测跟踪信息包括第一样本对象与第二样本对象是否预测为同一对象,信息融合的过程,可以参阅前述公开实施例中相关描述。In the embodiments of the present disclosure, the prediction tracking information includes whether the first sample object and the second sample object are predicted to be the same object, and the process of information fusion can refer to the relevant descriptions in the aforementioned disclosed embodiments.
步骤S114:基于样本跟踪信息与预测跟踪信息之间的差异,调整目标跟踪模型的网络参数。Step S114: Based on the difference between the sample tracking information and the predicted tracking information, adjust the network parameters of the target tracking model.
这里,可以采用诸如交叉熵等损失函数处理样本跟踪信息与预测跟踪信息之间的差异,得到目标跟踪模型的总损失,再基于诸如梯度下降等优化方式调整目标跟踪模型的网络参数。需要说明的是,损失的计算过程,可以参阅诸如交叉熵等损失函数的技术细节,参数的调整过程,可以参阅诸如梯度下降等优化方式的技术细节。此外,如前所述,在整体训练目标跟踪模型之前,上述目标分割网络、第一匹配网络、第二匹配网络均已训练收敛,故在调整目标跟踪模型的网络参数过程中,可以固定前述目标分割网络、第一匹配网络、第二匹配网络的网络参数,仅调整信息融合网络的网络参数,当然,也可以同时调整各个网络的网络参数,在此不做限定。Here, loss functions such as cross-entropy can be used to process the difference between sample tracking information and predicted tracking information to obtain the total loss of the target tracking model, and then adjust the network parameters of the target tracking model based on optimization methods such as gradient descent. It should be noted that for the calculation process of loss, you can refer to the technical details of loss functions such as cross entropy, and for the adjustment process of parameters, you can refer to the technical details of optimization methods such as gradient descent. In addition, as mentioned above, before the overall training of the target tracking model, the above-mentioned target segmentation network, the first matching network, and the second matching network have all been trained and converged, so the aforementioned target can be fixed during the process of adjusting the network parameters of the target tracking model. The network parameters of the split network, the first matching network, and the second matching network only adjust the network parameters of the information fusion network. Of course, the network parameters of each network can also be adjusted at the same time, which is not limited here.
上述方案,一方面在特征维度上在图像之间进行对象匹配,能够有利于确保对大尺寸对象的跟踪效果,另一方面在空间维度上在图像之间进行对象匹配,能够有利于确保对小尺寸对象的跟踪效果,并基于此融合两种匹配方式所得到的匹配信息得到跟踪信息,故能够同时兼顾大尺寸对象和小尺寸对象,有利于提升目标跟踪模型的精度。The above solution, on the one hand, performs object matching between images in the feature dimension, which can help to ensure the tracking effect of large-sized objects; The tracking effect of large-size objects, and based on this, the matching information obtained by fusing the two matching methods to obtain tracking information, so it can take into account both large-size objects and small-size objects, which is conducive to improving the accuracy of the target tracking model.
请参阅图12,图12是目标跟踪装置120的一个框架示意图。目标跟踪装置120包括:目标分割部分121、对象匹配部分122和信息融合部分123,目标分割部分121,被配置为分别对第一图像和第二图像进行目标分割,得到第一图像中第一对象的第一掩膜图像和第二图像中第二对象的第二掩膜图像;对象匹配部分122,被配置为基于第一掩膜图像和第二掩膜图像在特征维度进行对象匹配,得到第一匹配信息,并基于第一掩膜图像和第二掩膜图像在空间维度进行对象匹配,得到第二匹配信息;信息融合部分123,被配置为融合第一匹配信息和第二匹配信息,得到跟踪信息;其中,跟踪信息包括第一对象与第二对象是否为同一对象。Please refer to FIG. 12 . FIG. 12 is a schematic frame diagram of the target tracking device 120 . The target tracking device 120 includes: a target segmentation part 121, an object matching part 122 and an information fusion part 123. The target segmentation part 121 is configured to perform target segmentation on the first image and the second image respectively to obtain the first object in the first image The first mask image of and the second mask image of the second object in the second image; the object matching part 122 is configured to perform object matching in the feature dimension based on the first mask image and the second mask image, and obtain the first a matching information, and perform object matching in the spatial dimension based on the first mask image and the second mask image to obtain the second matching information; the information fusion part 123 is configured to fuse the first matching information and the second matching information to obtain Tracking information; wherein, the tracking information includes whether the first object and the second object are the same object.
上述方案,一方面在特征维度上在图像之间进行对象匹配,能够有利于确保对大尺寸对象的跟踪效果,另一方面在空间维度上在图像之间进行对象匹配,能够有利于确保对小尺寸对象的跟踪效果,并基于此融 合两种匹配方式所得到的匹配信息得到跟踪信息,故能够同时兼顾大尺寸对象和小尺寸对象,有利于提升目标跟踪精度。The above solution, on the one hand, performs object matching between images in the feature dimension, which can help to ensure the tracking effect of large-sized objects; Based on the tracking effect of large-size objects, and based on the fusion of the matching information obtained by the two matching methods to obtain tracking information, it can take into account both large-size objects and small-size objects, which is conducive to improving the accuracy of target tracking.
在一些公开实施例中,对象匹配部分122包括特征提取子部分,被配置为分别基于各第一对象的第一掩膜图像,提取得到各第一对象的第一特征表示,并分别基于各第二对象的第二掩膜图像,提取得到各第二对象的第二特征表示;对象匹配部分122包括相似度量子部分,被配置为利用第一特征表示和第二特征表示,得到各第一对象与各第二对象之间的特征相似度;对象匹配部分122包括第一匹配子部分,被配置为基于各第一对象与各第二对象之间的特征相似度,得到第一匹配信息。In some disclosed embodiments, the object matching part 122 includes a feature extraction subsection configured to extract the first feature representations of each first object based on the first mask image of each first object, and respectively based on the first mask images of each first object The second mask image of the two objects is extracted to obtain the second feature representation of each second object; the object matching part 122 includes a similarity measure subsection configured to use the first feature representation and the second feature representation to obtain each first object Feature similarity with each second object; the object matching part 122 includes a first matching subsection configured to obtain first matching information based on feature similarity between each first object and each second object.
因此,在特征维度上在图像之间进行对象匹配过程中,仅需对各对象的掩膜图像进行特征提取,再度量特征相似度即可,能够降低在特征维度在图像之间进行对象匹配的复杂度,有利于提升跟踪速度。Therefore, in the process of object matching between images in the feature dimension, it is only necessary to perform feature extraction on the mask image of each object, and then measure the feature similarity, which can reduce the cost of object matching between images in the feature dimension. Complexity, which is conducive to improving the tracking speed.
在一些公开实施例中,特征提取子部分包括边界确定部分,被配置为基于掩膜图像中各个像素点的像素值,确定对象边界;其中,对象边界为掩膜图像所属对象的边界;特征提取子部分包括图像裁剪部分,被配置为沿对象边界从掩膜图像中裁剪出区域图像;特征提取子部分包括表示提取部分,被配置为基于区域图像进行特征提取,得到所属对象的特征表示;其中,在掩膜图像为第一掩膜图像的情况下,所属对象为第一对象,特征表示为第一特征表示,在掩膜图像为第二掩膜图像的情况下,所属对象为第二对象,特征表示为第二特征表示。In some disclosed embodiments, the feature extraction subsection includes a boundary determination section configured to determine the boundary of the object based on the pixel values of each pixel in the mask image; wherein, the boundary of the object is the boundary of the object to which the mask image belongs; the feature extraction The sub-part includes an image cropping part, which is configured to cut out a region image from the mask image along the object boundary; the feature extraction sub-part includes a representation extraction part, which is configured to perform feature extraction based on the region image, and obtain a feature representation of the object to which it belongs; wherein , when the mask image is the first mask image, the belonging object is the first object, and the feature representation is the first feature representation; when the mask image is the second mask image, the belonging object is the second object , the feature representation is the second feature representation.
因此,能够在特征提取过程中,可以排除与掩膜图像所属对象无关像素的干扰,有利于提升特征表示的准确性。Therefore, in the process of feature extraction, the interference of pixels irrelevant to the object to which the mask image belongs can be eliminated, which is conducive to improving the accuracy of feature representation.
在一些公开实施例中,对象匹配部分122包括光流预测子部分,被配置为利用第二图像对第一图像行光流预测,得到第一图像的光流图像;对象匹配部分122包括像素偏移子部分,被配置为基于光流图像,对第一对象的第一掩膜图像进行逐像素偏移,得到第一对象在第二图像的拍摄时刻的预测掩膜图像;对象匹配部分122包括第二匹配子部分,被配置为基于各个第一对象的预测掩膜图像分别与各个第二对象的第二掩膜图像之间的重合度,得到第二匹配信息。In some disclosed embodiments, the object matching part 122 includes an optical flow prediction subsection configured to use the second image to perform optical flow prediction on the first image to obtain an optical flow image of the first image; the object matching part 122 includes a pixel offset The shifting part is configured to shift the first mask image of the first object pixel by pixel based on the optical flow image, so as to obtain the predicted mask image of the first object at the shooting moment of the second image; the object matching part 122 includes The second matching subsection is configured to obtain second matching information based on the degree of overlap between the predicted mask image of each first object and the second mask image of each second object.
因此,在空间维度上在图像之间进行对象匹配过程中,一方面能够基于像素级匹配实现对象匹配,有利于大大提升跟踪效果,特别是小尺寸对象的跟踪效果,另一方面在基于光流图像进行逐像素偏移之后仅需度量图像重合度即可得到匹配信息,也能够降低在空间维度在图像之间进行对象匹配的复杂度,有利于提升跟踪速度。Therefore, in the process of object matching between images in the spatial dimension, on the one hand, object matching can be achieved based on pixel-level matching, which is conducive to greatly improving the tracking effect, especially for small-sized objects. On the other hand, based on optical flow After the image is shifted pixel by pixel, it only needs to measure the image coincidence to obtain the matching information, and it can also reduce the complexity of object matching between images in the spatial dimension, which is conducive to improving the tracking speed.
在一些公开实施例中,像素偏移子部分包括像素相乘部分,被配置为将光流图像和第一掩膜图像进行逐像素相乘,得到第一掩膜图像中像素点的偏移值;像素偏移子部分包括像素相加部分,被配置为将第一掩膜图像中像素点的第一像素坐标与偏移值相加,得到像素点在拍摄时刻的第二像素坐标;像素偏移子部分包括图像获取部分,被配置为基于第一掩膜图像中像素点的第二像素坐标,得到预测掩膜图像。In some disclosed embodiments, the pixel offset subsection includes a pixel multiplication section configured to multiply the optical flow image and the first mask image pixel by pixel to obtain the offset value of the pixel in the first mask image The pixel offset subsection includes a pixel addition section configured to add the first pixel coordinates of the pixel in the first mask image to the offset value to obtain the second pixel coordinates of the pixel at the shooting moment; the pixel offset The shifting part includes an image acquisition part configured to obtain a predicted mask image based on the second pixel coordinates of the pixels in the first mask image.
因此,在逐像素偏移过程中,仅需像素相乘、相加等简单运算即可,故能够大大降低像素偏移的复杂度,有利于进一步提升跟踪速度。Therefore, in the process of pixel-by-pixel offset, only simple operations such as pixel multiplication and addition are required, so the complexity of pixel offset can be greatly reduced, which is conducive to further improving the tracking speed.
在一些公开实施例中,第一匹配信息包括第一对象与第二对象之间的第一匹配度,第二匹配信息包括第一对象与第二对象之间的第二匹配度,信息融合部分123包括加权子部分,被配置为对第一匹配信息中第一匹配度进行自适应加权,得到第一加权匹配信息,并对第二匹配信息中第二匹配度进行自适应性加权,得到第二加权匹配信息;其中,第一加权匹配信息包括第一对象与第二对象之间的第一加权匹配度,第二加权匹配信息包括第一对象与第二对象之间的第二加权匹配度;信息融合部分123包括融合子部分,被配置为将第一加权匹配信息和第二加权匹配信息进行融合,得到最终匹配信息;其中,最终匹配信息包括第一对象与第二对象之间的最终匹配度;信息融合部分123包括分析子部分,被配置为基于最终匹配信息进行分析,得到跟踪信息。In some disclosed embodiments, the first matching information includes a first matching degree between the first object and the second object, the second matching information includes a second matching degree between the first object and the second object, and the information fusion part 123 includes a weighting subpart configured to adaptively weight the first matching degree in the first matching information to obtain the first weighted matching information, and perform adaptive weighting to the second matching degree in the second matching information to obtain the second matching information Two weighted matching information; wherein, the first weighted matching information includes a first weighted matching degree between the first object and the second object, and the second weighted matching information includes a second weighted matching degree between the first object and the second object ; The information fusion part 123 includes a fusion subsection configured to fuse the first weighted matching information and the second weighted matching information to obtain final matching information; wherein, the final matching information includes the final matching information between the first object and the second object Matching degree; the information fusion part 123 includes an analysis sub-part configured to analyze based on the final matching information to obtain tracking information.
因此,在匹配信息的融合过程中,通过对第一匹配信息、第二匹配信息分别进行自适应加权,能够根据实际情况自适应地分别度量两者的重要程度,在此基础上再进行融合,有利于大大提升跟踪准确性。Therefore, in the fusion process of matching information, by adaptively weighting the first matching information and the second matching information, the importance of the two can be adaptively measured according to the actual situation, and then fusion is performed on this basis. It is beneficial to greatly improve the tracking accuracy.
在一些公开实施例中,跟踪信息是利用目标跟踪模型对第一图像和第二图像进行检测得到的,目标跟踪模型包括信息融合网络,信息融合网络包括第一加权子网络和第二加权子网络,第一加权子网络用于对第一匹配度进行自适应性加权,第二加权子网络用于对第二匹配度进行自适应性加权。In some disclosed embodiments, the tracking information is obtained by using a target tracking model to detect the first image and the second image, the target tracking model includes an information fusion network, and the information fusion network includes a first weighted sub-network and a second weighted sub-network , the first weighting subnetwork is used to adaptively weight the first matching degree, and the second weighting subnetwork is used to adaptively weight the second matching degree.
因此,能够通过神经网络根据实际情况获悉特征维度和空间维度两方面分别对目标跟踪的重要程度,有利于提升自适应加权的效率和精度。Therefore, the neural network can be used to learn the importance of the feature dimension and the spatial dimension to target tracking according to the actual situation, which is conducive to improving the efficiency and accuracy of adaptive weighting.
在一些公开实施例中,分析子部分包括组合部分,被配置为将各个第一对象与各个第二对象的两两组合,分别作为当前对象组;分析子部分包括确定部分,被配置为基于当前对象组的第一参考信息和第二参考信息中的至少一种信息,确定当前第一对象和当前第二对象是否为同一对象;其中,当前第一对象为当前对象组中的第一对象,当前第二对象为当前对象组中的第二对象,第一参考信息包括:当前第一对象分别与各个第二对象之间的最终匹配度,第二参考信息包括:当前第二对象分别与各个第一对象之间的最终匹配度。In some disclosed embodiments, the analysis subsection includes a combination section configured to combine pairs of each first object and each second object as a current object group respectively; the analysis subsection includes a determination section configured to be based on the current At least one of the first reference information and the second reference information of the object group determines whether the current first object and the current second object are the same object; wherein, the current first object is the first object in the current object group, The current second object is the second object in the current object group, the first reference information includes: the final matching degree between the current first object and each second object, and the second reference information includes: the current second object and each second object respectively The final degree of matching between the first objects.
因此,一方面能够确定每一对象组中的两个对象是否为同一对象,从而能够尽可能地避免遗漏,有利于提升跟踪精度,另一方面在确定过程中结合第一参考信息、第二参考信息中至少一者,也有利于提升确定的准确性。Therefore, on the one hand, it can be determined whether the two objects in each object group are the same object, so as to avoid omission as much as possible, which is beneficial to improve the tracking accuracy; At least one of the information is also beneficial to improve the accuracy of the determination.
在一些公开实施例中,分析子部分包括选择部分,被配置为将当前第一对象与当前第二对象之间的最终匹配度,作为待分析匹配度;确定部分还被配置为执行以下任一者:响应于待分析匹配度为第一参考信息中的最大值,确定当前第一对象和当前第二对象为同一对象;响应于待分析匹配度为第二参考信息中的最大值,确定当前第一对象和当前第二对象为同一对象;响应于待分析匹配度为第一参考信息和第二参考信息中的最大值,确定当前第一对象和当前第二对象为同一对象。In some disclosed embodiments, the analysis sub-part includes a selection part configured to use the final matching degree between the current first object and the current second object as the matching degree to be analyzed; the determining part is also configured to perform any of the following: or: in response to the matching degree to be analyzed is the maximum value in the first reference information, determine that the current first object and the current second object are the same object; in response to the matching degree to be analyzed is the maximum value in the second reference information, determine that the current The first object and the current second object are the same object; in response to the matching degree to be analyzed being the maximum value of the first reference information and the second reference information, it is determined that the current first object and the current second object are the same object.
因此,一方面通过前两者确定方式仅需搜索第一参考信息或第二参考信息中的最大值,即可完成确定操作,有利于降低确定复杂度,并提升确定速度,另一方面通过最后一种确定方式同时搜索第一参考信息和第二参考信息中的最大值来完成确定操作,能够在第一参考信息和第二参考信息基础上实现协同校验,有利于降低确定复杂度,并提升确定精度。Therefore, on the one hand, the first two determination methods only need to search for the maximum value in the first reference information or the second reference information to complete the determination operation, which is beneficial to reduce the determination complexity and improve the determination speed; on the other hand, through the final A determination method searches the maximum value of the first reference information and the second reference information at the same time to complete the determination operation, and can realize collaborative verification on the basis of the first reference information and the second reference information, which is beneficial to reduce the complexity of determination, and Improve determination accuracy.
在一些公开实施例中,目标跟踪装置120还包括条件响应部分,被配置为响应于跟踪信息满足预设条件,将跟踪信息作为第一跟踪信息,并获取第三图像;其中,第三图像、第一图像和第二图像分别是先后拍摄得到的;目标跟踪装置120还包括重复跟踪部分,被配置为基于第三图像和第二图像进行目标跟踪,得到第二跟踪信息,其中,第二跟踪信息包括第二对象与第三图像中的第三对象是否为同一对象;目标跟踪装置120还包括信息校验部分,被配置为基于第一跟踪信息和第二跟踪信息进行一致性校验,得到校验结果。In some disclosed embodiments, the object tracking device 120 further includes a condition response part configured to use the tracking information as the first tracking information and acquire a third image in response to the tracking information meeting a preset condition; wherein, the third image, The first image and the second image are successively photographed respectively; the target tracking device 120 also includes a repeating tracking part configured to perform target tracking based on the third image and the second image to obtain second tracking information, wherein the second tracking The information includes whether the second object and the third object in the third image are the same object; the target tracking device 120 also includes an information checking part configured to perform a consistency check based on the first tracking information and the second tracking information, and obtain Check result.
因此,能够大大减少目标跟踪在时序上出现不一致的情况,有利于进一步提升跟踪精度。Therefore, the timing inconsistency in target tracking can be greatly reduced, which is conducive to further improving the tracking accuracy.
在一些公开实施例中,预设条件包括:第二图像中存在目标对象;其中,目标对象与任一第一对象均不是同一对象。In some disclosed embodiments, the preset condition includes: a target object exists in the second image; wherein, the target object is not the same object as any first object.
因此,将预设条件设置为第二图像中不存在目标对象,且目标对象与任一第一对象均不是同一对象,故能够通过时序一致性校验,大大缓解对象消失、遮挡等情况对跟踪精度的影响,有利于提升跟踪精度。Therefore, the preset condition is set to the fact that there is no target object in the second image, and the target object is not the same object as any first object, so it can pass the timing consistency check, greatly reducing the impact of object disappearance, occlusion, etc. The impact of accuracy is beneficial to improve tracking accuracy.
在一些公开实施例中,不同图像中相同对象具有相同对象标识,信息校验部分包括信息分析子部分,被配置为基于第二跟踪信息对目标对象进行分析,得到分析结果;信息校验部分包括第一响应子部分,被配置为响应于分析结果包括目标对象与参考对象为同一对象,将参考对象的对象标识作为目标对象的对象标识;其中,参考对象为其中一个第三对象;信息校验部分包括第二响应子部分,被配置为响应于分析结果包括目标对象与第三图像中任一第三对象均不是同一对象,为目标对象标记新的对象标识。In some disclosed embodiments, the same object in different images has the same object identifier, and the information checking part includes an information analysis subsection configured to analyze the target object based on the second tracking information to obtain an analysis result; the information checking part includes The first response subpart is configured to use the object identifier of the reference object as the object identifier of the target object in response to the fact that the analysis result includes that the target object and the reference object are the same object; wherein, the reference object is one of the third objects; information verification The section includes a second response subsection configured to mark the target object with a new object identification in response to the analysis result including that the target object is not the same object as any third object in the third image.
因此,能够通过时序一致性校验处理由于对象遮挡、对象变形等原因造成对象消失后重新出现的复杂情况,并根据实际情况进行校验,有利于提升目标跟踪在复杂情况下的跟踪效果。Therefore, it is possible to deal with the complex situation that the object reappears after disappearing due to object occlusion, object deformation and other reasons through timing consistency verification, and verify it according to the actual situation, which is conducive to improving the tracking effect of target tracking in complex situations.
请查阅图13,图13是本公开实施例提供的目标跟踪模型的训练装置130的一个框架示意图。目标跟踪模型的训练装置130包括:样本获取部分131、样本匹配部分132、样本融合部分133和参数调整部分134,样本获取部分131,被配置为获取第一样本图像中第一样本对象的第一样本掩膜图像、第二样本图像中第二样本对象的第二样本掩膜图像和样本跟踪信息;其中,样本跟踪信息包括第一样本对象与第二样本对象是否实际为同一对象;样本匹配部分132,被配置为基于目标跟踪模型的第一匹配网络将第一样本掩膜图像和第二样本掩膜图像在特征维度进行对象匹配,得到第一预测匹配信息,并基于目标跟踪模型的第二匹配网络将第一样本掩膜图像和第二样本掩膜图像在空间维度进行对象匹配,得到第二预测匹配信息;样本融合部分133,被配置为利用目标跟踪模型的信息融合网络融合第一预测匹配信息和第二预测匹配信息,得到预测跟踪信息;其中,预测跟踪信息包括第一样本对象与第二样本对象是否预测为同一对象;参数调整部分134,被配置为基于样本跟踪信息与预测跟踪信息之间的差异,调整目标跟踪模型的网络参数。Please refer to FIG. 13 . FIG. 13 is a schematic frame diagram of an object tracking model training device 130 provided by an embodiment of the present disclosure. The training device 130 of the target tracking model includes: a sample acquisition part 131, a sample matching part 132, a sample fusion part 133 and a parameter adjustment part 134, the sample acquisition part 131 is configured to obtain the first sample object in the first sample image The first sample mask image, the second sample mask image of the second sample object in the second sample image, and sample tracking information; wherein the sample tracking information includes whether the first sample object and the second sample object are actually the same object The sample matching part 132 is configured to perform object matching on the feature dimension of the first sample mask image and the second sample mask image based on the first matching network of the target tracking model to obtain the first predicted matching information, and based on the target The second matching network of the tracking model performs object matching on the first sample mask image and the second sample mask image in the spatial dimension to obtain the second predicted matching information; the sample fusion part 133 is configured to use the information of the target tracking model The fusion network fuses the first predicted matching information and the second predicted matching information to obtain predicted tracking information; wherein, the predicted tracking information includes whether the first sample object and the second sample object are predicted to be the same object; the parameter adjustment part 134 is configured as Based on the difference between the sample tracking information and the predicted tracking information, the network parameters of the object tracking model are adjusted.
上述方案,一方面在特征维度上在图像之间进行对象匹配,能够有利于确保对大尺寸对象的跟踪效果,另一方面在空间维度上在图像之间进行对象匹配,能够有利于确保对小尺寸对象的跟踪效果,并基于此融合两种匹配方式所得到的匹配信息得到跟踪信息,故能够同时兼顾大尺寸对象和小尺寸对象,有利于提升目标跟踪模型的精度。The above solution, on the one hand, performs object matching between images in the feature dimension, which can help to ensure the tracking effect of large-sized objects; The tracking effect of large-size objects, and based on this, the matching information obtained by fusing the two matching methods to obtain tracking information, so it can take into account both large-size objects and small-size objects, which is conducive to improving the accuracy of the target tracking model.
在一些公开实施例中,第一匹配网络在整体训练目标跟踪模型之前已完成训练,目标跟踪模型的训练装置130还包括样本特征提取部分,被配置为基于第一匹配网络的第一提取子网络对第一样本对象的第一样本掩膜图像进行特征提取,得到第一样本对象的第一样本特征表示,并基于第一匹配网络的第二提取子网络对第二样本对象的第二样本掩膜图像进行特征提取,得到第二样本对象的第二样本特征表示;目标跟踪模型的训练装置130还包括子损失计算部分,被配置为对于各个第一样本对象,基于第一样本对象的第一样本特征表示分别与各个第二样本特征表示之间的特征相似度,得到第一样本对象分别与各个第二样本对象预测为同一对象的预测概率值,并基于各个预测概率值的期望值,得到第一样本对象的预测匹配对象,以及基于预测匹配对象与第一样本对象的实际匹配对象之间的差异,得到第一样本对象对应的子损失;其中,预测匹配对象为与第一样本对象预测为同一对象的第二样本对象,实际匹配对象为与第一样本对象实 际为同一对象的第二样本对象,实际匹配对象是基于样本跟踪信息确定的;目标跟踪模型的训练装置130还包括总损失计算部分,被配置为统计各个第一样本对象对应的子损失,得到第一匹配网络的总损失值;目标跟踪模型的训练装置130还包括网络优化部分,被配置为基于总损失值,调整第一匹配网络的网络参数。In some disclosed embodiments, the first matching network has completed training before the overall training of the target tracking model, and the training device 130 of the target tracking model further includes a sample feature extraction part configured as a first extraction sub-network based on the first matching network Feature extraction is performed on the first sample mask image of the first sample object to obtain the first sample feature representation of the first sample object, and based on the second extraction sub-network of the first matching network for the second sample object The second sample mask image is subjected to feature extraction to obtain the second sample feature representation of the second sample object; the training device 130 of the target tracking model also includes a sub-loss calculation part configured to, for each first sample object, based on the first The feature similarity between the first sample feature representation of the sample object and each second sample feature representation is obtained to obtain the predicted probability value that the first sample object and each second sample object are predicted to be the same object, and based on each The expected value of the predicted probability value, the predicted matching object of the first sample object is obtained, and the sub-loss corresponding to the first sample object is obtained based on the difference between the predicted matching object and the actual matching object of the first sample object; wherein, The predicted matching object is the second sample object that is predicted to be the same object as the first sample object, the actual matching object is the second sample object that is actually the same object as the first sample object, and the actual matching object is determined based on the sample tracking information The training device 130 of the target tracking model also includes a total loss calculation part, which is configured to count the corresponding sub-losses of each first sample object to obtain the total loss value of the first matching network; the training device 130 of the target tracking model also includes a network The optimization part is configured to adjust network parameters of the first matching network based on the total loss value.
因此,一方面在整体训练目标跟踪模型先对第一匹配网络进行训练,有利于提升目标跟踪模型的训练效率,另一方面通过度量特征相似度等操作确定预测匹配对象,在此基础上再计算损失,能够通过可微分匹配使第一匹配网络在训练过程中学习特征表示。Therefore, on the one hand, the first matching network is trained first in the overall training target tracking model, which is beneficial to improve the training efficiency of the target tracking model; loss, which enables the first matching network to learn feature representations during training through differentiable matching.
在一些公开实施例中,子损失计算部分包括归一化子部分;或者,子损失计算部分包括期望计算子部分、序号确定子部分和对象预测子部分;或者,子损失计算部分包括归一化子部分、期望计算子部分、序号确定子部分和对象预测子部分;归一化子部分,被配置为将特征相似度进行归一化,得到预测概率值;子损失计算部分包括期望计算子部分,被配置为基于第二样本对象的序号值和第二样本对象对应的预测概率值,得到期望值;子损失计算部分包括序号确定子部分,被配置为将期望值上取整之后的数值,作为目标序号值;子损失计算部分包括对象预测子部分,被配置为将目标序号值所属的第二样本对象,作为第一样本对象的预测匹配对象。In some disclosed embodiments, the sub-loss calculation section includes a normalization subsection; or, the sub-loss calculation section includes an expectation calculation subsection, a sequence number determination subsection, and an object prediction subsection; or, the sub-loss calculation section includes a normalization subsection subsection, expectation calculation subsection, serial number determination subsection and object prediction subsection; the normalization subsection is configured to normalize the feature similarity to obtain the predicted probability value; the sub loss calculation section includes the expectation calculation subsection , is configured to obtain the expected value based on the serial number value of the second sample object and the predicted probability value corresponding to the second sample object; the sub-loss calculation part includes a serial number determination sub-section, which is configured to take the value after rounding the expected value as the target The serial number value; the sub-loss calculation part includes an object prediction subsection configured to use the second sample object to which the target serial number value belongs as the predicted matching object of the first sample object.
因此,通过将特征相似度进行归一化操作得到预测概率值,能够有利于降低获取预测概率值的复杂度,而基于第二样本对象的序号值和第二样本对象对应的预测概率值,得到期望值,并将期望值上取整之后的数值,作为目标序号值,再将目标序号值所属的第二样本对象,作为第一样本对象的预测匹配对象,能够通过数学期望、上取整等简单运算确定预测匹配对象,有利于大大降低确定预测匹配对象的复杂度。Therefore, by normalizing the feature similarity to obtain the predicted probability value, it can help reduce the complexity of obtaining the predicted probability value, and based on the serial number value of the second sample object and the predicted probability value corresponding to the second sample object, get Expected value, and the value after rounding up the expected value is used as the target serial number value, and then the second sample object to which the target serial number value belongs is used as the predicted matching object of the first sample object, which can be simple through mathematical expectation and rounding up The operation determines the predicted matching object, which is beneficial to greatly reduce the complexity of determining the predicted matching object.
在一些公开实施例中,目标跟踪模型还包括目标分割网络;或者,第二匹配网络包括光流预测网络;或者,目标跟踪模型还包括目标分割网络,且第二匹配网络包括光流预测网络;第一样本掩膜图像、第二样本掩膜图像是利用目标分割网络分别对第一样本图像、第二样本图像进行目标分割得到的,且目标分割网络在训练第一匹配网络之前已完成训练;光流预测网络,用于利用第二样本图像对第一样本图像进行光流预测,得到第一样本图像的样本光流图像,且第二样本匹配信息是基于样本光流图像得到的。In some disclosed embodiments, the target tracking model further includes a target segmentation network; or, the second matching network includes an optical flow prediction network; or, the target tracking model further includes a target segmentation network, and the second matching network includes an optical flow prediction network; The first sample mask image and the second sample mask image are obtained by using the target segmentation network to perform target segmentation on the first sample image and the second sample image respectively, and the target segmentation network has been completed before training the first matching network Training; the optical flow prediction network is used to perform optical flow prediction on the first sample image by using the second sample image to obtain a sample optical flow image of the first sample image, and the second sample matching information is obtained based on the sample optical flow image of.
因此,目标跟踪模型还包括目标分割网络,第一样本掩膜图像、第二样本掩膜图像是利用目标分割网络分别对第一样本图像、第二样本图像进行目标分割得到的,且目标分割网络在训练第一匹配网络之前已完成训练,故通过分阶段地先训练目标分割网络,能够循序渐进地训练目标跟踪模型,有利于提升训练效率和效果;而第二匹配网络包括光流预测网络,用于利用第二样本图像对第一样本图像进行光流预测,得到第一样本图像的样本光流图像,且第二样本匹配信息是基于样本光流图像得到的,有利于提升光流预测的准确性和效率。Therefore, the target tracking model also includes a target segmentation network. The first sample mask image and the second sample mask image are obtained by using the target segmentation network to perform target segmentation on the first sample image and the second sample image respectively, and the target The segmentation network has been trained before training the first matching network, so by training the target segmentation network in stages, the target tracking model can be trained step by step, which is conducive to improving the training efficiency and effect; while the second matching network includes the optical flow prediction network , which is used to predict the optical flow of the first sample image by using the second sample image to obtain the sample optical flow image of the first sample image, and the matching information of the second sample is obtained based on the sample optical flow image, which is beneficial to improve the optical flow Accuracy and efficiency of stream forecasting.
在本公开实施例以及其他的实施例中,“部分”可以是部分电路、部分处理器、部分程序或软件等等,当然也可以是单元,还可以是模块也可以是非模块化的。In the embodiments of the present disclosure and other embodiments, a "part" may be a part of a circuit, a part of a processor, a part of a program or software, etc., of course it may also be a unit, a module or a non-modular one.
请参阅图14,图14是本公开实施例提供的电子设备140的一个框架示意图。电子设备140包括相互耦接的存储器141和处理器142,处理器142被配置为执行存储器141中存储的程序指令,以实现上述任一目标跟踪方法实施例的步骤,或实现上述任一目标跟踪模型的训练方法实施例的步骤。在一个具体的实施场景中,电子设备140可以包括但不限于:微型计算机、服务器,此外,电子设备140还可以包括笔记本电脑、平板电脑等移动设备,在此不做限定。Please refer to FIG. 14 . FIG. 14 is a schematic frame diagram of an electronic device 140 provided by an embodiment of the present disclosure. The electronic device 140 includes a memory 141 and a processor 142 coupled to each other, and the processor 142 is configured to execute the program instructions stored in the memory 141, so as to realize the steps of any of the above object tracking method embodiments, or to realize any of the above object tracking The steps of the embodiment of the training method of the model. In a specific implementation scenario, the electronic device 140 may include, but is not limited to: a microcomputer and a server. In addition, the electronic device 140 may also include mobile devices such as notebook computers and tablet computers, which are not limited here.
这里,处理器142被配置为控制其自身以及存储器141以实现上述任一目标跟踪方法实施例的步骤,或实现上述任一目标跟踪模型的训练方法实施例的步骤。处理器142还可以称为CPU(Central Processing Unit,中央处理单元)。处理器142可能是一种集成电路芯片,具有信号的处理能力。处理器142还可以是通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。另外,处理器142可以由集成电路芯片共同实现。Here, the processor 142 is configured to control itself and the memory 141 to implement the steps of any of the above object tracking method embodiments, or to implement the steps of any of the above object tracking model training method embodiments. The processor 142 may also be called a CPU (Central Processing Unit, central processing unit). The processor 142 may be an integrated circuit chip with signal processing capability. The processor 142 can also be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), a field-programmable gate array (Field-Programmable Gate Array, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like. In addition, the processor 142 may be jointly realized by an integrated circuit chip.
上述方案,一方面在特征维度在图像之间进行对象匹配,能够有利于确保对大尺寸对象的跟踪效果,另一方面在空间维度在图像之间进行对象匹配,能够有利于确保对小尺寸对象的跟踪效果,并基于此融合两种匹配方式所得到的匹配信息得到跟踪信息,故能够同时兼顾大尺寸对象和小尺寸对象,有利于提升目标跟踪精度。The above solution, on the one hand, performs object matching between images in the feature dimension, which can help to ensure the tracking effect of large-sized objects; tracking effect, and based on the fusion of the matching information obtained by the two matching methods to obtain tracking information, it can take into account both large-size objects and small-size objects, which is conducive to improving the accuracy of target tracking.
请参阅图15,图15为本公开实施例提供的计算机可读存储介质150的一个框架示意图。计算机可读存储介质150存储有能够被处理器运行的程序指令151,程序指令151用于实现上述任一目标跟踪方法实施例的步骤,或实现上述任一目标跟踪模型的训练方法实施例的步骤。Please refer to FIG. 15 . FIG. 15 is a schematic frame diagram of a computer-readable storage medium 150 provided by an embodiment of the present disclosure. The computer-readable storage medium 150 stores program instructions 151 that can be executed by the processor, and the program instructions 151 are used to implement the steps of any of the above-mentioned object tracking method embodiments, or realize the steps of any of the above-mentioned object tracking model training method embodiments. .
上述方案,一方面在特征维度上在图像之间进行对象匹配,能够有利于确保对大尺寸对象的跟踪效果,另一方面在空间维度上在图像之间进行对象匹配,能够有利于确保对小尺寸对象的跟踪效果,并基于此融合两种匹配方式所得到的匹配信息得到跟踪信息,故能够同时兼顾大尺寸对象和小尺寸对象,有利于提升 目标跟踪精度。The above solution, on the one hand, performs object matching between images in the feature dimension, which can help to ensure the tracking effect of large-sized objects; Based on the tracking effect of large-size objects, and based on the fusion of the matching information obtained by the two matching methods to obtain tracking information, it can take into account both large-size objects and small-size objects, which is conducive to improving the accuracy of target tracking.
本公开实施例还提供了一种计算机程序产品,所述计算机程序产品包括计算机程序或指令,在所述计算机程序或指令在电子设备上运行的情况下,使得所述电子设备执行上述任一目标跟踪方法实施例的步骤,或上述任一目标跟踪模型的训练方法实施例的步骤。An embodiment of the present disclosure also provides a computer program product, the computer program product includes a computer program or an instruction, and when the computer program or instruction is run on an electronic device, the electronic device is made to perform any of the above objectives The steps of the embodiment of the tracking method, or the steps of the embodiment of the training method of any target tracking model mentioned above.
在本公开的一些实施例中,计算机程序(计算机指令)可以采用程序、软件、软件模块、脚本或代码的形式,按任意形式的编程语言(包括编译或解释语言,或者声明性或过程性语言)来编写,并且其可按任意形式部署,包括被部署为独立的程序或者被部署为模块、组件、子例程或者适合在计算环境中使用的其它单元。In some embodiments of the present disclosure, a computer program (computer instructions) may take the form of a program, software, software module, script, or code in any form of programming language (including compiled or interpreted languages, or declarative or procedural languages) ) and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
本公开涉及增强现实领域,通过获取现实环境中的目标对象的图像信息,进而借助各类视觉相关算法实现对目标对象的相关特征、状态及属性进行检测或识别处理,从而得到与具体应用匹配的虚拟与现实相结合的AR效果。示例性的,目标对象可涉及与人体相关的脸部、肢体、手势、动作等,或者与物体相关的标识物、标志物,或者与场馆或场所相关的沙盘、展示区域或展示物品等。视觉相关算法可涉及视觉定位、SLAM、三维重建、图像注册、背景分割、对象的关键点提取及跟踪、对象的位姿或深度检测等。具体应用不仅可以涉及跟真实场景或物品相关的导览、导航、讲解、重建、虚拟效果叠加展示等交互场景,还可以涉及与人相关的特效处理,比如妆容美化、肢体美化、特效展示、虚拟模型展示等交互场景。This disclosure relates to the field of augmented reality. By acquiring the image information of the target object in the real environment, and then using various visual correlation algorithms to detect or identify the relevant features, states and attributes of the target object, and thus obtain the image information that matches the specific application. AR effect combining virtual and reality. Exemplarily, the target object may involve faces, limbs, gestures, actions, etc. related to the human body, or markers and markers related to objects, or sand tables, display areas or display items related to venues or places. Vision-related algorithms can involve visual positioning, SLAM, 3D reconstruction, image registration, background segmentation, object key point extraction and tracking, object pose or depth detection, etc. Specific applications can not only involve interactive scenes such as guided tours, navigation, explanations, reconstructions, virtual effect overlays and display related to real scenes or objects, but also special effects processing related to people, such as makeup beautification, body beautification, special effect display, virtual Interactive scenarios such as model display.
可通过卷积神经网络,实现对目标对象的相关特征、状态及属性进行检测或识别处理。上述卷积神经网络是基于深度学习框架进行模型训练而得到的网络模型。The relevant features, states and attributes of the target object can be detected or identified through the convolutional neural network. The above-mentioned convolutional neural network is a network model obtained by performing model training based on a deep learning framework.
在本公开所提供的几个实施例中,应该理解到,所揭露的方法和装置,可以通过其它的方式实现。例如,以上所描述的装置实施方式仅仅是示意性的,例如,部分的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性、机械或其它的形式。In the several embodiments provided in the present disclosure, it should be understood that the disclosed methods and devices may be implemented in other ways. For example, the device implementation described above is only illustrative, for example, the division of parts is only a logical function division, and there may be other division methods in actual implementation, for example, units or components can be combined or integrated into another A system, or some feature, can be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施方式方案的目的。A unit described as a separate component may or may not be physically separated, and a component shown as a unit may or may not be a physical unit, that is, it may be located in one place, or may also be distributed to network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本公开各个实施方式方法的全部或部分步骤。而前述的存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备,可为易失性存储介质或非易失性存储介质。计算机可读存储介质例如可以是(但不限于)电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、静态随机存取存储器(SRAM)、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身,诸如无线电波或者其他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波(例如,通过光纤电缆的光脉冲)、或者通过电线传输的电信号。If the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present disclosure is essentially or part of the contribution to the prior art, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) execute all or part of the steps of the methods in various embodiments of the present disclosure. The aforementioned storage medium may be a tangible device capable of holding and storing instructions used by the instruction execution device, and may be a volatile storage medium or a non-volatile storage medium. A computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of computer-readable storage media include: portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), or flash memory), static random access memory (SRAM), compact disc read only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanically encoded device, such as a printer with instructions stored thereon A hole card or a raised structure in a groove, and any suitable combination of the above. As used herein, computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., pulses of light through fiber optic cables), or transmitted electrical signals.
工业实用性Industrial Applicability
本公开实施例提供了一种目标跟踪及相关模型的训练方法、装置、设备、介质、计算机程序产品,其中,目标跟踪方法包括:分别对第一图像和第二图像进行目标分割,得到第一图像中第一对象的第一掩膜图像和第二图像中第二对象的第二掩膜图像;基于第一掩膜图像和第二掩膜图像在特征维度进行对象匹配,得到第一匹配信息,并基于第一掩膜图像和第二掩膜图像在空间维度进行对象匹配,得到第二匹配信息;融合第一匹配信息和第二匹配信息,得到跟踪信息;其中,跟踪信息包括第一对象与第二对象是否为同一对象。上述方案,能够提升目标跟踪精度。Embodiments of the present disclosure provide a target tracking and related model training method, device, device, medium, and computer program product, wherein the target tracking method includes: performing target segmentation on the first image and the second image respectively to obtain the first A first mask image of the first object in the image and a second mask image of the second object in the second image; performing object matching in feature dimensions based on the first mask image and the second mask image to obtain first matching information , and perform object matching in the spatial dimension based on the first mask image and the second mask image to obtain the second matching information; fuse the first matching information and the second matching information to obtain tracking information; wherein, the tracking information includes the first object Whether it is the same object as the second object. The above solution can improve the target tracking accuracy.

Claims (21)

  1. 一种目标跟踪方法,包括:A target tracking method, comprising:
    分别对第一图像和第二图像进行目标分割,得到所述第一图像中第一对象的第一掩膜图像和所述第二图像中第二对象的第二掩膜图像;performing object segmentation on the first image and the second image respectively, to obtain a first mask image of the first object in the first image and a second mask image of the second object in the second image;
    基于所述第一掩膜图像和所述第二掩膜图像在特征维度进行对象匹配,得到第一匹配信息,并基于所述第一掩膜图像和所述第二掩膜图像在空间维度进行对象匹配,得到第二匹配信息;Perform object matching in the feature dimension based on the first mask image and the second mask image to obtain first matching information, and perform object matching in the spatial dimension based on the first mask image and the second mask image The object is matched, and the second matching information is obtained;
    融合所述第一匹配信息和所述第二匹配信息,得到跟踪信息;其中,所述跟踪信息包括所述第一对象与所述第二对象是否为同一对象。Fusing the first matching information and the second matching information to obtain tracking information; wherein the tracking information includes whether the first object and the second object are the same object.
  2. 根据权利要求1所述的方法,其中,所述基于所述第一掩膜图像和所述第二掩膜图像在特征维度进行对象匹配,得到第一匹配信息,包括:The method according to claim 1, wherein said performing object matching in feature dimensions based on said first mask image and said second mask image to obtain first matching information comprises:
    分别基于各所述第一对象的所述第一掩膜图像,提取得到各所述第一对象的第一特征表示,并分别基于各所述第二对象的所述第二掩膜图像,提取得到各所述第二对象的第二特征表示;Extracting first feature representations of each of the first objects based on the first mask images of each of the first objects, and extracting based on the second mask images of each of the second objects obtaining a second feature representation of each of said second objects;
    利用所述第一特征表示和所述第二特征表示,得到各所述第一对象与各所述第二对象之间的特征相似度;Obtaining the feature similarity between each of the first objects and each of the second objects by using the first feature representation and the second feature representation;
    基于各所述第一对象与各所述第二对象之间的特征相似度,得到所述第一匹配信息。The first matching information is obtained based on the feature similarity between each of the first objects and each of the second objects.
  3. 根据权利要求2所述的方法,其中,所述第一特征表示或所述第二特征表示的提取步骤包括:The method according to claim 2, wherein the step of extracting the first feature representation or the second feature representation comprises:
    基于掩膜图像中各个像素点的像素值,确定对象边界;其中,所述对象边界为所述掩膜图像所属对象的边界;Determining an object boundary based on the pixel values of each pixel in the mask image; wherein, the object boundary is the boundary of the object to which the mask image belongs;
    沿所述对象边界从所述掩膜图像中裁剪出区域图像;cropping a region image from the mask image along the object boundary;
    基于所述区域图像进行特征提取,得到所述所属对象的特征表示;performing feature extraction based on the region image to obtain a feature representation of the object to which it belongs;
    其中,在所述掩膜图像为所述第一掩膜图像的情况下,所述所属对象为所述第一对象,所述特征表示为所述第一特征表示,在所述掩膜图像为所述第二掩膜图像的情况下,所述所属对象为所述第二对象,所述特征表示为所述第二特征表示。Wherein, when the mask image is the first mask image, the belonging object is the first object, the feature representation is the first feature representation, and when the mask image is In the case of the second mask image, the belonging object is the second object, and the feature representation is the second feature representation.
  4. 根据权利要求1至3任一项所述的方法,其中,所述基于所述第一掩膜图像和所述第二掩膜图像在空间维度进行对象匹配,得到第二匹配信息,包括:The method according to any one of claims 1 to 3, wherein said performing object matching in spatial dimensions based on said first mask image and said second mask image to obtain second matching information includes:
    利用所述第二图像对所述第一图像进行光流预测,得到所述第一图像的光流图像;performing optical flow prediction on the first image by using the second image to obtain an optical flow image of the first image;
    基于所述光流图像,对所述第一对象的第一掩膜图像进行逐像素偏移,得到所述第一对象在所述第二图像的拍摄时刻的预测掩膜图像;Based on the optical flow image, performing a pixel-by-pixel shift on the first mask image of the first object to obtain a predicted mask image of the first object at the shooting moment of the second image;
    基于各个所述第一对象的预测掩膜图像分别与各个所述第二对象的第二掩膜图像之间的重合度,得到所述第二匹配信息。The second matching information is obtained based on a degree of overlap between the predicted mask images of each of the first objects and the second mask images of each of the second objects.
  5. 根据权利要求4所述的方法,其中,所述基于所述光流图像,对所述第一对象的第一掩膜图像进行逐像素偏移,得到所述第一对象在所述第二图像的拍摄时刻的预测掩膜图像,包括:The method according to claim 4, wherein, based on the optical flow image, the first mask image of the first object is shifted pixel by pixel to obtain the first object in the second image Predicted mask images at the time of capture, including:
    将所述光流图像和所述第一掩膜图像进行逐像素相乘,得到所述第一掩膜图像中像素点的偏移值;Multiplying the optical flow image and the first mask image pixel by pixel to obtain the offset value of the pixel in the first mask image;
    将所述第一掩膜图像中所述像素点的第一像素坐标与所述偏移值相加,得到所述像素点在所述拍摄时刻的第二像素坐标;adding the first pixel coordinate of the pixel in the first mask image to the offset value to obtain the second pixel coordinate of the pixel at the shooting moment;
    基于所述第一掩膜图像中所述像素点的第二像素坐标,得到所述预测掩膜图像。The predicted mask image is obtained based on the second pixel coordinates of the pixel points in the first mask image.
  6. 根据权利要求1至5任一项所述的方法,其中,所述第一匹配信息包括所述第一对象与所述第二对象之间的第一匹配度,所述第二匹配信息包括所述第一对象与所述第二对象之间的第二匹配度,所述融合所述第一匹配信息和所述第二匹配信息,得到跟踪信息,包括:The method according to any one of claims 1 to 5, wherein the first matching information includes a first matching degree between the first object and the second object, and the second matching information includes the The second matching degree between the first object and the second object, and the fusing of the first matching information and the second matching information to obtain tracking information includes:
    对所述第一匹配信息中所述第一匹配度进行自适应加权,得到第一加权匹配信息,并对所述第二匹配信息中所述第二匹配度进行自适应加权,得到第二加权匹配信息;其中,所述第一加权匹配信息包括所述第一对象与所述第二对象之间的第一加权匹配度,所述第二加权匹配信息包括所述第一对象与所述第二对象之间的第二加权匹配度;performing adaptive weighting on the first matching degree in the first matching information to obtain first weighted matching information, and performing adaptive weighting on the second matching degree in the second matching information to obtain a second weighted Matching information; wherein, the first weighted matching information includes a first weighted matching degree between the first object and the second object, and the second weighted matching information includes the first object and the second object a second weighted degree of matching between the two objects;
    将所述第一加权匹配信息和所述第二加权匹配信息进行融合,得到最终匹配信息;其中,所述最终匹配信息包括所述第一对象与所述第二对象之间的最终匹配度;Fusing the first weighted matching information and the second weighted matching information to obtain final matching information; wherein the final matching information includes a final matching degree between the first object and the second object;
    基于所述最终匹配信息进行分析,得到所述跟踪信息。Analyze based on the final matching information to obtain the tracking information.
  7. 根据权利要求6所述的方法,其中,所述跟踪信息是利用目标跟踪模型对所述第一图像和所述第二图像进行检测得到的,所述目标跟踪模型包括信息融合网络,所述信息融合网络包括第一加权子网络和第二加权子网络,所述第一加权子网络用于对所述第一匹配度进行自适应加权,所述第二加权子网络用于对所述第二匹配度进行自适应加权。The method according to claim 6, wherein the tracking information is obtained by using a target tracking model to detect the first image and the second image, and the target tracking model includes an information fusion network, and the information The fusion network includes a first weighting subnetwork and a second weighting subnetwork, the first weighting subnetwork is used to adaptively weight the first matching degree, and the second weighting subnetwork is used to weight the second weighting subnetwork. The matching degree is adaptively weighted.
  8. 根据权利要求6或7所述的方法,其中,所述基于所述最终匹配信息进行分析,得到所述跟踪信息,包括:The method according to claim 6 or 7, wherein the analyzing based on the final matching information to obtain the tracking information includes:
    将各个所述第一对象与各个所述第二对象的两两组合,分别作为当前对象组;Combining pairs of each of the first objects and each of the second objects as a current object group;
    基于所述当前对象组的第一参考信息和第二参考信息中的至少一种信息,确定当前第一对象和当前第二对象是否为同一对象;determining whether the current first object and the current second object are the same object based on at least one of the first reference information and the second reference information of the current object group;
    其中,所述当前第一对象为所述当前对象组中的第一对象,所述当前第二对象为所述当前对象组中的第二对象,所述第一参考信息包括:所述当前第一对象分别与各个所述第二对象之间的最终匹配度,所述第二参考信息包括:所述当前第二对象分别与各个所述第一对象之间的最终匹配度。Wherein, the current first object is the first object in the current object group, the current second object is the second object in the current object group, and the first reference information includes: the current first object The final matching degrees between an object and each of the second objects, the second reference information includes: the final matching degrees between the current second object and each of the first objects.
  9. 根据权利要求8所述的方法,其中,在所述基于所述当前对象组的第一参考信息和第二参考信息中的至少一种信息,确定当前第一对象和当前第二对象是否为同一对象之前,所述方法还包括:The method according to claim 8, wherein at least one of the first reference information and the second reference information based on the current object group is used to determine whether the current first object and the current second object are the same Before the object, the method also includes:
    将所述当前第一对象与所述当前第二对象之间的最终匹配度,作为待分析匹配度;Taking the final matching degree between the current first object and the current second object as the matching degree to be analyzed;
    所述基于所述当前对象组的第一参考信息和第二参考信息中的至少一种信息,确定当前第一对象和当前第二对象是否为同一对象,包括以下任一者:The determining whether the current first object and the current second object are the same object based on at least one of the first reference information and the second reference information of the current object group includes any of the following:
    响应于所述待分析匹配度为所述第一参考信息中的最大值,确定所述当前第一对象和所述当前第二对象为同一对象;In response to the matching degree to be analyzed being the maximum value in the first reference information, determining that the current first object and the current second object are the same object;
    响应于所述待分析匹配度为所述第二参考信息中的最大值,确定所述当前第一对象和所述当前第二对象为同一对象;In response to the matching degree to be analyzed being the maximum value in the second reference information, determining that the current first object and the current second object are the same object;
    响应于所述待分析匹配度为所述第一参考信息和所述第二参考信息中的最大值,确定所述当前第一对象和所述当前第二对象为同一对象。In response to the matching degree to be analyzed being the maximum value of the first reference information and the second reference information, it is determined that the current first object and the current second object are the same object.
  10. 根据权利要求1至9任一项所述的方法,其中,在所述融合所述第一匹配信息和所述第二匹配信息,得到跟踪信息之后,所述方法还包括:The method according to any one of claims 1 to 9, wherein, after said fusing said first matching information and said second matching information to obtain tracking information, said method further comprises:
    响应于所述跟踪信息满足预设条件,将所述跟踪信息作为第一跟踪信息,并获取第三图像;其中,所述第三图像、所述第一图像和所述第二图像分别是先后拍摄得到的;In response to the tracking information meeting the preset condition, using the tracking information as the first tracking information, and acquiring a third image; wherein, the third image, the first image, and the second image are successively photographed;
    基于所述第三图像和所述第二图像进行目标跟踪,得到第二跟踪信息,其中,所述第二跟踪信息包括所述第二对象与所述第三图像中的第三对象是否为同一对象;Perform target tracking based on the third image and the second image to obtain second tracking information, wherein the second tracking information includes whether the second object is the same as a third object in the third image object;
    基于所述第一跟踪信息和所述第二跟踪信息进行一致性校验,得到校验结果。Consistency verification is performed based on the first tracking information and the second tracking information to obtain a verification result.
  11. 根据权利要求10所述的方法,其中,所述预设条件包括:所述第二图像中存在目标对象;其中,所述目标对象与任一所述第一对象均不是同一对象。The method according to claim 10, wherein the preset condition comprises: a target object exists in the second image; wherein the target object is not the same object as any one of the first objects.
  12. 根据权利要求11所述的方法,其中,不同图像中相同对象具有相同对象标识,所述基于所述第一跟踪信息和所述第二跟踪信息进行一致性校验,得到校验结果,包括:The method according to claim 11, wherein the same object in different images has the same object identifier, and performing consistency verification based on the first tracking information and the second tracking information to obtain a verification result includes:
    基于所述第二跟踪信息对所述目标对象进行分析,得到分析结果;Analyzing the target object based on the second tracking information to obtain an analysis result;
    响应于所述分析结果包括所述目标对象与参考对象为同一对象,将所述参考对象的对象标识作为所述目标对象的对象标识;其中,所述参考对象为其中一个所述第三对象;In response to the analysis result including that the target object and the reference object are the same object, using the object identifier of the reference object as the object identifier of the target object; wherein the reference object is one of the third objects;
    响应于所述分析结果包括所述目标对象与所述第三图像中任一所述第三对象均不是同一对象,为所述目标对象标记新的对象标识。In response to the analysis result including that the target object is not the same object as any of the third objects in the third image, a new object identifier is marked for the target object.
  13. 一种目标跟踪模型的训练方法,包括:A training method for a target tracking model, comprising:
    获取第一样本图像中第一样本对象的第一样本掩膜图像、第二样本图像中第二样本对象的第二样本掩膜图像和样本跟踪信息;其中,所述样本跟踪信息包括所述第一样本对象与所述第二样本对象是否实际为同一对象;Acquiring a first sample mask image of a first sample object in a first sample image, a second sample mask image of a second sample object in a second sample image, and sample tracking information; wherein the sample tracking information includes Whether the first sample object and the second sample object are actually the same object;
    基于所述目标跟踪模型的第一匹配网络将所述第一样本掩膜图像和所述第二样本掩膜图像在特征维度进行对象匹配,得到第一预测匹配信息,并基于所述目标跟踪模型的第二匹配网络将所述第一样本掩膜图像和所述第二样本掩膜图像在空间维度进行对象匹配,得到第二预测匹配信息;The first matching network based on the target tracking model performs object matching on the first sample mask image and the second sample mask image in the feature dimension to obtain first predicted matching information, and based on the target tracking The second matching network of the model performs object matching on the first sample mask image and the second sample mask image in the spatial dimension to obtain second predicted matching information;
    利用所述目标跟踪模型的信息融合网络融合所述第一预测匹配信息和所述第二预测匹配信息,得到预测跟踪信息;其中,所述预测跟踪信息包括所述第一样本对象与所述第二样本对象是否预测为同一对象;Using the information fusion network of the target tracking model to fuse the first predicted matching information and the second predicted matching information to obtain predicted tracking information; wherein the predicted tracking information includes the first sample object and the Whether the second sample object is predicted to be the same object;
    基于所述样本跟踪信息与所述预测跟踪信息之间的差异,调整所述目标跟踪模型的网络参数。Adjusting network parameters of the target tracking model based on the difference between the sample tracking information and the predicted tracking information.
  14. 根据权利要求13所述的方法,其中,所述第一匹配网络在整体训练所述目标跟踪模型之前已完成训练,且所述第一匹配网络的训练步骤包括:The method according to claim 13, wherein the first matching network has completed training before the overall training of the target tracking model, and the training step of the first matching network comprises:
    基于所述第一匹配网络的第一提取子网络对所述第一样本对象的第一样本掩膜图像进行特征提取,得到所述第一样本对象的第一样本特征表示,并基于所述第一匹配网络的第二提取子网络对所述第二样本对象的第二样本掩膜图像进行特征提取,得到所述第二样本对象的第二样本特征表示;performing feature extraction on the first sample mask image of the first sample object based on the first extraction sub-network of the first matching network to obtain a first sample feature representation of the first sample object, and performing feature extraction on the second sample mask image of the second sample object based on the second extraction sub-network of the first matching network to obtain a second sample feature representation of the second sample object;
    对于各个所述第一样本对象,基于所述第一样本对象的第一样本特征表示分别与各个所述第二样本特征表示之间的特征相似度,得到所述第一样本对象分别与各个所述第二样本对象预测为同一对象的预测概率值,并基于各个所述预测概率值的期望值,得到所述第一样本对象的预测匹配对象,以及基于所述预测匹配对象与所述第一样本对象的实际匹配对象之间的差异,得到所述第一样本对象对应的子损失; 其中,所述预测匹配对象为与所述第一样本对象预测为同一对象的第二样本对象,所述实际匹配对象为与所述第一样本对象实际为同一对象的第二样本对象,所述实际匹配对象是基于所述样本跟踪信息确定的;For each of the first sample objects, based on the feature similarity between the first sample feature representation of the first sample object and each of the second sample feature representations, the first sample object is obtained Predicting the predicted probability values of the same object as each of the second sample objects, and obtaining the predicted matching object of the first sample object based on the expected value of each of the predicted probability values, and based on the predicted matching object and The difference between the actual matching objects of the first sample object is obtained to obtain the sub-loss corresponding to the first sample object; wherein, the predicted matching object is predicted to be the same object as the first sample object A second sample object, the actual matching object is a second sample object that is actually the same object as the first sample object, and the actual matching object is determined based on the sample tracking information;
    统计各个所述第一样本对象对应的子损失,得到所述第一匹配网络的总损失值;Counting the sub-losses corresponding to each of the first sample objects to obtain the total loss value of the first matching network;
    基于所述总损失值,调整所述第一匹配网络的网络参数。Adjusting network parameters of the first matching network based on the total loss value.
  15. 根据权利要求14所述的方法,其中,所述基于所述第一样本对象的第一样本特征表示分别与各个所述第二样本特征表示之间的特征相似度,得到所述第一样本对象分别与各个所述第二样本对象预测为同一对象的预测概率值,包括:The method according to claim 14, wherein the first sample feature representation based on the first sample object is obtained based on the feature similarities between the first sample feature representations and each of the second sample feature representations, and the first The predicted probability values that the sample object and each of the second sample objects are predicted to be the same object include:
    将所述特征相似度进行归一化,得到所述预测概率值;Normalizing the feature similarity to obtain the predicted probability value;
    和/或,所述基于各个所述预测概率值的期望值,得到所述第一样本对象的预测匹配对象,包括:And/or, the obtaining the predicted matching object of the first sample object based on the expected value of each of the predicted probability values includes:
    基于所述第二样本对象的序号值和所述第二样本对象对应的预测概率值,得到所述期望值;其中,各个所述第二样本对象分别标记有序号值;The expected value is obtained based on the serial number value of the second sample object and the predicted probability value corresponding to the second sample object; wherein, each of the second sample objects is respectively marked with a serial number value;
    将所述期望值上取整之后的数值,作为目标序号值;Taking the value after rounding up the expected value as the target serial number value;
    将所述目标序号值所属的第二样本对象,作为所述第一样本对象的预测匹配对象。The second sample object to which the target serial number value belongs is used as the predicted matching object of the first sample object.
  16. 根据权利要求13至15任一项所述的方法,其中,所述目标跟踪模型还包括目标分割网络,所述第一样本掩膜图像、所述第二样本掩膜图像是利用所述目标分割网络分别对所述第一样本图像、所述第二样本图像进行目标分割得到的,且所述目标分割网络在训练所述第一匹配网络之前已完成训练;The method according to any one of claims 13 to 15, wherein the target tracking model further includes a target segmentation network, and the first sample mask image and the second sample mask image use the target The segmentation network is obtained by performing target segmentation on the first sample image and the second sample image, and the target segmentation network has completed training before training the first matching network;
    和/或,所述第二匹配网络包括光流预测网络,用于利用所述第二样本图像对所述第一样本图像进行光流预测,得到所述第一样本图像的样本光流图像,且所述第二样本匹配信息是基于所述样本光流图像得到的。And/or, the second matching network includes an optical flow prediction network, configured to use the second sample image to perform optical flow prediction on the first sample image to obtain a sample optical flow of the first sample image image, and the second sample matching information is obtained based on the sample optical flow image.
  17. 一种目标跟踪装置,包括:A target tracking device, comprising:
    目标分割部分,被配置为分别对第一图像和第二图像进行目标分割,得到所述第一图像中第一对象的第一掩膜图像和所述第二图像中第二对象的第二掩膜图像;The object segmentation part is configured to perform object segmentation on the first image and the second image respectively, to obtain a first mask image of the first object in the first image and a second mask image of the second object in the second image. film image;
    对象匹配部分,被配置为基于所述第一掩膜图像和所述第二掩膜图像在特征维度进行对象匹配,得到第一匹配信息,并基于所述第一掩膜图像和所述第二掩膜图像在空间维度进行对象匹配,得到第二匹配信息;The object matching part is configured to perform object matching in feature dimensions based on the first mask image and the second mask image, obtain first matching information, and based on the first mask image and the second mask image performing object matching on the mask image in a spatial dimension to obtain second matching information;
    信息融合部分,被配置为融合所述第一匹配信息和所述第二匹配信息,得到跟踪信息;其中,所述跟踪信息包括所述第一对象与所述第二对象是否为同一对象。The information fusion part is configured to fuse the first matching information and the second matching information to obtain tracking information; wherein the tracking information includes whether the first object and the second object are the same object.
  18. 一种目标跟踪模型的训练装置,包括:A training device for a target tracking model, comprising:
    样本获取部分,被配置为获取第一样本图像中第一样本对象的第一样本掩膜图像、第二样本图像中第二样本对象的第二样本掩膜图像和样本跟踪信息;其中,所述样本跟踪信息包括所述第一样本对象与所述第二样本对象是否实际为同一对象;a sample acquisition part configured to acquire a first sample mask image of a first sample object in the first sample image, a second sample mask image of a second sample object in the second sample image, and sample tracking information; wherein , the sample tracking information includes whether the first sample object and the second sample object are actually the same object;
    样本匹配部分,被配置为基于所述目标跟踪模型的第一匹配网络将所述第一样本掩膜图像和所述第二样本掩膜图像在特征维度进行对象匹配,得到第一预测匹配信息,并基于所述目标跟踪模型的第二匹配网络将所述第一样本掩膜图像和所述第二样本掩膜图像在空间维度进行对象匹配,得到第二预测匹配信息;The sample matching part is configured to perform object matching on the feature dimension of the first sample mask image and the second sample mask image based on the first matching network of the target tracking model to obtain first predicted matching information , and performing object matching on the first sample mask image and the second sample mask image in the spatial dimension based on the second matching network of the target tracking model to obtain second predicted matching information;
    样本融合部分,被配置为利用所述目标跟踪模型的信息融合网络融合所述第一预测匹配信息和所述第二预测匹配信息,得到预测跟踪信息;其中,所述预测跟踪信息包括所述第一样本对象与所述第二样本对象是否预测为同一对象;The sample fusion part is configured to use the information fusion network of the target tracking model to fuse the first predicted matching information and the second predicted matching information to obtain predicted tracking information; wherein the predicted tracking information includes the first whether a sample object and the second sample object are predicted to be the same object;
    参数调整部分,被配置为基于所述样本跟踪信息与所述预测跟踪信息之间的差异,调整所述目标跟踪模型的网络参数。The parameter adjustment part is configured to adjust the network parameters of the target tracking model based on the difference between the sample tracking information and the predicted tracking information.
  19. 一种电子设备,包括相互耦接的存储器和处理器,所述处理器被配置为执行所述存储器中存储的程序指令,以实现权利要求1至12任一项所述的目标跟踪方法,或实现权利要求13至16任一项所述的目标跟踪模型的训练方法。An electronic device, comprising a memory and a processor coupled to each other, the processor is configured to execute program instructions stored in the memory, so as to implement the target tracking method according to any one of claims 1 to 12, or Realize the training method of the target tracking model described in any one of claims 13 to 16.
  20. 一种计算机可读存储介质,其上存储有程序指令,所述程序指令被处理器执行时实现权利要求1至12任一项所述的目标跟踪方法,或实现权利要求13至16任一项所述的目标跟踪模型的训练方法。A computer-readable storage medium, on which program instructions are stored, and when the program instructions are executed by a processor, the target tracking method according to any one of claims 1 to 12 is realized, or any one of claims 13 to 16 is realized The training method of the target tracking model.
  21. 一种计算机程序产品,所述计算机程序产品包括计算机程序或指令,在所述计算机程序或指令在电子设备上运行的情况下,使得所述电子设备执行权利要求1至12中任一项所述的目标跟踪方法,或执行权利要求13至16任一项所述的目标跟踪模型的训练方法。A computer program product, the computer program product comprising a computer program or an instruction, when the computer program or instruction is run on an electronic device, the electronic device is made to execute any one of claims 1 to 12 The target tracking method, or execute the training method of the target tracking model described in any one of claims 13 to 16.
PCT/CN2022/106523 2021-11-26 2022-07-19 Target tracking method and apparatus, training method and apparatus for model related thereto, and device, medium and computer program product WO2023093086A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111424075.9 2021-11-26
CN202111424075.9A CN114155278A (en) 2021-11-26 2021-11-26 Target tracking and related model training method, related device, equipment and medium

Publications (1)

Publication Number Publication Date
WO2023093086A1 true WO2023093086A1 (en) 2023-06-01

Family

ID=80458300

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/106523 WO2023093086A1 (en) 2021-11-26 2022-07-19 Target tracking method and apparatus, training method and apparatus for model related thereto, and device, medium and computer program product

Country Status (2)

Country Link
CN (1) CN114155278A (en)
WO (1) WO2023093086A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114155278A (en) * 2021-11-26 2022-03-08 浙江商汤科技开发有限公司 Target tracking and related model training method, related device, equipment and medium
CN115147458B (en) * 2022-07-21 2023-04-07 北京远度互联科技有限公司 Target tracking method and device, electronic equipment and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7035431B2 (en) * 2002-02-22 2006-04-25 Microsoft Corporation System and method for probabilistic exemplar-based pattern tracking
CN108805900A (en) * 2017-05-03 2018-11-13 杭州海康威视数字技术股份有限公司 A kind of determination method and device of tracking target
CN109544590A (en) * 2018-11-27 2019-03-29 上海芯仑光电科技有限公司 A kind of method for tracking target and calculate equipment
CN110414443A (en) * 2019-07-31 2019-11-05 苏州市科远软件技术开发有限公司 A kind of method for tracking target, device and rifle ball link tracking
CN111709328A (en) * 2020-05-29 2020-09-25 北京百度网讯科技有限公司 Vehicle tracking method and device and electronic equipment
CN112070807A (en) * 2020-11-11 2020-12-11 湖北亿咖通科技有限公司 Multi-target tracking method and electronic device
CN113052019A (en) * 2021-03-10 2021-06-29 南京创维信息技术研究院有限公司 Target tracking method and device, intelligent equipment and computer storage medium
CN113205072A (en) * 2021-05-28 2021-08-03 上海高德威智能交通系统有限公司 Object association method and device and electronic equipment
CN114155278A (en) * 2021-11-26 2022-03-08 浙江商汤科技开发有限公司 Target tracking and related model training method, related device, equipment and medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7035431B2 (en) * 2002-02-22 2006-04-25 Microsoft Corporation System and method for probabilistic exemplar-based pattern tracking
CN108805900A (en) * 2017-05-03 2018-11-13 杭州海康威视数字技术股份有限公司 A kind of determination method and device of tracking target
CN109544590A (en) * 2018-11-27 2019-03-29 上海芯仑光电科技有限公司 A kind of method for tracking target and calculate equipment
CN110414443A (en) * 2019-07-31 2019-11-05 苏州市科远软件技术开发有限公司 A kind of method for tracking target, device and rifle ball link tracking
CN111709328A (en) * 2020-05-29 2020-09-25 北京百度网讯科技有限公司 Vehicle tracking method and device and electronic equipment
CN112070807A (en) * 2020-11-11 2020-12-11 湖北亿咖通科技有限公司 Multi-target tracking method and electronic device
CN113052019A (en) * 2021-03-10 2021-06-29 南京创维信息技术研究院有限公司 Target tracking method and device, intelligent equipment and computer storage medium
CN113205072A (en) * 2021-05-28 2021-08-03 上海高德威智能交通系统有限公司 Object association method and device and electronic equipment
CN114155278A (en) * 2021-11-26 2022-03-08 浙江商汤科技开发有限公司 Target tracking and related model training method, related device, equipment and medium

Also Published As

Publication number Publication date
CN114155278A (en) 2022-03-08

Similar Documents

Publication Publication Date Title
CN110472531B (en) Video processing method, device, electronic equipment and storage medium
JP7236545B2 (en) Video target tracking method and apparatus, computer apparatus, program
US11475660B2 (en) Method and system for facilitating recognition of vehicle parts based on a neural network
Xiong et al. Spatiotemporal modeling for crowd counting in videos
WO2023093086A1 (en) Target tracking method and apparatus, training method and apparatus for model related thereto, and device, medium and computer program product
CN113963445B (en) Pedestrian falling action recognition method and equipment based on gesture estimation
CN110287826B (en) Video target detection method based on attention mechanism
CN111062263B (en) Method, apparatus, computer apparatus and storage medium for hand gesture estimation
WO2021218786A1 (en) Data processing system, object detection method and apparatus thereof
CN110163188B (en) Video processing and method, device and equipment for embedding target object in video
Shen et al. A convolutional neural‐network‐based pedestrian counting model for various crowded scenes
TW202026948A (en) Methods and devices for biological testing and storage medium thereof
WO2021249114A1 (en) Target tracking method and target tracking device
CN111368634B (en) Human head detection method, system and storage medium based on neural network
CN112613668A (en) Scenic spot dangerous area management and control method based on artificial intelligence
CN112541403B (en) Indoor personnel falling detection method by utilizing infrared camera
CN113192124A (en) Image target positioning method based on twin network
Wu et al. Real‐time running detection system for UAV imagery based on optical flow and deep convolutional networks
Guo et al. Gesture recognition of traffic police based on static and dynamic descriptor fusion
CN117949942B (en) Target tracking method and system based on fusion of radar data and video data
Gündüz et al. A new YOLO-based method for social distancing from real-time videos
CN115035158A (en) Target tracking method and device, electronic equipment and storage medium
CN114972182A (en) Object detection method and device
Khoshboresh-Masouleh et al. Robust building footprint extraction from big multi-sensor data using deep competition network
US20230281843A1 (en) Generating depth images for image data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22897182

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE