WO2022152110A1 - 对象追踪、地物追踪方法、设备、系统及存储介质 - Google Patents

对象追踪、地物追踪方法、设备、系统及存储介质 Download PDF

Info

Publication number
WO2022152110A1
WO2022152110A1 PCT/CN2022/071259 CN2022071259W WO2022152110A1 WO 2022152110 A1 WO2022152110 A1 WO 2022152110A1 CN 2022071259 W CN2022071259 W CN 2022071259W WO 2022152110 A1 WO2022152110 A1 WO 2022152110A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
phase
pixel coordinates
candidate region
images
Prior art date
Application number
PCT/CN2022/071259
Other languages
English (en)
French (fr)
Inventor
高福杰
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2022152110A1 publication Critical patent/WO2022152110A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing

Definitions

  • the present application relates to the field of computer vision technology, and in particular, to a method, device, system and storage medium for object tracking and feature tracking.
  • remote sensing image analysis technology changes can be detected and ground objects can be tracked in designated areas based on the rich data provided by multi-temporal remote sensing images.
  • Various aspects of the present application provide an object tracking and feature tracking method, device, system, and storage medium, so as to improve the accuracy of object tracking results.
  • An embodiment of the present application provides an object tracking method, including: acquiring a two-phase image obtained by photographing a target environment; and determining at least one candidate region in the two-phase image according to image features of the two-phase image respective feature maps; according to the feature map of the at least one candidate region, perform instance segmentation on the two-phase image to obtain at least one tracking object included in the two-phase image; and, according to the at least one candidate region
  • the feature map of the two-phase image is detected, and the change detection is performed on a plurality of pixel coordinates corresponding to the two-phase images to obtain the corresponding change states of the plurality of pixel coordinates; according to the pixel coordinates corresponding to the at least one tracking object and the The respective change states of the pixel coordinates are determined, and the respective change states of the at least one tracking object are determined.
  • the embodiment of the present application also provides a method for tracking objects, including: acquiring a two-phase remote sensing image obtained by photographing a target environment; The respective feature maps of at least one candidate area of the at least one candidate area; according to the feature map of the at least one candidate area, perform instance segmentation on the two-phase remote sensing image to obtain at least one ground object included in the two-phase remote sensing image; and, According to the feature map of the at least one candidate area, change detection is performed on a plurality of pixel coordinates corresponding to the two-phase remote sensing image, and the corresponding change states of the plurality of pixel coordinates are obtained; The corresponding pixel coordinates and the respective change states of the plurality of pixel coordinates determine the respective change states of the at least one ground object.
  • Embodiments of the present application further provide an electronic device, including: a memory and a processor; the memory is used to store one or more computer instructions; the processor is used to execute the one or more computer instructions to execute this Steps in the methods provided in the application examples.
  • the embodiments of the present application further provide a computer-readable storage medium storing a computer program, and when the computer program is executed by a processor, the steps in the methods provided by the embodiments of the present application can be implemented.
  • the tracking objects in the two-phase images can be detected in detail; by performing change detection on the pixel coordinates corresponding to the two-phase images, Get pixel-level change state detection results. Based on the tracked object obtained by segmentation and the detection result of the change state at the pixel level, the change state of the tracked object in the two-phase image can be accurately obtained, and the accuracy and reliability of the object tracking result can be improved.
  • FIG. 1 is a schematic flowchart of an object tracking method provided by an exemplary embodiment of the present application
  • FIG. 2 is a schematic structural diagram of a neural network model provided by an exemplary embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of a neural network model provided by another exemplary embodiment of the present application.
  • FIG. 4 is a schematic flowchart of a method for tracking ground objects provided by an exemplary embodiment of the present application
  • FIG. 5 is a schematic structural diagram of an electronic device provided by an exemplary embodiment of the present application.
  • FIG. 1 is a schematic flowchart of an object tracking method provided by an exemplary embodiment of the present application. As shown in FIG. 1 , the method includes
  • Step 101 Acquire a two-phase image obtained by photographing the target environment.
  • Step 102 Determine respective feature maps of at least one candidate region in the two-phase images according to the image features of the two-phase images.
  • Step 103 Perform instance segmentation on the two-phase image according to the feature map of the at least one candidate region to obtain at least one tracking object included in the two-phase image.
  • Step 104 Perform change detection on a plurality of pixel coordinates corresponding to the two-phase image according to the feature map of the at least one candidate region, and obtain the corresponding change states of the plurality of pixel coordinates.
  • Step 105 Determine the respective change states of the at least one tracked object according to the respective pixel coordinates corresponding to the at least one tracked object and the respective change states of the plurality of pixel coordinates.
  • the object tracking refers to tracking the change history of the same object in images acquired at different times.
  • the target environment can include indoor places such as sports venues, factories, museums, and supermarkets, and can also include any outdoor ground environment such as blocks, suburbs, fields, villages, and roads.
  • the two-phase images may include a group of images of the same target environment acquired at different times.
  • the image features of the two-phase images can be obtained by performing a feature extraction operation on the two-phase images.
  • the extracted image features may include features such as color, texture, shape, and spatial relationship of the two-phase image, and may also include high-level semantic features of the two-phase image. Among them, high-level semantic features contain more global information, which can provide rich information for object tracking.
  • the candidate region in the two-phase image refers to a region in the two-phase image that may contain a tracking object, and the at least one candidate region can be obtained by performing a region generation operation on the two-phase image.
  • the region generation operation refers to extracting a possible region of interest from the two-phase image according to the image features of the two-phase image, and the region of interest may contain a tracking object.
  • the feature maps of the two-phase images may be based on a predefined basic detection frame (anchors, anchors) at a fixed ratio. Select the suggested regions of interest (region proposals). Next, determine the probability that the area selected by the basic detection frame belongs to the tracking object by a logistic regression algorithm (such as softmax), and use the bounding box regression algorithm to correct the basic detection frame to obtain an accurate candidate frame.
  • a logistic regression algorithm such as softmax
  • the feature map of the candidate region can be determined from the feature map of the two-phase image, and the tracking object corresponding to the candidate region can be further detected according to the region feature of the candidate region.
  • detecting the tracking object includes: segmenting the tracking object from the two-phase images, and detecting whether the tracking object changes at different shooting moments corresponding to the two-phase images.
  • instance segmentation refers to predicting the category label of each pixel in the image pixel by pixel through the segmentation algorithm, that is, the instance category to which each pixel belongs, and distinguishing the same instance in the input image at the same time different types of individuals.
  • instance segmentation can be performed on the two-phase images to obtain at least one tracking object included in the two-phase images.
  • the instance segmentation result not only includes the instance category of each tracking object, but also can segment different tracking object individuals of the same instance category.
  • objects of different instance categories in the block and different objects of each instance category can be segmented from the image of the block, for example, each house, Every road, every car, etc.
  • the change detection refers to analyzing the pixel points with the same pixel coordinates in the images captured at different times through the change detection algorithm, so as to determine the change state of each pixel coordinate at different times.
  • the two-phase images have the same size and resolution, so the pixel coordinates of the two-phase images correspond, that is, the two-phase images correspond to a plurality of identical pixel coordinates, hereinafter referred to as two-phase images
  • the corresponding multiple pixel coordinates For example, when the matrix K1 is used to represent the first image in the two-phase images, and the matrix K2 is used to represent the second image in the two-phase images, K1 and K2 are both m*n matrices, m represents the number of pixel rows, and n represents The number of pixel columns, m and n are positive integers. K1 and K2 correspond to m*n pixel coordinates.
  • the change detection of multiple pixel coordinates corresponding to the two-phase images can be performed, that is, the difference in the images captured by the same pixel coordinates at different times can be detected, and the multiple pixel coordinates can be detected.
  • the corresponding change state of each pixel coordinate can be performed, that is, the difference in the images captured by the same pixel coordinates at different times can be detected, and the multiple pixel coordinates can be detected. The corresponding change state of each pixel coordinate.
  • Changes in pixel coordinates in images captured at different times can reflect changes in real space corresponding to pixel coordinates.
  • the change state of each tracking object in the two-phase images can be determined. Based on the change state of each tracking object and the instance category of each tracking object included in the instance segmentation result, the tracking object can be tracked.
  • the tracking objects in the two-phase images can be detected in detail; by performing change detection on the pixel coordinates corresponding to the two-phase images, the pixel-level information can be obtained. Changed state detection results. Based on the tracked object obtained by segmentation and the detection result of the change state at the pixel level, the change state of the tracked object in the two-phase image can be accurately obtained, and the accuracy and reliability of the object tracking result can be improved.
  • the object tracking method provided by the embodiments of the present application can be implemented by a neural network model, and the neural network model can be implemented based on Mask-RCNN (Mask-Region-CNN, a method for predicting regions Masked Convolutional Neural Networks) for multi-task networks.
  • Mask-RCNN Mask-Region-CNN, a method for predicting regions Masked Convolutional Neural Networks
  • FIG. 2 is a schematic structural diagram of a neural network model provided by an exemplary embodiment of the present application.
  • the neural network model includes a backbone network and a multi-task network connected to the backbone network, wherein the multi-task network
  • the network includes: instance segmentation network and change detection network.
  • the backbone network is used to output the regional feature map according to the input image.
  • the instance segmentation network and the change detection network can share the regional feature map output by the backbone network.
  • the image input to the neural network model may include two-phase images and difference feature maps of the two-phase images.
  • An exemplary description will be given below.
  • the two-phase images are described as the first image and the second image obtained by photographing the target environment at different times.
  • a difference operation of pixel values may be performed on pixel points with the same pixel coordinates in the first image and the second image to obtain a difference value feature map.
  • the first image, the second image and the difference feature map are concatenated to obtain a multi-channel image.
  • the first image, the second image, and the difference feature map respectively include three color channels, R (red), G (green), and B (blue), so the multi-channel image obtained by splicing is a 9-channel image.
  • this multi-channel image is fed into the neural network model.
  • the feature extraction operation of the multi-channel image can be performed by using the backbone network in the neural network model to obtain the feature map of the multi-channel image;
  • the region generation operation is performed according to the feature map of the multi-channel image to obtain at least one candidate region, and obtain a feature map of the at least one candidate region through the region feature aggregation operation.
  • the backbone network may include: a feature extraction network, a region generation network, and a region feature aggregation layer.
  • the step of performing a feature extraction operation on the multi-channel image to obtain a feature map of the multi-channel image can be implemented based on a feature extraction network.
  • the feature extraction network may be implemented as a convolutional neural network (CNN), for example, may include but not limited to VGG, ResNet and other networks.
  • CNN convolutional neural network
  • the step of performing the region generation operation according to the feature map of the two-phase image can be realized based on the RPN (Region Proposal Network, region generation network).
  • RPN can select the suggested region of interest (region proposals) in the feature map of the two-phase image according to the basic detection frame (anchors) predefined at a fixed ratio, and judges that the region selected by the basic detection frame belongs to the foreground tracking object through softmax. probability, and use the bounding box regression algorithm to correct the basic detection frame to obtain an accurate candidate frame.
  • the feature map of the candidate region can be determined from the feature map of the two-phase image based on the regional feature aggregation layer.
  • the region feature aggregation layer may be implemented based on the ROIAlign algorithm, which will not be repeated in this embodiment.
  • the information contained in the two-phase images can be fully utilized.
  • the multi-channel image input to the neural network model contains a difference feature map, and the difference feature map is used to express the changed pixel coordinates in the first image and the second image. Further feature extraction of the difference feature map based on the neural network model can make full use of advanced features such as semantic features of the pixels that have changed, and better extract the change features, which is conducive to more accurate change detection.
  • the first image and the second image may be further preprocessed , to reduce the interference of other factors on change detection.
  • the respective histograms of the first image and the second image are obtained; according to the histogram of the first image, the The histogram is transformed to equalize the luminance information of the first image and the second image.
  • the operation of calculating the difference feature maps of the first image and the second image can be performed. Based on this implementation, the difference caused by the difference in illumination on the first image and the second image captured at different times can be excluded, and the accuracy of the change detection result can be further improved.
  • the instance segmentation network shown in FIG. 2 can perform instance segmentation on the two-phase image according to the feature map of at least one candidate region output by the backbone network to obtain at least one tracking object included in the two-phase image.
  • the instance segmentation network may include a classification branch and a segmentation branch.
  • an instance category of the tracking object corresponding to each of the at least one candidate region may be identified according to the feature map of the at least one candidate region.
  • the example categories may include: a house category, a road category, a vehicle category, a pedestrian category, a plant category, and the like.
  • the backbone network when the backbone network outputs feature maps of multiple candidate regions, a part of the multiple candidate regions is located in the first image in the two-phase image, and another part of the candidate region is located in the two-phase image of the second image.
  • the classification branch can output the instance category of the tracking object corresponding to the candidate region included in the first image, and output the instance category of the tracking object corresponding to the candidate region included in the second image.
  • the first image includes candidate region 1 , candidate region 2 , and candidate region 3 .
  • the second image includes a candidate region 4 and a candidate region 5 .
  • the classification branch can output on the first image that the tracking object corresponding to candidate area 1 is a house, the tracking object corresponding to candidate area 2 is a vehicle, and candidate area 3 corresponds to The tracking objects are pedestrians.
  • the classification branch can output the tracking object corresponding to the candidate region 4 as a house and the tracking object corresponding to the candidate region 5 as a vehicle on the second image according to the respective feature maps of the candidate region 4 and the candidate region 5 .
  • the pixel coordinates of the tracking objects corresponding to each of the at least one candidate region may be calculated according to the feature map of the at least one candidate region.
  • the backbone network when the backbone network outputs feature maps of multiple candidate regions, a part of the multiple candidate regions is located in the first image in the two-phase image, and another part of the candidate region is located in the two-phase image of the second image.
  • the segmentation branch can respectively output the pixel coordinates of the tracking object included in the first image and the pixel coordinates of the tracking object included in the second image.
  • a target candidate region included in the first image may be determined from the at least one candidate region.
  • the target candidate area may include one candidate area or multiple candidate areas.
  • the probability that each pixel in the target candidate region belongs to the tracking object can be calculated according to the feature map of the target candidate region. According to the probability that each pixel in the target candidate region belongs to the tracking object, the pixel coordinates of the tracking object corresponding to the target candidate region can be determined. According to the pixel coordinates of the tracking object, the position information of the tracking object can be accurately determined.
  • the outline of the polygon may be used to segment the tracking object from the first image.
  • the first probability threshold may be set according to requirements, for example, may be set to 60%, 80%, 90%, or other optional values, which are not limited in this embodiment.
  • the segmentation branch calculates and obtains the pixel coordinates of the tracking object corresponding to each candidate region
  • the respective instances of the first image and the second image can be output according to the pixel coordinates of the tracking objects contained in the first image and the second image respectively.
  • the segmentation mask (Mask) is the segmentation mask (Mask).
  • the segmentation branch can output instance segmentation masks corresponding to each of the multiple instance types.
  • each instance category corresponds to an instance segmentation mask. That is, when the segmentation branch segments the tracking objects of M types of instance categories, M instance segmentation masks can be output for the image to be segmented, where M is a positive integer.
  • M is a positive integer.
  • the value stored in each pixel coordinate is used to represent the probability that the pixel coordinate belongs to the tracking object of the instance category.
  • the segmentation branch may output a three-channel segmentation mask for that image.
  • the value stored in each pixel coordinate is used to represent the probability that the pixel coordinate belongs to the house category
  • the value stored in each pixel coordinate It is used to indicate the probability that the pixel coordinate belongs to the vehicle category
  • the value stored in each pixel coordinate is used to indicate the probability that the pixel coordinate belongs to the vehicle category
  • the segmentation mask of the third channel the value stored in each pixel coordinate is used to represent the probability that the pixel coordinate belongs to the road category.
  • the value of the pixel coordinates stored in the instance segmentation mask can be determined to be 1, otherwise, the instance segmentation mask In the version, it is determined that the value of the pixel coordinate stored is 0, and then the binary instance segmentation mask corresponding to the instance category is obtained.
  • the change detection network shown in FIG. 2 can perform change detection on multiple pixel coordinates corresponding to the two-phase images according to the feature map of at least one candidate region output by the backbone network, and obtain the corresponding change states of the multiple pixel coordinates.
  • the probability that a plurality of pixel coordinates corresponding to the two-phase image respectively belong to at least one change state may be calculated according to the feature map of the at least one candidate region. According to the probability that the plurality of pixel coordinates respectively belong to at least one change state, the change state corresponding to each of the plurality of pixel coordinates can be output.
  • each corresponding change state of a plurality of pixel coordinates can be described by a change mask.
  • the change detection network can detect N kinds of change states, it can output N change masks corresponding to the plurality of pixels, where N is a positive integer.
  • each change state corresponds to a change mask.
  • the value stored in each pixel coordinate is used to represent the probability that the pixel coordinate belongs to the change state.
  • the at least one changing state may include at least one of a newly added state, a disappearing state, and an unchanged state.
  • the change detection network can output a three-channel change mask, namely: the change mask 1 corresponding to the new state, the disappearing state The corresponding change mask 2 and the corresponding change mask 3 of the invariant state.
  • the value stored in each pixel coordinate in the change mask 1 is used to indicate the probability that the pixel coordinate belongs to the newly added state; the value stored in each pixel coordinate in the change mask 2 is used to indicate that the pixel coordinate belongs to the new state.
  • the probability of disappearing state; the value stored in each pixel coordinate in the change mask 3 is used to represent the probability that the pixel coordinate belongs to the invariant state.
  • the probability that the pixel coordinates belong to a certain change state is greater than the set second probability threshold, it can be determined that the value of the pixel coordinates stored in the change mask corresponding to the change state is 1, otherwise, in the change In the mask, the value stored in the pixel coordinate is determined to be 0, and then a binary change mask corresponding to the change state is obtained.
  • the second probability threshold may be set according to requirements, for example, may be set to 60%, 80%, 90%, or other optional values, which are not limited in this embodiment.
  • the neural network model can integrate segmentation tasks and change detection tasks, and based on an end-to-end neural network model, detailed change detection results and accurate object segmentation results can be obtained.
  • the respective instance category of at least one tracking object and the respective pixel coordinates of at least one tracking object in the two-phase images output by the instance segmentation network can be obtained; and the corresponding two-phase images output by the change detection network can be obtained.
  • the respective change states of the at least one tracking object can be determined by comparing the above-mentioned various output information.
  • the pixel coordinates of the target tracking object corresponding to the candidate region can be determined from the pixel coordinates of the tracking objects corresponding to the at least one candidate region.
  • determine the change state of the pixel coordinates corresponding to the target tracking object that is, the change state of the target tracking object.
  • the instance type of the target tracking object can be determined from the instance types of the tracking objects corresponding to the at least one candidate region, and the change state and the instance type of the target tracking object can be output.
  • the change state and instance type of the tracking object corresponding to each candidate region can be output, so as to realize the object tracking operation based on the two-phase image.
  • the two-phase image obtained by photographing the street it can output the object tracking such as house 1 in the disappearing state, house 2 in the newly added state, highway 1 in the same state, vehicle 1 in the newly added state, and vehicle 2 in the disappearing state, etc. information.
  • the instance segmentation network can output M instance segmentation masks of the first image and M instance segmentation masks of the second image, and the change detection network can output two N variation masks for multiple pixel coordinates corresponding to the time-phase image.
  • the M instance segmentation masks of the first image are respectively superimposed with the N change masks, so that the respective change states of the tracking objects in the first image can be determined.
  • the respective change states of the tracking objects in the second image can be determined by superimposing the M instance segmentation masks of the second image and the N change masks respectively.
  • the pixel coordinates of the tracking object 1 of the house category are ⁇ P1 ⁇
  • the pixel coordinates of the tracking object 2 of the house category are ⁇ P2 ⁇ . That is, in the segmentation mask of the first image, the pixel coordinates ⁇ P1 ⁇ and the pixel coordinates ⁇ P2 ⁇ are stored with a value of 1, and the other pixel coordinates are stored with a value of 0.
  • the pixel coordinates of the tracking object 3 of the house category are ⁇ P3 ⁇
  • the pixel coordinates of the tracking object 4 of the house category are ⁇ P4 ⁇ . That is, in the segmentation mask of the second image, the pixel coordinates ⁇ P3 ⁇ and the pixel coordinates ⁇ P4 ⁇ are stored with a value of 1, and the other pixel coordinates are stored with a value of 0.
  • the change states of the pixel coordinates ⁇ P1 ⁇ and ⁇ P3 ⁇ are invariant states. That is, in the first change mask, the pixel coordinates ⁇ P1 ⁇ and ⁇ P3 ⁇ are stored with a value of 1, and the other pixel coordinates are stored with a value of 0. Then, by superimposing the segmentation mask of the first image and the first change mask, it can be determined that the tracking object 1 of the house category is an unchanged tracking object.
  • the change state of the pixel coordinate ⁇ P2 ⁇ is the disappearing state. That is, in the second change mask, the value stored in the pixel coordinate ⁇ P2 ⁇ is 1, and the value stored in the other pixel coordinates is 0. Then, by superimposing the segmentation mask of the first image and the second change mask, it can be determined that the tracking object 2 of the house category is the disappearing tracking object.
  • the change state of the pixel coordinate ⁇ P4 ⁇ is the new state. That is, in the third change mask, the value stored in the pixel coordinate ⁇ P4 ⁇ is 1, and the value stored in the other pixel coordinates is 0. Then, by superimposing the segmentation mask of the first image and the third change mask, the tracking object 4 of the house category can be determined as the newly added tracking object.
  • each set of training data includes two-phase sample images, and each sample image is marked with a category label of a tracking object and a polygonal outline of the tracking object.
  • the multiple sets of training data may be input into the neural network model shown in FIG. 2 and FIG. 3 to perform iterative training on the neural network model.
  • the category label and polygon outline of the tracking object marked on the sample image can be directly used as the supervision signal of the instance segmentation network shown in Figure 2 and Figure 3.
  • the supervision signal of the detection network can be calculated based on the labeling difference between the two-time sample phase images and the shooting order of the two-time sample phase images.
  • the pixel coordinates can be marked as unchanged pixel coordinates.
  • pixel coordinate For a pixel coordinate, if the pixel coordinate is located outside the polygon outline of the tracked object in the first sample image, but within the polygon outline of the tracked object in the second sample image, the pixel coordinate is marked as a new occurrence. Incrementally changed pixel coordinates. Conversely, if the pixel coordinates are located within the polygon outline of the tracked object in the first sample image, but outside the polygon outline of the tracked object in the second sample image, the pixel coordinates are marked as pixels that have disappeared. coordinate.
  • the ground truth value of the state of change for each pixel coordinate in the training data can be determined. In this way of calculating the true value of the changed state by using the existing label value, it can avoid increasing the labeling workload and improve the training efficiency.
  • the class label and polygon outline of the tracking object marked on the sample image and the respective change state true values of the multiple pixel coordinates can be used as supervision signals to iteratively train the neural network model.
  • the predicted category, predicted contour and predicted change state of the tracking object output by the neural network for the training data can be obtained.
  • the classification loss of the neural network model can be calculated.
  • the segmentation loss of the neural network model can be calculated according to the predicted contour of the tracking object corresponding to the training data and the pre-marked polygon contour.
  • the change detection loss of the neural network model is calculated according to the predicted change state of the tracking objects corresponding to the training data and the true value of the change state of each tracked object calculated in the above manner.
  • the parameters of the neural network model can be optimized based on the combined loss of classification loss, segmentation loss, and change detection loss.
  • the above optimization process can be performed iteratively until the set number of iterations is satisfied or the above joint loss converges to a specified range.
  • the neural network model is optimized based on the joint loss of the multi-task network, which can greatly improve the performance of the backbone network in the neural network model, so as to finally optimize the object tracking performance of the neural network model.
  • the object tracking method provided by the embodiments of the present application can be applied to various object tracking scenarios. For example, the scene of moving object tracking in sports competitions, the scene of ground object tracking based on remote sensing images, the scene of crowd tracking in a specific place, the scene of environmental change detection in a specific area, and so on.
  • the acquired two-phase images of the target environment may include: two-phase remote sensing images corresponding to the target environment.
  • the object tracking method provided by the embodiments of the present application may be implemented as a feature tracking method, which will be exemplarily described below.
  • FIG. 4 is a schematic flowchart of a method for tracking ground objects according to an exemplary embodiment of the present application. As shown in FIG. 4 , the method includes:
  • Step 401 Acquire a two-phase remote sensing image obtained by photographing the target environment.
  • Step 402 Determine respective feature maps of at least one candidate region in the two-phase remote sensing images according to the image features of the two-phase remote sensing images.
  • Step 403 Perform instance segmentation on the two-phase remote sensing image according to the feature map of the at least one candidate region to obtain at least one ground object included in the two-phase remote sensing image.
  • Step 404 Perform change detection on a plurality of pixel coordinates corresponding to the two-phase remote sensing image according to the feature map of the at least one candidate region, and obtain the corresponding change states of the plurality of pixel coordinates.
  • Step 405 Determine the respective change states of the at least one ground object according to the respective pixel coordinates of the at least one ground object and the respective change states of the plurality of pixel coordinates.
  • two-phase remote sensing images refer to the images collected by remote sensing satellites on the same target environment at two different times.
  • Ground object tracking refers to tracking the change history of the same ground object in remote sensing images obtained at different times, and the ground objects may include: buildings, roads, etc.
  • the ground object tracking method provided in this embodiment can be implemented based on the neural network model shown in FIG. 2 and FIG. 3 .
  • the input data of the neural network model is a multi-channel image
  • the multi-channel image includes a two-phase remote sensing image and a difference feature map of the two-phase remote sensing image.
  • the neural network model can make full use of the image information contained in the remote sensing image, and pay attention to the high-level semantic features of the pixels that have changed in the difference feature map, which further enriches the image features used to achieve ground object tracking and improves the follow-up The accuracy of the feature tracking results.
  • the histogram of the image of the second phase can be used as a reference, and the histogram of the image of the first phase can be transformed to balance the two phases. To reduce the influence of shooting conditions on subsequent calculation results.
  • the backbone network shown in FIG. 2 and FIG. 3 can perform feature extraction on an input multi-channel image to obtain a feature map of the multi-channel image. Based on the feature map of the multi-channel image, the region generation operation and the region feature aggregation operation can be performed to obtain the respective feature maps of at least one candidate region contained in the two-phase remote sensing image.
  • the segmentation networks shown in FIG. 2 and FIG. 3 can perform instance segmentation on the two-phase remote sensing images according to the respective feature maps of at least one candidate region included in the two-phase remote sensing images, and output two-phase remote sensing images.
  • the change detection network illustrated in FIG. 2 and FIG. 3 can output multi-channel change state detection results.
  • the multi-channel change state detection result may include the results of three change state classifications, including the newly added state, the unchanged state, and the disappearing state, of the pixel coordinates in the two-phase remote sensing image.
  • the output results of the segmentation network will be compared with the multi-channel change state detection results of the change detection network output, and the newly added ground object targets and unchanged ground object targets in the two-phase remote sensing image can be obtained. and disappearing objects.
  • the execution subject of each step of the method provided by the above embodiments may be the same device, or the method may also be executed by different devices.
  • the execution subject of steps 401 to 404 may be device A; for another example, the execution subject of steps 401 and 402 may be device A, and the execution subject of step 203 may be device B; and so on.
  • FIG. 5 is a schematic structural diagram of an electronic device provided by an exemplary embodiment of the present application.
  • the electronic device can be implemented as a server, such as a conventional server, cloud server, cloud host, virtual center and other servers.
  • the electronic device may include a memory 501 and a processor 502 .
  • Memory 501 is used to store computer programs and may be configured to store various other data to support operations on the electronic device. Examples of such data include instructions for any application or method operating on the electronic device, contact data, phonebook data, messages, pictures, videos, etc.
  • the memory 501 can be implemented by any type of volatile or non-volatile storage devices or their combination, such as static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), erasable Except programmable read only memory (EPROM), programmable read only memory (PROM), read only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read only memory
  • EPROM erasable except programmable read only memory
  • PROM programmable read only memory
  • ROM read only memory
  • magnetic memory flash memory
  • flash memory magnetic disk or optical disk.
  • the processor 502 coupled with the memory 501, is used for executing the computer program in the memory 501, so as to: obtain a two-phase image obtained by photographing the target environment; The respective feature maps of at least one candidate region in the two-phase images; according to the feature maps of the at least one candidate region, instance segmentation is performed on the two-phase images to obtain at least one tracking object included in the two-phase images ; And, according to the feature map of the at least one candidate area, change detection is performed on a plurality of pixel coordinates corresponding to the two-phase images, and the corresponding change states of the plurality of pixel coordinates are obtained; according to the at least one tracking The respective pixel coordinates corresponding to the objects and the respective change states of the plurality of pixel coordinates determine the respective change states of the at least one tracking object.
  • the processor 502 is further configured to: for the first image and the second image in the two-phase images, obtain the respective histograms of the first image and the second image; The histogram of the image is transformed to equalize the luminance information of the first image and the second image.
  • the processor 502 is specifically configured to: for the first image and the second image in the two-phase images, perform the difference operation of the pixel values on the pixel points with the same pixel coordinates in the first image and the second image, to obtain a difference value feature map;
  • the first image, the second image and the difference feature map are connected to obtain a multi-channel image;
  • the multi-channel image is input into the neural network model, so as to use the backbone network of the neural network model to
  • the feature extraction operation and the region generation operation are performed on the multi-channel image to obtain the feature map of the at least one candidate region.
  • the neural network model further includes: a multi-task network respectively connected to the backbone network, and the multi-task network includes: an instance segmentation network and a change detection network.
  • the instance segmentation network includes: a classification branch and a segmentation branch; the processor 502 performs instance segmentation on the two-phase image according to the feature map of the at least one candidate region to obtain the two-phase image.
  • the image contains at least one tracking object, it is specifically used to: based on the classification branch and according to the feature map of the at least one candidate region, identify the instance category of the tracking object corresponding to each of the at least one candidate region;
  • the segmentation branch calculates the pixel coordinates of the tracking objects corresponding to each of the at least one candidate region according to the feature map of the at least one candidate region.
  • the processor 502 when calculating the pixel coordinates of the tracking objects corresponding to the at least one candidate region based on the segmentation branch and the feature map of the at least one candidate region, is specifically configured to: for the said at least one candidate region. For any image in the two-phase image, determine the target candidate area included in the image; based on the segmentation branch, according to the feature map of the target candidate area, calculate that each pixel in the target candidate area belongs to Probability of the tracking object; according to the probability that each pixel in the target candidate region belongs to the tracking object, determine the pixel coordinates of the tracking object corresponding to the target candidate region.
  • the processor 502 when the processor 502 performs change detection on the plurality of pixel coordinates corresponding to the two-phase images in the feature map of the region, and obtains the corresponding change states of the plurality of pixel coordinates, the processor 502 is specifically used for: based on The change detection network calculates the probability that the plurality of pixel coordinates belong to at least one change state according to the feature map of the at least one candidate region; the probability that the plurality of pixel coordinates belong to at least one change state respectively , and output the corresponding change states of the plurality of pixel coordinates.
  • the at least one changing state includes at least one of a newly added state, a disappearing state, and an unchanged state.
  • the processor 502 when determining the respective change states of the at least one tracking object according to the respective pixel coordinates of the at least one tracking object and the respective change states of the plurality of pixel coordinates, is specifically configured to: For any candidate region in the at least one candidate region, from the pixel coordinates of the tracking objects corresponding to the at least one candidate region, determine the pixel coordinates of the target tracking object corresponding to the candidate region; according to the target tracking The pixel coordinates of the object and the respective change states corresponding to the plurality of pixel coordinates are used to determine the change state of the target tracking object; and the target tracking object is determined from the instance categories of the tracking objects corresponding to each of the at least one candidate area. The instance category of ; output the change state of the target tracking object and the instance category.
  • the two-phase images include: two-phase remote sensing images corresponding to the target environment.
  • the processor 502 is further configured to: acquire multiple sets of training data, where each set of training data includes two-phase sample images; wherein, each sample image is marked with a category label and a polygon outline of the tracking object; according to the The labeling differences between the two-phase sample images and the shooting sequence of the two-phase sample images are calculated, and the true values of the respective change states of the multiple pixel coordinates corresponding to the two-phase sample images are calculated; the multiple sets of training data Inputting the neural network model; using the category label and polygon outline of the tracking object marked on the sample image and the respective change state true values of the plurality of pixel coordinates as supervision signals, the neural network model is iteratively trained.
  • the electronic device further includes: a communication component 503 , a power supply component 504 and other components. Only some components are schematically shown in FIG. 5 , which does not mean that the electronic device only includes the components shown in FIG. 5 .
  • the communication component 503 is configured to facilitate wired or wireless communication between the device where the communication component is located and other devices.
  • the device where the communication component is located can access a wireless network based on a communication standard, such as WiFi, 2G, 3G, 4G or 5G, or a combination thereof.
  • the communication component receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communication component may be based on Near Field Communication (NFC) technology, Radio Frequency Identification (RFID) technology, Infrared Data Association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies to fulfill.
  • NFC Near Field Communication
  • RFID Radio Frequency Identification
  • IrDA Infrared Data Association
  • UWB Ultra Wideband
  • Bluetooth Bluetooth
  • the power supply component 504 provides power for various components of the equipment where the power supply component is located.
  • a power supply assembly may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to the equipment in which the power supply assembly is located.
  • the tracking objects in the two-phase images can be detected in detail; by performing change detection on the pixel coordinates corresponding to the two-phase images, changes at the pixel level can be obtained. Status check result. Based on the tracked object obtained by segmentation and the detection result of the change state at the pixel level, the change state of the tracked object in the two-phase image can be accurately obtained, and the accuracy and reliability of the object tracking result can be improved.
  • the electronic device shown in FIG. 5 can also execute the following feature tracking logic, wherein the processor 502 is configured to acquire a two-phase remote sensing image obtained by photographing the target environment;
  • the respective feature maps of the at least one candidate region determine the respective feature maps of the at least one candidate region in the two-phase remote sensing images according to the image features of the two-phase remote sensing images; according to the feature maps of the at least one candidate region, Instance segmentation is performed on the two-phase remote sensing image to obtain at least one ground object included in the two-phase remote sensing image; and, according to the feature map of the at least one candidate region, the corresponding Performing change detection on a plurality of pixel coordinates to obtain respective change states of the plurality of pixel coordinates; determining the at least one The respective changing states of the ground objects.
  • the embodiments of the present application further provide a computer-readable storage medium storing a computer program, and when the computer program is executed, each step that can be executed by an electronic device in the foregoing method embodiments can be implemented.
  • embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions
  • the apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • Memory may include forms of non-persistent memory, random access memory (RAM) and/or non-volatile memory in computer readable media, such as read only memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
  • RAM random access memory
  • ROM read only memory
  • flash RAM flash memory
  • Computer-readable media includes both persistent and non-permanent, removable and non-removable media, and storage of information may be implemented by any method or technology.
  • Information may be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cartridges, magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • computer-readable media does not include transitory computer-readable media, such as modulated data signals and carrier waves.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • Probability & Statistics with Applications (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

本申请实施例提供一种对象追踪、地物追踪方法、设备、系统及存储介质。在对象追踪方法中,通过对两时相图像进行实例分割,可细致地检测出两时像图像中的追踪对象;通过对两时相图像对应的像素坐标进行变化检测,可获取到像素级别的变化状态检测结果。基于分割得到的追踪对象以及像素级别的变化状态检测结果,可准确地获取到两时相图像中的追踪对象的变化状态,提升对象追踪结果的准确性和可靠性。

Description

对象追踪、地物追踪方法、设备、系统及存储介质
本申请要求2021年01月18日递交的申请号为202110064800.X、发明名称为“对象追踪、地物追踪方法、设备、系统及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机视觉技术领域,尤其涉及一种对象追踪、地物追踪方法、设备、系统及存储介质。
背景技术
现如今,卫星遥感影像分析技术不断发展,在环境监测、基础设施发展监测以及灾害响应等场景中得到了广泛应用。在遥感影像分析技术中,可基于多时相遥感影像提供的丰富的数据,对指定区域进行变化检测以及地物追踪。
目前,存在一种基于CNN(Convolutional Neural Networks,卷积神经网络)对遥感影像进行变化检测的方案。在这种方案中,可基于CNN网络对遥感影像进行特征提取,并基于提取到的特征输出遥感影像对应的二分类的变化检测结果。但是,这种检测方案未能充分利用遥感影像包含的信息,不易提升最终的变化检测结果的准确性。因此,有待提出一种新的解决方案。
发明内容
本申请的多个方面提供一种对象追踪、地物追踪方法、设备、系统及存储介质,用以提升对象追踪结果的准确性。
本申请实施例提供一种对象追踪方法,包括:获取对目标环境进行拍摄得到的两时相图像;根据所述两时相图像的图像特征,确定所述两时相图像中的至少一个候选区域各自的特征图;根据所述至少一个候选区域的特征图,对所述两时相图像进行实例分割,得到所述两时相图像包含的至少一个追踪对象;以及,根据所述至少一个候选区域的特征图,对所述两时相图像对应的多个像素坐标进行变化检测,得到所述多个像素坐标各自对应的变化状态;根据所述至少一个追踪对象各自对应的像素坐标以及所述多个像素坐标各自的变化状态,确定所述至少一个追踪对象各自的变化状态。
本申请实施例还提供一种地物追踪方法,包括:获取对目标环境进行拍摄得到的两时相遥感图像;根据所述两时相遥感图像的图像特征,确定所述两时相遥感图像中的至少一个候选区域各自的特征图;根据所述至少一个候选区域的特征图,对所述两时相遥感图像进行实例分割,得到所述两时相遥感图像包含的至少一个地面物体;以及,根据所述至少一个候选区域的特征图,对所述两时相遥感图像对应的多个像素坐标进行变化检测,得到所述多个像素坐标各自对应的变化状态;根据所述至少一个地面物体各自对应的像素坐标以及所述多个像素坐标各自的变化状态,确定所述至少一个地面物体各自的变化状态。
本申请实施例还提供一种电子设备,包括:存储器和处理器;所述存储器用于存储一条或多条计算机指令;所述处理器用于执行所述一条或多条计算机指令以用于执行本申请实施例提供的方法中的步骤。
本申请实施例还提供一种存储有计算机程序的计算机可读存储介质,计算机程序被处理器执行时能够实现本申请实施例提供的方法中的步骤。
本申请实施例提供的对象追踪方法中,通过对两时相图像进行实例分割,可细致地检测出两时像图像中的追踪对象;通过对两时相图像对应的像素坐标进行变化检测,可获取到像素级别的变化状态检测结果。基于分割得到的追踪对象以及像素级别的变化状态检测结果,可准确地获取到两时相图像中的追踪对象的变化状态,提升对象追踪结果的准确性和可靠性。
附图说明
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1为本申请一示例性实施例提供的对象追踪方法的流程示意图;
图2为本申请一示例性实施例提供的神经网络模型的结构示意图;
图3为本申请另一示例性实施例提供的神经网络模型的结构示意图;
图4为本申请一示例性实施例提供的地物追踪方法的流程示意图;
图5为本申请一示例性实施例提供的电子设备的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请具体实施例及相应的附图对本申请技术方案进行清楚、完整地描述。显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
针对现有技术中无法充分利用遥感影像包含的信息,不易提升最终的变化检测结果的准确性的技术问题,在本申请一些实施例中,提供了一种解决方案,以下结合附图,详细说明本申请各实施例提供的技术方案。
图1为本申请一示例性实施例提供的对象追踪方法的流程示意图,如图1所示,该方法包括
步骤101、获取对目标环境进行拍摄得到的两时相图像。
步骤102、根据所述两时相图像的图像特征,确定所述两时相图像中的至少一个候选区域各自的特征图。
步骤103、根据所述至少一个候选区域的特征图,对所述两时相图像进行实例分割,得到所述两时相图像包含的至少一个追踪对象。
步骤104、根据所述至少一个候选区域的特征图,对所述两时相图像对应的多个像 素坐标进行变化检测,得到所述多个像素坐标各自对应的变化状态。
步骤105、根据所述至少一个追踪对象各自对应的像素坐标以及所述多个像素坐标各自的变化状态,确定所述至少一个追踪对象各自的变化状态。
其中,对象追踪,指的是对不同时间获取到的图像中的同一对象的变化历程进行跟踪。
其中,目标环境,可包括运动场馆、工厂、博物馆、商超等室内场所,也可包括街区、郊外、田野、乡村、公路等任意的室外地面环境。
其中,两时相图像,可包括在不同时刻获取到的同一目标环境的一组影像。
其中,两时相图像的图像特征,可通过对两时相图像进行特征提取操作得到。提取到的图像特征可包括两时相图像的颜色、纹理、形状、空间关系等特征,还可包括两时相图像的高层级的语义特征。其中,高层级的语义特征包含较多的全局信息,可为对象追踪提供丰富的信息。
其中,两时相图像中的候选区域,指的是两时相图像中可能包含追踪对象的区域,可通过对两时相图像进行区域生成操作得到该至少一个候选区域。其中,区域生成操作,是指根据两时相图像的图像特征,从两时相图像中提取可能感兴趣区域,该感兴趣的区域,可能包含追踪对象。
在一些实施例中,根据两时相图像的图像特征,从两时相图像中提取候选区域时,可按照以固定比例预定义的基础检测框(anchors,锚)在两时相图像的特征图中选择建议的感兴趣区域(region proposals)。接下来,通过逻辑回归算法(例如softmax)判断基础检测框选定的区域属于追踪对象的概率,并用采用边框回归(bounding box regression)算法修正基础检测框,得到精确的候选框。
确定候选框后,可从两时相图像的特征图中,确定候选区域的特征图,并根据候选区域的区域特征,进一步对候选区域对应的追踪对象进行检测。
在本实施例中,对追踪对象进行检测,包括:从两时相图像中分割出追踪对象,并检测追踪对象在两时相图像对应的不同拍摄时刻是否产生变化。
其中,实例分割(instance segmentation),是指通过分割算法,逐像素地预测出图像中的每一个像素点的类别标签,即每一个像素点所属的实例类别,并同时区分出输入图像中同一实例类别的不同个体。
在本实施例中,基于两时相图像中的至少一个候选区域的特征图,可对两时相图像进行实例分割,得到两时相图像包含的至少一个追踪对象。该实例分割结果既包含每个追踪对象的实例类别,也可将同一实例类别的不同追踪对象个体分割出来。
例如,目标场景实现为街区时,通过实例分割,可从街区的图像中,分割出街区中的不同实例类别的物体以及每一种实例类别的不同物体,例如,可分割出每一幢房屋、每一条道路、每一辆车等。
其中,变化检测,是指通过变化检测算法,对不同时刻拍摄到的图像中像素坐标相同的像素点进行分析,以确定在不同时刻每个像素坐标的变化状态。
在本实施例中,两时相图像具有相同的尺寸以及分辨率,因而两时相图像的像素坐标相对应,即:两时相图像对应多个相同的像素坐标,以下称为两时相图像对应的多个像素坐标。例如,以矩阵K1表示两时相图像中的第一图像,以矩阵K2表示两时相图像中的第二图像时,K1、K2均为m*n的矩阵,m表示像素行数,n表示像素列数,m、n为正整数。K1、K2对应m*n个像素坐标。
基于两时相图像中的至少一个候选区域的特征图,可对两时相图像对应的多个像素坐标进行变化检测,即检测同一像素坐标在不同时刻拍摄到的图像中的差异,得到该多个像素坐标各自对应的变化状态。
继续以目标场景实现为街区为例,针对两时相街区图像中的图1以及图2,假设,在图1中,像素坐标为P0的像素点属于背景区域,在图2中,像素坐标为P0的像素点属于前景区域,则可认为像素坐标P0为发生变化的像素坐标。若图1的拍摄时刻早于图2,则可认为像素坐标P0发生了新增变化。反之,若图2的拍摄时刻早于图1,则可认为像素坐标P0发生了消失变化。若在图1以及图2中,像素坐标为P1的两个像素点均属于背景区域或均属于前景区域,则可确定像素坐标P1未发生变化。
像素坐标在不同时刻拍摄到的图像中产生的变化,可反映像素坐标对应的真实空间产生的变化。
综合分割出的每个追踪对象对应的像素坐标以及两时相图像对应的多个像素坐标各自的变化状态,可确定两时相图像中每个追踪对象的变化状态。基于每个追踪对象的变化状态以及实例分割结果包含的每个追踪对象的实例类别,可实现对追踪对象进行追踪。
在本实施例中,通过对两时相图像进行实例分割,可细致地检测出两时像图像中的追踪对象;通过对两时相图像对应的像素坐标进行变化检测,可获取到像素级别的变化状态检测结果。基于分割得到的追踪对象以及像素级别的变化状态检测结果,可准确地获取到两时相图像中的追踪对象的变化状态,提升对象追踪结果的准确性和可靠性。
在一些示例性的实施例中,本申请实施例提供的对象追踪方法,可通过神经网络模型实现,该神经网络模型可实现为基于Mask-RCNN(Mask-Region-CNN,一种用于预测区域的蒙版的卷积神经网络)的多任务网络。以下将进行示例性说明。
图2为本申请一示例性实施例提供的神经网络模型的结构示意图,如图2所示,该神经网络模型包括主干网络(backbone)以及与主干网络连接的多任务网络,其中,该多任务网络包括:实例分割网络以及变化检测网络。其中,主干网络用于根据输入的图像,输出区域特征图。实例分割网络和变化检测网络,可共享主干网络输出的区域特征图。
在一些示例性实施例中,输入神经网络模型的图像,可包含两时相图像以及两时相 图像的差值特征图。以下将进行示例性说明。
为便于描述和区分,将两时相图像描述为在不同时刻对目标环境进行拍摄得到的第一图像和第二图像。
可选地,可对该第一图像和该第二图像中像素坐标相同的像素点进行像素值的差运算,得到差值特征图。当以矩阵K1表示第一图像,以矩阵K2表示第二图像时,计算差值特征图的过程可描述为:△K=K2-K1,K1、K2以及△K均为m*n的矩阵。
接下来,将该第一图像、该第二图像以及该差值特征图进行连接(concat),得到多通道图像。其中,第一图像、第二图像以及差值特征图分别包含R(红)、G(绿)、B(蓝)三个颜色通道,故而拼接得到的多通道图像为9通道图像。
接下来,将该多通道图像输入神经网络模型。在该神经网络模型中,可利用该神经网络模型中的主干网络对该多通道图像进行特征提取操作,得到该多通道图像的特征图;根据该多通道图像的特征图进行区域生成操作,得到至少一个候选区域,并通过区域特征聚集操作,得到该至少一个候选区域的特征图。
可选地,如图3所示,主干网络可包括:特征提取网络、区域生成网络以及区域特征聚集层。其中,对该多通道图像进行特征提取操作,得到该多通道图像的特征图的步骤,可基于特征提取网络实现。可选地,该特征体提取网络可实现为卷积神经网络(CNN),例如,可包括但不限于VGG、ResNet等网络。
其中,根据两时相图像的特征图进行区域生成操作的步骤,可基于RPN(RegionProposal Network,区域生成网络)实现。RPN可按照以固定比例预定义的基础检测框(anchors)在两时相图像的特征图中选择建议的感兴趣区域(region proposals),通过softmax判断基础检测框选定的区域属于前景追踪对象的概率,并用采用边框回归(bounding box regression)算法修正基础检测框获得精确的候选框。
基于RPN网络确定精确的候选框后,可基于区域特征聚集层,从两时相图像的特征图中,确定候选区域的特征图。其中,该区域特征聚集层,可基于ROIAlign算法实现,本实施例不做赘述。
基于图2以及图3示意的神经网络,可充分使用两时相图像包含的信息。同时,输入神经网络模型的多通道图像包含差值特征图,该差值特征图用于表达第一图像和第二图像中发生变化的像素坐标。基于神经网络模型对差值特征图进行进一步特征提取,可充分利用发生变化的像素的语义特征等高级特征,更好地提取变化特征,有利于实现更精确的变化检测。
在一些示例性的实施例中,根据该两时相图像的图像特征,确定该两时相图像包含的至少一个候选区域各自的特征图之前,可进一步对第一图像和第二图像进行预处理,以降低其他因素对变化检测造成的干扰。
可选地,针对该两时相图像中的第一图像和第二图像,获取该第一图像和该第二图 像各自的直方图;根据该第一图像的直方图,对该第二图像的直方图进行变换,以均衡该第一图像和该第二图像的亮度信息。在进行上述直方图变换后,可执行计算第一图像和第二图像的差值特征图的操作。基于这种实施方式,可排除光照差异对不同时刻拍摄得到的第一图像和第二图像造成的差异,进一步提升变化检测结果的准确性。
图2所示的实例分割网络,可根据主干网络输出的至少一个候选区域的特征图,对该两时相图像进行实例分割,得到该两时相图像包含的至少一个追踪对象。
在一些示例性的实施例中,如图3所示,该实例分割网络可包括:分类分支以及分割分支。
可选地,基于该分类分支,可根据该至少一个候选区域的特征图,识别该至少一个候选区域各自对应的追踪对象的实例类别。例如,两时相图像为拍摄街区得到的图像时,该实例类别可包括:房屋类别、道路类别、车辆类别、行人类别、植物类别等等。
在一些情况下,当主干网络输出的多个候选区域的特征图时,该多个候选区域中的一部分候选区域位于两时相图像中的第一图像,另一部分候选区域位于两时相图像中的第二图像。分类分支,可输出第一图像包含的候选区域对应的追踪对象的实例类别,并输出第二图像包含的候选区域对应的追踪对象的实例类别。
例如,第一图像包含候选区域1、候选区域2以及候选区域3。第二图像包含候选区域4以及候选区域5。分类分支可根据候选区域1、候选区域2以及候选区域3各自的特征图,在第一图像上输出候选区域1对应的追踪对象为房屋、候选区域2对应的追踪对象为车辆、候选区域3对应的追踪对象为行人。分类分支可根据候选区域4、候选区域5各自的特征图,在第二图像上输出候选区域4对应的追踪对象为房屋、候选区域5对应的追踪对象为车辆。
可选地,基于该分割分支,可根据该至少一个候选区域的特征图,计算该至少一个候选区域各自对应的追踪对象的像素坐标。
在一些情况下,当主干网络输出的多个候选区域的特征图时,该多个候选区域中的一部分候选区域位于两时相图像中的第一图像,另一部分候选区域位于两时相图像中的第二图像。分割分支可分别输出第一图像包含的追踪对象的像素坐标以及第二图像包含的追踪对象的像素坐标。
以下将以两时相图像中的第一图像为例,对分割网络的分割操作进行示例性说明。
可选地,可从该至少一个候选区域中,确定第一图像包含的目标候选区域。该目标候选区域,可包括一个候选区域或者多个候选区域。基于分割分支,可根据目标候选区域的特征图,计算目标候选区域中的每一像素点属于追踪对象的概率。根据目标候选区域中的每一像素点属于追踪对象的概率,可确定目标候选区域对应的追踪对象的像素坐标。根据该追踪对象的像素坐标,可精确地确定追踪对象的位置信息。进而,可采用多边形的轮廓,从第一图像中分割出追踪对象。
通常,针对目标候选区域中的一像素点而言,若该像素点属于追踪对象的概率大于设定的第一概率阈值,可确定该像素点对应属于追踪对象。其中,该第一概率阈值可根据需求进行设置,例如可设置为60%、80%、90%或者其他可选的数值,本实施例不做限制。
可选地,分割分支计算得到每个候选区域对应的追踪对象的像素坐标后,可根据第一图像以及第二图像各自包含的追踪对象的像素坐标,输出第一图像以及第二图像各自的实例分割蒙版(Mask)。
其中,针对每张图像而言,若分割分支分割出该图像中的多种实例类别的追踪对象,则分割分支可输出该多种实例类别各自对应的实例分割蒙版。其中,每种实例类别对应一个实例分割蒙版。也就是说,当分割分支对M种实例类别的追踪对象进行分割时,可针对待分割的图像输出M个实例分割蒙版,M为正整数。其中,每一种实例类别对应的实例分割蒙版中,每个像素坐标存放的值,用于表示该像素坐标属于该实例类别的追踪对象的概率。
以下将结合具体的例子对实例分割蒙版进行示例性说明。
假设,分割分支可分割出房屋、车辆以及道路等三种实例类别的追踪对象,即M=3。针对两时相图像中的任一图像,分割分支可输出该图像的三通道的分割蒙版。其中,第一个通道的分割蒙版中,每个像素坐标存放的值,用于表示该像素坐标属于房屋类别的概率;第二个通道的分割蒙版中,每个像素坐标存放的值,用于表示该像素坐标属于车辆类别的概率;第一个通道的分割蒙版中,每个像素坐标存放的值,用于表示该像素坐标属于车辆类别的概率;第三个通道的分割蒙版中,每个像素坐标存放的值,用于表示该像素坐标属于道路类别的概率。
可选地,若像素坐标属于某一实例类别的追踪对象的概率大于设定的第一概率阈值,则可在实例分割蒙版中确定该像素坐标存放的值为1,否则,在实例分割蒙版中确定该像素坐标存放的值为0,进而得到该实例类别对应的二值的实例分割蒙版。
图2示意的变化检测网络,可根据主干网络输出的至少一个候选区域的特征图,对该两时相图像对应的多个像素坐标进行变化检测,得到该多个像素坐标各自对应的变化状态。
可选地,基于该变化检测网络,可根据该至少一个候选区域的特征图,计算两时相图像对应的多个像素坐标分别属于至少一种变化状态的概率。根据该多个像素坐标分别属于至少一种变化状态的概率,可输出多个像素坐标各自对应的变化状态。
在一些实施例中,可通过变化蒙版来描述多个像素坐标各自对应的变化状态。当变化检测网络可检测N种变化状态时,可输出该多个像素对应的N个变化蒙版,N为正整数。其中,每种变化状态对应一个变化蒙版。其中,任一种变化状态对应的实例分割蒙版中,每个像素坐标存放的值,用于表示该像素坐标属于该变化状态的概率。
可选地,该至少一种变化状态可包括:新增状态、消失状态以及不变状态中的至少一种。
当该至少一种变化状态包括新增状态、消失状态以及不变状态时,N=3,变化检测网络可输出三通道的变化蒙版,即:新增状态对应的变化蒙版1、消失状态对应的变化蒙版2以及不变状态对应的变化蒙版3。
其中,变化蒙版1中的每个像素坐标存放的值,用于表示该像素坐标属于新增状态的概率;变化蒙版2中的每个像素坐标存放的值,用于表示该像素坐标属于消失状态的概率;变化蒙版3中的每个像素坐标存放的值,用于表示该像素坐标属于不变状态的概率。
可选地,若像素坐标属于某一变化状态的概率大于设定的第二概率阈值,则可在该变化状态对应的变化蒙版中确定该像素坐标存放的值为1,否则,在该变化蒙版中确定该像素坐标存放的值为0,进而得到该变化状态对应的二值的变化蒙版。其中,该第二概率阈值可根据需求进行设置,例如可设置为60%、80%、90%或者其他可选的数值,本实施例不做限制。
基于上述实施方式,神经网络模型可集成分割任务和变化检测任务,基于一个端到端的神经网络模型,即可获取细致的变化检测结果以及精确的对象分割结果。
基于上述实施例,可获取实例分割网络输出的两时相图像中的至少一个追踪对象各自的实例类别以及至少一个追踪对象各自的像素坐标;并可获取到变化检测网络输出的两时相图像对应的多个像素坐标的各自变化状态。将上述多种输出信息进行对比,即可确定该至少一个追踪对象各自的变化状态。
以该至少一个候选区域中的任一候选区域为例,可从该至少一个候选区域各自对应的追踪对象的像素坐标中,确定该候选区域对应的目标追踪对象的像素坐标。接下来,根据该目标追踪对象的像素坐标以及两时相图像对应的多个像素坐标各自对应的变化状态,确定该目标追踪对象对应的像素坐标的变化状态,即目标追踪对象的变化状态。同时,可从该至少一个候选区域各自对应的追踪对象的实例类别中,确定该目标追踪对象的实例类别,并输出该目标追踪对象的变化状态以及实例类别。
同理,可基于上述方式,输出每个候选区域对应的追踪对象的变化状态和实例类别,进而实现基于两时相图像的对象追踪操作。
例如,针对拍摄街区得到的两时相图像,可输出消失状态的房屋1、新增状态的房屋2、不变状态的公路1、新增状态的车辆1、消失状态的车辆2等等对象追踪信息。
当实例分割网络以及变化检测网络的输出结果采用蒙版进行表示时,实例分割网络可输出第一图像的M个实例分割蒙版以及第二图像的M个实例分割蒙版,变化检测网络输出两时相图像对应的多个像素坐标的N个变化蒙版。
接下来,将第一图像的M个实例分割蒙版分别与N个变化蒙版进行叠加,即可确定 第一图像中的追踪对象各自的变化状态。同理,将第二图像的M个实例分割蒙版分别与N个变化蒙版进行叠加,即可确定第二图像中的追踪对象各自的变化状态。
以下将结合具体的例子进行示例性说明。假设M=1,N=3。
假设,第一图像的实例分割结果中,房屋类别的追踪对象1的像素坐标为{P1},房屋类别的追踪对象2的像素坐标为{P2}。即,第一图像的分割蒙版中,像素坐标{P1}以及像素坐标{P2}存放的值为1,其余像素坐标存放的值为0。
假设,第二图像的实例分割结果中,房屋类别的追踪对象3的像素坐标为{P3},房屋类别的追踪对象4的像素坐标为{P4}。即,第二图像的分割蒙版中,像素坐标{P3}以及像素坐标{P4}存放的值为1,其余像素坐标存放的值为0。
假设,第一变化蒙版中,像素坐标{P1}、{P3}的变化状态为不变状态。即,第一变化蒙版中,像素坐标{P1}、{P3}存放的值为1,其余像素坐标存放的值为0。那么,将第一图像的分割蒙版和第一变化蒙版进行叠加,可确定房屋类别的追踪对象1为不变的追踪对象。
假设,第二变化蒙版中,像素坐标{P2}的变化状态为消失状态。即,第二变化蒙版中,像素坐标{P2}存放的值为1,其余像素坐标存放的值为0。那么,将第一图像的分割蒙版和第二变化蒙版进行叠加,可确定房屋类别的追踪对象2为消失的追踪对象。
假设,第三变化蒙版中,像素坐标{P4}的变化状态为新增状态。即,第三变化蒙版中,像素坐标{P4}存放的值为1,其余像素坐标存放的值为0。那么,将第一图像的分割蒙版和第三变化蒙版进行叠加,可确定房屋类别的追踪对象4为新增的追踪对象。
需要说明的是,本申请上述以及下述各实施例提供的神经网络模型,可基于多组训练数据训练得到。其中,每组训练数据包括两时相样本图像,每张样本图像上标注有追踪对象的类别标签以及该追踪对象的多边形轮廓。
获取到多组训练数据后,可将该多组训练数据输入该图2以及图3所示神经网络模型,以对该神经网络模型进行迭代训练。
其中,样本图像上标注的追踪对象的类别标签和多边形轮廓可直接作为图2以及图3所示的实例分割网络的监督信号。
其中,化检测网络的监督信号,可基于两时样本相图像之间的标注差异以及两时样本相图像的拍摄顺序计算得到。
可选地,针对两时相样本图像中拍摄时刻较早的第一样本图像以及拍摄时刻较晚的第二样本图像而言,若第一样本图像和第二样本图像中,相同的像素坐标对应相同的标注值时,可将该像素坐标标记为未发生变化的像素坐标。
针对一像素坐标,若该像素坐标在第一样本图像中位于追踪对象的多边形轮廓之外,但在第二样本图像中位于追踪对象的多边形轮廓之内,则将该像素坐标标记为发生新增变化的像素坐标。反之,若该像素坐标在第一样本图像中位于追踪对象的多边形轮廓之 内,但在第二样本图像中位于追踪对象的多边形轮廓之外,则将该像素坐标标记为发生消失变化的像素坐标。
基于标记过程,可确定训练数据中的每个像素坐标的变化状态真值。在这种通过已有标注值计算变化状态的真值的方式,可避免增加标注工作量,提升训练效率。
将多组训练数据输入神经网络模型之后,可将样本图像上标注的追踪对象的类别标签和多边形轮廓以及该多个像素坐标各自的变化状态真值作为监督信号,对神经网络模型进行迭代训练。
在每一轮训练中,将该训练数据输入神经网络模型之后,可获取神经网络针对训练数据输出的追踪对象的预测类别、预测轮廓以及预测变化状态。
根据该训练数据对应的追踪对象的预测类别和预先标注的类别标签,可计算该神经网络模型的分类损失。根据训练数据对应的追踪对象的预测轮廓以及预先标注的多边形轮廓,可计算该神经网络模型的分割损失。根据训练数据各自对应的追踪对象的预测变化状态以及通过上述方式计算得到的每个追踪对象的变化状态真值,计算该神经网络模型的变化检测损失。
接下来,可根据分类损失、分割损失以及变化检测损失的联合损失,对神经网络模型的参数进行优化。上述优化过程可以迭代进行,直至满足设定的迭代次数或者上述联合损失收敛至指定的范围。
在这种训练方式中,基于多任务网络的联合损失对该神经网络模型进行优化,可极大提升神经网络模型中的主干网络的性能,以最终优化神经网络模型的对象追踪性能。
本申请实施例提供的对象追踪方法,可应用在多种对象追踪场景。例如,运动比赛中的运动对象追踪场景、基于遥感影像的地物追踪场景、特定场所中的人群追踪场景、特定区域中的环境变化检测场景等等。
在基于遥感影像的地物追踪场景中,获取到的目标环境的两时相图像可包括:该目标环境对应的两时相遥感图像。
在基于遥感影像的地物追踪场景中,本申请实施例提供的对象追踪方法,可实现为地物追踪方法,以下将进行示例性说明。
图4为本申请一示例性实施例提供的地物追踪方法的流程示意图,如图4所示,该方法包括
步骤401、获取对目标环境进行拍摄得到的两时相遥感图像。
步骤402、根据所述两时相遥感图像的图像特征,确定所述两时相遥感图像中的至少一个候选区域各自的特征图。
步骤403、根据所述至少一个候选区域的特征图,对所述两时相遥感图像进行实例分割,得到所述两时相遥感图像包含的至少一个地面物体。
步骤404、根据所述至少一个候选区域的特征图,对所述两时相遥感图像对应的多 个像素坐标进行变化检测,得到所述多个像素坐标各自对应的变化状态。
步骤405、根据所述至少一个地面物体各自对应的像素坐标以及所述多个像素坐标各自的变化状态,确定所述至少一个地面物体各自的变化状态。
其中,两时相遥感图像,是指在两个不同时刻采用遥感卫星对同一目标环境采集到的图像。地物追踪,指的是对不同时间获取到的遥感图像中的同一地面物体的变化历程进行跟踪,该地面物体可包括:建筑、道路等。
本实施例提供的地物追踪方法,可基于图2以及图3示意的神经网络模型实现。
在一些可选的实施例中,神经网络模型的输入数据为多通道图像,该多通道图像包括两时相遥感图像以及两时相遥感图像的差值特征图。基于该多通道图像,神经网络模型可充分利用遥感图像包含的影像信息,并关注差值特征图中发生变化的像素的高级语义特征,进一步丰富了用于实现地物追踪的图像特征,提升后续地物追踪结果的准确性。
其中,在计算两时相遥感图像的差值特征图之前,可以将第二个时相的图像的直方图作为基准,对第一个时相的图像的直方图进行变换,以均衡两个时向的图像的光照,降低拍摄条件对后续计算结果的影响。
在一些可选的实施例中,图2以及图3示意的主干网络,可对输入的多通道图像进行特征提取,得到多通道图像的特征图。基于多通道图像的特征图,可进行区域生成操作和区域特征聚集操作,得到两时相遥感图像包含的至少一个候选区域各自的特征图。
在一些可选的实施例中,图2以及图3示意的分割网络,可根据两时相遥感图像包含的至少一个候选区域各自的特征图,对两时相遥感图像进行实例分割,输出两时相遥感图像中的地面物体的类别以及地面物体对应的像素坐标。
在一些可选的实施例中,图2以及图3示意的变化检测网络,可输出多通道的变化状态检测结果。可选地,该多通道的变化状态检测结果,可包括两时相遥感图像中的像素坐标在新增状态、不变状态以及消失状态等三个变化状态分类中的结果。
接下来,将将分割网络的输出结果,分别与变化检测网络的输出多通道的变化状态检测结果进行比对,可得到两时相遥感图像中新增的地物目标、不变的地物目标以及消失的地物目标。
需要说明的是,上述实施例所提供方法的各步骤的执行主体均可以是同一设备,或者,该方法也由不同设备作为执行主体。比如,步骤401至步骤404的执行主体可以为设备A;又比如,步骤401和402的执行主体可以为设备A,步骤203的执行主体可以为设备B;等等。
另外,在上述实施例及附图中的描述的一些流程中,包含了按照特定顺序出现的多个操作,但是应该清楚了解,这些操作可以不按照其在本文中出现的顺序来执行或并行执行,操作的序号如401、402等,仅仅是用于区分开各个不同的操作,序号本身不代表任何的执行顺序。另外,这些流程可以包括更多或更少的操作,并且这些操作可以按顺 序执行或并行执行。需要说明的是,本文中的“第一”、“第二”等描述,是用于区分不同的消息、设备、模块等,不代表先后顺序,也不限定“第一”和“第二”是不同的类型。
图5示意了本申请一示例性实施例提供的电子设备的结构示意图,该电子设备可实现为服务器,例如可实现为常规服务器、云服务器、云主机、虚拟中心等服务器等设备。如图5所示,该电子设备可包括存储器501以及处理器502。
存储器501,用于存储计算机程序,并可被配置为存储其它各种数据以支持在电子设备上的操作。这些数据的示例包括用于在电子设备上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。
其中,存储器501可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。
处理器502,与存储器501耦合,用于执行存储器501中的计算机程序,以用于:获取对目标环境进行拍摄得到的两时相图像;根据所述两时相图像的图像特征,确定所述两时相图像中的至少一个候选区域各自的特征图;根据所述至少一个候选区域的特征图,对所述两时相图像进行实例分割,得到所述两时相图像包含的至少一个追踪对象;以及,根据所述至少一个候选区域的特征图,对所述两时相图像对应的多个像素坐标进行变化检测,得到所述多个像素坐标各自对应的变化状态;根据所述至少一个追踪对象各自对应的像素坐标以及所述多个像素坐标各自的变化状态,确定所述至少一个追踪对象各自的变化状态。
进一步可选地,处理器502在至少一个候选区域各自的特征图根据所述两时相图像的图像特征,确定所述两时相图像中的至少一个候选区域各自的特征图之前,还用于:针对所述两时相图像中的第一图像和第二图像,获取所述第一图像和所述第二图像各自的直方图;根据所述第一图像的直方图,对所述第二图像的直方图进行变换,以均衡所述第一图像和所述第二图像的亮度信息。
进一步可选地,处理器502在至少一个候选区域各自的特征图根据所述两时相图像的图像特征,确定所述两时相图像中的至少一个候选区域各自的特征图时,具体用于:针对所述两时相图像中的第一图像和第二图像,对所述第一图像和所述第二图像中像素坐标相同的像素点进行像素值的差运算,得到差值特征图;将所述第一图像、所述第二图像以及所述差值特征图进行连接,得到多通道图像;将所述多通道图像输入神经网络模型,以利用所述神经网络模型的主干网络对所述多通道图像进行特征提取操作和区域生成操作,得到所述至少一个候选区域的特征图。
进一步可选地,所述神经网络模型还包括:与所述主干网络分别连接的多任务网络, 所述多任务网络包括:实例分割网络以及变化检测网络。
进一步可选地,所述实例分割网络包括:分类分支以及分割分支;处理器502在根据所述至少一个候选区域的特征图,对所述两时相图像进行实例分割,得到所述两时相图像包含的至少一个追踪对象时,具体用于:基于所述分类分支,根据所述至少一个候选区域的特征图,识别所述至少一个候选区域各自对应的追踪对象的实例类别;以及,基于所述分割分支,根据所述至少一个候选区域的特征图,计算所述至少一个候选区域各自对应的追踪对象的像素坐标。
进一步可选地,处理器502在基于所述分割分支,根据所述至少一个候选区域的特征图,计算所述至少一个候选区域各自对应的追踪对象的像素坐标时,具体用于:针对所述两时相图像中的任一张图像,确定所述图像包含的目标候选区域;基于所述分割分支,根据所述目标候选区域的特征图,计算所述目标候选区域中的每一像素点属于追踪对象的概率;根据所述目标候选区域中的每一像素点属于追踪对象的概率,确定所述目标候选区域对应的追踪对象的像素坐标。
进一步可选地,处理器502在区域的特征图,对所述两时相图像对应的多个像素坐标进行变化检测,得到所述多个像素坐标各自对应的变化状态时,具体用于:基于所述变化检测网络,根据所述至少一个候选区域的特征图,计算所述多个像素坐标分别属于至少一种变化状态的概率;根据所述多个像素坐标分别属于至少一种变化状态的概率,输出所述多个像素坐标各自对应的变化状态。
进一步可选地,所述至少一种变化状态包括:新增状态、消失状态以及不变状态中的至少一种。
进一步可选地,处理器502在根据所述至少一个追踪对象各自对应的像素坐标以及所述多个像素坐标各自的变化状态,确定所述至少一个追踪对象各自的变化状态时,具体用于:针对所述至少一个候选区域中的任一候选区域,从所述至少一个候选区域各自对应的追踪对象的像素坐标中,确定所述候选区域对应的目标追踪对象的像素坐标;根据所述目标追踪对象的像素坐标以及所述多个像素坐标各自对应的变化状态,确定所述目标追踪对象的变化状态;从所述至少一个候选区域各自对应的追踪对象的实例类别中,确定所述目标追踪对象的实例类别;输出所述目标追踪对象的变化状态以及实例类别。
进一步可选地,所述两时相图像包括:所述目标环境对应的两时相遥感图像。
进一步可选地,处理器502还用于:获取多组训练数据,每组训练数据包括两时相样本图像;其中,每张样本图像上标注有追踪对象的类别标签和多边形轮廓;根据所述两时样本相图像之间的标注差异以及所述两时样本相图像的拍摄顺序,计算所述两时相样本图像对应的多个像素坐标各自的变化状态真值;将所述多组训练数据输入所述神经网络模型;将所述样本图像上标注的追踪对象的类别标签和多边形轮廓以及所述多个像素坐标各自的变化状态真值作为监督信号,对所述神经网络模型进行迭代训练。
进一步,如图5所示,该电子设备还包括:通信组件503、电源组件504等其它组件。图5中仅示意性给出部分组件,并不意味着电子设备只包括图5所示组件。
其中,通信组件503被配置为便于通信组件所在设备和其他设备之间有线或无线方式的通信。通信组件所在设备可以接入基于通信标准的无线网络,如WiFi,2G、3G、4G或5G,或它们的组合。在一个示例性实施例中,通信组件经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,通信组件可基于近场通信(NFC)技术、射频识别(RFID)技术、红外数据协会(IrDA)技术、超宽带(UWB)技术、蓝牙(BT)技术和其他技术来实现。
其中,电源组件504,为电源组件所在设备的各种组件提供电力。电源组件可以包括电源管理系统,一个或多个电源,及其他与为电源组件所在设备生成、管理和分配电力相关联的组件。
本实施例中,通过对两时相图像进行实例分割,可细致地检测出两时像图像中的追踪对象;通过对两时相图像对应的像素坐标进行变化检测,可获取到像素级别的变化状态检测结果。基于分割得到的追踪对象以及像素级别的变化状态检测结果,可准确地获取到两时相图像中的追踪对象的变化状态,提升对象追踪结果的准确性和可靠性。
除前述实施例记载的对象追踪逻辑之外,图5所示的电子设备还可执行如下的地物追踪逻辑,其中,处理器502用于获取对目标环境进行拍摄得到的两时相遥感图像;至少一个候选区域各自的特征图根据所述两时相遥感图像的图像特征,确定所述两时相遥感图像中的至少一个候选区域各自的特征图;根据所述至少一个候选区域的特征图,对所述两时相遥感图像进行实例分割,得到所述两时相遥感图像包含的至少一个地面物体;以及,根据所述至少一个候选区域的特征图,对所述两时相遥感图像对应的多个像素坐标进行变化检测,得到所述多个像素坐标各自对应的变化状态;根据所述至少一个地面物体各自对应的像素坐标以及所述多个像素坐标各自的变化状态,确定所述至少一个地面物体各自的变化状态。
相应地,本申请实施例还提供一种存储有计算机程序的计算机可读存储介质,计算机程序被执行时能够实现上述方法实施例中可由电子设备执行的各步骤。
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些 计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。
以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。

Claims (14)

  1. 一种对象追踪方法,其特征在于,包括:
    获取对目标环境进行拍摄得到的两时相图像;
    根据所述两时相图像的图像特征,确定所述两时相图像中的至少一个候选区域各自的特征图;
    根据所述至少一个候选区域的特征图,对所述两时相图像进行实例分割,得到所述两时相图像包含的至少一个追踪对象;以及,
    根据所述至少一个候选区域的特征图,对所述两时相图像对应的多个像素坐标进行变化检测,得到所述多个像素坐标各自对应的变化状态;
    根据所述至少一个追踪对象各自对应的像素坐标以及所述多个像素坐标各自的变化状态,确定所述至少一个追踪对象各自的变化状态。
  2. 根据权利要求1所述的方法,其特征在于,根据所述两时相图像的图像特征,确定所述两时相图像中的至少一个候选区域各自的特征图之前,还包括:
    针对所述两时相图像中的第一图像和第二图像,获取所述第一图像和所述第二图像各自的直方图;
    根据所述第一图像的直方图,对所述第二图像的直方图进行变换,以均衡所述第一图像和所述第二图像的亮度信息。
  3. 根据权利要求1所述的方法,其特征在于,根据所述两时相图像的图像特征,确定所述两时相图像中的至少一个候选区域各自的特征图,包括:
    针对所述两时相图像中的第一图像和第二图像,对所述第一图像和所述第二图像中像素坐标相同的像素点进行像素值的差运算,得到差值特征图;
    将所述第一图像、所述第二图像以及所述差值特征图进行连接,得到多通道图像;
    将所述多通道图像输入神经网络模型,以利用所述神经网络模型的主干网络对所述多通道图像进行特征提取操作和区域生成操作,得到所述至少一个候选区域的特征图。
  4. 根据权利要求3所述的方法,其特征在于,所述神经网络模型还包括:与所述主干网络分别连接的多任务网络,所述多任务网络包括:实例分割网络以及变化检测网络。
  5. 根据权利要求4所述的方法,其特征在于,所述实例分割网络包括:分类分支以及分割分支;根据所述至少一个候选区域的特征图,对所述两时相图像进行实例分割,得到所述两时相图像包含的至少一个追踪对象,包括:
    基于所述分类分支,根据所述至少一个候选区域的特征图,识别所述至少一个候选区域各自对应的追踪对象的实例类别;以及,
    基于所述分割分支,根据所述至少一个候选区域的特征图,计算所述至少一个候选区域各自对应的追踪对象的像素坐标。
  6. 根据权利要求5所述的方法,其特征在于,基于所述分割分支,根据所述至少一个候选区域的特征图,计算所述至少一个候选区域各自对应的追踪对象的像素坐标,包 括:
    针对所述两时相图像中的任一张图像,确定所述图像包含的目标候选区域;
    基于所述分割分支,根据所述目标候选区域的特征图,计算所述目标候选区域中的每一像素点属于追踪对象的概率;
    根据所述目标候选区域中的每一像素点属于追踪对象的概率,确定所述目标候选区域对应的追踪对象的像素坐标。
  7. 根据权利要求5所述的方法,其特征在于,根据所述至少一个候选区域的特征图,对所述两时相图像对应的多个像素坐标进行变化检测,得到所述多个像素坐标各自对应的变化状态,包括:
    基于所述变化检测网络,根据所述至少一个候选区域的特征图,计算所述多个像素坐标分别属于至少一种变化状态的概率;
    根据所述多个像素坐标分别属于至少一种变化状态的概率,输出所述多个像素坐标各自对应的变化状态。
  8. 根据权利要求7所述的方法,其特征在于,所述至少一种变化状态包括:新增状态、消失状态以及不变状态中的至少一种。
  9. 根据权利要求7所述的方法,其特征在于,根据所述至少一个追踪对象各自对应的像素坐标以及所述多个像素坐标各自的变化状态,确定所述至少一个追踪对象各自的变化状态,包括:
    针对所述至少一个候选区域中的任一候选区域,从所述至少一个候选区域各自对应的追踪对象的像素坐标中,确定所述候选区域对应的目标追踪对象的像素坐标;
    根据所述目标追踪对象的像素坐标以及所述多个像素坐标各自对应的变化状态,确定所述目标追踪对象的变化状态;
    从所述至少一个候选区域各自对应的追踪对象的实例类别中,确定所述目标追踪对象的实例类别;
    输出所述目标追踪对象的变化状态以及实例类别。
  10. 根据权利要求1-9任一项所述的方法,其特征在于,所述两时相图像包括:所述目标环境对应的两时相遥感图像。
  11. 根据权利要求3-9任一项所述的方法,其特征在于,还包括:
    获取多组训练数据,每组训练数据包括两时相样本图像;其中,每张样本图像上标注有追踪对象的类别标签和多边形轮廓;
    根据所述两时样本相图像之间的标注差异以及所述两时样本相图像的拍摄顺序,计算所述两时相样本图像对应的多个像素坐标各自的变化状态真值;
    将所述多组训练数据输入所述神经网络模型;
    将所述样本图像上标注的追踪对象的类别标签和多边形轮廓以及所述多个像素坐标各自的变化状态真值作为监督信号,对所述神经网络模型进行迭代训练。
  12. 一种地物追踪方法,其特征在于,包括:
    获取对目标环境进行拍摄得到的两时相遥感图像;
    根据所述两时相遥感图像的图像特征,确定所述两时相遥感图像中的至少一个候选区域各自的特征图;
    根据所述至少一个候选区域的特征图,对所述两时相遥感图像进行实例分割,得到所述两时相遥感图像包含的至少一个地面物体;以及,
    根据所述至少一个候选区域的特征图,对所述两时相遥感图像对应的多个像素坐标进行变化检测,得到所述多个像素坐标各自对应的变化状态;
    根据所述至少一个地面物体各自对应的像素坐标以及所述多个像素坐标各自的变化状态,确定所述至少一个地面物体各自的变化状态。
  13. 一种电子设备,其特征在于,包括:存储器和处理器;
    所述存储器用于存储一条或多条计算机指令;
    所述处理器用于执行所述一条或多条计算机指令以用于:执行权利要求1-12任一项所述的方法中的步骤。
  14. 一种存储有计算机程序的计算机可读存储介质,其特征在于,计算机程序被处理器执行时能够实现权利要求1-12任一项所述的方法中的步骤。
PCT/CN2022/071259 2021-01-18 2022-01-11 对象追踪、地物追踪方法、设备、系统及存储介质 WO2022152110A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110064800.XA CN114820695A (zh) 2021-01-18 2021-01-18 对象追踪、地物追踪方法、设备、系统及存储介质
CN202110064800.X 2021-01-18

Publications (1)

Publication Number Publication Date
WO2022152110A1 true WO2022152110A1 (zh) 2022-07-21

Family

ID=82447948

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/071259 WO2022152110A1 (zh) 2021-01-18 2022-01-11 对象追踪、地物追踪方法、设备、系统及存储介质

Country Status (2)

Country Link
CN (1) CN114820695A (zh)
WO (1) WO2022152110A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107437091A (zh) * 2016-03-23 2017-12-05 西安电子科技大学 多层限制玻尔兹曼机的sar图像正负类变化检测方法
CN108287872A (zh) * 2017-12-28 2018-07-17 百度在线网络技术(北京)有限公司 一种建筑物变化检测方法、装置、服务器和存储介质
CN108573276A (zh) * 2018-03-12 2018-09-25 浙江大学 一种基于高分辨率遥感影像的变化检测方法
CN110163207A (zh) * 2019-05-20 2019-08-23 福建船政交通职业学院 一种基于Mask-RCNN船舶目标定位方法及存储设备
CN110969088A (zh) * 2019-11-01 2020-04-07 华东师范大学 一种基于显著性检测与深度孪生神经网络的遥感影像变化检测方法
US10713794B1 (en) * 2017-03-16 2020-07-14 Facebook, Inc. Method and system for using machine-learning for object instance segmentation

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107437091A (zh) * 2016-03-23 2017-12-05 西安电子科技大学 多层限制玻尔兹曼机的sar图像正负类变化检测方法
US10713794B1 (en) * 2017-03-16 2020-07-14 Facebook, Inc. Method and system for using machine-learning for object instance segmentation
CN108287872A (zh) * 2017-12-28 2018-07-17 百度在线网络技术(北京)有限公司 一种建筑物变化检测方法、装置、服务器和存储介质
CN108573276A (zh) * 2018-03-12 2018-09-25 浙江大学 一种基于高分辨率遥感影像的变化检测方法
CN110163207A (zh) * 2019-05-20 2019-08-23 福建船政交通职业学院 一种基于Mask-RCNN船舶目标定位方法及存储设备
CN110969088A (zh) * 2019-11-01 2020-04-07 华东师范大学 一种基于显著性检测与深度孪生神经网络的遥感影像变化检测方法

Also Published As

Publication number Publication date
CN114820695A (zh) 2022-07-29

Similar Documents

Publication Publication Date Title
Song et al. Vision-based vehicle detection and counting system using deep learning in highway scenes
Yang et al. Drivingstereo: A large-scale dataset for stereo matching in autonomous driving scenarios
US8620026B2 (en) Video-based detection of multiple object types under varying poses
WO2019218824A1 (zh) 一种移动轨迹获取方法及其设备、存储介质、终端
Vargas-Muñoz et al. Correcting rural building annotations in OpenStreetMap using convolutional neural networks
US20150063689A1 (en) Multi-cue object detection and analysis
Azevedo et al. Automatic vehicle trajectory extraction by aerial remote sensing
US9104919B2 (en) Multi-cue object association
Wang et al. YOLOv3-MT: A YOLOv3 using multi-target tracking for vehicle visual detection
Zhang et al. Semi-automatic road tracking by template matching and distance transformation in urban areas
US20150379371A1 (en) Object Detection Utilizing Geometric Information Fused With Image Data
Mihail et al. Sky segmentation in the wild: An empirical study
Wan et al. A novel neural network model for traffic sign detection and recognition under extreme conditions
US20170336215A1 (en) Classifying entities in digital maps using discrete non-trace positioning data
Dhatbale et al. Deep learning techniques for vehicle trajectory extraction in mixed traffic
CN113012215A (zh) 一种空间定位的方法、系统及设备
Haggui et al. Centroid human tracking via oriented detection in overhead fisheye sequences
Sarlin et al. Snap: Self-supervised neural maps for visual positioning and semantic understanding
Esfahani et al. DeepDSAIR: Deep 6-DOF camera relocalization using deblurred semantic-aware image representation for large-scale outdoor environments
Azimjonov et al. A vision-based real-time traffic flow monitoring system for road intersections
Bumanis et al. Multi-object Tracking for Urban and Multilane Traffic: Building Blocks for Real-World Application.
WO2022152110A1 (zh) 对象追踪、地物追踪方法、设备、系统及存储介质
Cai et al. 3D vehicle detection based on LiDAR and camera fusion
Afshany et al. Parallel implementation of a video-based vehicle speed measurement system for municipal roadways
Hara et al. An initial study of automatic curb ramp detection with crowdsourced verification using google street view images

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22738989

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22738989

Country of ref document: EP

Kind code of ref document: A1