WO2021238062A1 - 车辆跟踪方法、装置及电子设备 - Google Patents

车辆跟踪方法、装置及电子设备 Download PDF

Info

Publication number
WO2021238062A1
WO2021238062A1 PCT/CN2020/125446 CN2020125446W WO2021238062A1 WO 2021238062 A1 WO2021238062 A1 WO 2021238062A1 CN 2020125446 W CN2020125446 W CN 2020125446W WO 2021238062 A1 WO2021238062 A1 WO 2021238062A1
Authority
WO
WIPO (PCT)
Prior art keywords
vehicle
target image
pixel
image
detection frame
Prior art date
Application number
PCT/CN2020/125446
Other languages
English (en)
French (fr)
Inventor
张伟
谭啸
孙昊
文石磊
章宏武
丁二锐
Original Assignee
北京百度网讯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京百度网讯科技有限公司 filed Critical 北京百度网讯科技有限公司
Priority to KR1020227025961A priority Critical patent/KR20220113829A/ko
Priority to US17/995,752 priority patent/US20230186486A1/en
Priority to EP20938232.4A priority patent/EP4116867A4/en
Priority to JP2022545432A priority patent/JP7429796B2/ja
Publication of WO2021238062A1 publication Critical patent/WO2021238062A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/469Contour-based spatial representations, e.g. vector-coding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/49Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30241Trajectory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • G06T2207/30252Vehicle exterior; Vicinity of vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/08Detecting or categorising vehicles
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • This application relates to the field of computer technology, in particular to the field of artificial intelligence computer vision and intelligent transportation technology, and proposes a vehicle tracking method, device and electronic equipment.
  • Structural analysis of road traffic video, identification of vehicles in images, and tracking of vehicles are important technical capabilities for intelligent traffic visual perception.
  • the detection model is usually used to detect objects in the image frame, determine the detection frame contained in the image frame, and perform feature extraction on the detection frame to determine the characteristics of the vehicle, and then based on the vehicle characteristics in the current image frame and historical detection results The matching degree between the two, the vehicle is tracked.
  • this tracking method needs to determine the detection frame corresponding to the vehicle in two stages, it takes a long time and has poor real-time performance.
  • a method, device, electronic equipment and storage medium for vehicle tracking are provided.
  • a vehicle tracking method which includes: extracting a target image at the current moment from a video stream collected during a vehicle driving; performing instance segmentation on the target image to obtain each vehicle in the target image Corresponding detection frame; extract the pixel point set corresponding to each vehicle from the detection frame corresponding to each vehicle; process the image characteristics of each pixel point in the pixel point set corresponding to each vehicle to determine the The characteristics of each vehicle in the target image; and according to the matching degree between the characteristics of each vehicle in the target image and the characteristics of each vehicle in the historical image, the running trajectory of each vehicle in the target image is determined, where
  • the historical image is the first n frames of images adjacent to the target image in the video stream, and n is a positive integer.
  • a vehicle tracking device which includes: a first extraction module for extracting a target image at the current moment from a video stream collected during the driving of the vehicle; an instance segmentation module for extracting the target image Perform instance segmentation to obtain the detection frame corresponding to each vehicle in the target image; the second extraction module is used to extract the pixel point set corresponding to each vehicle from the detection frame corresponding to each vehicle; the first determination module , Used to process the image characteristics of each pixel in the pixel point set corresponding to each vehicle to determine the characteristics of each vehicle in the target image; and a second determining module, used to determine the characteristics of each vehicle in the target image; The matching degree between the characteristics of each vehicle in the historical image and the characteristics of each vehicle in the historical image determines the trajectory of each vehicle in the target image, where the historical image is the same as the target image in the video stream. For the first n adjacent frames, n is a positive integer.
  • an electronic device which includes: at least one processor; and a memory communicatively connected with the at least one processor; wherein the memory stores the Instructions, the instructions are executed by the at least one processor, so that the at least one processor can execute the vehicle tracking method as described above.
  • a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to make the computer execute the vehicle tracking method as described above.
  • the detection frame corresponding to each vehicle in the target image is directly obtained, and the pixel corresponding to each vehicle is extracted from the detection frame corresponding to each vehicle Point set, and then process the image characteristics of each pixel in the pixel point set corresponding to each vehicle to determine the characteristics of each vehicle in the target image, and then according to the characteristics of each vehicle in the target image, and the historical image
  • the matching degree between the features of the vehicles determines the trajectory of each vehicle in the target image.
  • the other objects contained in the target image are directly filtered out, and the detection frame corresponding to the vehicle in the target image is obtained in real time for subsequent processing, thereby improving the efficiency of vehicle tracking and real-time performance.
  • FIG. 1 is a schematic flowchart of a vehicle tracking method provided by an embodiment of the application
  • Figure 2 is a schematic diagram of marking each vehicle in a target image
  • FIG. 3 is a schematic flowchart of another vehicle tracking method provided by an embodiment of the application.
  • FIG. 4 is a schematic flowchart of another vehicle tracking method provided by an embodiment of the application.
  • FIG. 5 is a schematic structural diagram of a vehicle tracking device provided by an embodiment of the application.
  • FIG. 6 is a schematic structural diagram of an electronic device provided by an embodiment of the application.
  • the embodiment of the present application proposes a vehicle tracking method for the problem of long time-consuming and poor real-time performance in a vehicle tracking method that requires two stages to determine a detection frame corresponding to a vehicle in related technologies.
  • FIG. 1 is a schematic flowchart of a vehicle tracking method provided by an embodiment of the application.
  • the vehicle tracking method includes the following steps:
  • Step 101 Extract the target image at the current moment from the video stream collected during the driving of the vehicle.
  • the vehicle tracking method of the embodiment of the present application may be executed by the vehicle tracking device of the embodiment of the present application.
  • the vehicle tracking device of the embodiment of the present application may be configured in any electronic device to execute the vehicle tracking method of the embodiment of the present application.
  • the vehicle tracking device of the embodiment of the present application may be configured in a vehicle (such as an autonomous driving vehicle) to track the vehicle on the road in which the vehicle is traveling, so as to visually perceive the surrounding environment of the vehicle and improve the driving performance of the vehicle.
  • a vehicle such as an autonomous driving vehicle
  • Security or, the vehicle tracking device of the embodiment of the present application can also be configured in the server of the traffic management system, and used to identify violations of vehicles at the traffic monitoring intersection, traffic flow statistics, etc.
  • the acquisition path of the video stream in the embodiment of the present application is related to the application scenario of the vehicle tracking method in the embodiment of the present application.
  • the processor in the vehicle may establish a communication connection with the video capture device in the vehicle to obtain the video stream collected by the video capture device in real time;
  • the server of the traffic management system can obtain the video stream collected by the monitoring equipment of the traffic intersection in real time.
  • the target image may be a frame of image that is newly acquired by the video acquisition device when acquiring the video.
  • the video stream collected by the video capture device can be obtained in real time, and each time a new frame of image in the video stream is obtained, the obtained new frame of image can be determined as the target at the current moment image.
  • the target image at the current moment can be extracted every two frames from the captured video stream, that is, at the moment when the first frame, the third frame, the fifth frame, the seventh frame and other odd-numbered frames in the video stream are acquired. , Respectively determine each odd frame image as the target image.
  • the vehicle tracking method of the embodiment of the present application can also be applied in non-real-time vehicle tracking scenarios, for example, to analyze given video data to determine the driving trajectory of a specific vehicle. Therefore, the vehicle tracking device of the embodiment of the present application can also directly obtain a piece of video data that has been photographed, analyze the video data, and determine each frame of image included in the video data as the target image in turn; or, it can also be inserted
  • the frame method sequentially determines part of the image frames in the video data as the target image. For example, the odd-numbered frame images in the video data can be sequentially determined as the target image.
  • Step 102 Perform instance segmentation on the target image to obtain a detection frame corresponding to each vehicle in the target image.
  • any instance segmentation algorithm may be used to perform instance segmentation on the target image to determine each vehicle included in the target image, and generate a detection frame corresponding to each vehicle.
  • each vehicle in the target image is all located in its corresponding detection frame, or most of the area of the vehicle is located in its corresponding detection frame.
  • an appropriate instance segmentation algorithm can be selected according to actual needs or the computing performance of the electronic device to perform instance segmentation on the target image, which is not limited in the embodiment of the application.
  • an instance segmentation algorithm based on spatial embedding, K-means (K-means) clustering algorithm, etc. can be used.
  • Step 103 Extract a set of pixels corresponding to each vehicle from the detection frame corresponding to each vehicle.
  • the pixel point set corresponding to the vehicle refers to a set of pixels extracted from the target image area in the detection frame corresponding to the vehicle.
  • the target image is divided into instances, and after the detection frame corresponding to each vehicle in the target image is determined, most of the pixels in the detection frame corresponding to each vehicle are the pixels corresponding to the vehicle, so that the vehicle corresponds to The pixels in the detection frame can accurately describe the characteristics of the vehicle. Therefore, in the embodiment of the present application, the pixel point set corresponding to each vehicle can be extracted from the detection frame corresponding to each vehicle to describe the characteristics of each vehicle.
  • the detection frame corresponding to the vehicle when extracting the pixel point set corresponding to the vehicle, can be divided into multiple sub-regions (for example, divided into N ⁇ N regions, where N is a positive integer greater than 1), A certain number of pixel points are randomly extracted from each sub-region of the detection frame corresponding to the vehicle to form a pixel point set corresponding to the vehicle.
  • a preset number such as 100
  • a preset ratio such as 80%
  • the detection frame corresponding to the vehicle can also be divided into a central area and an edge area, and a certain amount is randomly extracted from the central area of the detection frame.
  • the number of pixels constitutes a set of pixels corresponding to the vehicle.
  • the size of the detection frame corresponding to vehicle A is 500 ⁇ 500 pixels
  • 80% of the area located in the upper part of the detection frame can be determined as the center area, that is, the 400 ⁇ 400 pixel area located in the upper part of the detection frame can be determined as the center Area, where the position of the center point of the central area is the same as the position of the center point of the detection frame, and other areas in the detection frame are determined as edge areas, and then 80% of the pixels are randomly extracted from the central area of 400 ⁇ 400 pixels Points, constitute a set of pixel points corresponding to vehicle A.
  • the detection frame corresponding to the vehicle when the detection frame corresponding to the vehicle is divided into a central area and an edge area, a certain number of pixels can also be randomly extracted from the central area and the edge area of the detection frame to form the corresponding vehicle.
  • the pixel point set so that the pixel point set corresponding to the vehicle can not only include the pixel point corresponding to the vehicle, but also the pixel point corresponding to the background near the vehicle, so as to better describe the characteristics of the vehicle and improve the accuracy of vehicle tracking.
  • a circular area with the center point of the detection frame as the center and a radius of 400 pixels can be determined as the central area of the detection frame, and the detection frame The other areas are determined as the edge area, and then 80% of the pixels are randomly extracted from the central area, and 80% of the pixels are randomly extracted from the edge area to form a set of pixels corresponding to vehicle A.
  • Step 104 Process the image characteristics of each pixel in the pixel point set corresponding to each vehicle to determine the characteristics of each vehicle in the target image.
  • the image characteristics of a pixel may include the pixel value of the pixel, the neighboring pixel value of the pixel, the positional relationship between the pixel and other pixels in the pixel set, the difference in pixel value, and the like.
  • the image characteristics of the pixels to be used can be selected according to actual needs, which is not limited in the embodiment of the present application.
  • the feature of the vehicle refers to the feature that can be used for target recognition determined by calculating or learning the image feature of each pixel in the pixel point set corresponding to the vehicle.
  • the feature of the vehicle may be a ReID (Person re-identification, pedestrian re-identification) feature, HOG (Histogram of Oriented Gradient, gradient histogram) feature, Haar (Haar-like, Haar) feature, etc.
  • a preset algorithm can be used to calculate or learn the image characteristics of each pixel in the pixel point set corresponding to each vehicle to pass the pixel point set.
  • the image characteristics of each pixel in the set describe the vehicle, and the characteristics of each vehicle in the target image are generated.
  • the feature type of the vehicle and the corresponding algorithm for determining the feature of the vehicle can be selected according to actual needs and specific application scenarios, which is not limited in the embodiment of the present application. For example, in order to improve real-time performance and computational efficiency, you can choose a highly efficient deep learning algorithm or image feature extraction algorithm to determine the characteristics of each vehicle in the target image.
  • Step 105 Determine the trajectory of each vehicle in the target image according to the matching degree between the characteristics of each vehicle in the target image and the characteristics of each vehicle in the historical image, where the historical image is adjacent to the target image in the video stream
  • the first n frames of image, n is a positive integer.
  • the degree of matching between the characteristics of each vehicle in the target image and the characteristics of each vehicle in the historical image can be determined by means of metric learning. Specifically, for a vehicle in the target image, the distance between the vehicle and the features of each vehicle in the historical image can be determined by means of metric learning. Since the smaller the distance between the features, the more similar the features are. Therefore, the reciprocal of the distance between the vehicle and the features of each vehicle in the historical image can be determined as the degree of matching between the vehicle and the features of each vehicle in the historical image.
  • the value of n can be 1, that is, each vehicle in the target image can be compared with the previous frame image adjacent to the target image in the video stream to determine each vehicle in the target image.
  • the trajectory of the vehicle For a vehicle A in the target image, the vehicle whose matching degree between the characteristics of vehicle A in the historical image and the feature of vehicle A is greater than a threshold can be determined as vehicle A, and then according to the running trajectory of vehicle A in the historical image and the target image To determine the trajectory of vehicle A in the target image, determine the identity of vehicle A in the historical image as the identity of vehicle A in the target image, and display the identity of vehicle A in the target image to mark vehicle A . For example, if the logo of vehicle A in the historical image is "Car1", then the logo of vehicle A "Car1" can be displayed on the top of vehicle A. As shown in Figure 2, it is a schematic diagram of marking each vehicle in the target image.
  • vehicle A is the new vehicle that appears for the first time in the video stream, so that the collection location of the target image can be determined as The starting point of the trajectory of vehicle A, and a new vehicle identification is assigned to vehicle A, and the identification of vehicle A is displayed in the target image to mark vehicle A.
  • n can be an integer greater than 1, that is, each vehicle in the target image can be compared with multiple frames of images in the video stream that are located before and adjacent to the target image. , To determine the trajectory of each vehicle in the target image to improve the accuracy of vehicle tracking.
  • a candidate vehicle whose matching degree with the feature of the vehicle A in the historical image is greater than a threshold can be determined first.
  • the candidate vehicle can be determined as vehicle A, and then according to the trajectory of vehicle A in the historical image and the collection position of the target image, determine the trajectory of vehicle A in the target image, and The identification of vehicle A in the historical image is determined as the identification of vehicle A in the target image. If there are multiple frames of images that contain candidate vehicles, it can be determined whether the candidate vehicles in each frame of historical image are the same vehicle. If so, the candidate vehicle in the historical image whose acquisition time is closest to the acquisition time of the target image can be determined as the vehicle A, and determine the running trajectory of the vehicle A in the target image according to the running trajectory of the vehicle A in the historical image and the collection position of the target image that is closest to the collection time of the target image.
  • vehicle A is the new vehicle that appears for the first time in the video stream, so that the collection position of the target image can be determined.
  • It is determined as the starting point of the running track of vehicle A, and a new vehicle identification is assigned to vehicle A, and the identification of vehicle A is displayed in the target image to mark vehicle A.
  • the matching degree between the feature of a vehicle in the target image and the features of multiple vehicles in the historical image is greater than a threshold
  • the The vehicle with the greatest degree of matching between features is determined as the vehicle.
  • the matching degree between the feature of each vehicle in the target image and the feature of each vehicle in the historical image can be determined first, and then the degree of matching with each feature in the target image can be determined.
  • the matching relationship between each vehicle in the target image and each vehicle in the historical image is determined, and then the Hungarian algorithm is used to match the relationship between the vehicle in the target image and each vehicle in the historical image Perform analysis to determine the vehicle that uniquely matches each vehicle in the target image in the historical image.
  • n can be determined according to actual needs and specific application scenarios, which is not limited in the embodiment of the present application.
  • the vehicle tracking method of the embodiment of the present application is applied to a traffic management scene, since the monitoring equipment at the traffic intersection is fixed, only the previous frame image adjacent to the target image can be compared to determine each of the target images. The running track of the vehicle, so that the value of n can be 1.
  • the vehicle tracking method of the embodiment of the present application is applied to scenes such as vehicle automatic driving and assisted driving, the location of the video collection device is constantly changing during the driving of the vehicle. Yes, and the situation of overtaking and being overtaken will occur during the driving of the vehicle. If only the previous frame image adjacent to the target image is compared, it will easily lead to inaccurate vehicle tracking results. Therefore, n can be determined to be greater than 1. Integer to improve the accuracy of vehicle tracking.
  • the detection frame corresponding to each vehicle in the target image is directly obtained, and the corresponding detection frame of each vehicle is extracted from the detection frame corresponding to each vehicle. Then, the image characteristics of each pixel in the pixel set corresponding to each vehicle are processed to determine the characteristics of each vehicle in the target image, and then according to the characteristics of each vehicle in the target image, and the historical image The matching degree between the characteristics of each vehicle in the target image determines the trajectory of each vehicle in the target image.
  • the other objects contained in the target image are directly filtered out, and the detection frame corresponding to the vehicle in the target image is obtained in real time for subsequent processing, thereby improving the efficiency of vehicle tracking and real-time performance.
  • the point cloud model can be used to process the pixels in the foreground area of the detection frame (that is, the pixels corresponding to the vehicle in the detection frame) and the pixels in the background area, respectively, to determine the target
  • the characteristics of each vehicle in the image can be extracted accurately and efficiently, which further improves the real-time and accuracy of vehicle tracking.
  • FIG. 3 is a schematic flowchart of another vehicle tracking method provided by an embodiment of the application.
  • the vehicle tracking method includes the following steps:
  • Step 201 Extract the target image at the current moment from the video stream collected during the driving of the vehicle.
  • Step 202 Perform instance segmentation on the target image to obtain a detection frame corresponding to each vehicle in the target image.
  • Step 203 Extract a first sub-set of pixel points from the mask area in the detection frame corresponding to each vehicle.
  • the mask area in the detection frame refers to the corresponding area of the vehicle in the detection frame in the detection frame.
  • the first pixel point sub-set corresponding to the vehicle refers to the set of pixels corresponding to the vehicle extracted from the mask area in the detection frame corresponding to the vehicle.
  • the result of the instance segmentation of the target image may be to output the detection frame corresponding to each vehicle in the target image and the mask area in the detection frame at the same time.
  • the instance segmentation algorithm can be used to identify each vehicle in the target image, and generate the detection frame corresponding to each vehicle, as well as the mask area corresponding to the vehicle in each detection frame, and each detection frame excludes the mask area
  • the outer area is the unmasked area corresponding to the background area, that is, the detection frame corresponding to each vehicle can include a masked area and an unmasked area.
  • the algorithm for instance segmentation of the target image can be any instance segmentation algorithm that can directly identify a specific type of target, and can output the detection frame and mask area corresponding to the specific type of target at the same time.
  • the application embodiment does not limit this.
  • it may be an instance segmentation algorithm based on clustering, such as an instance segmentation algorithm based on space embedding, K-means clustering algorithm, etc.
  • the mask area in the detection frame corresponding to the vehicle can represent the corresponding area of the vehicle in the detection frame
  • the pixels of the mask area in the detection frame corresponding to the vehicle can accurately describe the characteristics of the vehicle itself. feature. Therefore, a certain number of pixel points can be randomly extracted from the mask area in the detection frame corresponding to each vehicle to form the first pixel point subset corresponding to each vehicle to accurately describe the characteristics of each vehicle itself (such as color features). , Shape features, brand features, etc.).
  • the number of pixels included in the first pixel point subset can be preset, so that a preset number of pixels can be randomly selected from the mask area in the detection frame corresponding to each vehicle to form respectively The first pixel sub-set corresponding to each vehicle. For example, if the preset number is 500, 500 pixels can be randomly extracted from the mask area in the detection frame corresponding to each vehicle to form the first pixel point subset corresponding to each vehicle.
  • the ratio of the number of pixels in the first pixel point subset to the number of pixels in the mask area so as to randomly extract from the mask area in the detection frame corresponding to each vehicle
  • the pixel points of the preset ratio respectively constitute the first pixel point sub-set corresponding to each vehicle. For example, if the preset ratio is 80% and the number of pixels in the mask area in the detection frame corresponding to vehicle A is 1000, 800 pixels can be randomly extracted from the mask area in the detection frame corresponding to the vehicle to form a vehicle The first pixel sub-set corresponding to A.
  • the manner of extracting the first pixel point subset from the mask area may include, but is not limited to, the situations listed above.
  • an appropriate extraction method can be selected according to actual needs and specific application scenarios, which is not limited in the embodiment of the application.
  • Step 204 Extract a second sub-set of pixel points from the unmasked area in the detection frame corresponding to each vehicle.
  • the non-masked area in the detection frame refers to the corresponding area in the detection frame of the background area other than the vehicle in the detection frame.
  • the second pixel point sub-set corresponding to the vehicle refers to the pixel point set that is extracted from the unmasked area in the detection frame corresponding to the vehicle and used to characterize the background of the vehicle.
  • the result of instance segmentation of the target image can be to output the detection frame corresponding to each vehicle in the target image and the mask area in the detection frame at the same time, so that each detection frame can be directly unmasked.
  • the area outside the film area is determined as the unmasked area in each detection frame.
  • the vehicle characteristics can be assisted by the background area pixels in each detection frame, so as to enhance the difference between the vehicle characteristics through the background area characteristics of the vehicle and improve the vehicle. Accuracy of tracking. Therefore, a certain number of pixel points can be randomly extracted from the unmasked area in the detection frame corresponding to each vehicle to form a second pixel point sub-set corresponding to each vehicle to accurately describe the background characteristics of each vehicle.
  • the number of pixels included in the first pixel point subset may be the same as the number of pixels included in the second pixel point subset, so that the characteristics of the vehicle are evenly integrated with the characteristics of the vehicle itself.
  • the background features of the vehicle make the feature description of the vehicle more accurate and improve the accuracy of vehicle tracking. Therefore, the number of pixels included in the first pixel point subset and the second pixel point subset can be preset, and the preset number of pixels can be randomly extracted from the mask area in the detection frame corresponding to each vehicle to form each The first pixel point sub-set corresponding to the vehicle, and a preset number of pixel points are randomly extracted from the unmasked area in the detection frame corresponding to each vehicle to form the second pixel point sub-set corresponding to each vehicle.
  • the preset number is 500.
  • 500 pixels can be randomly extracted from the mask area in the detection frame corresponding to vehicle A to form the first pixel sub-set corresponding to vehicle A, and 500 pixels are randomly extracted from the unmasked area in the detection frame corresponding to vehicle A to form a second pixel sub-set corresponding to vehicle A.
  • weights may be assigned to the first pixel point subset and the second pixel point subset respectively, so that more pixels in the extracted pixel point set contribute more to characterizing vehicle features. There are fewer pixels that contribute less to characterizing vehicle features. It should be noted that the weights of the first pixel point subset and the second pixel point subset may be calibrated based on a large amount of experimental data, which is not limited in the embodiment of the present application.
  • the preset number is 500
  • the weight of the first pixel point subset calibrated by experimental data is 1, and the weight of the second pixel point subset is 0.8
  • the corresponding 500 pixels are randomly extracted from the masked area in the detection frame to form the first pixel sub-set corresponding to vehicle A
  • 400 pixels are randomly extracted from the unmasked area in the detection frame corresponding to vehicle A to form the corresponding vehicle A
  • the second pixel point sub-collection for the vehicle A in the target image, the corresponding 500 pixels are randomly extracted from the masked area in the detection frame to form the first pixel sub-set corresponding to vehicle A, and 400 pixels are randomly extracted from the unmasked area in the detection frame corresponding to vehicle A to form the corresponding vehicle A
  • the second pixel point sub-collection for the vehicle A in the target image, the corresponding 500 pixels are randomly extracted from the masked area in the detection frame to form the first pixel sub-set corresponding to vehicle A
  • 400 pixels are randomly extracted from the unmasked area in the detection frame corresponding
  • the number of pixels included in the second pixel point subset may also be irrelevant to the number of pixels included in the first pixel point subset, that is, the number of pixels included in the second pixel point subset may be individually preset The number of pixels, or the ratio of the number of pixels in the second pixel point subset to the number of pixels in the unmasked area.
  • the second pixel point subset is extracted from the unmasked area in the same way as the first pixel point subset is extracted in step 204.
  • the specific implementation process and principle please refer to the detailed description of step 204, which will not be repeated here. .
  • Step 205 Use the first encoder in the preset point cloud model to encode the image features of each pixel in the first pixel sub-set corresponding to each vehicle to determine the first vector corresponding to each vehicle.
  • the preset point cloud model refers to a pre-trained model that can process the input point set and generate a feature representation corresponding to the point set.
  • the first vector corresponding to the vehicle may refer to the feature representation of the pixels of the vehicle itself, and may be used to characterize the characteristics of the vehicle itself.
  • the image characteristics of the pixel may include the RGB pixel value of the pixel and so on.
  • the point cloud model can directly generate the feature representation of the point set data according to the input disordered point set data, the use of the point cloud model to generate the characteristics of the vehicle can achieve high efficiency of the vehicle characteristics. extract.
  • the feature type of the vehicle can be determined in advance.
  • the feature type of the vehicle can be the ReID feature, and a large number of sample images containing the vehicle can be obtained, and then each sample image is instance segmented to generate each sample image
  • the detection frame and mask area corresponding to each vehicle in the detection frame and then use the ReID feature extraction algorithm to determine the sample first ReID feature of the mask area corresponding to each vehicle in each sample image, and extract the sample from the mask area in the detection frame
  • the first pixel point subset, and finally the initial point cloud model is used to learn the correspondence between the sample first ReID feature of each vehicle and the sample first pixel point subset to generate the first encoder in the preset point cloud model.
  • the first encoder in the preset point cloud model learns the correlation between the first ReID feature of the vehicle and the first pixel point subset. Therefore, each pixel point in the first pixel point subset corresponding to the vehicle can be The image features of is input into the first encoder in the preset point cloud model, so that the first encoder encodes the RGB pixel value of each pixel in the first pixel point subset to generate the first vector corresponding to the vehicle, that is The ReID feature of the vehicle itself.
  • Step 206 Use the second encoder in the preset point cloud model to encode the image features of each pixel in the second pixel subset corresponding to each vehicle to determine the second vector corresponding to each vehicle.
  • the second vector corresponding to the vehicle may refer to the feature representation of the background pixel of the vehicle, and may be used to characterize the background feature of the vehicle.
  • the point cloud model can be trained to generate and A second encoder with a different encoder performs encoding processing on the second pixel point subset, so that the generated second vector can more accurately represent the background characteristics of the vehicle.
  • each sample image is divided into instances, and after the detection frame and mask area corresponding to each vehicle in each sample image are generated, the ReID feature extraction algorithm can be used to determine the corresponding vehicle in each sample image.
  • the second ReID feature of the sample in the unmasked area in the detection frame, and the second pixel point subset of the sample is extracted from the unmasked area in the detection frame, and then the second ReID of the sample corresponding to each vehicle is determined using the initial point cloud model
  • the corresponding relationship between the feature and the second pixel point subset of the sample is learned, and the second encoder in the preset point cloud model is generated.
  • the second encoder in the preset point cloud model learns the correlation between the second ReID feature of the vehicle and the second pixel point subset. Therefore, each pixel point in the second pixel point subset corresponding to the vehicle can be The image features of is input into the second encoder in the preset point cloud model, so that the second encoder encodes the RGB pixel values of each pixel in the second pixel point subset to generate a second vector corresponding to the vehicle, that is The ReID feature of the background area of the vehicle.
  • Step 207 Use the decoder in the preset point cloud model to decode the first vector and the second vector corresponding to each vehicle to determine the characteristics of each vehicle.
  • the vector representation of the vehicle's own characteristics and the vector representation of the background characteristics of the vehicle are respectively determined, so that the preset point cloud can also be used
  • the decoder in the model fuses the first vector and the second vector corresponding to each vehicle to generate the characteristics of each vehicle.
  • a decoder in a preset point cloud model can be used to perform maximum pooling processing on the first vector and the second vector corresponding to each vehicle to achieve The fusion of the first vector and the second vector of each vehicle generates the characteristics of each vehicle.
  • Step 208 Determine the running trajectory of each vehicle in the target image based on the characteristics of each vehicle in the target image and the degree of matching between the characteristics of each vehicle in the historical image, where the historical image is adjacent to the target image in the video stream
  • the first n frames of image, n is a positive integer.
  • the detection frame and mask area corresponding to each vehicle in the target image are directly obtained, and the detection frame corresponding to each vehicle
  • the mask area extracts the first pixel point sub-set to characterize the foreground features of the vehicle, and extracts the second pixel point sub-set from the unmasked area to characterize the background characteristics of the vehicle, and then uses the preset point cloud model according to the extracted pixel point set , Generate the characteristics of the vehicle to determine the trajectory of each vehicle in the target image according to the matching degree between the characteristics of each vehicle in the target image and the characteristics of each vehicle in the historical image.
  • the point cloud model to fuse the foreground and background features of the vehicle, accurate and efficient extraction of vehicle features is achieved, thereby further improving the real-time and accuracy of vehicle tracking.
  • a clustering algorithm can be used to implement instance segmentation of the target image, so as to directly generate a detection frame corresponding to the vehicle, and improve the real-time performance of vehicle tracking.
  • FIG. 4 is a schematic flowchart of another vehicle tracking method provided by an embodiment of the application.
  • the vehicle tracking method includes the following steps:
  • Step 301 Extract the target image at the current moment from the video stream collected during the driving of the vehicle.
  • Step 302 Perform clustering processing on the pixels in the target image based on the characteristics of each pixel in the target image, so as to determine a detection frame corresponding to each vehicle in the target image according to the clustering result.
  • the characteristics of a pixel may include the pixel value of the pixel, the pixel value of the neighborhood pixel, and the pixel value of the neighborhood pixel.
  • the characteristics of the pixels to be used can be selected according to actual needs, which is not limited in the embodiment of the present application.
  • a clustering algorithm can be used to cluster the characteristics of each pixel in the target image to classify each pixel in the target image and determine whether each pixel in the target image is The pixels corresponding to the vehicle, and whether they are the pixels corresponding to the same vehicle. Furthermore, according to the pixel points corresponding to each vehicle, a detection frame corresponding to each vehicle is generated, that is, each detection frame may include all pixels corresponding to the same vehicle.
  • an instance segmentation algorithm based on spatial embedding can be used to analyze the characteristics of each pixel in the target image to perform clustering processing on each pixel in the target image, and then directly according to the pixel point Based on the clustering results, the detection frame corresponding to each vehicle is generated, and the instance segmentation is completed in one step, with good real-time performance.
  • the instance segmentation algorithm based on spatial embedding can learn different clustering radii for different types of instances, and the accuracy of instance segmentation is high.
  • Step 303 Extract a set of pixels corresponding to each vehicle from the detection frame corresponding to each vehicle.
  • Step 304 Process the image characteristics of each pixel in the pixel point set corresponding to each vehicle to determine the characteristics of each vehicle in the target image.
  • Step 305 If the matching degree between the feature of the first vehicle in the target image and the feature of the second vehicle in the historical image is greater than the threshold, update the running trajectory of the second vehicle according to the acquisition location and acquisition time of the target image.
  • the first vehicle refers to any vehicle in the target image;
  • the second vehicle refers to a vehicle that exists both in the historical image and in the target image.
  • the degree of matching between the feature of each vehicle in the target image and the feature of each vehicle in the historical image can be determined by means of metric learning.
  • the distance between the vehicle and the features of each vehicle in the historical image can be determined by means of metric learning. Since the smaller the distance between the features, the more similar the features are. Therefore, the reciprocal of the distance between the vehicle and the features of each vehicle in the historical image can be determined as the degree of matching between the vehicle and the features of each vehicle in the historical image.
  • a vehicle in the historical image whose matching degree with the characteristics of the first vehicle is greater than a threshold can be determined as the second vehicle, and then according to the first vehicle in the historical image 2.
  • the running trajectory of the vehicle and the collection position of the target image, the collection position of the target image is taken as a new point of the running trajectory of the second vehicle and added to the running trajectory of the second vehicle to update the running trajectory of the second vehicle .
  • the running track of the vehicle may include not only the position information of the vehicle, but also the time information of the vehicle running to each point in the running track. Therefore, in the embodiment of the present application, when the acquisition position of the target image is taken as the newly added point of the second vehicle's trajectory and added to the second vehicle's trajectory, the acquisition time of the target image can also be used as the newly added point. The time information of the point is added to the running track to improve the accuracy and richness of vehicle tracking information.
  • the new point when the acquisition position of the target image is used as the new point of the second vehicle's trajectory, and when it is added to the second vehicle's trajectory, the new point can be highlighted, and the new point can be compared with the previous one.
  • the points added to the running track at adjacent moments are connected, and the time information of the newly-added point (that is, the collection time of the target image) is displayed near the newly-added point.
  • the first vehicle is a new vehicle that appears for the first time in the video stream, so that the target image can be collected
  • the position is determined as the starting point of the running trajectory of the first vehicle, and the time information at which the collection time of the target image is taken as the starting point is added to the running trajectory of the first vehicle.
  • the pixel points in the target image are clustered to directly obtain the detection frame corresponding to each vehicle in the target image, and the detection frame corresponding to each vehicle is extracted from the detection frame corresponding to each vehicle. Then, the image characteristics of each pixel in the pixel set corresponding to each vehicle are processed to determine the characteristics of each vehicle in the target image, and then the characteristics of the first vehicle in the target image are compared with the historical image When the degree of matching between the features of the second vehicle in the second vehicle is greater than the threshold, the running track of the second vehicle is updated according to the acquisition location and acquisition time of the target image.
  • the instance segmentation of the target image is realized through the clustering algorithm, other objects contained in the target image are directly filtered out, the detection frame corresponding to the vehicle in the target image is obtained in real time, and the time information is integrated into the vehicle's trajectory, thereby It not only further improves the real-time performance of vehicle tracking, but also further improves the accuracy and richness of vehicle tracking information.
  • this application also proposes a vehicle tracking device.
  • Fig. 5 is a schematic structural diagram of a vehicle tracking device provided by an embodiment of the application.
  • the vehicle tracking device 40 includes:
  • the first extraction module 41 is configured to extract the target image at the current moment from the video stream collected during the driving of the vehicle;
  • the instance segmentation module 42 is used to perform instance segmentation on the target image to obtain the detection frame corresponding to each vehicle in the target image;
  • the second extraction module 43 is configured to extract a set of pixels corresponding to each vehicle from the detection frame corresponding to each vehicle;
  • the first determining module 44 is configured to process the image characteristics of each pixel in the pixel point set corresponding to each vehicle to determine the characteristics of each vehicle in the target image;
  • the second determination module 45 is used to determine the running trajectory of each vehicle in the target image according to the matching degree between the characteristics of each vehicle in the target image and the characteristics of each vehicle in the historical image, where the historical image is in the video stream For the first n frames of images adjacent to the target image, n is a positive integer.
  • the vehicle tracking device provided in the embodiments of the present application can be configured in any electronic device to execute the aforementioned vehicle tracking method.
  • the detection frame corresponding to each vehicle in the target image is directly obtained, and the corresponding detection frame of each vehicle is extracted from the detection frame corresponding to each vehicle. Then, the image characteristics of each pixel in the pixel set corresponding to each vehicle are processed to determine the characteristics of each vehicle in the target image, and then according to the characteristics of each vehicle in the target image, and the historical image The matching degree between the characteristics of each vehicle in the target image determines the trajectory of each vehicle in the target image.
  • the other objects contained in the target image are directly filtered out, and the detection frame corresponding to the vehicle in the target image is obtained in real time for subsequent processing, thereby improving the efficiency of vehicle tracking and real-time performance.
  • the detection frame corresponding to each vehicle includes a masked area and a non-masked area
  • the second extraction module 43 includes:
  • the first extraction unit is configured to extract the first sub-set of pixel points from the mask area in the detection frame corresponding to each vehicle;
  • the second extraction unit is used to extract a second sub-set of pixel points from the non-masked area in the detection frame corresponding to each vehicle.
  • the above-mentioned first determining module 44 includes:
  • the first determining unit is configured to use the first encoder in the preset point cloud model to encode the image characteristics of each pixel in the first pixel sub-set corresponding to each vehicle, so as to determine the corresponding image feature of each vehicle.
  • the second determining unit is configured to use the second encoder in the preset point cloud model to encode the image characteristics of each pixel in the second pixel point subset corresponding to each vehicle to determine the corresponding The second vector;
  • the third determining unit is configured to use the decoder in the preset point cloud model to decode the first vector and the second vector corresponding to each vehicle to determine the characteristics of each vehicle.
  • the number of pixels included in the first pixel point subset is the same as the number of pixels included in the second pixel point subset.
  • the above-mentioned instance segmentation module 42 includes:
  • the clustering processing unit is configured to perform clustering processing on the pixels in the target image based on the characteristics of each pixel in the target image, so as to determine the detection frame corresponding to each vehicle in the target image according to the clustering result.
  • the above-mentioned second determination module 45 includes:
  • the update unit is used for when the matching degree between the characteristics of the first vehicle in the target image and the characteristics of the second vehicle in the historical image is greater than the threshold, the running track of the second vehicle is performed according to the acquisition position and time of the target image. renew.
  • the detection frame and mask area corresponding to each vehicle in the target image are directly obtained, and the detection frame corresponding to each vehicle
  • the mask area extracts the first pixel point sub-set to characterize the foreground features of the vehicle, and extracts the second pixel point sub-set from the unmasked area to characterize the background characteristics of the vehicle, and then uses the preset point cloud model according to the extracted pixel point set , Generate the characteristics of the vehicle to determine the trajectory of each vehicle in the target image according to the matching degree between the characteristics of each vehicle in the target image and the characteristics of each vehicle in the historical image.
  • the point cloud model to fuse the foreground and background features of the vehicle, accurate and efficient extraction of vehicle features is achieved, thereby further improving the real-time and accuracy of vehicle tracking.
  • the present application also provides an electronic device and a readable storage medium.
  • FIG. 6 it is a block diagram of an electronic device of a vehicle tracking method according to an embodiment of the present application.
  • Electronic devices are intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers.
  • Electronic devices can also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions are merely examples, and are not intended to limit the implementation of the application described and/or required herein.
  • the electronic device includes one or more processors 501, a memory 502, and interfaces for connecting various components, including a high-speed interface and a low-speed interface.
  • the various components are connected to each other using different buses, and can be installed on a common motherboard or installed in other ways as needed.
  • the processor may process instructions executed in the electronic device, including instructions stored in or on the memory to display graphical information of the GUI on an external input/output device (such as a display device coupled to an interface).
  • an external input/output device such as a display device coupled to an interface.
  • multiple processors and/or multiple buses can be used with multiple memories and multiple memories.
  • multiple electronic devices can be connected, and each electronic device provides part of the necessary operations (for example, as a server array, a group of blade servers, or a multi-processor system).
  • a processor 501 is taken as an example.
  • the memory 502 is a non-transitory computer-readable storage medium provided by this application.
  • the memory stores instructions executable by at least one processor, so that the at least one processor executes the vehicle tracking method provided in this application.
  • the non-transitory computer-readable storage medium of the present application stores computer instructions, and the computer instructions are used to make a computer execute the vehicle tracking method provided by the present application.
  • the memory 502 as a non-transitory computer-readable storage medium, can be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to the vehicle tracking method in the embodiments of the present application (for example, attached
  • the processor 501 executes various functional applications and data processing of the server by running the non-transient software programs, instructions, and modules stored in the memory 502, that is, implements the vehicle tracking method in the foregoing method embodiment.
  • the memory 502 may include a storage program area and a storage data area.
  • the storage program area may store an operating system and an application program required by at least one function; the storage data area may store data created according to the use of the electronic device of the vehicle tracking method, etc. .
  • the memory 502 may include a high-speed random access memory, and may also include a non-transitory memory, such as at least one magnetic disk storage device, a flash memory device, or other non-transitory solid-state storage devices.
  • the memory 502 may optionally include memories remotely provided with respect to the processor 501, and these remote memories may be connected to the electronic device of the vehicle tracking method via a network. Examples of the aforementioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
  • the electronic device of the vehicle tracking method may further include: an input device 503 and an output device 504.
  • the processor 501, the memory 502, the input device 503, and the output device 504 may be connected by a bus or in other ways. In FIG. 6, the connection by a bus is taken as an example.
  • the input device 503 can receive input digital or character information, and generate key signal input related to the user settings and function control of the electronic device of the vehicle tracking method, such as touch screen, keypad, mouse, track pad, touch pad, pointing stick, One or more mouse buttons, trackballs, joysticks and other input devices.
  • the output device 504 may include a display device, an auxiliary lighting device (for example, LED), a tactile feedback device (for example, a vibration motor), and the like.
  • the display device may include, but is not limited to, a liquid crystal display (LCD), a light emitting diode (LED) display, and a plasma display. In some embodiments, the display device may be a touch screen.
  • Various implementations of the systems and techniques described herein can be implemented in digital electronic circuit systems, integrated circuit systems, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: being implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, the programmable processor It can be a dedicated or general-purpose programmable processor that can receive data and instructions from the storage system, at least one input device, and at least one output device, and transmit the data and instructions to the storage system, the at least one input device, and the at least one output device. An output device.
  • machine-readable medium and “computer-readable medium” refer to any computer program product, device, and/or device used to provide machine instructions and/or data to a programmable processor ( For example, magnetic disks, optical disks, memory, programmable logic devices (PLD)), including machine-readable media that receive machine instructions as machine-readable signals.
  • machine-readable signal refers to any signal used to provide machine instructions and/or data to a programmable processor.
  • the systems and techniques described here can be implemented on a computer that has: a display device for displaying information to the user (for example, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) ); and a keyboard and a pointing device (for example, a mouse or a trackball) through which the user can provide input to the computer.
  • a display device for displaying information to the user
  • LCD liquid crystal display
  • keyboard and a pointing device for example, a mouse or a trackball
  • Other types of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (for example, visual feedback, auditory feedback, or tactile feedback); and can be in any form (including Acoustic input, voice input, or tactile input) to receive input from the user.
  • the systems and technologies described herein can be implemented in a computing system that includes back-end components (for example, as a data server), or a computing system that includes middleware components (for example, an application server), or a computing system that includes front-end components (for example, A user computer with a graphical user interface or a web browser through which the user can interact with the implementation of the system and technology described herein), or includes such back-end components, middleware components, Or any combination of front-end components in a computing system.
  • the components of the system can be connected to each other through any form or medium of digital data communication (for example, a communication network). Examples of communication networks include: local area network (LAN), wide area network (WAN), and the Internet.
  • the computer system can include clients and servers.
  • the client and server are generally far away from each other and usually interact through a communication network.
  • the relationship between the client and the server is generated through computer programs that run on the corresponding computers and have a client-server relationship with each other.
  • the detection frame corresponding to each vehicle in the target image is directly obtained, and the corresponding detection frame of each vehicle is extracted from the detection frame corresponding to each vehicle. Then, the image characteristics of each pixel in the pixel set corresponding to each vehicle are processed to determine the characteristics of each vehicle in the target image, and then according to the characteristics of each vehicle in the target image, and the historical image The matching degree between the characteristics of each vehicle in the target image determines the trajectory of each vehicle in the target image.
  • the other objects contained in the target image are directly filtered out, and the detection frame corresponding to the vehicle in the target image is obtained in real time for subsequent processing, thereby improving the efficiency of vehicle tracking and real-time performance.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)

Abstract

一种车辆跟踪方法、装置及电子设备,属于人工智能计算机视觉及智能交通技术领域。其中,该方法包括:从车辆行驶过程中采集的视频流中提取当前时刻的目标图像(101);对目标图像进行实例分割,以获取目标图像中各个车辆对应的检测框(102);从每个车辆对应的检测框内提取每个车辆对应的像素点集合(103);对每个车辆对应的像素点集合中各个像素点的图像特征进行处理,以确定目标图像中每个车辆的特征(104);根据目标图像中每个车辆的特征,与历史图像中各个车辆的特征间的匹配度,确定目标图像中每个车辆的运行轨迹,其中,历史图像为视频流中与目标图像相邻的前n帧图像,n为正整数(105)。由此,通过这种车辆跟踪方法,提升了车辆跟踪的效率,实时性好。

Description

车辆跟踪方法、装置及电子设备
相关申请的交叉引用
本申请要求北京百度网讯科技有限公司于2020年5月29日提交的、发明名称为“车辆跟踪方法、装置及电子设备”的、中国专利申请号“202010478496.9”的优先权。
技术领域
本申请涉及计算机技术领域,尤其涉及人工智能计算机视觉及智能交通技术领域,提出一种车辆跟踪方法、装置及电子设备。
背景技术
对道路交通的视频进行结构化分析,确定图像中的车辆,并对车辆进行跟踪,是智能交通视觉感知的重要技术能力。
相关技术中,通常利用检测模型对图像帧进行物体检测,确定图像帧中包含的检测框,并对检测框进行特征提取,确定车辆的特征,进而根据当前图像帧中的车辆特征与历史检测结果间的匹配度,对车辆进行跟踪。但是,由于这种追踪方法需要通过两个阶段确定车辆对应的检测框,耗时长、实时性差。
发明内容
提供了一种用于车辆跟踪方法、装置、电子设备及存储介质。
根据第一方面,提供了一种车辆跟踪方法,包括:从车辆行驶过程中采集的视频流中提取当前时刻的目标图像;对所述目标图像进行实例分割,以获取所述目标图像中各个车辆对应的检测框;从每个所述车辆对应的检测框内提取每个车辆对应的像素点集合;对所述每个车辆对应的像素点集合中各个像素点的图像特征进行处理,以确定所述目标图像中每个车辆的特征;以及根据所述目标图像中每个车辆的特征,与历史图像中各个车辆的特征间的匹配度,确定所述目标图像中每个车辆的运行轨迹,其中,所述历史图像为所述视频流中与所述目标图像相邻的前n帧图像,n为正整数。
根据第二方面,提供了一种车辆跟踪装置,包括:第一提取模块,用于从车辆行驶过程中采集的视频流中提取当前时刻的目标图像;实例分割模块,用于对所述目标图像进行实例分割,以获取所述目标图像中各个车辆对应的检测框;第二提取模块,用于从每个所述车辆对应的检测框内提取每个车辆对应的像素点集合;第一确定模块,用于对所 述每个车辆对应的像素点集合中各个像素点的图像特征进行处理,以确定所述目标图像中每个车辆的特征;以及第二确定模块,用于根据所述目标图像中每个车辆的特征,与历史图像中各个车辆的特征间的匹配度,确定所述目标图像中每个车辆的运行轨迹,其中,所述历史图像为所述视频流中与所述目标图像相邻的前n帧图像,n为正整数。
根据第三方面,提供了一种电子设备,其包括:至少一个处理器;以及与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如前所述的车辆跟踪方法。
根据第四方面,提供了一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于使所述计算机执行如前所述的车辆跟踪方法。
根据本申请的技术方案,通过对视频流中当前时刻的目标图像进行实例分割,直接获取目标图像中各个车辆对应的检测框,并从每个车辆对应的检测框内提取每个车辆对应的像素点集合,之后对每个车辆对应的像素点集合中各个像素点的图像特征进行处理,以确定目标图像中每个车辆的特征,进而根据目标图像中每个车辆的特征,与历史图像中各个车辆的特征间的匹配度,确定目标图像中每个车辆的运行轨迹。由此,通过对目标图像进行实例分割,直接滤除目标图像中包含的其他物体,实时获取目标图像中的车辆对应的检测框,以进行后续处理,从而提升了车辆跟踪的效率,实时性好。
应当理解,本部分所描述的内容并非旨在标识本申请的实施例的关键或重要特征,也不用于限制本申请的范围。本申请的其它特征将通过以下的说明书而变得容易理解。
附图说明
附图用于更好地理解本方案,不构成对本申请的限定。其中:
图1为本申请实施例所提供的一种车辆跟踪方法的流程示意图;
图2为对目标图像中的各个车辆进行标记的示意图;
图3为本申请实施例所提供的另一种车辆跟踪方法的流程示意图;
图4为本申请实施例所提供的再一种车辆跟踪方法的流程示意图;
图5为本申请实施例提供的一种车辆跟踪装置的结构示意图;
图6为本申请实施例提供的电子设备的结构示意图。
具体实施方式
以下结合附图对本申请的示范性实施例做出说明,其中包括本申请实施例的各种 细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本申请的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。
本申请实施例针对相关技术中,需要通过两个阶段确定车辆对应的检测框的车辆跟踪方法,耗时长、实时性差的问题,提出一种车辆跟踪方法。
下面参考附图对本申请提供的车辆跟踪方法、装置、电子设备及存储介质进行详细描述。
图1为本申请实施例所提供的一种车辆跟踪方法的流程示意图。
如图1所示,该车辆跟踪方法,包括以下步骤:
步骤101,从车辆行驶过程中采集的视频流中提取当前时刻的目标图像。
需要说明的是,实际使用时,本申请实施例的车辆跟踪方法,可以由本申请实施例的车辆跟踪装置执行。本申请实施例的车辆跟踪装置可以配置在任意电子设备中,以执行本申请实施例的车辆跟踪方法。
举例来说,本申请实施例的车辆跟踪装置可以配置在车辆(如自动驾驶车辆)中,用于对车辆行驶道路中的车辆进行跟踪,以对车辆的周围环境进行视觉感知,提升车辆行驶的安全性;或者,本申请实施例的车辆跟踪装置,还可以配置在交通管理系统的服务器中,用于对交通监控路口的车辆进行违章识别、车流统计等。
需要说明的是,本申请实施例的视频流的获取途径与本申请实施例的车辆跟踪方法的应用场景有关。比如,在本申请实施例的车辆跟踪方法应用在车辆自动驾驶、辅助驾驶等领域时,车辆中的处理器可以与车辆中的视频采集设备建立通信连接,实时获取视频采集设备采集的视频流;又如,在本申请实施例的车辆跟踪方法应用在交通管理场景中,实现违章识别、车辆统计等目的时,交通管理系统的服务器可以实时获取交通路口的监控设备采集的视频流。
其中,目标图像,可以是视频采集设备在采集视频时,最新采集到的一帧图像。
作为一种可能的实现方式,可以实时获取通过视频采集设备采集的视频流,并且每在获取到视频流中的新一帧图像时,可以将获取到的新一帧图像确定为当前时刻的目标图像。
作为另一种可能的实现方式,还可以以插帧的方式从采集的视频流中提取当前时刻的目标图像,以降低车辆跟踪的数据处理量,进一步提升车辆跟踪的实时性。比如,可以从采集的视频流中每两帧提取一次当前时刻的目标图像,即可以在获取到视频流中的第1帧、第3帧、第5帧、第7帧等奇数帧图像的时刻,分别将各奇数帧图像确定为目标图像。
需要说明的是,本申请实施例的车辆跟踪方法还可以应用在非实时的车辆跟踪场景中,比如,对给定的视频数据进行分析,确定特定车辆的行驶轨迹等。因此,本申请实施例的 车辆跟踪装置还可以直接获取一段已经拍摄完成的视频数据,并对视频数据进行分析,将视频数据中包括的每帧图像依次确定为目标图像;或者,还可以以插帧的方式将视频数据中的部分图像帧依次确定为目标图像,比如,可以将视频数据中的奇数帧图像依次确定为目标图像。
步骤102,对目标图像进行实例分割,以获取目标图像中各个车辆对应的检测框。
在本申请实施例中,可以采用任意的实例分割算法,对目标图像进行实例分割,以确定目标图像中包含的各个车辆,并生成每个车辆对应的检测框。其中,目标图像中各个车辆全部位于其对应的检测框内,或者车辆的绝大部分区域位于其对应的检测框内。
需要说明的是,实际使用时,可以根据实际需要或电子设备的计算性能,选择合适的实例分割算法,对目标图像进行实例分割,本申请实施例对此不做限定。比如,可以采用基于空间嵌入的实例分割算法、K-means(K均值)聚类算法等。
步骤103,从每个车辆对应的检测框内提取每个车辆对应的像素点集合。
其中,车辆对应的像素点集合,是指从该车辆对应的检测框内的目标图像区域中提取的像素点组成的集合。
在本申请实施例中,对目标图像进行实例分割,确定出目标图像中各个车辆对应的检测框之后,每个车辆对应的检测框中的大部分像素点为车辆对应的像素点,从而车辆对应的检测框中的像素点可以准确描述车辆的特征。因此,在本申请实施例中,可以从每个车辆对应的检测框内分别提取每个车辆对应的像素点集合,以对每个车辆的特征进行描述。
作为一种可能的实现方式,在提取车辆对应的像素点集合时,可以该车辆对应的检测框平均划分为多个子区域(如划分为N×N的区域,N为大于1的正整数),并从该车辆对应的检测框的每个子区域中随机提取一定数量的像素点,构成该车辆对应的像素点集合。比如,可以从车辆对应的检测框的每个子区域中随机提取预设数量(如100个)的像素点,或者预设比例(如80%)的像素点,构成该车辆对应的像素点集合,本申请实施例对此不做限定。
作为另一种可能的实现方式,由于车辆对应的像素点通常位于检测框的中部位置,因此还可以将车辆对应的检测框划分为中心区域和边缘区域,并从检测框的中心区域随机提取一定数量的像素点,构成车辆对应的像素点集合。
比如,车辆A对应的检测框的大小为500×500像素,则可以将位于检测框中部位置的80%的区域确定为中心区域,即将位于检测框中部位置的400×400像素的区域确定为中心区域,其中,中心区域的中心点的位置与检测框的中心点的位置相同,并将检测框中的其他区域确定为边缘区域,进而从400×400像素的中心区域中随机提取80%的像素点,构成车辆A对应的像素点集合。
作为再一种可能的实现方式,将车辆对应的检测框划分为中心区域和边缘区域时,还可以分别从检测框的中心区域与边缘区域中分别随机提取一定数量的像素点,构成车辆对应的像素点集合,从而使得车辆对应的像素点集合中不仅可以包括车辆对应的像素点,还可以包括车辆附近的背景对应的像素点,以更好的描述车辆特征,提升了车辆跟踪的准确性。
比如,车辆A对应的检测框的大小为500×500像素,则可以将以检测框的中心点为圆心、以400像素为半径的圆形区域确定为检测框的中心区域,并将检测框中的其他区域确定为边缘区域,进而从中心区域中随机提取80%的像素点,从边缘区域中随机提取80%的像素点,构成车辆A对应的像素点集合。
需要说明的是,上述举例仅为示例性的,不能视为对本申请的限制。实际使用时,可以根据实际需要及具体的应用场景,选择确定检测框的中心区域的方式,以及像素点的提取数量或比例,本申请实施例对此不做限定。
步骤104,对每个车辆对应的像素点集合中各个像素点的图像特征进行处理,以确定目标图像中每个车辆的特征。
其中,像素点的图像特征,可以包括像素点的像素值、像素点的邻域像素值、像素点与像素点集合中的其他各像素点的位置关系、像素值差异等特征。实际使用时,可以根据实际需要选择使用的像素点的图像特征,本申请实施例对此不做限定。
其中,车辆的特征,是指通过对车辆对应的像素点集合中各个像素点的图像特征进行计算或学习,确定的可用于目标识别的特征。比如,车辆的特征可以为ReID(Person re-identification,行人重识别)特征、HOG(Histogram of Oriented Gradient,梯度直方图)特征、Haar(Haar-like,哈尔)特征等。
在本申请实施例中,提取出每个车辆对应的像素点集合之后,则可以利用预设的算法对每个车辆对应的像素点集合中各个像素的图像特征进行计算或学习,以通过像素点集合中各个像素的图像特征对车辆进行描述,生成目标图像中每个车辆的特征。
需要说明的是,实际使用时,可以根据实际需要及具体的应用场景,选择车辆的特征类型,以及相应的确定车辆的特征的算法,本申请实施例对此不做限定。比如,为提升实时性和计算效率,可以选择高效率的深度学习算法或图像特征提取算法,确定目标图像中各车辆的特征。
步骤105,根据目标图像中每个车辆的特征,与历史图像中各个车辆的特征间的匹配度,确定目标图像中每个车辆的运行轨迹,其中,历史图像为视频流中与目标图像相邻的前n帧图像,n为正整数。
在本申请实施例中,可以通过度量学习的方式,确定目标图像中的每个车辆的特 征与历史图像中各个车辆的特征间的匹配度。具体的,对于目标图像中的一个车辆,可以通过度量学习的方式,确定该车辆与历史图像中各个车辆的特征间的距离。由于特征间的距离越小,说明特征越相似,因此可以将该车辆与历史图像中各个车辆的特征间的距离的倒数,确定为该车辆与历史图像中各个车辆的特征间的匹配度。
作为一种可能的实现方式,n的取值可以为1,即可以仅将目标图像中的每个车辆与视频流中与目标图像相邻的前一帧图像进行比较,确定目标图像中每个车辆的运行轨迹。可选地,对于目标图像中的一个车辆A,可以将历史图像中与车辆A的特征间的匹配度大于阈值的车辆,确定为车辆A,进而根据历史图像中车辆A的运行轨迹与目标图像的采集位置,确定目标图像中车辆A的运行轨迹,并将历史图像中车辆A的标识确定为目标图像中车辆A的标识,并在目标图像中显示车辆A的标识,以对车辆A进行标记。比如,历史图像中车辆A的标识为“Car1”,则可以在车辆A的顶部显示车辆A的标识“Car1”。如图2所示,为对目标图像中的各个车辆进行标记的示意图。
相应的,若历史图像中不存在与车辆A的特征间的匹配度大于阈值的车辆,则可以确定车辆A为在视频流中首次出现的新车辆,从而可以将目标图像的采集位置,确定为车辆A的运行轨迹的起点,并为车辆A分配新的车辆标识,并在目标图像中显示车辆A的标识,以对车辆A进行标记。
作为另一种可能的实现方式,n的取值可以为大于1的整数,即可以将目标图像中的每个车辆与视频流中位于目标图像之前且与目标图像相邻的多帧图像进行比较,确定目标图像中每个车辆的运行轨迹,以提升车辆跟踪的准确性。可选地,对于目标图像中的一个车辆A,可以首先确定出历史图像中与车辆A的特征间的匹配度大于阈值的候选车辆。若仅有一帧历史图像中包含候选车辆,则可以将该候选车辆确定为车辆A,进而根据历史图像中车辆A的运行轨迹与目标图像的采集位置,确定目标图像中车辆A的运行轨迹,并将历史图像中车辆A的标识确定为目标图像中车辆A的标识。若存在多帧图像中包含候选车辆,则可以确定各帧历史图像中的候选车辆是否为同一车辆,若是,则可以将采集时刻与目标图像的采集时刻最近的历史图像中的候选车辆确定为车辆A,并根据与目标图像的采集时刻最近的历史图像中车辆A的运行轨迹与目标图像的采集位置,确定目标图像中车辆A的运行轨迹。
相应的,若各帧历史图像中均不存在与车辆A的特征间的匹配度大于阈值的车辆,则可以确定车辆A为在视频流中首次出现的新车辆,从而可以将目标图像的采集位置,确定为车辆A的运行轨迹的起点,并为车辆A分配新的车辆标识,并在目标图像中显示车辆A的标识,以对车辆A进行标记。
在本申请实施例中,根据车辆的特征确定历史图像中与目标图像中的各车辆匹配 的车辆时,可能出现目标图像中的一个车辆特征与历史图像中的多个车辆的特征匹配度大于阈值的情况。
可选的,在本申请实施例一种可能的实现方式中,可以在目标图像中的一个车辆的特征与历史图像中的多个车辆的特征的匹配度大于阈值时,可以将与该车辆的特征间的匹配度最大的车辆,确定为该车辆。
可选的,在本申请实施例另一种可能的实现方式中,可以首先分别确定出目标图像中各车辆的特征与历史图像中各车辆的特征的匹配度,进而确定出与目标图像中各车辆的特征间的匹配度大于阈值的各候选车辆,以确定出目标图像中各车辆与历史图像中各车辆的匹配关系,进而利用匈牙利算法对目标图像中车辆与历史图像中各车辆的匹配关系进行分析,确定出历史图像中与目标图像中的各车辆唯一匹配的车辆。
需要说明的是,实际使用时,可以根据实际需要及具体的应用场景,确定n的取值,本申请实施例对此不做限定。比如,本申请实施例的车辆跟踪方法应用在交通管理场景时,由于交通路口的监控设备是固定的,因此仅与目标图像相邻的前一帧图像进行比较,即可以确定出目标图像中各车辆的运行轨迹,从而n的取值可以为1;又如,本申请实施例的车辆跟踪方法应用在车辆自动驾驶、辅助驾驶等场景时,由于车辆行驶过程中视频采集设备的位置是不断变化的,且在车辆行驶过程中会出现超车与被超车的情况,如果仅与目标图像相邻的前一帧图像进行比较,容易导致车辆跟踪结果不准确,因此,可以将n确定为大于1的整数,以提升车辆跟踪的准确性。
根据本申请实施例的技术方案,通过对视频流中当前时刻的目标图像进行实例分割,直接获取目标图像中各个车辆对应的检测框,并从每个车辆对应的检测框内提取每个车辆对应的像素点集合,之后对每个车辆对应的像素点集合中各个像素点的图像特征进行处理,以确定目标图像中每个车辆的特征,进而根据目标图像中每个车辆的特征,与历史图像中各个车辆的特征间的匹配度,确定目标图像中每个车辆的运行轨迹。由此,通过对目标图像进行实例分割,直接滤除目标图像中包含的其他物体,实时获取目标图像中的车辆对应的检测框,以进行后续处理,从而提升了车辆跟踪的效率,实时性好。
在本申请一种可能的实现形式中,可以利用点云模型分别对检测框中的前景区域的像素点(即检测框中的车辆对应的像素点)与背景区域的像素点进行处理,确定目标图像中各个车辆的特征,以实现准确、高效的提取车辆特征,进一步提升了车辆跟踪的实时性和准确性。
下面结合图3,对本申请实施例提供的车辆跟踪方法进行进一步说明。
图3为本申请实施例所提供的另一种车辆跟踪方法的流程示意图。
如图3所示,该车辆跟踪方法,包括以下步骤:
步骤201,从车辆行驶过程中采集的视频流中提取当前时刻的目标图像。
步骤202,对目标图像进行实例分割,以获取目标图像中各个车辆对应的检测框。
上述步骤201-202的具体实现过程及原理,可以参照上述实施例的详细描述,此处不再赘述。
步骤203,从每个车辆对应的检测框内的掩膜区域中提取第一像素点子集合。
其中,检测框内的掩膜区域,是指该检测框内的车辆在检测框内的对应区域。车辆对应的第一像素点子集合,是指从车辆对应的检测框内的掩膜区域提取的用于表征车辆对应像素的集合。
作为一种可能的实现方式,对目标图像进行实例分割的结果可以是同时输出目标图像中各个车辆对应的检测框及检测框内的掩膜区域。也就是说,可以利用实例分割算法识别出目标图像中的各个车辆,并生成各个车辆对应的检测框,以及每个检测框中车辆对应的掩膜区域,而每个检测框中除掩膜区域外的区域即为背景区域对应的非掩膜区域,即每个车辆对应的检测框内可以包括掩膜区域及非掩膜区域。
需要说明的是,实际使用时,对目标图像进行实例分割的算法,可以是任意可以直接识别出特定类型目标,并可以同时输出特定类型目标对应的检测框及掩膜区域的实例分割算法,本申请实施例对此不做限定。比如,可以是基于聚类的实例分割算法,如基于空间嵌入的实例分割算法、K-means聚类算法等。
在本申请实施例中,由于车辆对应的检测框内的掩膜区域可以表征车辆在检测框内的相应区域,因此车辆对应的检测框内的掩膜区域的像素点可以准确描述该车辆本身的特征。从而,可以从每个车辆对应的检测框内的掩膜区域随机提取一定数量的像素点,分别构成每个车辆对应的第一像素点子集合,以准确描述每个车辆本身的特征(如颜色特征、形状特征、品牌特征等)。
作为一种可能的实现方式,可以预设第一像素点子集合中包括的像素点数量,从而可以直接从每个车辆对应的检测框内的掩膜区域随机抽取预设数量的像素点,分别构成每个车辆对应的第一像素点子集合。比如,预设数量为500,则可以从每个车辆对应的检测框内的掩膜区域中随机提取500个像素点,分别构成每个车辆对应的第一像素点子集合。
作为另一种可能的实现方式,还可以预设第一像素点子集合中的像素点数量与掩膜区域的像素点数量的比例,从而从每个车辆对应的检测框内的掩膜区域随机抽取预设比例的像素点,分别构成每个车辆对应的第一像素点子集合。比如,预设比例为80%,车辆A对应的检测框内的掩膜区域的像素点数量为1000,则可以从车辆对应的检测框 内的掩膜区域中随机提取800个像素点,构成车辆A对应的第一像素点子集合。
需要说明的是,从掩膜区域中提取第一像素点子集合的方式,可以包括但不限于以上列举的情形。实际使用时,可以根据实际需要及具体的应用场景,选取合适的提取方式,本申请实施例对此不做限定。
步骤204,从每个车辆对应的检测框内的非掩膜区域提取第二像素点子集合。
其中,检测框内的非掩膜区域,是指是指该检测框内除车辆之外的背景区域在检测框内的对应区域。车辆对应的第二像素点子集合,是指从车辆对应的检测框内的非掩膜区域提取的用于表征车辆背景的像素点集合。
作为一种可能的实现方式,由于对目标图像进行实例分割的结果可以是同时输出目标图像中各个车辆对应的检测框及检测框内的掩膜区域,从而可以直接将每个检测框内除掩膜区域之外的区域,分别确定为每个检测框内的非掩膜区域。
在本申请实施例中,由于车辆的颜色、外形等具有极大的相似性,仅通过车辆本身的像素点的特征对车辆特征进行描述,容易导致将不同车辆确定为同一车辆,从而导致车辆跟踪的结果不准确。因此,在本申请实施例一种可能的实现形式中,可以通过每个检测框内的背景区域像素对车辆特征进行辅助描述,以通过车辆的背景区域特征增强车辆特征之间的差异,提升车辆跟踪的准确性。从而,可以从每个车辆对应的检测框内的非掩膜区域随机提取一定数量的像素点,分别构成每个车辆对应的第二像素点子集合,以准确描述每个车辆的背景特征。
作为一种可能的实现方式,第一像素点子集合中包括的像素点数量,可以与第二像素点子集合中包括的像素点数量相同,以使车辆的特征中均衡融合了车辆本身的特征与车辆的背景特征,从而使得车辆的特征描述更准确,提升了车辆跟踪的准确性。因此,可以预设第一像素点子集合与第二像素点子集合中包括的像素点数量,并从每个车辆对应的检测框内的掩膜区域随机提取预设数量的像素点,分别构成每个车辆对应的第一像素点子集合,以及从每个车辆对应的检测框内的非掩膜区域随机提取预设数量的像素点,分别构成每个车辆对应的第二像素点子集合。
举例来说,预设数量为500,对于目标图像中的车辆A,可以从车辆A对应的检测框内的掩膜区域随机提取500个像素点,构成车辆A对应的第一像素点子集合,以及从车辆A对应的检测框内的非掩膜区域随机提取500个像素点,构成车辆A对应的第二像素点子集合。
作为另一种可能的实现方式,还可以为第一像素点子集合与第二像素点子集合分别分配不同的权重,以使提取的像素点集合中对表征车辆特征贡献较大的像素点较多,而对表征车辆特征贡献较小的像素点较少。需要说明的是,第一像素点子集合与第二 像素点子集合的权重,可以是根据大量实验数据标定的,本申请实施例对此不做限定。
举例来说,预设数量为500,通过实验数据标定的第一像素点子集合的权重为1,第二像素点子集合的权重为0.8,则对于目标图像中的车辆A,可以从车辆A对应的检测框内的掩膜区域随机提取500个像素点,构成车辆A对应的第一像素点子集合,以及从车辆A对应的检测框内的非掩膜区域随机提取400个像素点,构成车辆A对应的第二像素点子集合。
作为再一种可能的实现方式,第二像素点子集合中包括的像素点数量也可以与第一像素点子集合中包括的像素点数量不相关,即可以单独预设第二像素点子集合中包括的像素点数量,或者第二像素点子集合中的像素点数量与非掩膜区域的像素点数量的比例。之后,按照与步骤204中提取第一像素点子集合相同的方式,从非掩膜区域中提取第二像素点子集合,具体的实现过程及原理,可以参照步骤204的详细描述,此处不再赘述。
步骤205,利用预设的点云模型中的第一编码器,对每个车辆对应的第一像素点子集合中各个像素点的图像特征进行编码处理,以确定每个车辆对应的第一向量。
其中,预设的点云模型,是指预先训练的可以对输入的点集进行处理,生成点集对应的特征表示的模型。
其中,车辆对应的第一向量,可以是指车辆本身像素点的特征表示,可以用于表征车辆本身的特征。
其中,像素点的图像特征,可以包括像素点的RGB像素值等。
在本申请实施例中,由于点云模型可以根据输入的无序的点集数据,直接生成点集数据的特征表示,因此,利用点云模型生成车辆的特征,可以实现对车辆特征的高效率提取。作为一种可能的实现方式,可以预先确定车辆的特征类型,比如车辆的特征类型可以为ReID特征,并获取包含车辆的大量样本图像,之后对每个样本图像进行实例分割,生成每个样本图像中各个车辆对应的检测框及掩膜区域,进而利用ReID特征提取算法确定每个样本图像中各个车辆对应的掩膜区域的样本第一ReID特征,以及从检测框内的掩膜区域中提取样本第一像素点子集合,最后利用初始点云模型对每个车辆对应的样本第一ReID特征与样本第一像素点子集合的对应关系进行学习,生成预设的点云模型中的第一编码器。从而使得预设的点云模型中的第一编码器学习到了车辆的第一ReID特征与第一像素点子集合之间的关联性,因此,可以将车辆对应的第一像素点子集合中各像素点的图像特征输入预设的点云模型中的第一编码器,以使第一编码器对第一像素点子集合中各像素点的RGB像素值进行编码处理,生成车辆对应的第一向量,即车辆本身的ReID特征。
步骤206,利用预设的点云模型中的第二编码器,对每个车辆对应的第二像素点子集 合中各个像素点的图像特征进行编码处理,以确定每个车辆对应的第二向量。
其中,车辆对应的第二向量,可以是指车辆的背景像素点的特征表示,可以用于表征车辆的背景特征。
需要说明的是,由于车辆对应的第一像素点子集合用于表征车辆本身的特征,车辆对应的第二像素点子集合用于表征车辆的背景特征,因此,可以在点云模型中训练生成与第一编码器不同的第二编码器,对第二像素点子集合进行编码处理,以使生成的第二向量可以更加准确的表示车辆的背景特征。
在本申请实施例中,对每个样本图像进行实例分割,生成每个样本图像中各个车辆对应的检测框及掩膜区域后,可以利用ReID特征提取算法确定每个样本图像中各个车辆对应的检测框内的非掩膜区域的样本第二ReID特征,以及从检测框内的非掩膜区域中提取样本第二像素点子集合,之后利用初始点云模型对每个车辆对应的样本第二ReID特征与样本第二像素点子集合的对应关系进行学习,生成预设的点云模型中的第二编码器。从而使得预设的点云模型中的第二编码器学习到了车辆的第二ReID特征与第二像素点子集合之间的关联性,因此,可以将车辆对应的第二像素点子集合中各像素点的图像特征输入预设的点云模型中的第二编码器,以使第二编码器对第二像素点子集合中各像素点的RGB像素值进行编码处理,生成车辆对应的第二向量,即车辆的背景区域的ReID特征。
步骤207,利用预设的点云模型中的解码器,对每个车辆对应的第一向量及第二向量进行解码处理,以确定每个车辆的特征。
在本申请实施例中,由于采用了预设的点云模型中的不同网络分支,分别确定出车辆本身特征的向量表示,与车辆的背景特征的向量表示,从而还可以利用预设的点云模型中的解码器对每个车辆对应的第一向量与第二向量进行融合,以生成每个车辆的特征。
可选地,在本申请一种可能的实现方式中,可以利用预设的点云模型中的解码器,对每个车辆对应的第一向量与第二向量进行最大值池化处理,以实现每个车辆的第一向量与第二向量的融合,生成每个车辆的特征。
步骤208,根据目标图像中每个车辆的特征,与历史图像中各个车辆的特征间的匹配度,确定目标图像中每个车辆的运行轨迹,其中,历史图像为视频流中与目标图像相邻的前n帧图像,n为正整数。
上述步骤208的具体实现过程及原理,可以参照上述实施例的详细描述,此处不再赘述。
根据本申请实施例的技术方案,通过对视频流中当前时刻的目标图像进行实例分割,直接获取目标图像中各个车辆对应的检测框及掩膜区域,并从每个车辆对应的检测框内的掩膜区域提取第一像素点子集合,表征车辆的前景特征,以及从非掩膜区域 内提取第二像素点子集合,表征车辆的背景特征,进而利用预设的点云模型根据提取的像素点集合,生成车辆的特征,以根据目标图像中每个车辆的特征,与历史图像中各个车辆的特征间的匹配度,确定目标图像中每个车辆的运行轨迹。由此,通过利用点云模型融合车辆的前景特征和背景特征,实现了准确、高效的提取车辆特征从而进一步提升了车辆跟踪实时性和准确性。
在本申请一种可能的实现形式中,可以利用聚类算法实现对目标图像的实例分割,以实现直接生成车辆对应的检测框,提升车辆跟踪的实时性。
下面结合图4,对本申请实施例提供的车辆跟踪方法进行进一步说明。
图4为本申请实施例所提供的再一种车辆跟踪方法的流程示意图。
如图4所示,该车辆跟踪方法,包括以下步骤:
步骤301,从车辆行驶过程中采集的视频流中提取当前时刻的目标图像。
上述步骤301的具体实现过程及原理,可以参照上述实施例的详细描述,此处不再赘述。
步骤302,基于目标图像中各个像素点的特征,将目标图像中的像素点进行聚类处理,以根据聚类结果确定目标图像中各个车辆对应的检测框。
其中,像素点的特征,可以包括像素点的像素值、邻域像素、邻域像素的像素值等特征。实际使用时,可以根据实际需要选择使用的像素点的特征,本申请实施例对此不做限定。
在本申请实施例中,可以利用聚类算法,对目标图像中的各个像素点的特征进行聚类处理,以对目标图像中的各个像素点进行分类,确定目标图像中的各个像素点是否是车辆对应的像素点,以及是否是同一车辆对应的像素点。进而,根据每个车辆对应的像素点,生成各个车辆对应的检测框,即每个检测框可以包括同一车辆对应的所有像素点。
作为一种可能的实现方式,可以采用基于空间嵌入的实例分割算法,对目标图像中各像素点的特征进行分析,以对目标图像中的各个像素点进行聚类处理,进而直接根据对像素点的聚类结果,生成每个车辆对应的检测框,一步完成实例分割,实时性好。并且,基于空间嵌入的实例分割算法对于不同类型的实例,可以学习到不同的聚类半径,实例分割的准确性较高。
步骤303,从每个车辆对应的检测框内提取每个车辆对应的像素点集合。
步骤304,对每个车辆对应的像素点集合中各个像素点的图像特征进行处理,以确定目标图像中每个车辆的特征。
上述步骤303-304的具体实现过程及原理,可以参照上述实施例的详细描述,此处 不再赘述。
步骤305,若目标图像中第一车辆的特征,与历史图像中第二车辆的特征间的匹配度大于阈值,则根据目标图像的获取位置及获取时刻,对第二车辆的运行轨迹进行更新。
其中,第一车辆,是指目标图像中的任意一个车辆;第二车辆,是指既在历史图像中存在,又在目标图像中存在的车辆。
在本申请实施例中,可以通过度量学习的方式,确定目标图像中的每个车辆的特征与历史图像中各个车辆的特征间的匹配度。具体的,对于目标图像中的一个车辆,可以通过度量学习的方式,确定该车辆与历史图像中各个车辆的特征间的距离。由于特征间的距离越小,说明特征越相似,因此可以将该车辆与历史图像中各个车辆的特征间的距离的倒数,确定为该车辆与历史图像中各个车辆的特征间的匹配度。
作为一种可能的实现方式,对于目标图像中的一个第一车辆,可以将历史图像中与第一车辆的特征间的匹配度大于阈值的车辆,确定为第二车辆,进而根据历史图像中第二车辆的运行轨迹与目标图像的采集位置,将目标图像的采集位置作为第二车辆的运行轨迹的新增点,添加至第二车辆的运行轨迹中,以对第二车辆的运行轨迹进行更新。
作为另一种可能的实现方式,车辆的运行轨迹中不仅可以包括车辆的位置信息,还可以包括车辆运行至运行轨迹中的各个点的时刻信息。因此,在本申请实施例中,在将目标图像的采集位置作为第二车辆的运行轨迹的新增点,添加至第二车辆的运行轨迹中时,还可以将目标图像的采集时刻作为新增点的时刻信息添加至运行轨迹中,以提升车辆跟踪信息的准确性和丰富度。
举例来说,在将目标图像的采集位置作为第二车辆的运行轨迹的新增点,添加至第二车辆的运行轨迹中时,可以将新增点突出显示,并将新增点与上一相邻时刻添加至运行轨迹中的点进行连接,以及在新增点附近显示新增点的时刻信息(即目标图像的采集时刻)。
相应的,若历史图像中不存在与第一车辆的特征间的匹配度大于阈值的第二车辆,则可以确定第一车辆为在视频流中首次出现的新车辆,从而可以将目标图像的采集位置,确定为第一车辆的运行轨迹的起点,并将目标图像的采集时刻作为起点的时刻信息添加至第一车辆的运行轨迹中。
根据本申请实施例的技术方案,通过对目标图像中的像素点进行聚类处理,以直接获取目标图像中各个车辆对应的检测框,并从每个车辆对应的检测框内提取每个车辆对应的像素点集合,之后对每个车辆对应的像素点集合中各个像素点的图像特征进 行处理,以确定目标图像中每个车辆的特征,进而在目标图像中第一车辆的特征,与历史图像中第二车辆的特征间的匹配度大于阈值时,根据目标图像的获取位置及获取时刻,对第二车辆的运行轨迹进行更新。由此,通过聚类算法实现对目标图像的实例分割,直接滤除目标图像中包含的其他物体,实时获取目标图像中的车辆对应的检测框,并在车辆的运行轨迹中融入时刻信息,从而不仅进一步提升了车辆跟踪的实时性,而且进一步提升了车辆跟踪信息的准确性和丰富度。
为了实现上述实施例,本申请还提出一种车辆跟踪装置。
图5为本申请实施例提供的一种车辆跟踪装置的结构示意图。
如图5所示,该车辆跟踪装置40,包括:
第一提取模块41,用于从车辆行驶过程中采集的视频流中提取当前时刻的目标图像;
实例分割模块42,用于对目标图像进行实例分割,以获取目标图像中各个车辆对应的检测框;
第二提取模块43,用于从每个车辆对应的检测框内提取每个车辆对应的像素点集合;
第一确定模块44,用于对每个车辆对应的像素点集合中各个像素点的图像特征进行处理,以确定目标图像中每个车辆的特征;以及
第二确定模块45,用于根据目标图像中每个车辆的特征,与历史图像中各个车辆的特征间的匹配度,确定目标图像中每个车辆的运行轨迹,其中,历史图像为视频流中与目标图像相邻的前n帧图像,n为正整数。
在实际使用时,本申请实施例提供的车辆跟踪装置,可以被配置在任意电子设备中,以执行前述车辆跟踪方法。
根据本申请实施例的技术方案,通过对视频流中当前时刻的目标图像进行实例分割,直接获取目标图像中各个车辆对应的检测框,并从每个车辆对应的检测框内提取每个车辆对应的像素点集合,之后对每个车辆对应的像素点集合中各个像素点的图像特征进行处理,以确定目标图像中每个车辆的特征,进而根据目标图像中每个车辆的特征,与历史图像中各个车辆的特征间的匹配度,确定目标图像中每个车辆的运行轨迹。由此,通过对目标图像进行实例分割,直接滤除目标图像中包含的其他物体,实时获取目标图像中的车辆对应的检测框,以进行后续处理,从而提升了车辆跟踪的效率,实时性好。
在本申请一种可能的实现形式中,上述每个车辆对应的检测框内包括掩膜区域及非掩膜区域,其中,第二提取模块43,包括:
第一提取单元,用于从每个车辆对应的检测框内的掩膜区域中提取第一像素点子集合;以及
第二提取单元,用于从每个车辆对应的检测框内的非掩膜区域提取第二像素点子集合。
进一步的,在本申请另一种可能的实现形式中,上述第一确定模块44,包括:
第一确定单元,用于利用预设的点云模型中的第一编码器,对每个车辆对应的第一像素点子集合中各个像素点的图像特征进行编码处理,以确定每个车辆对应的第一向量;
第二确定单元,用于利用预设的点云模型中的第二编码器,对每个车辆对应的第二像素点子集合中各个像素点的图像特征进行编码处理,以确定每个车辆对应的第二向量;以及
第三确定单元,用于利用预设的点云模型中的解码器,对每个车辆对应的第一向量及第二向量进行解码处理,以确定每个车辆的特征。
进一步的,在本申请再一种可能的实现形式中,上述第一像素点子集合中包括的像素点数量,与上述第二像素点子集合中包括的像素点数量相同。
进一步的,在本申请又一种可能的实现形式中,上述实例分割模块42,包括:
聚类处理单元,用于基于目标图像中各个像素点的特征,将目标图像中的像素点进行聚类处理,以根据聚类结果确定目标图像中各个车辆对应的检测框。
进一步的,在本申请又一种可能的实现形式中,上述第二确定模块45,包括:
更新单元,用于在目标图像中第一车辆的特征,与历史图像中第二车辆的特征间的匹配度大于阈值时,根据目标图像的获取位置及获取时刻,对第二车辆的运行轨迹进行更新。
需要说明的是,前述对图1、图3、图4所示的车辆跟踪方法实施例的解释说明也适用于该实施例的车辆跟踪装置40,此处不再赘述。
根据本申请实施例的技术方案,通过对视频流中当前时刻的目标图像进行实例分割,直接获取目标图像中各个车辆对应的检测框及掩膜区域,并从每个车辆对应的检测框内的掩膜区域提取第一像素点子集合,表征车辆的前景特征,以及从非掩膜区域内提取第二像素点子集合,表征车辆的背景特征,进而利用预设的点云模型根据提取的像素点集合,生成车辆的特征,以根据目标图像中每个车辆的特征,与历史图像中各个车辆的特征间的匹配度,确定目标图像中每个车辆的运行轨迹。由此,通过利用点云模型融合车辆的前景特征和背景特征,实现了准确、高效的提取车辆特征从而进一步提升了车辆跟踪实时性和准确性。
根据本申请的实施例,本申请还提供了一种电子设备和一种可读存储介质。
如图6所示,是根据本申请实施例的车辆跟踪方法的电子设备的框图。电子设备旨在表示各种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还 可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本申请的实现。
如图6所示,该电子设备包括:一个或多个处理器501、存储器502,以及用于连接各部件的接口,包括高速接口和低速接口。各个部件利用不同的总线互相连接,并且可以被安装在公共主板上或者根据需要以其它方式安装。处理器可以对在电子设备内执行的指令进行处理,包括存储在存储器中或者存储器上以在外部输入/输出装置(诸如,耦合至接口的显示设备)上显示GUI的图形信息的指令。在其它实施方式中,若需要,可以将多个处理器和/或多条总线与多个存储器和多个存储器一起使用。同样,可以连接多个电子设备,各个电子设备提供部分必要的操作(例如,作为服务器阵列、一组刀片式服务器、或者多处理器系统)。图6中以一个处理器501为例。
存储器502即为本申请所提供的非瞬时计算机可读存储介质。其中,所述存储器存储有可由至少一个处理器执行的指令,以使所述至少一个处理器执行本申请所提供的车辆跟踪方法。本申请的非瞬时计算机可读存储介质存储计算机指令,该计算机指令用于使计算机执行本申请所提供的车辆跟踪方法。
存储器502作为一种非瞬时计算机可读存储介质,可用于存储非瞬时软件程序、非瞬时计算机可执行程序以及模块,如本申请实施例中的车辆跟踪方法对应的程序指令/模块(例如,附图5所示的第一提取模块41、实例分割模块42、第二提取模块43、第一确定模块44及第二确定模块45)。处理器501通过运行存储在存储器502中的非瞬时软件程序、指令以及模块,从而执行服务器的各种功能应用以及数据处理,即实现上述方法实施例中的车辆跟踪方法。
存储器502可以包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需要的应用程序;存储数据区可存储根据车辆跟踪方法的电子设备的使用所创建的数据等。此外,存储器502可以包括高速随机存取存储器,还可以包括非瞬时存储器,例如至少一个磁盘存储器件、闪存器件、或其他非瞬时固态存储器件。在一些实施例中,存储器502可选包括相对于处理器501远程设置的存储器,这些远程存储器可以通过网络连接至车辆跟踪方法的电子设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
车辆跟踪方法的电子设备还可以包括:输入装置503和输出装置504。处理器501、存储器502、输入装置503和输出装置504可以通过总线或者其他方式连接,图6中以通过总线连接为例。
输入装置503可接收输入的数字或字符信息,以及产生与车辆跟踪方法的电子设 备的用户设置以及功能控制有关的键信号输入,例如触摸屏、小键盘、鼠标、轨迹板、触摸板、指示杆、一个或者多个鼠标按钮、轨迹球、操纵杆等输入装置。输出装置504可以包括显示设备、辅助照明装置(例如,LED)和触觉反馈装置(例如,振动电机)等。该显示设备可以包括但不限于,液晶显示器(LCD)、发光二极管(LED)显示器和等离子体显示器。在一些实施方式中,显示设备可以是触摸屏。
此处描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、专用ASIC(专用集成电路)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。
这些计算程序(也称作程序、软件、软件应用、或者代码)包括可编程处理器的机器指令,并且可以利用高级过程和/或面向对象的编程语言、和/或汇编/机器语言来实施这些计算程序。如本文使用的,术语“机器可读介质”和“计算机可读介质”指的是用于将机器指令和/或数据提供给可编程处理器的任何计算机程序产品、设备、和/或装置(例如,磁盘、光盘、存储器、可编程逻辑装置(PLD)),包括,接收作为机器可读信号的机器指令的机器可读介质。术语“机器可读信号”指的是用于将机器指令和/或数据提供给可编程处理器的任何信号。
为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术,该计算机具有:用于向用户显示信息的显示装置(例如,CRT(阴极射线管)或者LCD(液晶显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。 通信网络的示例包括:局域网(LAN)、广域网(WAN)和互联网。
计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。
根据本申请实施例的技术方案,通过对视频流中当前时刻的目标图像进行实例分割,直接获取目标图像中各个车辆对应的检测框,并从每个车辆对应的检测框内提取每个车辆对应的像素点集合,之后对每个车辆对应的像素点集合中各个像素点的图像特征进行处理,以确定目标图像中每个车辆的特征,进而根据目标图像中每个车辆的特征,与历史图像中各个车辆的特征间的匹配度,确定目标图像中每个车辆的运行轨迹。由此,通过对目标图像进行实例分割,直接滤除目标图像中包含的其他物体,实时获取目标图像中的车辆对应的检测框,以进行后续处理,从而提升了车辆跟踪的效率,实时性好。
应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本发申请中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本申请公开的技术方案所期望的结果,本文在此不进行限制。
上述具体实施方式,并不构成对本申请保护范围的限制。本领域技术人员应该明白的是,根据设计要求和其他因素,可以进行各种修改、组合、子组合和替代。任何在本申请的精神和原则之内所作的修改、等同替换和改进等,均应包含在本申请保护范围之内。

Claims (14)

  1. 一种车辆跟踪方法,其中,包括:
    从车辆行驶过程中采集的视频流中提取当前时刻的目标图像;
    对所述目标图像进行实例分割,以获取所述目标图像中各个车辆对应的检测框;
    从每个所述车辆对应的检测框内提取每个车辆对应的像素点集合;
    对所述每个车辆对应的像素点集合中各个像素点的图像特征进行处理,以确定所述目标图像中每个车辆的特征;以及
    根据所述目标图像中每个车辆的特征,与历史图像中各个车辆的特征间的匹配度,确定所述目标图像中每个车辆的运行轨迹,其中,所述历史图像为所述视频流中与所述目标图像相邻的前n帧图像,n为正整数。
  2. 如权利要求1所述的方法,其中,每个车辆对应的检测框内包括掩膜区域及非掩膜区域,其中,所述从每个车辆对应的检测框内提取每个车辆对应的像素点集合,包括:
    从每个车辆对应的检测框内的掩膜区域中提取第一像素点子集合;以及
    从每个车辆对应的检测框内的非掩膜区域提取第二像素点子集合。
  3. 如权利要求2所述的方法,其中,所述对所述每个车辆对应的像素点集合中各个像素点的图像特征进行处理,包括:
    利用预设的点云模型中的第一编码器,对每个车辆对应的第一像素点子集合中各个像素点的图像特征进行编码处理,以确定每个车辆对应的第一向量;
    利用所述预设的点云模型中的第二编码器,对每个车辆对应的第二像素点子集合中各个像素点的图像特征进行编码处理,以确定每个车辆对应的第二向量;以及
    利用所述预设的点云模型中的解码器,对每个车辆对应的第一向量及第二向量进行解码处理,以确定每个车辆的特征。
  4. 如权利要求2或3所述的方法,其中,所述第一像素点子集合中包括的像素点数量,与所述第二像素点子集合中包括的像素点数量相同。
  5. 如权利要求1-4任一所述的方法,其中,所述对所述目标图像进行实例分割,以获取所述目标图像中各个车辆对应的检测框,包括:
    基于所述目标图像中各个像素点的特征,将所述目标图像中的像素点进行聚类处理, 以根据聚类结果确定所述目标图像中各个车辆对应的检测框。
  6. 如权利要求1-5任一所述的方法,其中,所述根据所述目标图像中每个车辆的特征,与历史图像中各个车辆的特征间的匹配度,确定所述目标图像中每个车辆的运行轨迹,包括:
    若所述目标图像中第一车辆的特征,与历史图像中第二车辆的特征间的匹配度大于阈值,则根据所述目标图像的获取位置及获取时刻,对所述第二车辆的运行轨迹进行更新。
  7. 一种车辆跟踪装置,其中,包括:
    第一提取模块,用于从车辆行驶过程中采集的视频流中提取当前时刻的目标图像;
    实例分割模块,用于对所述目标图像进行实例分割,以获取所述目标图像中各个车辆对应的检测框;
    第二提取模块,用于从每个所述车辆对应的检测框内提取每个车辆对应的像素点集合;
    第一确定模块,用于对所述每个车辆对应的像素点集合中各个像素点的图像特征进行处理,以确定所述目标图像中每个车辆的特征;以及
    第二确定模块,用于根据所述目标图像中每个车辆的特征,与历史图像中各个车辆的特征间的匹配度,确定所述目标图像中每个车辆的运行轨迹,其中,所述历史图像为所述视频流中与所述目标图像相邻的前n帧图像,n为正整数。
  8. 如权利要求7所述的装置,其中,每个车辆对应的检测框内包括掩膜区域及非掩膜区域,其中,第二提取模块,包括:
    第一提取单元,用于从每个车辆对应的检测框内的掩膜区域中提取第一像素点子集合;以及
    第二提取单元,用于从每个车辆对应的检测框内的非掩膜区域提取第二像素点子集合。
  9. 如权利要求8所述的装置,其中,所述第一确定模块,包括:
    第一确定单元,用于利用预设的点云模型中的第一编码器,对每个车辆对应的第一像素点子集合中各个像素点的图像特征进行编码处理,以确定每个车辆对应的第一向量;
    第二确定单元,用于利用所述预设的点云模型中的第二编码器,对每个车辆对应的第二像素点子集合中各个像素点的图像特征进行编码处理,以确定每个车辆对应的第二向量;以及
    第三确定单元,用于利用所述预设的点云模型中的解码器,对每个车辆对应的第一向 量及第二向量进行解码处理,以确定每个车辆的特征。
  10. 如权利要求8或9所述的装置,其中,所述第一像素点子集合中包括的像素点数量,与所述第二像素点子集合中包括的像素点数量相同。
  11. 如权利要求7-10任一所述的装置,其中,所述实例分割模块,包括:
    聚类处理单元,用于基于所述目标图像中各个像素点的特征,将所述目标图像中的像素点进行聚类处理,以根据聚类结果确定所述目标图像中各个车辆对应的检测框。
  12. 如权利要求7-11任一所述的装置,其中,所述第二确定模块,包括:
    更新单元,用于在所述目标图像中第一车辆的特征,与历史图像中第二车辆的特征间的匹配度大于阈值时,根据所述目标图像的获取位置及获取时刻,对所述第二车辆的运行轨迹进行更新。
  13. 一种电子设备,其中,包括:
    至少一个处理器;以及
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-6中任一项所述的方法。
  14. 一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于使所述计算机执行权利要求1-6中任一项所述的方法。
PCT/CN2020/125446 2020-05-29 2020-10-30 车辆跟踪方法、装置及电子设备 WO2021238062A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
KR1020227025961A KR20220113829A (ko) 2020-05-29 2020-10-30 차량 추적 방법, 장치 및 전자 기기
US17/995,752 US20230186486A1 (en) 2020-05-29 2020-10-30 Vehicle tracking method and apparatus, and electronic device
EP20938232.4A EP4116867A4 (en) 2020-05-29 2020-10-30 VEHICLE TRACKING METHOD AND APPARATUS AND ELECTRONIC DEVICE
JP2022545432A JP7429796B2 (ja) 2020-05-29 2020-10-30 車両追跡方法、装置及び電子機器

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010478496.9 2020-05-29
CN202010478496.9A CN111709328B (zh) 2020-05-29 2020-05-29 车辆跟踪方法、装置及电子设备

Publications (1)

Publication Number Publication Date
WO2021238062A1 true WO2021238062A1 (zh) 2021-12-02

Family

ID=72537343

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/125446 WO2021238062A1 (zh) 2020-05-29 2020-10-30 车辆跟踪方法、装置及电子设备

Country Status (6)

Country Link
US (1) US20230186486A1 (zh)
EP (1) EP4116867A4 (zh)
JP (1) JP7429796B2 (zh)
KR (1) KR20220113829A (zh)
CN (1) CN111709328B (zh)
WO (1) WO2021238062A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114463705A (zh) * 2022-02-07 2022-05-10 厦门市执象智能科技有限公司 一种基于视频流行为轨迹自动识别检测方法
CN114973169A (zh) * 2022-08-02 2022-08-30 山东建筑大学 基于多目标检测和跟踪的车辆分类计数方法及系统
CN115050190A (zh) * 2022-06-13 2022-09-13 天翼数字生活科技有限公司 一种道路车辆监控方法及其相关装置
CN116091552A (zh) * 2023-04-04 2023-05-09 上海鉴智其迹科技有限公司 基于DeepSORT的目标跟踪方法、装置、设备及存储介质

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111709328B (zh) * 2020-05-29 2023-08-04 北京百度网讯科技有限公司 车辆跟踪方法、装置及电子设备
CN112270244A (zh) * 2020-10-23 2021-01-26 平安科技(深圳)有限公司 目标物违规监控方法、装置、电子设备及存储介质
CN112489450B (zh) * 2020-12-21 2022-07-08 阿波罗智联(北京)科技有限公司 交通路口处的车辆流量控制方法、路侧设备及云控平台
CN112987764B (zh) * 2021-02-01 2024-02-20 鹏城实验室 降落方法、装置、无人机以及计算机可读存储介质
CN113160272B (zh) * 2021-03-19 2023-04-07 苏州科达科技股份有限公司 目标跟踪方法、装置、电子设备及存储介质
CN113901911B (zh) * 2021-09-30 2022-11-04 北京百度网讯科技有限公司 图像识别、模型训练方法、装置、电子设备及存储介质
CN114004864A (zh) * 2021-10-29 2022-02-01 北京百度网讯科技有限公司 对象追踪方法、相关装置及计算机程序产品
CN114067270B (zh) * 2021-11-18 2022-09-09 华南理工大学 一种车辆追踪方法和装置、计算机设备及存储介质
CN114155278A (zh) * 2021-11-26 2022-03-08 浙江商汤科技开发有限公司 目标跟踪及相关模型的训练方法和相关装置、设备、介质
CN117237418B (zh) * 2023-11-15 2024-01-23 成都航空职业技术学院 一种基于深度学习的运动目标检测方法和系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109389671A (zh) * 2018-09-25 2019-02-26 南京大学 一种基于多阶段神经网络的单图像三维重建方法
CN109816686A (zh) * 2019-01-15 2019-05-28 山东大学 基于物体实例匹配的机器人语义slam方法、处理器及机器人
CN110956643A (zh) * 2019-12-04 2020-04-03 齐鲁工业大学 一种基于MDNet的改进车辆跟踪方法及系统
CN111709328A (zh) * 2020-05-29 2020-09-25 北京百度网讯科技有限公司 车辆跟踪方法、装置及电子设备

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5344618B2 (ja) 2009-11-30 2013-11-20 住友電気工業株式会社 移動体追跡装置、追跡方法及びコンピュータプログラム
JP5036084B2 (ja) * 2010-10-14 2012-09-26 シャープ株式会社 映像処理装置、映像処理方法、及びプログラム
KR101382902B1 (ko) * 2012-06-29 2014-04-23 엘지이노텍 주식회사 차선이탈 경고 시스템 및 차선이탈 경고 방법
US9070289B2 (en) 2013-05-10 2015-06-30 Palo Alto Research Incorporated System and method for detecting, tracking and estimating the speed of vehicles from a mobile platform
CN104183127B (zh) * 2013-05-21 2017-02-22 北大方正集团有限公司 交通监控视频检测方法和装置
US9824434B2 (en) * 2015-08-18 2017-11-21 Industrial Technology Research Institute System and method for object recognition
JP6565661B2 (ja) 2015-12-17 2019-08-28 富士通株式会社 画像処理システム、画像類似判定方法および画像類似判定プログラム
US10679351B2 (en) * 2017-08-18 2020-06-09 Samsung Electronics Co., Ltd. System and method for semantic segmentation of images
CN107909005A (zh) * 2017-10-26 2018-04-13 西安电子科技大学 基于深度学习的监控场景下人物姿态识别方法
CN108053427B (zh) * 2017-10-31 2021-12-14 深圳大学 一种基于KCF与Kalman的改进型多目标跟踪方法、系统及装置
US11100352B2 (en) * 2018-10-16 2021-08-24 Samsung Electronics Co., Ltd. Convolutional neural network for object detection
CN109993091B (zh) * 2019-03-25 2020-12-15 浙江大学 一种基于背景消除的监控视频目标检测方法
CN110349138B (zh) * 2019-06-28 2021-07-27 歌尔股份有限公司 基于实例分割框架的目标物体的检测方法及装置
CN110895810B (zh) * 2019-10-24 2022-07-05 中科院广州电子技术有限公司 基于改进Mask RCNN的染色体图像实例分割方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109389671A (zh) * 2018-09-25 2019-02-26 南京大学 一种基于多阶段神经网络的单图像三维重建方法
CN109816686A (zh) * 2019-01-15 2019-05-28 山东大学 基于物体实例匹配的机器人语义slam方法、处理器及机器人
CN110956643A (zh) * 2019-12-04 2020-04-03 齐鲁工业大学 一种基于MDNet的改进车辆跟踪方法及系统
CN111709328A (zh) * 2020-05-29 2020-09-25 北京百度网讯科技有限公司 车辆跟踪方法、装置及电子设备

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
PAUL VOIGTLAENDER; MICHAEL KRAUSE; ALJOSA OSEP; JONATHON LUITEN; BERIN BALACHANDAR GNANA SEKAR; ANDREAS GEIGER; BASTIAN LEIBE: "MOTS: Multi-Object Tracking and Segmentation", ARXIV.ORG, 10 February 2019 (2019-02-10), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081027401 *
See also references of EP4116867A4 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114463705A (zh) * 2022-02-07 2022-05-10 厦门市执象智能科技有限公司 一种基于视频流行为轨迹自动识别检测方法
CN115050190A (zh) * 2022-06-13 2022-09-13 天翼数字生活科技有限公司 一种道路车辆监控方法及其相关装置
CN115050190B (zh) * 2022-06-13 2024-01-23 天翼数字生活科技有限公司 一种道路车辆监控方法及其相关装置
CN114973169A (zh) * 2022-08-02 2022-08-30 山东建筑大学 基于多目标检测和跟踪的车辆分类计数方法及系统
CN116091552A (zh) * 2023-04-04 2023-05-09 上海鉴智其迹科技有限公司 基于DeepSORT的目标跟踪方法、装置、设备及存储介质

Also Published As

Publication number Publication date
EP4116867A1 (en) 2023-01-11
KR20220113829A (ko) 2022-08-16
EP4116867A4 (en) 2024-02-07
JP2023511455A (ja) 2023-03-17
CN111709328B (zh) 2023-08-04
CN111709328A (zh) 2020-09-25
JP7429796B2 (ja) 2024-02-08
US20230186486A1 (en) 2023-06-15

Similar Documents

Publication Publication Date Title
WO2021238062A1 (zh) 车辆跟踪方法、装置及电子设备
US20200364443A1 (en) Method for acquiring motion track and device thereof, storage medium, and terminal
Zhou et al. Joint 3d instance segmentation and object detection for autonomous driving
Spencer et al. Defeat-net: General monocular depth via simultaneous unsupervised representation learning
Min et al. A new approach to track multiple vehicles with the combination of robust detection and two classifiers
Ashraf et al. Dogfight: Detecting drones from drones videos
CN111832568B (zh) 车牌识别方法、车牌识别模型的训练方法和装置
US8620026B2 (en) Video-based detection of multiple object types under varying poses
CN112528786B (zh) 车辆跟踪方法、装置及电子设备
WO2016034059A1 (zh) 基于颜色-结构特征的目标对象跟踪方法
Zhang et al. Deep learning in lane marking detection: A survey
WO2021082168A1 (zh) 一种场景图像中特定目标对象的匹配方法
Varghese et al. An efficient algorithm for detection of vacant spaces in delimited and non-delimited parking lots
CN113361344B (zh) 视频事件识别方法、装置、设备及存储介质
CN111767831B (zh) 用于处理图像的方法、装置、设备及存储介质
Rabiee et al. Crowd behavior representation: an attribute-based approach
CN112561053B (zh) 图像处理方法、预训练模型的训练方法、装置和电子设备
CN103942778A (zh) 一种主成分特征曲线分析的快速视频关键帧提取方法
US20230095533A1 (en) Enriched and discriminative convolutional neural network features for pedestrian re-identification and trajectory modeling
Yao et al. Coupled multivehicle detection and classification with prior objectness measure
Ou et al. FAMN: feature aggregation multipath network for small traffic sign detection
KR101826669B1 (ko) 동영상 검색 시스템 및 그 방법
Liao et al. Multi-scale saliency features fusion model for person re-identification
Antonio et al. Pedestrians' detection methods in video images: A literature review
Zuo et al. Road model prediction based unstructured road detection

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20938232

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022545432

Country of ref document: JP

Kind code of ref document: A

Ref document number: 20227025961

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2020938232

Country of ref document: EP

Effective date: 20221005

NENP Non-entry into the national phase

Ref country code: DE