WO2021238062A1 - 车辆跟踪方法、装置及电子设备 - Google Patents
车辆跟踪方法、装置及电子设备 Download PDFInfo
- Publication number
- WO2021238062A1 WO2021238062A1 PCT/CN2020/125446 CN2020125446W WO2021238062A1 WO 2021238062 A1 WO2021238062 A1 WO 2021238062A1 CN 2020125446 W CN2020125446 W CN 2020125446W WO 2021238062 A1 WO2021238062 A1 WO 2021238062A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- vehicle
- target image
- pixel
- image
- detection frame
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 77
- 238000001514 detection method Methods 0.000 claims abstract description 136
- 230000011218 segmentation Effects 0.000 claims abstract description 36
- 238000012545 processing Methods 0.000 claims abstract description 18
- 238000000605 extraction Methods 0.000 claims description 21
- 230000015654 memory Effects 0.000 claims description 20
- 239000000284 extract Substances 0.000 claims description 17
- 230000008569 process Effects 0.000 claims description 16
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 230000004438 eyesight Effects 0.000 abstract description 2
- 238000004422 calculation algorithm Methods 0.000 description 22
- 238000010586 diagram Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 238000004590 computer program Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012544 monitoring process Methods 0.000 description 3
- 230000003993 interaction Effects 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000003064 k means clustering Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 238000012916 structural analysis Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 230000016776 visual perception Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/58—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T9/00—Image coding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/469—Contour-based spatial representations, e.g. vector-coding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
- G06V10/751—Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/762—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/49—Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30241—Trajectory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30248—Vehicle exterior or interior
- G06T2207/30252—Vehicle exterior; Vicinity of vehicle
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/08—Detecting or categorising vehicles
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Definitions
- This application relates to the field of computer technology, in particular to the field of artificial intelligence computer vision and intelligent transportation technology, and proposes a vehicle tracking method, device and electronic equipment.
- Structural analysis of road traffic video, identification of vehicles in images, and tracking of vehicles are important technical capabilities for intelligent traffic visual perception.
- the detection model is usually used to detect objects in the image frame, determine the detection frame contained in the image frame, and perform feature extraction on the detection frame to determine the characteristics of the vehicle, and then based on the vehicle characteristics in the current image frame and historical detection results The matching degree between the two, the vehicle is tracked.
- this tracking method needs to determine the detection frame corresponding to the vehicle in two stages, it takes a long time and has poor real-time performance.
- a method, device, electronic equipment and storage medium for vehicle tracking are provided.
- a vehicle tracking method which includes: extracting a target image at the current moment from a video stream collected during a vehicle driving; performing instance segmentation on the target image to obtain each vehicle in the target image Corresponding detection frame; extract the pixel point set corresponding to each vehicle from the detection frame corresponding to each vehicle; process the image characteristics of each pixel point in the pixel point set corresponding to each vehicle to determine the The characteristics of each vehicle in the target image; and according to the matching degree between the characteristics of each vehicle in the target image and the characteristics of each vehicle in the historical image, the running trajectory of each vehicle in the target image is determined, where
- the historical image is the first n frames of images adjacent to the target image in the video stream, and n is a positive integer.
- a vehicle tracking device which includes: a first extraction module for extracting a target image at the current moment from a video stream collected during the driving of the vehicle; an instance segmentation module for extracting the target image Perform instance segmentation to obtain the detection frame corresponding to each vehicle in the target image; the second extraction module is used to extract the pixel point set corresponding to each vehicle from the detection frame corresponding to each vehicle; the first determination module , Used to process the image characteristics of each pixel in the pixel point set corresponding to each vehicle to determine the characteristics of each vehicle in the target image; and a second determining module, used to determine the characteristics of each vehicle in the target image; The matching degree between the characteristics of each vehicle in the historical image and the characteristics of each vehicle in the historical image determines the trajectory of each vehicle in the target image, where the historical image is the same as the target image in the video stream. For the first n adjacent frames, n is a positive integer.
- an electronic device which includes: at least one processor; and a memory communicatively connected with the at least one processor; wherein the memory stores the Instructions, the instructions are executed by the at least one processor, so that the at least one processor can execute the vehicle tracking method as described above.
- a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to make the computer execute the vehicle tracking method as described above.
- the detection frame corresponding to each vehicle in the target image is directly obtained, and the pixel corresponding to each vehicle is extracted from the detection frame corresponding to each vehicle Point set, and then process the image characteristics of each pixel in the pixel point set corresponding to each vehicle to determine the characteristics of each vehicle in the target image, and then according to the characteristics of each vehicle in the target image, and the historical image
- the matching degree between the features of the vehicles determines the trajectory of each vehicle in the target image.
- the other objects contained in the target image are directly filtered out, and the detection frame corresponding to the vehicle in the target image is obtained in real time for subsequent processing, thereby improving the efficiency of vehicle tracking and real-time performance.
- FIG. 1 is a schematic flowchart of a vehicle tracking method provided by an embodiment of the application
- Figure 2 is a schematic diagram of marking each vehicle in a target image
- FIG. 3 is a schematic flowchart of another vehicle tracking method provided by an embodiment of the application.
- FIG. 4 is a schematic flowchart of another vehicle tracking method provided by an embodiment of the application.
- FIG. 5 is a schematic structural diagram of a vehicle tracking device provided by an embodiment of the application.
- FIG. 6 is a schematic structural diagram of an electronic device provided by an embodiment of the application.
- the embodiment of the present application proposes a vehicle tracking method for the problem of long time-consuming and poor real-time performance in a vehicle tracking method that requires two stages to determine a detection frame corresponding to a vehicle in related technologies.
- FIG. 1 is a schematic flowchart of a vehicle tracking method provided by an embodiment of the application.
- the vehicle tracking method includes the following steps:
- Step 101 Extract the target image at the current moment from the video stream collected during the driving of the vehicle.
- the vehicle tracking method of the embodiment of the present application may be executed by the vehicle tracking device of the embodiment of the present application.
- the vehicle tracking device of the embodiment of the present application may be configured in any electronic device to execute the vehicle tracking method of the embodiment of the present application.
- the vehicle tracking device of the embodiment of the present application may be configured in a vehicle (such as an autonomous driving vehicle) to track the vehicle on the road in which the vehicle is traveling, so as to visually perceive the surrounding environment of the vehicle and improve the driving performance of the vehicle.
- a vehicle such as an autonomous driving vehicle
- Security or, the vehicle tracking device of the embodiment of the present application can also be configured in the server of the traffic management system, and used to identify violations of vehicles at the traffic monitoring intersection, traffic flow statistics, etc.
- the acquisition path of the video stream in the embodiment of the present application is related to the application scenario of the vehicle tracking method in the embodiment of the present application.
- the processor in the vehicle may establish a communication connection with the video capture device in the vehicle to obtain the video stream collected by the video capture device in real time;
- the server of the traffic management system can obtain the video stream collected by the monitoring equipment of the traffic intersection in real time.
- the target image may be a frame of image that is newly acquired by the video acquisition device when acquiring the video.
- the video stream collected by the video capture device can be obtained in real time, and each time a new frame of image in the video stream is obtained, the obtained new frame of image can be determined as the target at the current moment image.
- the target image at the current moment can be extracted every two frames from the captured video stream, that is, at the moment when the first frame, the third frame, the fifth frame, the seventh frame and other odd-numbered frames in the video stream are acquired. , Respectively determine each odd frame image as the target image.
- the vehicle tracking method of the embodiment of the present application can also be applied in non-real-time vehicle tracking scenarios, for example, to analyze given video data to determine the driving trajectory of a specific vehicle. Therefore, the vehicle tracking device of the embodiment of the present application can also directly obtain a piece of video data that has been photographed, analyze the video data, and determine each frame of image included in the video data as the target image in turn; or, it can also be inserted
- the frame method sequentially determines part of the image frames in the video data as the target image. For example, the odd-numbered frame images in the video data can be sequentially determined as the target image.
- Step 102 Perform instance segmentation on the target image to obtain a detection frame corresponding to each vehicle in the target image.
- any instance segmentation algorithm may be used to perform instance segmentation on the target image to determine each vehicle included in the target image, and generate a detection frame corresponding to each vehicle.
- each vehicle in the target image is all located in its corresponding detection frame, or most of the area of the vehicle is located in its corresponding detection frame.
- an appropriate instance segmentation algorithm can be selected according to actual needs or the computing performance of the electronic device to perform instance segmentation on the target image, which is not limited in the embodiment of the application.
- an instance segmentation algorithm based on spatial embedding, K-means (K-means) clustering algorithm, etc. can be used.
- Step 103 Extract a set of pixels corresponding to each vehicle from the detection frame corresponding to each vehicle.
- the pixel point set corresponding to the vehicle refers to a set of pixels extracted from the target image area in the detection frame corresponding to the vehicle.
- the target image is divided into instances, and after the detection frame corresponding to each vehicle in the target image is determined, most of the pixels in the detection frame corresponding to each vehicle are the pixels corresponding to the vehicle, so that the vehicle corresponds to The pixels in the detection frame can accurately describe the characteristics of the vehicle. Therefore, in the embodiment of the present application, the pixel point set corresponding to each vehicle can be extracted from the detection frame corresponding to each vehicle to describe the characteristics of each vehicle.
- the detection frame corresponding to the vehicle when extracting the pixel point set corresponding to the vehicle, can be divided into multiple sub-regions (for example, divided into N ⁇ N regions, where N is a positive integer greater than 1), A certain number of pixel points are randomly extracted from each sub-region of the detection frame corresponding to the vehicle to form a pixel point set corresponding to the vehicle.
- a preset number such as 100
- a preset ratio such as 80%
- the detection frame corresponding to the vehicle can also be divided into a central area and an edge area, and a certain amount is randomly extracted from the central area of the detection frame.
- the number of pixels constitutes a set of pixels corresponding to the vehicle.
- the size of the detection frame corresponding to vehicle A is 500 ⁇ 500 pixels
- 80% of the area located in the upper part of the detection frame can be determined as the center area, that is, the 400 ⁇ 400 pixel area located in the upper part of the detection frame can be determined as the center Area, where the position of the center point of the central area is the same as the position of the center point of the detection frame, and other areas in the detection frame are determined as edge areas, and then 80% of the pixels are randomly extracted from the central area of 400 ⁇ 400 pixels Points, constitute a set of pixel points corresponding to vehicle A.
- the detection frame corresponding to the vehicle when the detection frame corresponding to the vehicle is divided into a central area and an edge area, a certain number of pixels can also be randomly extracted from the central area and the edge area of the detection frame to form the corresponding vehicle.
- the pixel point set so that the pixel point set corresponding to the vehicle can not only include the pixel point corresponding to the vehicle, but also the pixel point corresponding to the background near the vehicle, so as to better describe the characteristics of the vehicle and improve the accuracy of vehicle tracking.
- a circular area with the center point of the detection frame as the center and a radius of 400 pixels can be determined as the central area of the detection frame, and the detection frame The other areas are determined as the edge area, and then 80% of the pixels are randomly extracted from the central area, and 80% of the pixels are randomly extracted from the edge area to form a set of pixels corresponding to vehicle A.
- Step 104 Process the image characteristics of each pixel in the pixel point set corresponding to each vehicle to determine the characteristics of each vehicle in the target image.
- the image characteristics of a pixel may include the pixel value of the pixel, the neighboring pixel value of the pixel, the positional relationship between the pixel and other pixels in the pixel set, the difference in pixel value, and the like.
- the image characteristics of the pixels to be used can be selected according to actual needs, which is not limited in the embodiment of the present application.
- the feature of the vehicle refers to the feature that can be used for target recognition determined by calculating or learning the image feature of each pixel in the pixel point set corresponding to the vehicle.
- the feature of the vehicle may be a ReID (Person re-identification, pedestrian re-identification) feature, HOG (Histogram of Oriented Gradient, gradient histogram) feature, Haar (Haar-like, Haar) feature, etc.
- a preset algorithm can be used to calculate or learn the image characteristics of each pixel in the pixel point set corresponding to each vehicle to pass the pixel point set.
- the image characteristics of each pixel in the set describe the vehicle, and the characteristics of each vehicle in the target image are generated.
- the feature type of the vehicle and the corresponding algorithm for determining the feature of the vehicle can be selected according to actual needs and specific application scenarios, which is not limited in the embodiment of the present application. For example, in order to improve real-time performance and computational efficiency, you can choose a highly efficient deep learning algorithm or image feature extraction algorithm to determine the characteristics of each vehicle in the target image.
- Step 105 Determine the trajectory of each vehicle in the target image according to the matching degree between the characteristics of each vehicle in the target image and the characteristics of each vehicle in the historical image, where the historical image is adjacent to the target image in the video stream
- the first n frames of image, n is a positive integer.
- the degree of matching between the characteristics of each vehicle in the target image and the characteristics of each vehicle in the historical image can be determined by means of metric learning. Specifically, for a vehicle in the target image, the distance between the vehicle and the features of each vehicle in the historical image can be determined by means of metric learning. Since the smaller the distance between the features, the more similar the features are. Therefore, the reciprocal of the distance between the vehicle and the features of each vehicle in the historical image can be determined as the degree of matching between the vehicle and the features of each vehicle in the historical image.
- the value of n can be 1, that is, each vehicle in the target image can be compared with the previous frame image adjacent to the target image in the video stream to determine each vehicle in the target image.
- the trajectory of the vehicle For a vehicle A in the target image, the vehicle whose matching degree between the characteristics of vehicle A in the historical image and the feature of vehicle A is greater than a threshold can be determined as vehicle A, and then according to the running trajectory of vehicle A in the historical image and the target image To determine the trajectory of vehicle A in the target image, determine the identity of vehicle A in the historical image as the identity of vehicle A in the target image, and display the identity of vehicle A in the target image to mark vehicle A . For example, if the logo of vehicle A in the historical image is "Car1", then the logo of vehicle A "Car1" can be displayed on the top of vehicle A. As shown in Figure 2, it is a schematic diagram of marking each vehicle in the target image.
- vehicle A is the new vehicle that appears for the first time in the video stream, so that the collection location of the target image can be determined as The starting point of the trajectory of vehicle A, and a new vehicle identification is assigned to vehicle A, and the identification of vehicle A is displayed in the target image to mark vehicle A.
- n can be an integer greater than 1, that is, each vehicle in the target image can be compared with multiple frames of images in the video stream that are located before and adjacent to the target image. , To determine the trajectory of each vehicle in the target image to improve the accuracy of vehicle tracking.
- a candidate vehicle whose matching degree with the feature of the vehicle A in the historical image is greater than a threshold can be determined first.
- the candidate vehicle can be determined as vehicle A, and then according to the trajectory of vehicle A in the historical image and the collection position of the target image, determine the trajectory of vehicle A in the target image, and The identification of vehicle A in the historical image is determined as the identification of vehicle A in the target image. If there are multiple frames of images that contain candidate vehicles, it can be determined whether the candidate vehicles in each frame of historical image are the same vehicle. If so, the candidate vehicle in the historical image whose acquisition time is closest to the acquisition time of the target image can be determined as the vehicle A, and determine the running trajectory of the vehicle A in the target image according to the running trajectory of the vehicle A in the historical image and the collection position of the target image that is closest to the collection time of the target image.
- vehicle A is the new vehicle that appears for the first time in the video stream, so that the collection position of the target image can be determined.
- It is determined as the starting point of the running track of vehicle A, and a new vehicle identification is assigned to vehicle A, and the identification of vehicle A is displayed in the target image to mark vehicle A.
- the matching degree between the feature of a vehicle in the target image and the features of multiple vehicles in the historical image is greater than a threshold
- the The vehicle with the greatest degree of matching between features is determined as the vehicle.
- the matching degree between the feature of each vehicle in the target image and the feature of each vehicle in the historical image can be determined first, and then the degree of matching with each feature in the target image can be determined.
- the matching relationship between each vehicle in the target image and each vehicle in the historical image is determined, and then the Hungarian algorithm is used to match the relationship between the vehicle in the target image and each vehicle in the historical image Perform analysis to determine the vehicle that uniquely matches each vehicle in the target image in the historical image.
- n can be determined according to actual needs and specific application scenarios, which is not limited in the embodiment of the present application.
- the vehicle tracking method of the embodiment of the present application is applied to a traffic management scene, since the monitoring equipment at the traffic intersection is fixed, only the previous frame image adjacent to the target image can be compared to determine each of the target images. The running track of the vehicle, so that the value of n can be 1.
- the vehicle tracking method of the embodiment of the present application is applied to scenes such as vehicle automatic driving and assisted driving, the location of the video collection device is constantly changing during the driving of the vehicle. Yes, and the situation of overtaking and being overtaken will occur during the driving of the vehicle. If only the previous frame image adjacent to the target image is compared, it will easily lead to inaccurate vehicle tracking results. Therefore, n can be determined to be greater than 1. Integer to improve the accuracy of vehicle tracking.
- the detection frame corresponding to each vehicle in the target image is directly obtained, and the corresponding detection frame of each vehicle is extracted from the detection frame corresponding to each vehicle. Then, the image characteristics of each pixel in the pixel set corresponding to each vehicle are processed to determine the characteristics of each vehicle in the target image, and then according to the characteristics of each vehicle in the target image, and the historical image The matching degree between the characteristics of each vehicle in the target image determines the trajectory of each vehicle in the target image.
- the other objects contained in the target image are directly filtered out, and the detection frame corresponding to the vehicle in the target image is obtained in real time for subsequent processing, thereby improving the efficiency of vehicle tracking and real-time performance.
- the point cloud model can be used to process the pixels in the foreground area of the detection frame (that is, the pixels corresponding to the vehicle in the detection frame) and the pixels in the background area, respectively, to determine the target
- the characteristics of each vehicle in the image can be extracted accurately and efficiently, which further improves the real-time and accuracy of vehicle tracking.
- FIG. 3 is a schematic flowchart of another vehicle tracking method provided by an embodiment of the application.
- the vehicle tracking method includes the following steps:
- Step 201 Extract the target image at the current moment from the video stream collected during the driving of the vehicle.
- Step 202 Perform instance segmentation on the target image to obtain a detection frame corresponding to each vehicle in the target image.
- Step 203 Extract a first sub-set of pixel points from the mask area in the detection frame corresponding to each vehicle.
- the mask area in the detection frame refers to the corresponding area of the vehicle in the detection frame in the detection frame.
- the first pixel point sub-set corresponding to the vehicle refers to the set of pixels corresponding to the vehicle extracted from the mask area in the detection frame corresponding to the vehicle.
- the result of the instance segmentation of the target image may be to output the detection frame corresponding to each vehicle in the target image and the mask area in the detection frame at the same time.
- the instance segmentation algorithm can be used to identify each vehicle in the target image, and generate the detection frame corresponding to each vehicle, as well as the mask area corresponding to the vehicle in each detection frame, and each detection frame excludes the mask area
- the outer area is the unmasked area corresponding to the background area, that is, the detection frame corresponding to each vehicle can include a masked area and an unmasked area.
- the algorithm for instance segmentation of the target image can be any instance segmentation algorithm that can directly identify a specific type of target, and can output the detection frame and mask area corresponding to the specific type of target at the same time.
- the application embodiment does not limit this.
- it may be an instance segmentation algorithm based on clustering, such as an instance segmentation algorithm based on space embedding, K-means clustering algorithm, etc.
- the mask area in the detection frame corresponding to the vehicle can represent the corresponding area of the vehicle in the detection frame
- the pixels of the mask area in the detection frame corresponding to the vehicle can accurately describe the characteristics of the vehicle itself. feature. Therefore, a certain number of pixel points can be randomly extracted from the mask area in the detection frame corresponding to each vehicle to form the first pixel point subset corresponding to each vehicle to accurately describe the characteristics of each vehicle itself (such as color features). , Shape features, brand features, etc.).
- the number of pixels included in the first pixel point subset can be preset, so that a preset number of pixels can be randomly selected from the mask area in the detection frame corresponding to each vehicle to form respectively The first pixel sub-set corresponding to each vehicle. For example, if the preset number is 500, 500 pixels can be randomly extracted from the mask area in the detection frame corresponding to each vehicle to form the first pixel point subset corresponding to each vehicle.
- the ratio of the number of pixels in the first pixel point subset to the number of pixels in the mask area so as to randomly extract from the mask area in the detection frame corresponding to each vehicle
- the pixel points of the preset ratio respectively constitute the first pixel point sub-set corresponding to each vehicle. For example, if the preset ratio is 80% and the number of pixels in the mask area in the detection frame corresponding to vehicle A is 1000, 800 pixels can be randomly extracted from the mask area in the detection frame corresponding to the vehicle to form a vehicle The first pixel sub-set corresponding to A.
- the manner of extracting the first pixel point subset from the mask area may include, but is not limited to, the situations listed above.
- an appropriate extraction method can be selected according to actual needs and specific application scenarios, which is not limited in the embodiment of the application.
- Step 204 Extract a second sub-set of pixel points from the unmasked area in the detection frame corresponding to each vehicle.
- the non-masked area in the detection frame refers to the corresponding area in the detection frame of the background area other than the vehicle in the detection frame.
- the second pixel point sub-set corresponding to the vehicle refers to the pixel point set that is extracted from the unmasked area in the detection frame corresponding to the vehicle and used to characterize the background of the vehicle.
- the result of instance segmentation of the target image can be to output the detection frame corresponding to each vehicle in the target image and the mask area in the detection frame at the same time, so that each detection frame can be directly unmasked.
- the area outside the film area is determined as the unmasked area in each detection frame.
- the vehicle characteristics can be assisted by the background area pixels in each detection frame, so as to enhance the difference between the vehicle characteristics through the background area characteristics of the vehicle and improve the vehicle. Accuracy of tracking. Therefore, a certain number of pixel points can be randomly extracted from the unmasked area in the detection frame corresponding to each vehicle to form a second pixel point sub-set corresponding to each vehicle to accurately describe the background characteristics of each vehicle.
- the number of pixels included in the first pixel point subset may be the same as the number of pixels included in the second pixel point subset, so that the characteristics of the vehicle are evenly integrated with the characteristics of the vehicle itself.
- the background features of the vehicle make the feature description of the vehicle more accurate and improve the accuracy of vehicle tracking. Therefore, the number of pixels included in the first pixel point subset and the second pixel point subset can be preset, and the preset number of pixels can be randomly extracted from the mask area in the detection frame corresponding to each vehicle to form each The first pixel point sub-set corresponding to the vehicle, and a preset number of pixel points are randomly extracted from the unmasked area in the detection frame corresponding to each vehicle to form the second pixel point sub-set corresponding to each vehicle.
- the preset number is 500.
- 500 pixels can be randomly extracted from the mask area in the detection frame corresponding to vehicle A to form the first pixel sub-set corresponding to vehicle A, and 500 pixels are randomly extracted from the unmasked area in the detection frame corresponding to vehicle A to form a second pixel sub-set corresponding to vehicle A.
- weights may be assigned to the first pixel point subset and the second pixel point subset respectively, so that more pixels in the extracted pixel point set contribute more to characterizing vehicle features. There are fewer pixels that contribute less to characterizing vehicle features. It should be noted that the weights of the first pixel point subset and the second pixel point subset may be calibrated based on a large amount of experimental data, which is not limited in the embodiment of the present application.
- the preset number is 500
- the weight of the first pixel point subset calibrated by experimental data is 1, and the weight of the second pixel point subset is 0.8
- the corresponding 500 pixels are randomly extracted from the masked area in the detection frame to form the first pixel sub-set corresponding to vehicle A
- 400 pixels are randomly extracted from the unmasked area in the detection frame corresponding to vehicle A to form the corresponding vehicle A
- the second pixel point sub-collection for the vehicle A in the target image, the corresponding 500 pixels are randomly extracted from the masked area in the detection frame to form the first pixel sub-set corresponding to vehicle A, and 400 pixels are randomly extracted from the unmasked area in the detection frame corresponding to vehicle A to form the corresponding vehicle A
- the second pixel point sub-collection for the vehicle A in the target image, the corresponding 500 pixels are randomly extracted from the masked area in the detection frame to form the first pixel sub-set corresponding to vehicle A
- 400 pixels are randomly extracted from the unmasked area in the detection frame corresponding
- the number of pixels included in the second pixel point subset may also be irrelevant to the number of pixels included in the first pixel point subset, that is, the number of pixels included in the second pixel point subset may be individually preset The number of pixels, or the ratio of the number of pixels in the second pixel point subset to the number of pixels in the unmasked area.
- the second pixel point subset is extracted from the unmasked area in the same way as the first pixel point subset is extracted in step 204.
- the specific implementation process and principle please refer to the detailed description of step 204, which will not be repeated here. .
- Step 205 Use the first encoder in the preset point cloud model to encode the image features of each pixel in the first pixel sub-set corresponding to each vehicle to determine the first vector corresponding to each vehicle.
- the preset point cloud model refers to a pre-trained model that can process the input point set and generate a feature representation corresponding to the point set.
- the first vector corresponding to the vehicle may refer to the feature representation of the pixels of the vehicle itself, and may be used to characterize the characteristics of the vehicle itself.
- the image characteristics of the pixel may include the RGB pixel value of the pixel and so on.
- the point cloud model can directly generate the feature representation of the point set data according to the input disordered point set data, the use of the point cloud model to generate the characteristics of the vehicle can achieve high efficiency of the vehicle characteristics. extract.
- the feature type of the vehicle can be determined in advance.
- the feature type of the vehicle can be the ReID feature, and a large number of sample images containing the vehicle can be obtained, and then each sample image is instance segmented to generate each sample image
- the detection frame and mask area corresponding to each vehicle in the detection frame and then use the ReID feature extraction algorithm to determine the sample first ReID feature of the mask area corresponding to each vehicle in each sample image, and extract the sample from the mask area in the detection frame
- the first pixel point subset, and finally the initial point cloud model is used to learn the correspondence between the sample first ReID feature of each vehicle and the sample first pixel point subset to generate the first encoder in the preset point cloud model.
- the first encoder in the preset point cloud model learns the correlation between the first ReID feature of the vehicle and the first pixel point subset. Therefore, each pixel point in the first pixel point subset corresponding to the vehicle can be The image features of is input into the first encoder in the preset point cloud model, so that the first encoder encodes the RGB pixel value of each pixel in the first pixel point subset to generate the first vector corresponding to the vehicle, that is The ReID feature of the vehicle itself.
- Step 206 Use the second encoder in the preset point cloud model to encode the image features of each pixel in the second pixel subset corresponding to each vehicle to determine the second vector corresponding to each vehicle.
- the second vector corresponding to the vehicle may refer to the feature representation of the background pixel of the vehicle, and may be used to characterize the background feature of the vehicle.
- the point cloud model can be trained to generate and A second encoder with a different encoder performs encoding processing on the second pixel point subset, so that the generated second vector can more accurately represent the background characteristics of the vehicle.
- each sample image is divided into instances, and after the detection frame and mask area corresponding to each vehicle in each sample image are generated, the ReID feature extraction algorithm can be used to determine the corresponding vehicle in each sample image.
- the second ReID feature of the sample in the unmasked area in the detection frame, and the second pixel point subset of the sample is extracted from the unmasked area in the detection frame, and then the second ReID of the sample corresponding to each vehicle is determined using the initial point cloud model
- the corresponding relationship between the feature and the second pixel point subset of the sample is learned, and the second encoder in the preset point cloud model is generated.
- the second encoder in the preset point cloud model learns the correlation between the second ReID feature of the vehicle and the second pixel point subset. Therefore, each pixel point in the second pixel point subset corresponding to the vehicle can be The image features of is input into the second encoder in the preset point cloud model, so that the second encoder encodes the RGB pixel values of each pixel in the second pixel point subset to generate a second vector corresponding to the vehicle, that is The ReID feature of the background area of the vehicle.
- Step 207 Use the decoder in the preset point cloud model to decode the first vector and the second vector corresponding to each vehicle to determine the characteristics of each vehicle.
- the vector representation of the vehicle's own characteristics and the vector representation of the background characteristics of the vehicle are respectively determined, so that the preset point cloud can also be used
- the decoder in the model fuses the first vector and the second vector corresponding to each vehicle to generate the characteristics of each vehicle.
- a decoder in a preset point cloud model can be used to perform maximum pooling processing on the first vector and the second vector corresponding to each vehicle to achieve The fusion of the first vector and the second vector of each vehicle generates the characteristics of each vehicle.
- Step 208 Determine the running trajectory of each vehicle in the target image based on the characteristics of each vehicle in the target image and the degree of matching between the characteristics of each vehicle in the historical image, where the historical image is adjacent to the target image in the video stream
- the first n frames of image, n is a positive integer.
- the detection frame and mask area corresponding to each vehicle in the target image are directly obtained, and the detection frame corresponding to each vehicle
- the mask area extracts the first pixel point sub-set to characterize the foreground features of the vehicle, and extracts the second pixel point sub-set from the unmasked area to characterize the background characteristics of the vehicle, and then uses the preset point cloud model according to the extracted pixel point set , Generate the characteristics of the vehicle to determine the trajectory of each vehicle in the target image according to the matching degree between the characteristics of each vehicle in the target image and the characteristics of each vehicle in the historical image.
- the point cloud model to fuse the foreground and background features of the vehicle, accurate and efficient extraction of vehicle features is achieved, thereby further improving the real-time and accuracy of vehicle tracking.
- a clustering algorithm can be used to implement instance segmentation of the target image, so as to directly generate a detection frame corresponding to the vehicle, and improve the real-time performance of vehicle tracking.
- FIG. 4 is a schematic flowchart of another vehicle tracking method provided by an embodiment of the application.
- the vehicle tracking method includes the following steps:
- Step 301 Extract the target image at the current moment from the video stream collected during the driving of the vehicle.
- Step 302 Perform clustering processing on the pixels in the target image based on the characteristics of each pixel in the target image, so as to determine a detection frame corresponding to each vehicle in the target image according to the clustering result.
- the characteristics of a pixel may include the pixel value of the pixel, the pixel value of the neighborhood pixel, and the pixel value of the neighborhood pixel.
- the characteristics of the pixels to be used can be selected according to actual needs, which is not limited in the embodiment of the present application.
- a clustering algorithm can be used to cluster the characteristics of each pixel in the target image to classify each pixel in the target image and determine whether each pixel in the target image is The pixels corresponding to the vehicle, and whether they are the pixels corresponding to the same vehicle. Furthermore, according to the pixel points corresponding to each vehicle, a detection frame corresponding to each vehicle is generated, that is, each detection frame may include all pixels corresponding to the same vehicle.
- an instance segmentation algorithm based on spatial embedding can be used to analyze the characteristics of each pixel in the target image to perform clustering processing on each pixel in the target image, and then directly according to the pixel point Based on the clustering results, the detection frame corresponding to each vehicle is generated, and the instance segmentation is completed in one step, with good real-time performance.
- the instance segmentation algorithm based on spatial embedding can learn different clustering radii for different types of instances, and the accuracy of instance segmentation is high.
- Step 303 Extract a set of pixels corresponding to each vehicle from the detection frame corresponding to each vehicle.
- Step 304 Process the image characteristics of each pixel in the pixel point set corresponding to each vehicle to determine the characteristics of each vehicle in the target image.
- Step 305 If the matching degree between the feature of the first vehicle in the target image and the feature of the second vehicle in the historical image is greater than the threshold, update the running trajectory of the second vehicle according to the acquisition location and acquisition time of the target image.
- the first vehicle refers to any vehicle in the target image;
- the second vehicle refers to a vehicle that exists both in the historical image and in the target image.
- the degree of matching between the feature of each vehicle in the target image and the feature of each vehicle in the historical image can be determined by means of metric learning.
- the distance between the vehicle and the features of each vehicle in the historical image can be determined by means of metric learning. Since the smaller the distance between the features, the more similar the features are. Therefore, the reciprocal of the distance between the vehicle and the features of each vehicle in the historical image can be determined as the degree of matching between the vehicle and the features of each vehicle in the historical image.
- a vehicle in the historical image whose matching degree with the characteristics of the first vehicle is greater than a threshold can be determined as the second vehicle, and then according to the first vehicle in the historical image 2.
- the running trajectory of the vehicle and the collection position of the target image, the collection position of the target image is taken as a new point of the running trajectory of the second vehicle and added to the running trajectory of the second vehicle to update the running trajectory of the second vehicle .
- the running track of the vehicle may include not only the position information of the vehicle, but also the time information of the vehicle running to each point in the running track. Therefore, in the embodiment of the present application, when the acquisition position of the target image is taken as the newly added point of the second vehicle's trajectory and added to the second vehicle's trajectory, the acquisition time of the target image can also be used as the newly added point. The time information of the point is added to the running track to improve the accuracy and richness of vehicle tracking information.
- the new point when the acquisition position of the target image is used as the new point of the second vehicle's trajectory, and when it is added to the second vehicle's trajectory, the new point can be highlighted, and the new point can be compared with the previous one.
- the points added to the running track at adjacent moments are connected, and the time information of the newly-added point (that is, the collection time of the target image) is displayed near the newly-added point.
- the first vehicle is a new vehicle that appears for the first time in the video stream, so that the target image can be collected
- the position is determined as the starting point of the running trajectory of the first vehicle, and the time information at which the collection time of the target image is taken as the starting point is added to the running trajectory of the first vehicle.
- the pixel points in the target image are clustered to directly obtain the detection frame corresponding to each vehicle in the target image, and the detection frame corresponding to each vehicle is extracted from the detection frame corresponding to each vehicle. Then, the image characteristics of each pixel in the pixel set corresponding to each vehicle are processed to determine the characteristics of each vehicle in the target image, and then the characteristics of the first vehicle in the target image are compared with the historical image When the degree of matching between the features of the second vehicle in the second vehicle is greater than the threshold, the running track of the second vehicle is updated according to the acquisition location and acquisition time of the target image.
- the instance segmentation of the target image is realized through the clustering algorithm, other objects contained in the target image are directly filtered out, the detection frame corresponding to the vehicle in the target image is obtained in real time, and the time information is integrated into the vehicle's trajectory, thereby It not only further improves the real-time performance of vehicle tracking, but also further improves the accuracy and richness of vehicle tracking information.
- this application also proposes a vehicle tracking device.
- Fig. 5 is a schematic structural diagram of a vehicle tracking device provided by an embodiment of the application.
- the vehicle tracking device 40 includes:
- the first extraction module 41 is configured to extract the target image at the current moment from the video stream collected during the driving of the vehicle;
- the instance segmentation module 42 is used to perform instance segmentation on the target image to obtain the detection frame corresponding to each vehicle in the target image;
- the second extraction module 43 is configured to extract a set of pixels corresponding to each vehicle from the detection frame corresponding to each vehicle;
- the first determining module 44 is configured to process the image characteristics of each pixel in the pixel point set corresponding to each vehicle to determine the characteristics of each vehicle in the target image;
- the second determination module 45 is used to determine the running trajectory of each vehicle in the target image according to the matching degree between the characteristics of each vehicle in the target image and the characteristics of each vehicle in the historical image, where the historical image is in the video stream For the first n frames of images adjacent to the target image, n is a positive integer.
- the vehicle tracking device provided in the embodiments of the present application can be configured in any electronic device to execute the aforementioned vehicle tracking method.
- the detection frame corresponding to each vehicle in the target image is directly obtained, and the corresponding detection frame of each vehicle is extracted from the detection frame corresponding to each vehicle. Then, the image characteristics of each pixel in the pixel set corresponding to each vehicle are processed to determine the characteristics of each vehicle in the target image, and then according to the characteristics of each vehicle in the target image, and the historical image The matching degree between the characteristics of each vehicle in the target image determines the trajectory of each vehicle in the target image.
- the other objects contained in the target image are directly filtered out, and the detection frame corresponding to the vehicle in the target image is obtained in real time for subsequent processing, thereby improving the efficiency of vehicle tracking and real-time performance.
- the detection frame corresponding to each vehicle includes a masked area and a non-masked area
- the second extraction module 43 includes:
- the first extraction unit is configured to extract the first sub-set of pixel points from the mask area in the detection frame corresponding to each vehicle;
- the second extraction unit is used to extract a second sub-set of pixel points from the non-masked area in the detection frame corresponding to each vehicle.
- the above-mentioned first determining module 44 includes:
- the first determining unit is configured to use the first encoder in the preset point cloud model to encode the image characteristics of each pixel in the first pixel sub-set corresponding to each vehicle, so as to determine the corresponding image feature of each vehicle.
- the second determining unit is configured to use the second encoder in the preset point cloud model to encode the image characteristics of each pixel in the second pixel point subset corresponding to each vehicle to determine the corresponding The second vector;
- the third determining unit is configured to use the decoder in the preset point cloud model to decode the first vector and the second vector corresponding to each vehicle to determine the characteristics of each vehicle.
- the number of pixels included in the first pixel point subset is the same as the number of pixels included in the second pixel point subset.
- the above-mentioned instance segmentation module 42 includes:
- the clustering processing unit is configured to perform clustering processing on the pixels in the target image based on the characteristics of each pixel in the target image, so as to determine the detection frame corresponding to each vehicle in the target image according to the clustering result.
- the above-mentioned second determination module 45 includes:
- the update unit is used for when the matching degree between the characteristics of the first vehicle in the target image and the characteristics of the second vehicle in the historical image is greater than the threshold, the running track of the second vehicle is performed according to the acquisition position and time of the target image. renew.
- the detection frame and mask area corresponding to each vehicle in the target image are directly obtained, and the detection frame corresponding to each vehicle
- the mask area extracts the first pixel point sub-set to characterize the foreground features of the vehicle, and extracts the second pixel point sub-set from the unmasked area to characterize the background characteristics of the vehicle, and then uses the preset point cloud model according to the extracted pixel point set , Generate the characteristics of the vehicle to determine the trajectory of each vehicle in the target image according to the matching degree between the characteristics of each vehicle in the target image and the characteristics of each vehicle in the historical image.
- the point cloud model to fuse the foreground and background features of the vehicle, accurate and efficient extraction of vehicle features is achieved, thereby further improving the real-time and accuracy of vehicle tracking.
- the present application also provides an electronic device and a readable storage medium.
- FIG. 6 it is a block diagram of an electronic device of a vehicle tracking method according to an embodiment of the present application.
- Electronic devices are intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers.
- Electronic devices can also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices.
- the components shown herein, their connections and relationships, and their functions are merely examples, and are not intended to limit the implementation of the application described and/or required herein.
- the electronic device includes one or more processors 501, a memory 502, and interfaces for connecting various components, including a high-speed interface and a low-speed interface.
- the various components are connected to each other using different buses, and can be installed on a common motherboard or installed in other ways as needed.
- the processor may process instructions executed in the electronic device, including instructions stored in or on the memory to display graphical information of the GUI on an external input/output device (such as a display device coupled to an interface).
- an external input/output device such as a display device coupled to an interface.
- multiple processors and/or multiple buses can be used with multiple memories and multiple memories.
- multiple electronic devices can be connected, and each electronic device provides part of the necessary operations (for example, as a server array, a group of blade servers, or a multi-processor system).
- a processor 501 is taken as an example.
- the memory 502 is a non-transitory computer-readable storage medium provided by this application.
- the memory stores instructions executable by at least one processor, so that the at least one processor executes the vehicle tracking method provided in this application.
- the non-transitory computer-readable storage medium of the present application stores computer instructions, and the computer instructions are used to make a computer execute the vehicle tracking method provided by the present application.
- the memory 502 as a non-transitory computer-readable storage medium, can be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to the vehicle tracking method in the embodiments of the present application (for example, attached
- the processor 501 executes various functional applications and data processing of the server by running the non-transient software programs, instructions, and modules stored in the memory 502, that is, implements the vehicle tracking method in the foregoing method embodiment.
- the memory 502 may include a storage program area and a storage data area.
- the storage program area may store an operating system and an application program required by at least one function; the storage data area may store data created according to the use of the electronic device of the vehicle tracking method, etc. .
- the memory 502 may include a high-speed random access memory, and may also include a non-transitory memory, such as at least one magnetic disk storage device, a flash memory device, or other non-transitory solid-state storage devices.
- the memory 502 may optionally include memories remotely provided with respect to the processor 501, and these remote memories may be connected to the electronic device of the vehicle tracking method via a network. Examples of the aforementioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
- the electronic device of the vehicle tracking method may further include: an input device 503 and an output device 504.
- the processor 501, the memory 502, the input device 503, and the output device 504 may be connected by a bus or in other ways. In FIG. 6, the connection by a bus is taken as an example.
- the input device 503 can receive input digital or character information, and generate key signal input related to the user settings and function control of the electronic device of the vehicle tracking method, such as touch screen, keypad, mouse, track pad, touch pad, pointing stick, One or more mouse buttons, trackballs, joysticks and other input devices.
- the output device 504 may include a display device, an auxiliary lighting device (for example, LED), a tactile feedback device (for example, a vibration motor), and the like.
- the display device may include, but is not limited to, a liquid crystal display (LCD), a light emitting diode (LED) display, and a plasma display. In some embodiments, the display device may be a touch screen.
- Various implementations of the systems and techniques described herein can be implemented in digital electronic circuit systems, integrated circuit systems, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: being implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, the programmable processor It can be a dedicated or general-purpose programmable processor that can receive data and instructions from the storage system, at least one input device, and at least one output device, and transmit the data and instructions to the storage system, the at least one input device, and the at least one output device. An output device.
- machine-readable medium and “computer-readable medium” refer to any computer program product, device, and/or device used to provide machine instructions and/or data to a programmable processor ( For example, magnetic disks, optical disks, memory, programmable logic devices (PLD)), including machine-readable media that receive machine instructions as machine-readable signals.
- machine-readable signal refers to any signal used to provide machine instructions and/or data to a programmable processor.
- the systems and techniques described here can be implemented on a computer that has: a display device for displaying information to the user (for example, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) ); and a keyboard and a pointing device (for example, a mouse or a trackball) through which the user can provide input to the computer.
- a display device for displaying information to the user
- LCD liquid crystal display
- keyboard and a pointing device for example, a mouse or a trackball
- Other types of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (for example, visual feedback, auditory feedback, or tactile feedback); and can be in any form (including Acoustic input, voice input, or tactile input) to receive input from the user.
- the systems and technologies described herein can be implemented in a computing system that includes back-end components (for example, as a data server), or a computing system that includes middleware components (for example, an application server), or a computing system that includes front-end components (for example, A user computer with a graphical user interface or a web browser through which the user can interact with the implementation of the system and technology described herein), or includes such back-end components, middleware components, Or any combination of front-end components in a computing system.
- the components of the system can be connected to each other through any form or medium of digital data communication (for example, a communication network). Examples of communication networks include: local area network (LAN), wide area network (WAN), and the Internet.
- the computer system can include clients and servers.
- the client and server are generally far away from each other and usually interact through a communication network.
- the relationship between the client and the server is generated through computer programs that run on the corresponding computers and have a client-server relationship with each other.
- the detection frame corresponding to each vehicle in the target image is directly obtained, and the corresponding detection frame of each vehicle is extracted from the detection frame corresponding to each vehicle. Then, the image characteristics of each pixel in the pixel set corresponding to each vehicle are processed to determine the characteristics of each vehicle in the target image, and then according to the characteristics of each vehicle in the target image, and the historical image The matching degree between the characteristics of each vehicle in the target image determines the trajectory of each vehicle in the target image.
- the other objects contained in the target image are directly filtered out, and the detection frame corresponding to the vehicle in the target image is obtained in real time for subsequent processing, thereby improving the efficiency of vehicle tracking and real-time performance.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
- Traffic Control Systems (AREA)
Abstract
Description
Claims (14)
- 一种车辆跟踪方法,其中,包括:从车辆行驶过程中采集的视频流中提取当前时刻的目标图像;对所述目标图像进行实例分割,以获取所述目标图像中各个车辆对应的检测框;从每个所述车辆对应的检测框内提取每个车辆对应的像素点集合;对所述每个车辆对应的像素点集合中各个像素点的图像特征进行处理,以确定所述目标图像中每个车辆的特征;以及根据所述目标图像中每个车辆的特征,与历史图像中各个车辆的特征间的匹配度,确定所述目标图像中每个车辆的运行轨迹,其中,所述历史图像为所述视频流中与所述目标图像相邻的前n帧图像,n为正整数。
- 如权利要求1所述的方法,其中,每个车辆对应的检测框内包括掩膜区域及非掩膜区域,其中,所述从每个车辆对应的检测框内提取每个车辆对应的像素点集合,包括:从每个车辆对应的检测框内的掩膜区域中提取第一像素点子集合;以及从每个车辆对应的检测框内的非掩膜区域提取第二像素点子集合。
- 如权利要求2所述的方法,其中,所述对所述每个车辆对应的像素点集合中各个像素点的图像特征进行处理,包括:利用预设的点云模型中的第一编码器,对每个车辆对应的第一像素点子集合中各个像素点的图像特征进行编码处理,以确定每个车辆对应的第一向量;利用所述预设的点云模型中的第二编码器,对每个车辆对应的第二像素点子集合中各个像素点的图像特征进行编码处理,以确定每个车辆对应的第二向量;以及利用所述预设的点云模型中的解码器,对每个车辆对应的第一向量及第二向量进行解码处理,以确定每个车辆的特征。
- 如权利要求2或3所述的方法,其中,所述第一像素点子集合中包括的像素点数量,与所述第二像素点子集合中包括的像素点数量相同。
- 如权利要求1-4任一所述的方法,其中,所述对所述目标图像进行实例分割,以获取所述目标图像中各个车辆对应的检测框,包括:基于所述目标图像中各个像素点的特征,将所述目标图像中的像素点进行聚类处理, 以根据聚类结果确定所述目标图像中各个车辆对应的检测框。
- 如权利要求1-5任一所述的方法,其中,所述根据所述目标图像中每个车辆的特征,与历史图像中各个车辆的特征间的匹配度,确定所述目标图像中每个车辆的运行轨迹,包括:若所述目标图像中第一车辆的特征,与历史图像中第二车辆的特征间的匹配度大于阈值,则根据所述目标图像的获取位置及获取时刻,对所述第二车辆的运行轨迹进行更新。
- 一种车辆跟踪装置,其中,包括:第一提取模块,用于从车辆行驶过程中采集的视频流中提取当前时刻的目标图像;实例分割模块,用于对所述目标图像进行实例分割,以获取所述目标图像中各个车辆对应的检测框;第二提取模块,用于从每个所述车辆对应的检测框内提取每个车辆对应的像素点集合;第一确定模块,用于对所述每个车辆对应的像素点集合中各个像素点的图像特征进行处理,以确定所述目标图像中每个车辆的特征;以及第二确定模块,用于根据所述目标图像中每个车辆的特征,与历史图像中各个车辆的特征间的匹配度,确定所述目标图像中每个车辆的运行轨迹,其中,所述历史图像为所述视频流中与所述目标图像相邻的前n帧图像,n为正整数。
- 如权利要求7所述的装置,其中,每个车辆对应的检测框内包括掩膜区域及非掩膜区域,其中,第二提取模块,包括:第一提取单元,用于从每个车辆对应的检测框内的掩膜区域中提取第一像素点子集合;以及第二提取单元,用于从每个车辆对应的检测框内的非掩膜区域提取第二像素点子集合。
- 如权利要求8所述的装置,其中,所述第一确定模块,包括:第一确定单元,用于利用预设的点云模型中的第一编码器,对每个车辆对应的第一像素点子集合中各个像素点的图像特征进行编码处理,以确定每个车辆对应的第一向量;第二确定单元,用于利用所述预设的点云模型中的第二编码器,对每个车辆对应的第二像素点子集合中各个像素点的图像特征进行编码处理,以确定每个车辆对应的第二向量;以及第三确定单元,用于利用所述预设的点云模型中的解码器,对每个车辆对应的第一向 量及第二向量进行解码处理,以确定每个车辆的特征。
- 如权利要求8或9所述的装置,其中,所述第一像素点子集合中包括的像素点数量,与所述第二像素点子集合中包括的像素点数量相同。
- 如权利要求7-10任一所述的装置,其中,所述实例分割模块,包括:聚类处理单元,用于基于所述目标图像中各个像素点的特征,将所述目标图像中的像素点进行聚类处理,以根据聚类结果确定所述目标图像中各个车辆对应的检测框。
- 如权利要求7-11任一所述的装置,其中,所述第二确定模块,包括:更新单元,用于在所述目标图像中第一车辆的特征,与历史图像中第二车辆的特征间的匹配度大于阈值时,根据所述目标图像的获取位置及获取时刻,对所述第二车辆的运行轨迹进行更新。
- 一种电子设备,其中,包括:至少一个处理器;以及与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-6中任一项所述的方法。
- 一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于使所述计算机执行权利要求1-6中任一项所述的方法。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020227025961A KR20220113829A (ko) | 2020-05-29 | 2020-10-30 | 차량 추적 방법, 장치 및 전자 기기 |
US17/995,752 US20230186486A1 (en) | 2020-05-29 | 2020-10-30 | Vehicle tracking method and apparatus, and electronic device |
EP20938232.4A EP4116867A4 (en) | 2020-05-29 | 2020-10-30 | VEHICLE TRACKING METHOD AND APPARATUS AND ELECTRONIC DEVICE |
JP2022545432A JP7429796B2 (ja) | 2020-05-29 | 2020-10-30 | 車両追跡方法、装置及び電子機器 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010478496.9 | 2020-05-29 | ||
CN202010478496.9A CN111709328B (zh) | 2020-05-29 | 2020-05-29 | 车辆跟踪方法、装置及电子设备 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021238062A1 true WO2021238062A1 (zh) | 2021-12-02 |
Family
ID=72537343
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/125446 WO2021238062A1 (zh) | 2020-05-29 | 2020-10-30 | 车辆跟踪方法、装置及电子设备 |
Country Status (6)
Country | Link |
---|---|
US (1) | US20230186486A1 (zh) |
EP (1) | EP4116867A4 (zh) |
JP (1) | JP7429796B2 (zh) |
KR (1) | KR20220113829A (zh) |
CN (1) | CN111709328B (zh) |
WO (1) | WO2021238062A1 (zh) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114463705A (zh) * | 2022-02-07 | 2022-05-10 | 厦门市执象智能科技有限公司 | 一种基于视频流行为轨迹自动识别检测方法 |
CN114973169A (zh) * | 2022-08-02 | 2022-08-30 | 山东建筑大学 | 基于多目标检测和跟踪的车辆分类计数方法及系统 |
CN115050190A (zh) * | 2022-06-13 | 2022-09-13 | 天翼数字生活科技有限公司 | 一种道路车辆监控方法及其相关装置 |
CN116091552A (zh) * | 2023-04-04 | 2023-05-09 | 上海鉴智其迹科技有限公司 | 基于DeepSORT的目标跟踪方法、装置、设备及存储介质 |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111709328B (zh) * | 2020-05-29 | 2023-08-04 | 北京百度网讯科技有限公司 | 车辆跟踪方法、装置及电子设备 |
CN112270244A (zh) * | 2020-10-23 | 2021-01-26 | 平安科技(深圳)有限公司 | 目标物违规监控方法、装置、电子设备及存储介质 |
CN112489450B (zh) * | 2020-12-21 | 2022-07-08 | 阿波罗智联(北京)科技有限公司 | 交通路口处的车辆流量控制方法、路侧设备及云控平台 |
CN112987764B (zh) * | 2021-02-01 | 2024-02-20 | 鹏城实验室 | 降落方法、装置、无人机以及计算机可读存储介质 |
CN113160272B (zh) * | 2021-03-19 | 2023-04-07 | 苏州科达科技股份有限公司 | 目标跟踪方法、装置、电子设备及存储介质 |
CN113901911B (zh) * | 2021-09-30 | 2022-11-04 | 北京百度网讯科技有限公司 | 图像识别、模型训练方法、装置、电子设备及存储介质 |
CN114004864A (zh) * | 2021-10-29 | 2022-02-01 | 北京百度网讯科技有限公司 | 对象追踪方法、相关装置及计算机程序产品 |
CN114067270B (zh) * | 2021-11-18 | 2022-09-09 | 华南理工大学 | 一种车辆追踪方法和装置、计算机设备及存储介质 |
CN114155278A (zh) * | 2021-11-26 | 2022-03-08 | 浙江商汤科技开发有限公司 | 目标跟踪及相关模型的训练方法和相关装置、设备、介质 |
CN117237418B (zh) * | 2023-11-15 | 2024-01-23 | 成都航空职业技术学院 | 一种基于深度学习的运动目标检测方法和系统 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109389671A (zh) * | 2018-09-25 | 2019-02-26 | 南京大学 | 一种基于多阶段神经网络的单图像三维重建方法 |
CN109816686A (zh) * | 2019-01-15 | 2019-05-28 | 山东大学 | 基于物体实例匹配的机器人语义slam方法、处理器及机器人 |
CN110956643A (zh) * | 2019-12-04 | 2020-04-03 | 齐鲁工业大学 | 一种基于MDNet的改进车辆跟踪方法及系统 |
CN111709328A (zh) * | 2020-05-29 | 2020-09-25 | 北京百度网讯科技有限公司 | 车辆跟踪方法、装置及电子设备 |
Family Cites Families (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5344618B2 (ja) | 2009-11-30 | 2013-11-20 | 住友電気工業株式会社 | 移動体追跡装置、追跡方法及びコンピュータプログラム |
JP5036084B2 (ja) * | 2010-10-14 | 2012-09-26 | シャープ株式会社 | 映像処理装置、映像処理方法、及びプログラム |
KR101382902B1 (ko) * | 2012-06-29 | 2014-04-23 | 엘지이노텍 주식회사 | 차선이탈 경고 시스템 및 차선이탈 경고 방법 |
US9070289B2 (en) | 2013-05-10 | 2015-06-30 | Palo Alto Research Incorporated | System and method for detecting, tracking and estimating the speed of vehicles from a mobile platform |
CN104183127B (zh) * | 2013-05-21 | 2017-02-22 | 北大方正集团有限公司 | 交通监控视频检测方法和装置 |
US9824434B2 (en) * | 2015-08-18 | 2017-11-21 | Industrial Technology Research Institute | System and method for object recognition |
JP6565661B2 (ja) | 2015-12-17 | 2019-08-28 | 富士通株式会社 | 画像処理システム、画像類似判定方法および画像類似判定プログラム |
US10679351B2 (en) * | 2017-08-18 | 2020-06-09 | Samsung Electronics Co., Ltd. | System and method for semantic segmentation of images |
CN107909005A (zh) * | 2017-10-26 | 2018-04-13 | 西安电子科技大学 | 基于深度学习的监控场景下人物姿态识别方法 |
CN108053427B (zh) * | 2017-10-31 | 2021-12-14 | 深圳大学 | 一种基于KCF与Kalman的改进型多目标跟踪方法、系统及装置 |
US11100352B2 (en) * | 2018-10-16 | 2021-08-24 | Samsung Electronics Co., Ltd. | Convolutional neural network for object detection |
CN109993091B (zh) * | 2019-03-25 | 2020-12-15 | 浙江大学 | 一种基于背景消除的监控视频目标检测方法 |
CN110349138B (zh) * | 2019-06-28 | 2021-07-27 | 歌尔股份有限公司 | 基于实例分割框架的目标物体的检测方法及装置 |
CN110895810B (zh) * | 2019-10-24 | 2022-07-05 | 中科院广州电子技术有限公司 | 基于改进Mask RCNN的染色体图像实例分割方法及装置 |
-
2020
- 2020-05-29 CN CN202010478496.9A patent/CN111709328B/zh active Active
- 2020-10-30 JP JP2022545432A patent/JP7429796B2/ja active Active
- 2020-10-30 WO PCT/CN2020/125446 patent/WO2021238062A1/zh unknown
- 2020-10-30 US US17/995,752 patent/US20230186486A1/en active Pending
- 2020-10-30 EP EP20938232.4A patent/EP4116867A4/en not_active Withdrawn
- 2020-10-30 KR KR1020227025961A patent/KR20220113829A/ko not_active Application Discontinuation
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109389671A (zh) * | 2018-09-25 | 2019-02-26 | 南京大学 | 一种基于多阶段神经网络的单图像三维重建方法 |
CN109816686A (zh) * | 2019-01-15 | 2019-05-28 | 山东大学 | 基于物体实例匹配的机器人语义slam方法、处理器及机器人 |
CN110956643A (zh) * | 2019-12-04 | 2020-04-03 | 齐鲁工业大学 | 一种基于MDNet的改进车辆跟踪方法及系统 |
CN111709328A (zh) * | 2020-05-29 | 2020-09-25 | 北京百度网讯科技有限公司 | 车辆跟踪方法、装置及电子设备 |
Non-Patent Citations (2)
Title |
---|
PAUL VOIGTLAENDER; MICHAEL KRAUSE; ALJOSA OSEP; JONATHON LUITEN; BERIN BALACHANDAR GNANA SEKAR; ANDREAS GEIGER; BASTIAN LEIBE: "MOTS: Multi-Object Tracking and Segmentation", ARXIV.ORG, 10 February 2019 (2019-02-10), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081027401 * |
See also references of EP4116867A4 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114463705A (zh) * | 2022-02-07 | 2022-05-10 | 厦门市执象智能科技有限公司 | 一种基于视频流行为轨迹自动识别检测方法 |
CN115050190A (zh) * | 2022-06-13 | 2022-09-13 | 天翼数字生活科技有限公司 | 一种道路车辆监控方法及其相关装置 |
CN115050190B (zh) * | 2022-06-13 | 2024-01-23 | 天翼数字生活科技有限公司 | 一种道路车辆监控方法及其相关装置 |
CN114973169A (zh) * | 2022-08-02 | 2022-08-30 | 山东建筑大学 | 基于多目标检测和跟踪的车辆分类计数方法及系统 |
CN116091552A (zh) * | 2023-04-04 | 2023-05-09 | 上海鉴智其迹科技有限公司 | 基于DeepSORT的目标跟踪方法、装置、设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
EP4116867A1 (en) | 2023-01-11 |
KR20220113829A (ko) | 2022-08-16 |
EP4116867A4 (en) | 2024-02-07 |
JP2023511455A (ja) | 2023-03-17 |
CN111709328B (zh) | 2023-08-04 |
CN111709328A (zh) | 2020-09-25 |
JP7429796B2 (ja) | 2024-02-08 |
US20230186486A1 (en) | 2023-06-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021238062A1 (zh) | 车辆跟踪方法、装置及电子设备 | |
US20200364443A1 (en) | Method for acquiring motion track and device thereof, storage medium, and terminal | |
Zhou et al. | Joint 3d instance segmentation and object detection for autonomous driving | |
Spencer et al. | Defeat-net: General monocular depth via simultaneous unsupervised representation learning | |
Min et al. | A new approach to track multiple vehicles with the combination of robust detection and two classifiers | |
Ashraf et al. | Dogfight: Detecting drones from drones videos | |
CN111832568B (zh) | 车牌识别方法、车牌识别模型的训练方法和装置 | |
US8620026B2 (en) | Video-based detection of multiple object types under varying poses | |
CN112528786B (zh) | 车辆跟踪方法、装置及电子设备 | |
WO2016034059A1 (zh) | 基于颜色-结构特征的目标对象跟踪方法 | |
Zhang et al. | Deep learning in lane marking detection: A survey | |
WO2021082168A1 (zh) | 一种场景图像中特定目标对象的匹配方法 | |
Varghese et al. | An efficient algorithm for detection of vacant spaces in delimited and non-delimited parking lots | |
CN113361344B (zh) | 视频事件识别方法、装置、设备及存储介质 | |
CN111767831B (zh) | 用于处理图像的方法、装置、设备及存储介质 | |
Rabiee et al. | Crowd behavior representation: an attribute-based approach | |
CN112561053B (zh) | 图像处理方法、预训练模型的训练方法、装置和电子设备 | |
CN103942778A (zh) | 一种主成分特征曲线分析的快速视频关键帧提取方法 | |
US20230095533A1 (en) | Enriched and discriminative convolutional neural network features for pedestrian re-identification and trajectory modeling | |
Yao et al. | Coupled multivehicle detection and classification with prior objectness measure | |
Ou et al. | FAMN: feature aggregation multipath network for small traffic sign detection | |
KR101826669B1 (ko) | 동영상 검색 시스템 및 그 방법 | |
Liao et al. | Multi-scale saliency features fusion model for person re-identification | |
Antonio et al. | Pedestrians' detection methods in video images: A literature review | |
Zuo et al. | Road model prediction based unstructured road detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20938232 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2022545432 Country of ref document: JP Kind code of ref document: A Ref document number: 20227025961 Country of ref document: KR Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 2020938232 Country of ref document: EP Effective date: 20221005 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |