WO2020252974A1 - 一种针对运动状态下的多目标对象追踪方法和装置 - Google Patents
一种针对运动状态下的多目标对象追踪方法和装置 Download PDFInfo
- Publication number
- WO2020252974A1 WO2020252974A1 PCT/CN2019/108432 CN2019108432W WO2020252974A1 WO 2020252974 A1 WO2020252974 A1 WO 2020252974A1 CN 2019108432 W CN2019108432 W CN 2019108432W WO 2020252974 A1 WO2020252974 A1 WO 2020252974A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- video
- target object
- target
- adjacent
- video frames
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
- G06T7/248—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/285—Analysis of motion using a sequence of stereo image pairs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/90—Determination of colour characteristics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30221—Sports video; Sports image
Definitions
- the embodiments of the present application relate to the field of artificial intelligence technology, and in particular to a method and device for tracking multiple target objects in a motion state.
- Computer vision technology is a technology that studies how to make machines "see”. Cameras and computer equipment can be used to replace human eyes with machine vision processing technologies such as real-time recognition, positioning, tracking, and measurement of target objects.
- the image is analyzed and processed by computer equipment, so that the data obtained by the camera is more suitable for human observation or image information sent to the instrument for detection.
- a camera it is usually necessary to use a camera to track multiple players on the court at the same time, so that the user can switch to the player's corresponding tracking shooting angle or obtain the player's movement track data on the court at any time according to needs. Therefore, how to achieve rapid and precise positioning and tracking of the target object when the video capture device and the target object are both in motion, has become a technical problem to be solved urgently.
- the technical means usually used in the prior art is to judge the position similarity of target objects in video frames based on 2D image recognition technology, and determine whether the target objects in adjacent video frames are the same target object. Furthermore, the positioning and tracking of the target object and the movement track of the target object are obtained.
- the technical means usually used in the prior art is to judge the position similarity of target objects in video frames based on 2D image recognition technology, and determine whether the target objects in adjacent video frames are the same target object. Furthermore, the positioning and tracking of the target object and the movement track of the target object are obtained.
- there are often changes in the pose of the video capture device itself resulting in poor actual tracking and shooting effects of the target object in the prior art, and identification errors are prone to occur. Meet the needs of current users.
- the embodiments of the present application provide a tracking method for multiple target objects in a motion state, so as to solve the problems of low efficiency and poor accuracy in the recognition and tracking of multiple target objects in a video in the prior art.
- a tracking method for multiple target objects in a motion state including: obtaining video frames included in video data collected in a video capturing device; sending the video frames to a preset feature recognition model , Determine the feature detection area corresponding to the target object in the video frame, extract the color feature of the target object from the detection area, and compare the color feature of the target object in the adjacent video frames , Obtain the first comparison result; determine the position information of the identification part of the target object in the adjacent video frame in the target coordinate system, and place the identification part in the adjacent video frame in the target coordinate system To compare the position information of to obtain a second comparison result; according to the first comparison result and the second comparison result, determine whether the target object in the adjacent video frame is the same target object, If so, the target object in the adjacent video frames is taken as the same target object for tracking.
- the determining the position information of the identification part of the target object in the adjacent video frames in the target coordinate system specifically includes: predicting the positions of the video acquisition devices corresponding to the adjacent video frames.
- the pose change information obtain the pose change information of the video acquisition device corresponding to the adjacent video frames; according to the pose change information and the video corresponding to the previous video frame in the adjacent video frames
- the location information of the collection device is determined, and the location information of the video collection device corresponding to the next video frame in the adjacent video frames is determined;
- the identification position of the target object is obtained by triangulation method to obtain the position information of the identification position of the target object in the spatial rectangular coordinate system constructed with the spatial coordinate origin of the video capture device; the position information of the target object is obtained through coordinate transformation The position information of the identified part in the target coordinate system.
- the method for tracking multiple target objects in a motion state further includes: determining the actual motion area of the target object in the video frame; taking the actual motion area of the target object in the video frame as For the area to be detected, the feature detection area outside the area to be detected is filtered out to obtain the feature detection area within the area to be detected.
- the identification part is the neck part of the target object; correspondingly, the position information of the identification part of the target object in the target coordinate system is that the neck part of the target object is located in the to-be-detected
- the center of the area is the position information in the space rectangular coordinate system constructed by the origin of the space coordinates.
- the video data collected in the video acquisition device is obtained, and the video data is segmented to obtain the video segments included in the video data; the feature similarity between the video segments is detected, and the A video clip whose feature similarity reaches or exceeds a preset similarity threshold and a time interval does not exceed the preset time threshold is taken as a video shot; and the video frames included in the video shot are acquired.
- an embodiment of the present application also provides a tracking device for multiple target objects in a moving state, including: a video frame obtaining unit, configured to obtain video frames included in the video data collected in the video capture device; first comparison Unit, configured to send the video frame to a preset feature recognition model, determine the feature detection area corresponding to the target object in the video frame, extract the color feature of the target object from the detection area, and compare the corresponding The color features of the target object in the adjacent video frames are compared to obtain a first comparison result; a second comparison unit is used to determine that the identification position of the target object in the adjacent video frames is The position information in the target coordinate system is used to compare the position information of the identification parts in the adjacent video frames in the target coordinate system to obtain a second comparison result; The comparison result and the second comparison result determine whether the target objects in the adjacent video frames are the same target object, and if so, the target objects in the adjacent video frames are regarded as the same target object follow it up.
- the determining the position information of the identification part of the target object in the adjacent video frames in the target coordinate system specifically includes: predicting the positions of the video acquisition devices corresponding to the adjacent video frames.
- the pose change information obtain the pose change information of the video acquisition device corresponding to the adjacent video frames; according to the pose change information and the video corresponding to the previous video frame in the adjacent video frames
- the location information of the collection device is determined, and the location information of the video collection device corresponding to the next video frame in the adjacent video frames is determined;
- the identification position of the target object is obtained by triangulation method to obtain the position information of the identification position of the target object in the spatial rectangular coordinate system constructed with the spatial coordinate origin of the video capture device; the position information of the target object is obtained through coordinate transformation The position information of the identified part in the target coordinate system.
- the device for tracking multiple target objects in a motion state further includes: a motion area determining unit, configured to determine the actual motion area of the target object in the video frame; and a filtering unit, configured to compare all The actual motion area of the target object in the video frame is used as the area to be detected, and the feature detection area outside the area to be detected is filtered out to obtain the feature detection area within the area to be detected.
- the identification part is the neck part of the target object; correspondingly, the position information of the identification part of the target object in the target coordinate system is that the neck part of the target object is located in the to-be-detected
- the center of the area is the position information in the space rectangular coordinate system constructed by the origin of the space coordinates.
- the obtaining the video frames included in the video data collected in the video collecting device specifically includes: obtaining the video data collected in the video collecting device, and performing segmentation processing on the video data to obtain the video Video clips included in the data; detecting the feature similarity between the video clips, and using the video clips whose feature similarity reaches or exceeds a preset similarity threshold and the time interval does not exceed the preset time threshold as a video shot; Obtain the video frames included in the video shot.
- the present application also provides an electronic device, including: a processor and a memory; wherein the memory is used to store a program for the tracking method for multiple target objects in a motion state, and the device is powered on and runs through the processor After the procedure for the multi-target tracking method in motion, the following steps are performed:
- the position information of the marking part in the target coordinate system is compared with the position information of the marking part in the adjacent video frames in the target coordinate system to obtain a second comparison result; according to the first comparison result and
- the second comparison result determines whether the target object in the adjacent video frame is the same target object, and if so, the target object in the adjacent video frame is tracked as the same target object.
- the present application also provides a storage device that stores a program for the multi-target object tracking method in a motion state, and the program is run by the processor to perform the following steps:
- the position information of the marking part in the target coordinate system is compared with the position information of the marking part in the adjacent video frames in the target coordinate system to obtain a second comparison result; according to the first comparison result and
- the second comparison result determines whether the target object in the adjacent video frame is the same target object, and if so, the target object in the adjacent video frame is tracked as the same target object.
- Using the method for tracking multiple target objects in motion as described in this application can simultaneously quickly identify and track multiple target objects in motion, which improves the recognition of multiple target objects in motion in video data. And tracking accuracy, thereby enhancing the user experience.
- FIG. 1 is a flowchart of a method for tracking multiple target objects in a motion state according to an embodiment of the application
- FIG. 2 is a schematic diagram of a tracking device for multiple target objects in a motion state provided by an embodiment of the application
- FIG. 3 is a schematic diagram of locating a target object using a triangulation method according to an embodiment of the application
- FIG. 4 is a schematic diagram of an electronic device provided by an embodiment of the application.
- FIG. 1 it is a flowchart of a method for tracking multiple targets in a motion state provided by an embodiment of this application.
- the specific implementation process includes the following steps:
- Step S101 Obtain video frames included in the video data collected in the video collection device.
- the video acquisition device includes video data acquisition equipment such as cameras, video recorders, and image sensors.
- the video data is video data contained in an independent shot.
- an independent lens is video data obtained by a continuous shooting process of the video capture device, and the video data is composed of video frames, and a group of continuous video frames can form a lens.
- a complete video data may include multiple shots, and the acquisition of the video frames included in the video data collected in the video collection device can be specifically implemented in the following ways:
- Video frames To obtain the video data collected in the video capture device, before obtaining the video frame contained in one of the shots, it is necessary to first perform shot segmentation on the complete video data based on the global and local features of the video frame to obtain a series of independent Video clips. The similarity between the video clips is detected, and the video clips whose similarity reaches or exceeds the preset similarity threshold and the time interval does not exceed the preset time threshold are taken as a video shot, and then the video shots contained in the video shots are obtained Video frames.
- the color characteristics of the video frames contained in different shots usually have obvious differences.
- the color feature extraction algorithm can extract the RGB or HSV color histogram of each video frame in the video data, and then use the window function to calculate the probability distribution of the first half and the second half of the video frame. If the two probabilities are different, the current The center of the window is the lens boundary.
- the shot segmentation of the complete video data based on the global features and local features of the video frame can be specifically implemented through the following process:
- Global feature analysis Calculate the first similarity between adjacent video frames of the video data based on the color features of adjacent video frames, compare the first similarity with a first similarity threshold, if the first similarity If it is less than the first similarity threshold, the video frame is regarded as a candidate video frame of an independent shot.
- Local feature analysis Calculate the distance value between the descriptor of the key point in the candidate video frame and the previous video frame to each visual word respectively, and correspond the descriptor with the visual word with the smallest distance value of the visual word, based on Descriptors and corresponding visual words, respectively construct the visual word histograms of the candidate video frame and the previous frame, and calculate the second similarity between the visual word histograms of the video frames.
- Shot segmentation step judge the second similarity, if the second similarity is greater than or equal to the second similarity threshold, merge the candidate video frame and the previous frame into the same shot, if all If the second similarity is less than the second similarity threshold, the candidate video frame is determined as the starting video frame of the new shot.
- Step S102 Send the video frame to a preset feature recognition model, determine the feature detection area corresponding to the target object in the video frame, extract the color feature of the target object from the feature detection area, and compare The color features of the target object in the adjacent video frames are compared to obtain a first comparison result.
- step S101 After obtaining the video frames included in the video data collected by the video capture device in the above step S101, data preparation is done for this step to compare the color features of the target object in the adjacent video frames.
- step S102 the color feature of the target object may be extracted from the video frame, and the color feature of the target object in the adjacent video frames may be further compared to obtain a first comparison result.
- the feature recognition model may refer to a Faster RCNN deep neural network model obtained through iterative training in advance.
- the feature detection area may refer to the detection frame corresponding to each target object in the video frame obtained in the process of using the Faster RCNN deep neural network model to detect the target object on the video frame.
- RGB red, green, blue
- HSV Human Saturation Value, hue, saturation, lightness
- the feature detection area corresponding to the target object in the video frame there may be a detection area generated corresponding to the non-target object in the final detection result (that is, the monitoring frame corresponding to the non-target object), At this time, the above detection results need to be filtered in advance, and only the feature detection area corresponding to the target object (that is, the monitoring frame corresponding to the target object) is retained.
- the specific implementation is as follows:
- the actual motion area is the movement area of the target object.
- the difference between the color features of the stadium floor and the color features of the audience can be used to differentiate and filter through the threshold filtering method to obtain an image that only contains the stadium, and a series of processing operations such as corrosion and expansion of the stadium image are further performed.
- Obtain the outer contour of the court (the area enclosed by the outer contour is the actual movement area of the target object), filter out the detection frame outside the outer contour of the court, and only keep the area enclosed by the outer contour ( Namely: stay in the field) the detection frame.
- Step S103 Determine the position information of the identification part of the target object in the adjacent video frame in the target coordinate system, and compare the position information of the identification part in the adjacent video frame in the target coordinate system Yes, get the second comparison result.
- this step can further determine the position information of the identification part of the target object in the adjacent video frames in the target coordinate system, and combine all the adjacent video frames The position information of the marking part in the target coordinate system is compared to obtain a second comparison result.
- the target coordinate system may refer to the world coordinate system, which may refer to the absolute coordinate system of the video screen, and the coordinates of the points corresponding to the identification parts of all target objects in the video screen can be
- the world coordinate system is used to determine the specific location of each target object.
- the world coordinate system may refer to a spatial rectangular coordinate system constructed with the center of the detection area as the origin of the spatial coordinate system.
- point P can refer to the position of the point corresponding to the neck of the target object
- point Q1 can refer to the position of the point corresponding to the video capture device in the previous video frame, or it can refer to the point corresponding to the video capture device The position of the point in the previous shot
- Q2 point can refer to the position of the point corresponding to the video capture device in the next video frame relative to the previous video, or the point corresponding to the video capture device The position in the next lens relative to the previous lens.
- the determination of the position information of the identification part of the target object in the adjacent video frame in the target coordinate system may be specifically implemented in the following manner:
- the visual mileage calculation method can be used to predict the pose change of the video acquisition device, and the corresponding adjacent video frames can be obtained through prediction.
- the position and posture change information of the video capture device is further obtained, and the posture change information of the video capture device corresponding to the adjacent video frames is obtained. According to the posture change information, the position information of the video capture devices corresponding to the adjacent video frames can be determined.
- the position information of the video acquisition device in the previous video frame in the adjacent video frames may be recorded as the first position, and the position information of the video acquisition device in the next video frame in the adjacent video frames Recorded as the second position.
- the target object can be obtained by calculation using the triangulation method shown in FIG.
- the position information in the spatial rectangular coordinate system constructed with the video capture device as the spatial coordinate origin can be further obtained by coordinate transformation to obtain the position information of the target object in the target coordinate system (ie, the world coordinate system).
- the posture changes include changes in motion trajectories and activity postures.
- the identification part may select the neck part of the target object.
- the position information of the identification part of the target object in the target coordinate system is the position information of the neck part in the spatial rectangular coordinate system constructed with the center of the area to be detected as the spatial coordinate origin.
- a bone detection algorithm can be used to obtain the point P corresponding to the neck of each target object.
- Step S104 According to the first comparison result and the second comparison result, determine whether the target objects in the adjacent video frames are the same target object, and if so, compare the The target object is tracked as the same target object.
- this step can determine all the adjacent video frames based on the first comparison result and the second comparison result. Whether the target object is the same target object, so as to realize real-time positioning and tracking of the target object.
- the target object in the adjacent video frames it is determined whether the similarity value of the target object in the adjacent video frames meets a preset similarity threshold according to the first comparison result and the second comparison result. , The target object in the adjacent video frames is taken as the same target object for positioning and tracking.
- the similarity function is defined as follows:
- the color feature similarity of the corresponding target objects in each video frame is Sim(b i , b j ); Sim(P i , P j ) is the square of the Euclidean distance between the two points P i and P j .
- the similarity threshold T Pre-set the similarity threshold T.
- the similarity Sim(player i ,player j ) of the target object in two adjacent video frames is equal to or greater than T
- the two adjacent video frames can be regarded as the same target object, and
- the target objects in two adjacent video frames are identified as the same target object, and the trajectories are merged to realize accurate target object recognition and tracking.
- multiple target objects in motion can be quickly identified and tracked at the same time, which improves tracking of multiple target objects in motion in video data.
- the present application also provides a device for tracking multiple target objects in a motion state. Since the embodiment of the device is similar to the above method embodiment, the description is relatively simple. For related details, please refer to the description of the above method embodiment section.
- the following describes an implementation of a multi-target object tracking device in motion The examples are only illustrative. Please refer to FIG. 2, which is a schematic diagram of a multi-target object tracking device in a motion state provided by an embodiment of this application.
- the multi-target object tracking device in motion state described in this application includes the following parts:
- the video frame obtaining unit 201 is configured to obtain video frames contained in the video data collected in the video collecting device.
- the video acquisition device includes video data acquisition equipment such as cameras, video recorders, and image sensors.
- the video data is video data contained in an independent shot.
- an independent lens is video data obtained by a continuous shooting process of the video capture device, and the video data is composed of video frames, and a group of continuous video frames can form a lens.
- a complete video data may include multiple shots, and the acquisition of the video frames included in the video data collected in the video collection device can be specifically implemented in the following ways:
- Video frames To obtain the video data collected in the video capture device, before obtaining the video frame contained in one of the shots, it is necessary to first perform shot segmentation on the complete video data based on the global and local features of the video frame to obtain a series of independent Video clips. The similarity between the video clips is detected, and the video clips whose similarity reaches or exceeds the preset similarity threshold and the time interval does not exceed the preset time threshold are taken as a video shot, and then the video shots contained in the video shots are obtained Video frames.
- the color characteristics of the video frames contained in different shots usually have obvious differences.
- the color feature extraction algorithm can extract the RGB or HSV color histogram of each video frame in the video data, and then use the window function to calculate the probability distribution of the first half and the second half of the video frame. If the two probabilities are different, the current The center of the window is the lens boundary.
- the first comparison unit 202 is configured to send the video frame to a preset feature recognition model, determine the feature detection area corresponding to the target object in the video frame, and extract the target object from the detection area
- the color feature is to compare the color features of the target object in the adjacent video frames to obtain a first comparison result.
- the feature recognition model may refer to the Faster RCNN deep neural network model.
- the feature detection area may refer to the detection frame of each target object in the video frame obtained by using the Faster RCNN deep neural network model to perform target object detection on the video frame.
- RGB red, green, blue
- HSV Hue, Saturation, Value, hue, saturation, brightness
- the second comparison unit 203 is configured to determine the position information of the identification part of the target object in the adjacent video frame in the target coordinate system, and place the identification part in the adjacent video frame in the target coordinate system. Compare the location information in to obtain the second comparison result.
- the target coordinate system may refer to the world coordinate system, which may refer to the absolute coordinate system of the video screen, and the coordinates of the points corresponding to the identification parts of all target objects in the video screen can be
- the world coordinate system is used to determine the specific location of each target object.
- the world coordinate system may refer to a spatial rectangular coordinate system constructed with the center of the detection area as the origin of the spatial coordinate system.
- point P can refer to the position of the point corresponding to the neck of the target object
- point Q1 can refer to the position of the point corresponding to the video capture device in the previous video frame, or it can refer to the point corresponding to the video capture device The position of the point in the previous shot
- Q2 point can refer to the position of the point corresponding to the video capture device in the next video frame relative to the previous video, or the point corresponding to the video capture device The position in the next lens relative to the previous lens.
- the determination of the position information of the identification part of the target object in the adjacent video frame in the target coordinate system may be specifically implemented in the following manner:
- the visual mileage calculation method can be used to predict the pose change of the video acquisition device, and the corresponding adjacent video frames can be obtained through prediction.
- the position and posture change information of the video capture device is further obtained, and the posture change information of the video capture device corresponding to the adjacent video frames is obtained. According to the posture change information, the position information of the video capture devices corresponding to the adjacent video frames can be determined.
- the position information of the video acquisition device in the previous video frame in the adjacent video frames may be recorded as the first position, and the position information of the video acquisition device in the next video frame in the adjacent video frames Recorded as the second position.
- the target object can be obtained by calculation using the triangulation method shown in FIG.
- the position information in the spatial rectangular coordinate system constructed with the video capture device as the spatial coordinate origin can be further obtained by coordinate transformation to obtain the position information of the target object in the target coordinate system (ie, the world coordinate system).
- the posture changes include changes in motion trajectories and activity postures.
- the identification part may select the neck part of the target object.
- the position information of the identification part of the target object in the target coordinate system is the position information of the neck part in the spatial rectangular coordinate system constructed with the center of the area to be detected as the spatial coordinate origin.
- a bone detection algorithm can be used to obtain the point P corresponding to the neck of each target object.
- the judging unit 204 is configured to judge whether the target objects in the adjacent video frames are the same target object according to the first comparison result and the second comparison result, and if so, compare the adjacent video frames The target object mentioned in is tracked as the same target object.
- the target object in the adjacent video frames it is determined whether the similarity value of the target object in the adjacent video frames meets a preset similarity threshold according to the first comparison result and the second comparison result. , The target object in the adjacent video frames is taken as the same target object for positioning and tracking.
- the similarity function is defined as follows:
- the color feature similarity of the corresponding target objects in each video frame is Sim(b i , b j ); Sim(P i , P j ) is the square of the Euclidean distance between the two points P i and P j .
- the multi-target object tracking device in motion state described in this application can simultaneously quickly identify and track multi-target objects in motion state, which improves the tracking of multi-target objects in motion state in video data. The accuracy of, thereby enhancing the user experience.
- the present application also provides an electronic device and a storage device. Since the embodiment of the electronic device is similar to the above method embodiment, the description is relatively simple. For related details, please refer to the description of the above method embodiment.
- the following describes an embodiment of an electronic device and a storage device. The examples are only illustrative. Please refer to FIG. 4, which is a schematic diagram of an electronic device provided by an embodiment of this application.
- the present application also provides an electronic device, including: a processor 401 and a memory 402; wherein the memory 402 is used to store a program for the tracking method for multiple targets in a motion state, and the device is powered on and runs through the processor After the procedure for the multi-target tracking method in motion, the following steps are performed:
- the position information of the marking part in the target coordinate system is compared with the position information of the marking part in the adjacent video frames in the target coordinate system to obtain a second comparison result; according to the first comparison result and
- the second comparison result determines whether the target object in the adjacent video frame is the same target object, and if so, the target object in the adjacent video frame is tracked as the same target object.
- This application also provides a storage device that stores a program for the tracking method for multiple targets in a motion state, and the program is run by the processor to perform the following steps:
- the position information of the marking part in the target coordinate system is compared with the position information of the marking part in the adjacent video frames in the target coordinate system to obtain a second comparison result; according to the first comparison result and
- the second comparison result determines whether the target object in the adjacent video frame is the same target object, and if so, the target object in the adjacent video frame is tracked as the same target object.
- the processor or the processor module may be an integrated circuit chip with signal processing capability.
- the processor can be a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (Field Programmable Gate Array, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
- DSP digital signal processor
- ASIC application specific integrated circuit
- FPGA field programmable gate array
- Programmable logic devices discrete gate or transistor logic devices, discrete hardware components.
- the methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed.
- the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
- the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
- the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
- the processor reads the information in the storage medium and completes the steps of the above method in combination with its hardware.
- the storage medium may be a memory, for example, may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory.
- the non-volatile memory can be a read-only memory (Read-Only Memory, ROM for short), a programmable read-only memory (Programmable ROM, PROM for short), and an erasable programmable read-only memory (Erasable PROM, EPROM for short). , Electrically Erasable Programmable Read-Only Memory (Electrically EPROM, EEPROM for short) or flash memory.
- Read-Only Memory Read-Only Memory
- PROM programmable read-only memory
- Erasable PROM Erasable PROM, EPROM for short
- Electrically Erasable Programmable Read-Only Memory Electrically Erasable Programmable Read-Only Memory (Electrically EPROM, EEPROM for short) or flash memory.
- the volatile memory may be a random access memory (Random Access Memory, RAM for short), which is used as an external cache.
- RAM Random Access Memory
- many forms of RAM are available, such as static random access memory (Static RAM, SRAM for short), dynamic random access memory (Dynamic RAM, DRAM for short), and synchronous dynamic random access memory (Synchronous RAM).
- DRAM static random access memory
- DRAM dynamic random access memory
- DRAM dynamic random access memory
- Synchronous RAM Synchronous Dynamic Random Access Memory
- DRAM Double Data Rate Synchronous Dynamic Random Access Memory
- Enhanced SDRAM for ESDRAM
- Synch link DRAM, SLDRAM for short Synchronously Connected Dynamic Random Access Memory
- DRRAM Direct Ram bus RAM
- the storage media described in the embodiments of the present application are intended to include, but are not limited to, these and any other suitable types of memory.
- the functions described in this application can be implemented by a combination of hardware and software.
- the corresponding function can be stored in a computer-readable medium or transmitted as one or more instructions or codes on the computer-readable medium.
- the computer-readable medium includes a computer storage medium and a communication medium, where the communication medium includes any medium that facilitates the transfer of a computer program from one place to another.
- the storage medium may be any available medium that can be accessed by a general-purpose or special-purpose computer.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (10)
- 一种针对运动状态下的多目标对象追踪方法,其特征在于,包括:获得视频采集装置中采集的视频数据所包含的视频帧;将所述视频帧发送至预设的特征识别模型中,确定所述视频帧中对应目标对象的特征检测区域,从所述特征检测区域中提取所述目标对象的颜色特征,将相邻所述视频帧中所述目标对象的所述颜色特征进行比对,获得第一比对结果;确定相邻所述视频帧中所述目标对象的标识部位在目标坐标系中的位置信息,将相邻所述视频帧中的所述标识部位在所述目标坐标系中的位置信息进行比对,获得第二比对结果;根据所述第一比对结果和所述第二比对结果,判断相邻所述视频帧中的所述目标对象是否为同一目标对象,若是,则将相邻所述视频帧中所述目标对象作为同一目标对象进行追踪。
- 根据权利要求1所述的针对运动状态下的多目标对象追踪方法,其特征在于,所述确定相邻所述视频帧中所述目标对象的标识部位在目标坐标系中的位置信息,具体包括:通过预测相邻所述视频帧分别对应的所述视频采集装置的位姿变化情况,获得相邻所述视频帧中分别对应的所述视频采集装置的位姿变化信息;根据所述位姿变化信息和相邻所述视频帧中前一视频帧对应的所述视频采集装置的位置信息,确定相邻所述视频帧中后一视频帧对应的所述视频采集装置的位置信息;根据相邻所述视频帧中分别对应的所述视频采集装置的位置信息以及所述目标对象的标识部位,利用三角测量法获得所述目标对象的标识部位在以所述视频采集装置为空间坐标原点构建的空间直角坐标系中的位置信息;通过坐标变换获得所述目标对象的标识部位在目标坐标系中的位置 信息。
- 根据权利要求1所述的针对运动状态下的多目标对象追踪方法,其特征在于,还包括:确定所述视频帧中所述目标对象的实际运动区域;将所述视频帧中所述目标对象的实际运动区域作为待检测区域,对所述待检测区域之外的所述特征检测区域进行滤除,获得所述待检测区域之内的所述特征检测区域。
- 根据权利要求3所述的针对运动状态下的多目标对象追踪方法,其特征在于,所述标识部位为所述目标对象的颈部部位;相应的,所述目标对象的标识部位在目标坐标系中的位置信息为所述目标对象的颈部部位位于以所述待检测区域中心为空间坐标原点构建的空间直角坐标系中的位置信息。
- 根据权利要求1所述的针对运动状态下的多目标对象追踪方法,其特征在于,所述获得视频采集装置中采集的视频数据所包含的视频帧,具体包括:获得所述视频采集装置中采集的所述视频数据,对所述视频数据进行分割处理,获得所述视频数据所包含的视频片段;检测所述视频片段之间的特征相似度,将所述特征相似度达到或超过预设相似度阈值并且时间间隔不超过预设时间阈值的视频片段作为一个视频镜头;获取所述视频镜头中所包含的视频帧。
- 一种针对运动状态下的多目标对象追踪装置,其特征在于,包括:视频帧获得单元,用于获得视频采集装置中采集的视频数据所包含的视频帧;第一比对单元,用于将所述视频帧发送至预设的特征识别模型中,确定所述视频帧中对应目标对象的特征检测区域,从所述检测区域中提取所述目标对象的颜色特征,将相邻所述视频帧中所述目标对象的所述颜色特征进行比对,获得第一比对结果;第二比对单元,用于确定相邻所述视频帧中所述目标对象的标识部位在目标坐标系中的位置信息,将相邻所述视频帧中的所述标识部位在目标坐标系中的位置信息进行比对,获得第二比对结果;判断单元,用于根据所述第一比对结果和所述第二比对结果,判断相邻所述视频帧中的所述目标对象是否为同一目标对象,若是,则将相邻所述视频帧中所述目标对象作为同一目标对象进行追踪。
- 根据权利要求6所述的针对运动状态下的多目标对象追踪装置,其特征在于,所述确定相邻所述视频帧中所述目标对象的标识部位在目标坐标系中的位置信息,具体包括:通过预测相邻所述视频帧分别对应的所述视频采集装置的位姿变化情况,获得相邻所述视频帧中分别对应的所述视频采集装置的位姿变化信息;根据所述位姿变化信息和相邻所述视频帧中前一视频帧对应的所述视频采集装置的位置信息,确定相邻所述视频帧中后一视频帧对应的所述视频采集装置的位置信息;根据相邻所述视频帧中分别对应的所述视频采集装置的位置信息以及所述目标对象的标识部位,利用三角测量法获得所述目标对象的标识部位在以所述视频采集装置为空间坐标原点构建的空间直角坐标系中的位置信息;通过坐标变换获得所述目标对象的标识部位在目标坐标系中的位置信息。
- 根据权利要求6所述的针对运动状态下的多目标对象追踪装置,其特征在于,还包括:运动区域确定单元,用于确定所述视频帧中所述目标对象的实际运动区域;滤除单元,用于将所述视频帧中所述目标对象的实际运动区域作为待检测区域,对所述待检测区域之外的所述特征检测区域进行滤除,获得所述待检测区域之内的所述特征检测区域。
- 一种电子设备,其特征在于,包括:处理器;以及存储器,用于存储针对运动状态下的多目标对象追踪方法的程序,该设备通电并通过所述处理器运行该针对运动状态下的多目标对象追踪方法的程序后,执行下述步骤:获得视频采集装置中采集的视频数据所包含的视频帧;将所述视频帧发送至预设的特征识别模型中,确定所述视频帧中对应目标对象的特征检测区域,从所述检测区域中提取所述目标对象的颜色特征,将相邻所述视频帧中所述目标对象的所述颜色特征进行比对,获得第一比对结果;确定相邻所述视频帧中所述目标对象的标识部位在目标坐标系中的位置信息,将相邻所述视频帧中的所述标识部位在目标坐标系中的位置信息进行比对,获得第二比对结果;根据所述第一比对结果和所述第二比对结果,判断相邻所述视频帧中的所述目标对象是否为同一目标对象,若是,则将相邻所述视频帧中所述目标对象作为同一目标对象进行追踪。
- 一种存储设备,其特征在于,存储有针对运动状态下的多目标对象追踪方法的程序,该程序被处理器运行,执行下述步骤:获得视频采集装置中采集的视频数据所包含的视频帧;将所述视频帧发送至预设的特征识别模型中,确定所述视频帧中对应目标对象的特征检测区域,从所述检测区域中提取所述目标对象的颜色特征,将相邻所述视频帧中所述目标对象的所述颜色特征进行比对,获得第一比对结果;确定相邻所述视频帧中所述目标对象的标识部位在目标坐标系中的位置信息,将相邻所述视频帧中的所述标识部位在目标坐标系中的位置信息进行比对,获得第二比对结果;根据所述第一比对结果和所述第二比对结果,判断相邻所述视频帧中的所述目标对象是否为同一目标对象,若是,则将相邻所述视频帧中 所述目标对象作为同一目标对象进行追踪。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/620,119 US20220215560A1 (en) | 2019-06-17 | 2019-09-27 | Method and device for tracking multiple target objects in motion state |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910522911.3A CN110264493B (zh) | 2019-06-17 | 2019-06-17 | 一种针对运动状态下的多目标对象追踪方法和装置 |
CN201910522911.3 | 2019-06-17 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020252974A1 true WO2020252974A1 (zh) | 2020-12-24 |
Family
ID=67918698
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/108432 WO2020252974A1 (zh) | 2019-06-17 | 2019-09-27 | 一种针对运动状态下的多目标对象追踪方法和装置 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20220215560A1 (zh) |
CN (1) | CN110264493B (zh) |
WO (1) | WO2020252974A1 (zh) |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110264493B (zh) * | 2019-06-17 | 2021-06-18 | 北京影谱科技股份有限公司 | 一种针对运动状态下的多目标对象追踪方法和装置 |
KR102311798B1 (ko) * | 2019-12-12 | 2021-10-08 | 포항공과대학교 산학협력단 | 다중 객체 추적 방법 및 장치 |
CN116797971A (zh) * | 2019-12-31 | 2023-09-22 | 支付宝实验室(新加坡)有限公司 | 一种视频流识别方法及装置 |
CN111582036B (zh) * | 2020-04-09 | 2023-03-07 | 天津大学 | 可穿戴设备下基于形状和姿态的跨视角人物识别方法 |
CN111553257A (zh) * | 2020-04-26 | 2020-08-18 | 上海天诚比集科技有限公司 | 一种高空抛物预警方法 |
CN111681264A (zh) * | 2020-06-05 | 2020-09-18 | 浙江新再灵科技股份有限公司 | 一种监控场景的实时多目标跟踪方法 |
CN112101223B (zh) * | 2020-09-16 | 2024-04-12 | 阿波罗智联(北京)科技有限公司 | 检测方法、装置、设备和计算机存储介质 |
CN112991280B (zh) * | 2021-03-03 | 2024-05-28 | 望知科技(深圳)有限公司 | 视觉检测方法、系统及电子设备 |
CN114025183B (zh) * | 2021-10-09 | 2024-05-14 | 浙江大华技术股份有限公司 | 直播方法、装置、设备、系统和存储介质 |
CN115937267B (zh) * | 2023-03-03 | 2023-10-24 | 北京灵赋生物科技有限公司 | 一种基于多帧视频的目标轨迹追踪方法 |
CN116189116B (zh) * | 2023-04-24 | 2024-02-23 | 江西方兴科技股份有限公司 | 一种交通状态感知方法及系统 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105405154A (zh) * | 2014-09-04 | 2016-03-16 | 成都理想境界科技有限公司 | 基于颜色-结构特征的目标对象跟踪方法 |
CN105678288A (zh) * | 2016-03-04 | 2016-06-15 | 北京邮电大学 | 目标跟踪方法和装置 |
CN105872477A (zh) * | 2016-05-27 | 2016-08-17 | 北京旷视科技有限公司 | 视频监控方法和视频监控系统 |
CN107025662A (zh) * | 2016-01-29 | 2017-08-08 | 成都理想境界科技有限公司 | 一种实现增强现实的方法、服务器、终端及系统 |
CN110264493A (zh) * | 2019-06-17 | 2019-09-20 | 北京影谱科技股份有限公司 | 一种针对运动状态下的多目标对象追踪方法和装置 |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1757087A4 (en) * | 2004-04-16 | 2009-08-19 | James A Aman | AUTOMATIC VIDEO RECORDING OF EVENTS, PURSUIT AND CONTENT PRODUCTION SYSTEM |
CN102289948B (zh) * | 2011-09-02 | 2013-06-05 | 浙江大学 | 高速公路场景下一种多特征融合的多车辆视频跟踪方法 |
CN103281477B (zh) * | 2013-05-17 | 2016-05-11 | 天津大学 | 基于多级别特征数据关联的多目标视觉跟踪方法 |
CN105760854B (zh) * | 2016-03-11 | 2019-07-26 | 联想(北京)有限公司 | 信息处理方法及电子设备 |
CN106600631A (zh) * | 2016-11-30 | 2017-04-26 | 郑州金惠计算机系统工程有限公司 | 基于多目标跟踪的客流统计方法 |
KR102022971B1 (ko) * | 2017-10-18 | 2019-09-19 | 한국전자통신연구원 | 영상의 객체 처리 방법 및 장치 |
-
2019
- 2019-06-17 CN CN201910522911.3A patent/CN110264493B/zh active Active
- 2019-09-27 US US17/620,119 patent/US20220215560A1/en not_active Abandoned
- 2019-09-27 WO PCT/CN2019/108432 patent/WO2020252974A1/zh active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105405154A (zh) * | 2014-09-04 | 2016-03-16 | 成都理想境界科技有限公司 | 基于颜色-结构特征的目标对象跟踪方法 |
CN107025662A (zh) * | 2016-01-29 | 2017-08-08 | 成都理想境界科技有限公司 | 一种实现增强现实的方法、服务器、终端及系统 |
CN105678288A (zh) * | 2016-03-04 | 2016-06-15 | 北京邮电大学 | 目标跟踪方法和装置 |
CN105872477A (zh) * | 2016-05-27 | 2016-08-17 | 北京旷视科技有限公司 | 视频监控方法和视频监控系统 |
CN110264493A (zh) * | 2019-06-17 | 2019-09-20 | 北京影谱科技股份有限公司 | 一种针对运动状态下的多目标对象追踪方法和装置 |
Also Published As
Publication number | Publication date |
---|---|
CN110264493B (zh) | 2021-06-18 |
CN110264493A (zh) | 2019-09-20 |
US20220215560A1 (en) | 2022-07-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020252974A1 (zh) | 一种针对运动状态下的多目标对象追踪方法和装置 | |
WO2021043073A1 (zh) | 基于图像识别的城市宠物活动轨迹监测方法及相关设备 | |
CN111462200B (zh) | 一种跨视频行人定位追踪方法、系统及设备 | |
CN109145708B (zh) | 一种基于rgb和d信息融合的人流量统计方法 | |
US10417503B2 (en) | Image processing apparatus and image processing method | |
WO2018209934A1 (zh) | 基于时空约束的跨镜头多目标跟踪方法及装置 | |
WO2020094091A1 (zh) | 一种图像抓拍方法、监控相机及监控系统 | |
CN104598883B (zh) | 一种多摄像机监控网络中目标再识别的方法 | |
WO2020258978A1 (zh) | 对象检测方法和装置 | |
CN108197604A (zh) | 基于嵌入式设备的快速人脸定位跟踪方法 | |
WO2020094088A1 (zh) | 一种图像抓拍方法、监控相机及监控系统 | |
CN104978567B (zh) | 基于场景分类的车辆检测方法 | |
KR101781358B1 (ko) | 디지털 영상 내의 얼굴 인식을 통한 개인 식별 시스템 및 방법 | |
WO2019001505A1 (zh) | 一种目标特征提取方法、装置及应用系统 | |
CN106991370B (zh) | 基于颜色和深度的行人检索方法 | |
CN104361327A (zh) | 一种行人检测方法和系统 | |
KR20070016849A (ko) | 얼굴 검출과 피부 영역 검출을 적용하여 피부의 선호색변환을 수행하는 방법 및 장치 | |
CN109993086A (zh) | 人脸检测方法、装置、系统及终端设备 | |
CN106447701A (zh) | 用于图像相似性确定、对象检测和跟踪的方法和装置 | |
CN104573617A (zh) | 一种摄像控制方法 | |
US9947106B2 (en) | Method and electronic device for object tracking in a light-field capture | |
Sokolova et al. | Human identification by gait from event-based camera | |
Chen et al. | Object tracking over a multiple-camera network | |
CN109001674B (zh) | 一种基于连续视频序列的WiFi指纹信息快速采集与定位方法 | |
TW201530495A (zh) | 移動物體追蹤方法及電子裝置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19933715 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19933715 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 28.03.2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19933715 Country of ref document: EP Kind code of ref document: A1 |