WO2021218671A1 - 目标跟踪方法及装置、存储介质及计算机程序 - Google Patents
目标跟踪方法及装置、存储介质及计算机程序 Download PDFInfo
- Publication number
- WO2021218671A1 WO2021218671A1 PCT/CN2021/087870 CN2021087870W WO2021218671A1 WO 2021218671 A1 WO2021218671 A1 WO 2021218671A1 CN 2021087870 W CN2021087870 W CN 2021087870W WO 2021218671 A1 WO2021218671 A1 WO 2021218671A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- target
- scene
- feature
- scene image
- similarity
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
- G06T7/248—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30232—Surveillance
Definitions
- the present disclosure relates to the field of computer vision, and in particular to a target tracking method and device, storage medium and computer program.
- the demand for analyzing the movement trajectory of a target through multi-target tracking technology is increasing.
- the processing time of the above-mentioned multi-target tracking is linearly related to the number of targets in the scene. For example, if there are N objects in the scene, where N is a positive integer, then multi-target tracking requires N times of single-target tracking reasoning, and the processing time will increase to N times the time required for single-target tracking. The larger the value of N, the longer the multi-target tracking time, which requires the device to have higher computing power and time-consuming.
- the present disclosure provides a target tracking method and device, storage medium and computer program.
- a target tracking method comprising: acquiring multiple scene images corresponding to the same scene; performing feature extraction processing on each of the multiple scene images; and Target part detection to obtain the feature information of each scene image and the positions of multiple target parts on each scene image; to obtain the location of the multiple target parts in the feature information of each scene image Respectively corresponding target feature information; according to the acquired target feature information corresponding to the positions of the multiple target parts, multiple identical targets appearing on the multiple scene images are determined, wherein each scene image includes Part or all of the plurality of identical targets.
- the feature extraction processing and target part detection are performed on each scene image of the plurality of scene images to obtain the feature information of each scene image and the feature information of each scene image.
- the positions of the multiple target parts include: extracting a first feature map of each scene image in the multiple scene images; performing target part detection on the first feature map of each scene image to obtain the The positions of multiple target parts on each scene image; and, performing feature extraction processing on the first feature map of each scene image to obtain a multi-dimensional second feature map; said acquiring each scene image
- the target feature information corresponding to the positions of the multiple target parts in the feature information includes: acquiring target feature vectors respectively corresponding to the positions of the multiple target parts on the multi-dimensional second feature map.
- the determining the multiple identical targets appearing on the multiple scene images according to the acquired target feature information corresponding to the positions of the multiple target parts includes: using the The multiple target feature information corresponding to each adjacent two scene images in the multiple scene images are respectively obtained, and the similarity between each target part on each two adjacent scene images is obtained; based on each adjacent two scene images The similarity between various target parts on the image determines multiple identical targets appearing on the images of different scenes.
- each of the two adjacent scene images is a first scene image and a second scene image; the use of the plurality of scene images corresponding to each adjacent two scene images respectively Target feature information to obtain the similarity between each target part on each two adjacent scene images, including: determining N target feature vectors on the first scene image and M target features on the second scene image, respectively The similarity between vectors; where N and M are positive integers greater than or equal to 2; according to the difference between the N target feature vectors on the first scene image and the M target feature vectors on the second scene image, respectively The similarity between the N ⁇ M dimensions is obtained, and the value of any dimension in the similarity matrix represents any first target part of the first scene image and the second scene image. The similarity of any second target part of.
- the determining a plurality of the same targets appearing on the different scene images based on the similarity between the respective target parts on each two adjacent scene images includes: According to the similarity matrix, in the similarities between the first target eigenvectors in the N target eigenvectors and the M target eigenvectors, determine the maximum similarity; if the maximum similarity is greater than A preset threshold, the second target feature vector corresponding to the maximum similarity is determined among the M target feature vectors; the first target part corresponding to the first target feature vector on the first scene image is determined The belonging target and the target belonging to the second target part corresponding to the second target feature vector on the second scene image are regarded as the same target.
- the feature extraction processing and target part detection are performed on each scene image of the plurality of scene images to obtain the feature information of each scene image and the feature information of each scene image.
- the positions of the multiple target parts include: extracting the first feature map of each of the multiple scene images through the backbone network of the feature detection model; detecting branches through the part of the feature detection model, Perform target part detection on the first feature map of each scene image to obtain the positions of multiple target parts on each scene image;
- the first feature map of the image is subjected to feature extraction processing to obtain a multi-dimensional second feature map.
- the method further includes: inputting multiple sample scene images corresponding to the same scene into the initial neural network model, and obtaining information about multiple target parts on each sample scene image output by the initial neural network model.
- the first similarity between the sample feature vectors corresponding to the positions of the target parts, and/or the second similarity between the sample feature vectors corresponding to the positions of the target parts of the different target identifiers is determined Degree; based on the target identifiers corresponding to the multiple target parts marked on each sample scene image, according to at least one of the first similarity and the second similarity, the initial neural network
- the model undergoes supervised training to obtain the feature detection model.
- the target identifiers corresponding to the multiple target parts marked on each sample scene image are based on at least one of the first similarity and the second similarity.
- performing supervised training on the initial neural network model to obtain the feature detection model includes: using the difference between the first similarity reference value and the first similarity as a first loss function; wherein, the The first similarity reference value is the similarity reference value between the sample feature vectors corresponding to the target parts of the same target identifier marked on each of the two adjacent sample scene images; the second similarity reference value is compared with The difference between the second similarities is used as a second loss function; wherein, the second similarity reference value is corresponding to target parts of different target identifiers marked on each of the two adjacent sample scene images The reference value of the similarity between the sample feature vectors of the; according to at least one of the first loss function and the second loss function, the initial neural network model is trained to obtain the feature detection model.
- the method further includes: determining whether the motion trajectory of at least one of the multiple identical targets appearing on the multiple scene images within a preset time period conforms to the target motion trajectory.
- the multiple scene images correspond to a classroom scene
- the target includes a teaching object
- the target motion track includes at least one motion track designated for the teaching object in a teaching task.
- a target tracking device the device includes: an acquisition module, configured to acquire multiple scene images corresponding to the same scene; Feature extraction processing and target part detection are performed on each scene image of each scene image to obtain the characteristic information of each scene image and the positions of multiple target parts on each scene image; the characteristic information determination module is used to obtain the The target feature information corresponding to the positions of the multiple target parts in the feature information of each scene image; the target determination module is used to determine the occurrence of the target feature information corresponding to the positions of the multiple target parts obtained A plurality of identical targets on the plurality of scene images, wherein each scene image includes part or all of the targets of the plurality of identical targets.
- a computer-readable storage medium stores a computer program, and the computer program is used to execute the target tracking method of any one of the first aspects.
- a target tracking device including: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured to call the storage in the memory Executable instructions to implement the target tracking method described in any one of the first aspect.
- a computer program is provided, wherein when the computer program is executed by a processor, the target tracking method described in any one of the first aspect can be implemented.
- Single target tracking reasoning is to perform single frame inference for a single scene image, to obtain target feature information corresponding to the positions of multiple target parts, and to match the single frame inference results to obtain multiple of each adjacent two scene images.
- the same target achieves the purpose of multi-target tracking, and even if multiple targets are included in the current scene, due to the inference for the entire scene image, the duration of the entire multi-target tracking process is independent of the number of targets included in the scene image , It will not increase the tracking time due to the increase in the number of targets to perform single-target tracking reasoning one by one, which greatly saves computing resources, shortens the duration of multi-target tracking, and effectively improves the detection efficiency of multi-target tracking.
- Fig. 1 is a flowchart of a target tracking method according to an exemplary embodiment of the present disclosure
- Fig. 2 is a flowchart of another target tracking method according to an exemplary embodiment of the present disclosure
- Fig. 3 is a flowchart of another target tracking method according to an exemplary embodiment of the present disclosure.
- Fig. 4 is a flowchart of another target tracking method according to an exemplary embodiment of the present disclosure.
- Fig. 5 is a flowchart of another target tracking method according to an exemplary embodiment of the present disclosure.
- Fig. 6 is a schematic structural diagram of a feature detection model according to an exemplary embodiment of the present disclosure.
- Fig. 7 is a schematic diagram of an inference process of multi-target tracking according to an exemplary embodiment of the present disclosure
- Fig. 8 is a flowchart of another target tracking method according to an exemplary embodiment of the present disclosure.
- Fig. 9 is a schematic diagram of a feature detection model training scene according to an exemplary embodiment of the present disclosure.
- Fig. 10 is a flowchart of another target tracking method according to an exemplary embodiment of the present disclosure.
- Fig. 11 is a block diagram showing a target tracking device according to an exemplary embodiment of the present disclosure.
- Fig. 12 is a schematic structural diagram of a target tracking device according to an exemplary embodiment of the present disclosure.
- first, second, third, etc. may be used in this disclosure to describe various information, the information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other.
- first information may also be referred to as second information, and similarly, the second information may also be referred to as first information.
- word “if” as used herein can be interpreted as "when” or “when” or “in response to a certainty”.
- the embodiments of the present disclosure provide a multi-target tracking solution, which is exemplary and can be applied to terminal devices in different scenarios. Different scenarios include, but are not limited to, classrooms, locations where surveillance is deployed, or other indoor or outdoor scenarios that require tracking of multiple targets.
- the terminal device can be any device with a camera, or the terminal device can also be an external camera device.
- the terminal device may successively collect multiple scene images in the same scene, or may directly collect a video stream, and use the multiple images in the video stream as the multiple scene images.
- the terminal device performs feature extraction processing and target part detection on each of the multiple scene images acquired, based on the feature information of each scene image and the number of target parts on each scene image. Position, acquiring target characteristic information corresponding to the positions of multiple target parts in the characteristic information of each scene image, so as to determine multiple identical targets appearing in the multiple scene images.
- the terminal equipment can be a teaching multimedia device with a camera deployed in the classroom, including but not limited to teaching projectors, monitoring equipment in the classroom, etc.
- the terminal device acquires multiple scene images in the classroom, thereby performing feature extraction processing and target part detection on each of the multiple scene images to obtain the feature information of each scene image and each scene The location of multiple target parts on the image.
- Acquire target feature information corresponding to the positions of the multiple target parts in the feature information of each scene image, so as to determine multiple identical targets appearing on the multiple scene images, and achieve multi-target tracking Purpose.
- the target in this scene may include but is not limited to teaching objects, such as students, and the target parts include but are not limited to human face parts and human body parts.
- one or more surveillance cameras can be deployed in a subway or railway station, and multiple scene images of the subway or railway station can be acquired through the surveillance cameras.
- Targets in this scenario may include passengers, luggage carried by passengers, staff, and so on.
- the multi-target tracking solution provided by the embodiments of the present disclosure can also be applied to cloud servers in different scenarios.
- the cloud server can be equipped with an external camera, and the external camera can collect multiple scene images in the same scene successively, or directly A video stream is collected, and multiple images in the video stream are used as the multiple scene images.
- the collected scene images can be sent to a cloud server through a router or a gateway, and the cloud server performs feature extraction processing and target part detection on each scene image to obtain the feature information of each scene image and each scene image The positions of the multiple target parts on the upper part, so as to obtain the target characteristic information corresponding to the positions of the multiple target parts in the characteristic information of each scene image, and further, it is determined to appear on the multiple scene images Multiple same goals.
- the external camera is set in the classroom, and the external camera collects multiple scene images in the classroom, and sends them to the cloud server through the router or gateway, and the cloud server executes the above-mentioned target tracking method.
- the same target is identified with the same identification frame and the identified scene image is output.
- a red marking box is used to identify target 1 in the scene
- a green marking box is used to identify target 2 in the scene
- a blue marking box is used to identify the target in the scene. 3 and so on, in order to better show multiple identical targets in the current scene.
- the same or different targets can be distinguished by the target identifiers corresponding to the marking boxes.
- an output scene image includes 3 marking boxes, and the corresponding target identifiers are 1, 2 and 3 respectively.
- the adjacent scene image includes two identification frames, and the corresponding target identifications are 1 and 3. Then it can be determined that the identification frame with the target identification 1 on the two scene images corresponds to the same target, and the identification frame with the target identification 3 is also Corresponding to the same target, the recognition boxes with target identifiers 1 and 3 respectively correspond to different targets.
- the target motion trajectory may include but is not limited to at least one motion trajectory specified for the teaching object in the teaching task, such as moving from the current position to another position designated by the teacher.
- the other positions may be the podium, blackboard, or the position of other students, or the target motion trajectory may also include the same position. Teachers can better perform teaching work according to the movement trajectory of multiple teaching objects.
- the target includes but is not limited to passengers.
- the target motion trajectory can include but is not limited to a designated dangerous motion trajectory or an illegal motion trajectory, such as from a platform
- the position is moved to the position of the rail, above or below the gate, etc.
- the staff can better manage the station according to the movement trajectory of the passengers to avoid dangerous behaviors or fare evasion.
- Fig. 1 shows a target tracking method according to an exemplary embodiment, which includes the following steps:
- step 101 multiple scene images corresponding to the same scene are acquired.
- multiple scene images can be collected in the same scene one after another, or a video stream can be collected, and multiple images in the video stream can be used as multiple scene images.
- the scenarios of the present disclosure include, but are not limited to, any scenarios that require multi-target tracking, such as classrooms, locations where monitoring is arranged, and so on.
- step 102 feature extraction processing and target part detection are performed on each scene image of the multiple scene images to obtain feature information of each scene image and multiple target parts on each scene image s position.
- performing feature extraction on each scene image refers to extracting feature information from each scene image
- the feature information may include, but is not limited to, color features, texture features, shape features, and the like.
- Color feature is a kind of global feature, which describes the surface color attribute of the object corresponding to the image
- texture feature is also a kind of global feature, which describes the surface texture attribute of the object corresponding to the image
- there are two types of representation methods for shape features one is It is the contour feature, and the other is the regional feature.
- the contour feature of the image is mainly for the outer boundary of the object, and the regional feature of the image is related to the shape of the image area.
- one target part corresponds to one target, but it is not restrictive, and multiple target parts may correspond to one target.
- the target part may include, but is not limited to, a human face part and/or a human body part.
- the human body part may include the entire human body of the person or a certain designated part of the human body, such as hands, legs, and so on.
- the position of the target part can be represented by at least the center position of the identification frame of the target part.
- the target part includes a face part, and the position of the target part can be represented by the center position of the face identification frame.
- the marking frame of the target part can be realized as a rectangular frame circumscribing the target part, and so on.
- step 103 target feature information corresponding to the positions of the multiple target parts in the feature information of each scene image is obtained.
- each scene image includes multiple target parts, and according to the acquired feature information of each scene image, feature extraction is performed on the pixels of the area including the target part, and the positions of the multiple target parts are determined. Corresponding target feature information respectively.
- the target characteristic information corresponding to the multiple pixels included in the region of each target part in the characteristic information of each scene image may be obtained through convolution processing or the like.
- step 104 according to the acquired target feature information corresponding to the positions of the multiple target parts, multiple identical targets appearing on the multiple scene images are determined, wherein each scene image includes the multiple Part of the goal or all goals of the same goal.
- the target feature information corresponding to the positions of multiple target parts is obtained on each scene image, and by matching the target feature information of the multiple scene images, it can be determined that it appears in the multiple scenes. Multiple identical targets on the image.
- Target tracking reasoning instead, a single frame inference is performed on a single scene image, and the target feature information corresponding to the positions of multiple target parts is obtained.
- the target feature information corresponding to the positions of multiple target parts is obtained.
- each Multiple identical targets in two adjacent scene images achieve the purpose of multi-target tracking.
- the duration of the entire multi-target tracking process has nothing to do with the number of targets included in the scene image.
- Target tracking reasoning leads to an increase in tracking time, which greatly saves computing resources, shortens the duration of multi-target tracking, and effectively improves the detection efficiency of multi-target tracking.
- step 102 may include:
- step 102-1 the first feature map of each scene image in the plurality of scene images is extracted.
- the image feature of each scene image can be extracted through a pre-trained neural network model to obtain the first feature map.
- the neural network model can adopt, but is not limited to, models such as Visual Geometry Group Network (VGG Net).
- step 102-2 target part detection is performed on the first feature map of each scene image to obtain the positions of multiple target parts on each scene image; and, for each scene image Perform feature extraction processing on the first feature map to obtain a multi-dimensional second feature map.
- the target part may include a human face part and/or a human body part.
- RPN Regional Proposal Network
- face parts and/or human body parts can be detected on the first feature map of each scene image to determine the face area and/or corresponding to the face part.
- the face area can be identified by the face recognition frame
- the human body area can be identified by the human body recognition frame.
- the center position of the face recognition frame may be used as the position of the face part.
- the center position of the human body identification frame can be regarded as the position of the human body part.
- the size of the second feature map may be the same as the size of the first feature map, and the dimension value of the second feature map is the preset number of channels corresponding to each scene image.
- step 103 may include:
- the target feature information is used to represent the feature information corresponding to multiple pixels in each of the multiple target part regions included in the second feature map of any one dimension.
- the target part may include a human face part and/or a human body part.
- the feature information corresponding to any one pixel can constitute a one-dimensional feature vector.
- these feature vectors can be used One or more feature vectors are selected to represent the feature information of the region of the target part, that is, the target feature information.
- the feature vector corresponding to the pixel of the position of the target part can be selected, and the feature vector can be used as the target feature vector corresponding to the position of the target part on the second feature map of the dimension.
- the position of the target part may include the center position of the face recognition frame/or the center position of the human body recognition frame.
- the feature information corresponding to the pixels of the positions of multiple target parts can be obtained, and the corresponding feature information can be obtained.
- the positions of multiple target parts respectively correspond to target feature vectors.
- target feature vectors corresponding to the positions of multiple target parts can be obtained, so that the dimensional value of the target feature vector is the same as the dimensional value of the second feature map. For example, if the dimension value of the second feature map is C, then the dimension value of the target feature vector is also C.
- feature extraction, target part detection, and target feature vectors corresponding to the positions of multiple target parts are sequentially performed for the entire scene image.
- the entire process is a single frame inference for a single scene image, so It has nothing to do with the number of targets included; subsequent matching will be performed on the target feature vectors corresponding to multiple target positions on every two adjacent scene images, so that there is no need to perform single target tracking and inference separately, even if the scene image There are more targets included, and the matching process can also be completed at one time.
- the target tracking method of the present disclosure has nothing to do with the number of targets in the scene image, and will not increase the tracking time due to the increase in the number of targets, which greatly saves computing resources, shortens the duration of multi-target tracking, and effectively improves multi-target tracking The detection efficiency.
- step 104 may include:
- step 104-1 the multiple target feature information corresponding to each two adjacent scene images on the multiple scene images are used to obtain the difference between the target parts on each two adjacent scene images. Similarity.
- multiple target feature information corresponding to the multiple target parts in the feature information of each scene image have been determined, and multiple target feature information corresponding to each two adjacent scene images can be used.
- the similarity calculation is performed to obtain the similarity between each target part on each two adjacent scene images.
- step 104-2 based on the similarity between the respective target parts on each of the two adjacent scene images, multiple identical targets appearing on the different scene images are determined.
- the target to which the target part with the greatest similarity belongs can be regarded as the same target appearing on different scene images.
- multiple identical targets appearing on different scene images can be determined according to the similarity between each target part on each two adjacent scene images, which achieves the purpose of multi-target tracking, and the tracking process is the same as The number of targets is irrelevant, and the availability is high.
- every two adjacent scene images are the first scene image T 0 and the second scene image T 1 .
- the foregoing step 104-1 may include:
- step 104-11 the similarity between the N target feature vectors on the first scene image and the M target feature vectors on the second scene image is determined.
- the target feature information is used to represent the feature information corresponding to multiple pixels in each of the multiple target part regions included in the second feature map of any one dimension.
- the target part may include a human face part and/or a human body part.
- the feature information corresponding to any one pixel can constitute a one-dimensional feature vector.
- you can One or more feature vectors are selected from these feature vectors to represent the feature information of the region of the target part.
- the feature vector corresponding to the pixel of the position of the target part can be selected, and the feature vector can be used as the target feature vector corresponding to the position of the target part on the second feature map of the dimension.
- the position of the target part may include the center position of the face recognition frame/or the center position of the human body recognition frame.
- the similarity between the N target feature vectors on the first scene image and the M target feature vectors on the second scene image in every two adjacent scene images can be determined, where, N and M are positive integers greater than or equal to 2. That is, the similarity between the multiple target feature vectors on the first scene image and the multiple target feature vectors on the second scene image is determined.
- the cosine similarity value between the target feature vectors can be determined.
- the similarity between them is evaluated.
- step 104-12 according to the similarity between the N target feature vectors on the first scene image and the M target feature vectors on the second scene image, an N ⁇ M dimension Similarity matrix.
- the value of any dimension in the similarity matrix represents the similarity between any first target part in the first scene image and any second target part in the second scene image.
- N and M can be equal or unequal.
- the similarity between the N target feature vectors on the first scene image and the M target feature vectors on the second scene image can be determined to obtain a similarity matrix of N ⁇ M dimensions.
- the degree matrix represents the similarity between any first target part in the first scene image and any second target part in the second scene image, which is easy to implement and has high usability.
- a bipartite graph algorithm may be used for step 104-2. Under the condition that the spatial distance constraint is satisfied, based on the similarity between each target part on each two adjacent scene images, it is determined Multiple identical objects appearing on the images of the different scenes.
- the bipartite graph algorithm refers to a bipartite graph, assuming that the left vertex is X, and the right vertex is Y. Now for each group of left and right connections X i Y j with weights w ij , find a match so that all w ij The sum is the largest.
- X i corresponds to a feature vector of the N target scene on the first image
- Y j corresponding to the target feature vectors of the M image of the scene on the second one, on the weights w ij Corresponding similarity.
- the present disclosure needs to match the N target feature vectors with the second target feature vector when the degree of similarity is the largest, and finally multiple identical targets appearing in every two adjacent scene images can be determined.
- the conditions for satisfying the spatial distance constraint include: the dimension of the similarity between the N target feature vectors and the M target feature vectors does not exceed N ⁇ M.
- step 104-2 may include:
- step 104-21 according to the similarity matrix, among the similarities between the first target eigenvectors in the N target eigenvectors and the M target eigenvectors, the maximum similarity is determined .
- the first target feature vector is any one of the N target feature vectors determined on the first scene image. According to the similarity matrix, the similarity between the first target feature vector and each target feature vector on the second scene image can be obtained, and a maximum similarity can be determined among these similarities.
- the similarity matrix is A:
- the similarities between the first target feature vector and the M second target feature vectors are a 11 , a 12, and a 13 respectively , and the maximum value can be determined, which is assumed to be a 11 .
- step 104-22 if the maximum similarity is greater than a preset threshold, a second target feature vector corresponding to the maximum similarity is determined among the M target feature vectors.
- the second target feature vector is the target feature vector corresponding to the maximum similarity among the M target feature vectors included in the second scene image.
- step 104-23 the first target part corresponding to the first target feature vector on the first scene image belongs to the target and the second target part corresponding to the second target feature vector on the second scene image belongs to Goal, as the same goal.
- the first target feature vector corresponding to the first target feature vector of the first scene image is assigned to the target and the second scene image.
- the target of the second target part corresponding to the second target feature vector is regarded as the same target.
- the target to which the first target part corresponding to the first target feature vector on the first scene image belongs does not have the same target on the second scene image.
- the number of repetitions is the number N of target feature vectors included in the first scene image, and finally all the same targets appearing on the first scene image and the second scene image can be determined.
- the two targets with the closest similarity between the target parts on each two adjacent scene images can be taken as the same target according to the similarity matrix, which achieves the purpose of multi-target tracking and has high availability.
- At least two of the multiple scene images may be input to a pre-trained feature detection model, and the feature detection model can
- Each scene image in the scene image is subjected to feature extraction processing and target part detection to obtain the characteristic information of each scene image and the positions of multiple target parts on each scene image, and based on each scene The positions of the multiple target parts on the image are obtained, and the multiple target feature information corresponding to the multiple target parts in the feature information of each scene image is acquired.
- the structure of the feature detection model is shown in Figure 6, where multiple scene images are input into the feature detection model.
- the feature detection model first extracts features from each of the multiple scene images through the backbone network to obtain each scene The first feature map of the image.
- the location detection branch of the feature detection model perform target location detection on the first feature map of each scene image to obtain the positions of multiple target locations on each scene image; and
- feature extraction processing is performed on the first feature map of each scene image to obtain a multi-dimensional second feature map.
- the target may include a person
- the target part may include a face part and/or a body part.
- the feature extraction branch may be formed by concatenating at least one convolutional layer.
- the size of the second feature map is the same as that of the first feature map, so that the positions of multiple target parts on the second feature map of each dimension are the same.
- the dimension value of the second feature map is the same as the number of preset channels corresponding to each scene image.
- the position of the target part may be represented by the center position of the face recognition frame and/or the center position of the human body recognition frame.
- the dimension value of the target feature vector is the same as the dimension value of the second feature map.
- the size of the second feature map obtained by the feature extraction branch is the same as the size of the first feature map, both are H ⁇ W, where H and W are images respectively.
- the dimensionality value of the second feature map is C, and C is the preset number of channels corresponding to each scene image. On each channel, the target feature vector corresponding to the center position (x, y) of the face recognition frame can be obtained. Therefore, the dimension value of the target feature vector is C.
- N target feature vectors on the first scene image can be determined.
- the similarity between the M target feature vectors on the second scene image is obtained, and the similarity matrix is obtained. According to the similarity matrix, multiple identical targets appearing on the different scene images are determined.
- the determination method is the same as that of step 104-2, and will not be repeated here.
- the above feature detection models are input respectively, and N target feature vectors and M target feature vectors can be obtained respectively.
- a bipartite graph algorithm may be used to match the extracted features of the target part under the condition of satisfying the spatial distance constraint, so as to determine the same target appearing in T 0 and T 1.
- the method may further include:
- step 100-1 input multiple sample scene images corresponding to the same scene into the initial neural network model, and obtain sample feature vectors corresponding to the positions of multiple target parts on each sample scene image output by the initial neural network model.
- the existing multiple sample images corresponding to the same scene are used as the input value of the initial neural network model. Multiple same goals and different goals.
- the structure of the initial neural network model may also be as shown in FIG. 6, including a backbone network, a location detection branch, and a feature extraction branch.
- the input value includes multiple sample scene images
- sample feature vectors corresponding to the positions of multiple target parts on each sample scene image can be obtained.
- step 100-2 according to the target identifiers corresponding to the multiple target parts marked on each sample scene image, on every two adjacent sample scene images, determine the same target identifier.
- the first similarity between the sample feature vectors corresponding to the positions of the target parts, and/or the second similarity between the sample feature vectors corresponding to the positions of the target parts of the different target identifiers is determined .
- the same target on every two adjacent sample scene images can be determined
- the first similarity between the sample feature vectors corresponding to the positions of the identified target parts, and/or, the difference between the target parts of the target identifiers on each of the two adjacent sample scene images The second degree of similarity between the sample feature vectors corresponding to the positions.
- the first similarity value and the second similarity value can be obtained according to the cosine similarity value between the sample feature vectors.
- step 100-3 based on the target identifiers corresponding to the multiple target parts marked on each sample scene image, according to at least one of the first similarity and the second similarity, Supervised training is performed on the initial neural network model to obtain the feature detection model.
- the loss function can be determined by increasing the first similarity value and reducing the second similarity value, for example, as shown in FIG. 9. Based on the target identifiers corresponding to the multiple target positions on each of the two adjacent sample scene images, the network parameters of the preset model are adjusted according to the determined loss function, and the feature detection model is obtained after the supervision training is completed.
- the initial neural network model is supervised and trained based on the target identifiers corresponding to the multiple target parts marked on each sample scene image to obtain the feature detection model, which improves the feature detection model.
- the detection performance and generalization performance are supervised and trained based on the target identifiers corresponding to the multiple target parts marked on each sample scene image to obtain the feature detection model, which improves the feature detection model.
- the difference between the first similarity reference value and the first similarity may be used as the first loss function.
- the first similarity reference value is the similarity reference value between the sample feature vectors corresponding to the target parts of the same target identifier marked on each of the two sample scene images.
- the first similarity reference value is the cosine similarity value between the sample feature vectors, and the value may be 1.
- the first loss function is minimized or reaches the preset number of training times, and the feature detection model is obtained.
- the difference between the second similarity reference value and the second similarity may be used as the second loss function.
- the second similarity reference value is the similarity reference value between sample feature vectors corresponding to target parts of different target identifiers marked on each of the two sample scene images.
- the second similarity reference value is the cosine similarity value between the sample feature vectors, and the value may be zero.
- the second loss function is minimized or reaches the preset number of training times, and the feature detection model is obtained.
- the method may further include:
- step 105 it is determined whether the motion trajectory of at least one of the multiple identical targets appearing on the multiple scene images within a preset time period conforms to the target motion trajectory.
- multiple scene images correspond to a classroom scene
- the target includes a teaching object
- the target motion track includes at least one motion track designated for the teaching object in a teaching task.
- the at least one motion trajectory specified for the teaching object in the teaching task includes but is not limited to walking from the current position to other positions designated by the teacher.
- the other positions may be the positions of the podium, blackboard, or other students, or the target motion trajectory It can also include no movement at the current location.
- teaching multimedia equipment with cameras deployed in the classroom can be used to sequentially collect multiple scene images in the classroom.
- the motion track of at least one teaching object included in the classroom scene image is determined, and the teaching object may be a student.
- each teaching object for example, the movement trajectory of each student, conforms to at least one specified for the teaching object in the teaching task Movement trajectory. For example, whether to move from the current position to the blackboard or the position of other students according to the instruction of the teacher, or always stay in the same position without movement of the motion track, for example, always sit in your own position and listen to the lecture.
- the above results can be displayed through teaching multimedia equipment, so that teachers can better carry out teaching tasks.
- the present disclosure also provides an embodiment of the device.
- FIG. 11 is a block diagram of a target tracking device according to an exemplary embodiment of the present disclosure.
- the device includes: an acquisition module 210 for acquiring multiple scene images corresponding to the same scene; and a processing module 220 for Performing feature extraction processing and target part detection on each scene image of the multiple scene images to obtain feature information of each scene image and the positions of multiple target parts on each scene image;
- the information determining module 230 is used to obtain the target feature information corresponding to the positions of the multiple target parts in the characteristic information of each scene image;
- the target determining module 240 is used to obtain the multiple target parts according to the acquired target feature information.
- the positions of the corresponding target feature information respectively determine multiple identical targets appearing on the multiple scene images, wherein each scene image includes part or all of the multiple identical targets.
- the processing module includes: a first processing sub-module for extracting a first feature map of each scene image in the plurality of scene images; a second processing sub-module for Perform target part detection on the first feature map of each scene image to obtain the positions of multiple target parts on each scene image; and perform feature extraction on the first feature map of each scene image Processing to obtain a multi-dimensional second feature map;
- the feature information determining module includes: a feature vector determining sub-module for obtaining information corresponding to the positions of the multiple target parts on the multi-dimensional second feature map Multiple target feature vectors.
- the target determination module includes: a similarity determination sub-module, configured to use multiple target feature information corresponding to each two adjacent scene images in the multiple scene images to obtain the The similarity between each target part on every two adjacent scene images; the target determination sub-module is used to determine the similarity between each target part on every two adjacent scene images, Multiple identical targets on the scene image.
- each of the two adjacent scene images is a first scene image and a second scene image;
- the similarity determination sub-module includes: determining the N target feature vectors on the first scene image, respectively The degree of similarity with the M target feature vectors on the second scene image; where N and M are positive integers greater than or equal to 2; and the N target feature vectors on the first scene image are respectively compared with the first scene image.
- the similarity between the M target feature vectors on the two scene images obtains a similarity matrix of N ⁇ M dimensions, and the value of any dimension in the similarity matrix represents any first scene image of the first scene image.
- the similarity between a target part and any second target part in the second scene image is determining the N target feature vectors on the first scene image, respectively The degree of similarity with the M target feature vectors on the second scene image; where N and M are positive integers greater than or equal to 2; and the N target feature vectors on the first scene image are respectively compared with the first scene image.
- the target determination submodule includes: according to the similarity matrix, the difference between the first target eigenvectors in the N target eigenvectors and the M target eigenvectors is In the similarity, determine the maximum value of the similarity; if the maximum value of the similarity is greater than the preset threshold, determine the second target feature vector corresponding to the maximum value of the similarity among the M target feature vectors; The target of the first target part corresponding to the first target feature vector on the first scene image and the target of the second target part corresponding to the second target feature vector on the second scene image are regarded as the same target.
- the processing module includes: a third processing sub-module, configured to extract the first feature map of each of the multiple scene images through the backbone network of the feature detection model; and fourth The processing sub-module is used to detect the target part on the first feature map of each scene image through the part detection branch of the feature detection model to obtain the positions of multiple target parts on each scene image And, through the feature extraction branch of the feature detection model, feature extraction processing is performed on the first feature map of each scene image to obtain a multi-dimensional second feature map.
- a third processing sub-module configured to extract the first feature map of each of the multiple scene images through the backbone network of the feature detection model
- fourth The processing sub-module is used to detect the target part on the first feature map of each scene image through the part detection branch of the feature detection model to obtain the positions of multiple target parts on each scene image
- feature extraction processing is performed on the first feature map of each scene image to obtain a multi-dimensional second feature map.
- the device further includes: a feature vector determining module, configured to input multiple sample scene images corresponding to the same scene into a preset model, and obtain each sample scene image output by the preset model. Multiple feature vectors corresponding to the positions of multiple target parts; the similarity determination module is used to determine the target identifiers corresponding to the multiple target parts marked on every two adjacent sample scene images, respectively.
- a feature vector determining module configured to input multiple sample scene images corresponding to the same scene into a preset model, and obtain each sample scene image output by the preset model. Multiple feature vectors corresponding to the positions of multiple target parts; the similarity determination module is used to determine the target identifiers corresponding to the multiple target parts marked on every two adjacent sample scene images, respectively.
- the first similarity between the sample feature vectors corresponding to the positions of the target parts of the same target identifier on the two sample scene images; and/or determine the different targets on each of the two adjacent sample scene images The second degree of similarity between the sample feature vectors corresponding to the positions of the identified target parts; the training module is configured to be based on the target identifications respectively corresponding to the multiple target parts marked on each of the two adjacent sample scene images, According to at least one of the second degree of similarity and the first degree of similarity, supervised training is performed on the preset model to obtain the feature detection model.
- the difference between the first similarity reference value and the first similarity is taken as the first loss function; wherein, the first similarity reference value is every two adjacent sample scenes The similarity reference value between the sample feature vectors corresponding to the target parts of the same target identifier marked on the image; taking the difference between the second similarity reference value and the second similarity as the second loss function;
- the second similarity reference value is the similarity reference value between the sample feature vectors corresponding to target parts of different target identifiers marked on each of the two adjacent sample scene images; according to the first At least one of a loss function and the second loss function is trained on the initial neural network model to obtain the feature detection model.
- the device further includes: a motion trajectory determining module, configured to determine the motion of at least one target among the multiple same targets appearing on the multiple scene images within a preset time period Whether the trajectory conforms to the target motion trajectory.
- a motion trajectory determining module configured to determine the motion of at least one target among the multiple same targets appearing on the multiple scene images within a preset time period Whether the trajectory conforms to the target motion trajectory.
- the multiple scene images correspond to a classroom scene
- the target includes a teaching object
- the target motion track includes at least one motion track designated for the teaching object in a teaching task.
- the relevant part can refer to the part of the description of the method embodiment.
- the device embodiments described above are merely illustrative, where the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place. , Or it can be distributed to multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the present disclosure. Those of ordinary skill in the art can understand and implement without creative work.
- the embodiment of the present disclosure also provides a computer-readable storage medium, the storage medium stores a computer program, and the computer program is used to execute any of the target tracking methods described above.
- the embodiments of the present disclosure provide a computer program product, including computer-readable code.
- the processor in the device executes any of the above implementations.
- the example provides instructions for the target tracking method.
- the embodiments of the present disclosure also provide another computer program product for storing computer-readable instructions, which when executed, cause the computer to perform the operations of the target tracking method provided by any of the foregoing embodiments.
- the computer program product can be specifically implemented by hardware, software, or a combination thereof.
- the computer program product is specifically embodied as a computer storage medium.
- the computer program product is specifically embodied as a software product, such as a software development kit (SDK), etc. Wait.
- SDK software development kit
- the embodiments of the present disclosure provide a computer program, wherein when the computer program is executed, the computer executes the operation of the target tracking method provided in any of the foregoing embodiments.
- the embodiment of the present disclosure also provides a target tracking device, including: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured to call the executable instructions stored in the memory to implement any of the foregoing.
- a target tracking device including: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured to call the executable instructions stored in the memory to implement any of the foregoing.
- FIG. 12 is a schematic diagram of the hardware structure of a target tracking device provided by an embodiment of the disclosure.
- the target tracking device 310 includes a processor 311, and may also include an input device 312, an output device 313, and a memory 314.
- the input device 312, the output device 313, the memory 314, and the processor 311 are connected to each other through a bus.
- Memory includes, but is not limited to, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), or portable Read-only memory (compact disc read-only memory, CD-ROM), which is used for related instructions and data.
- RAM random access memory
- ROM read-only memory
- EPROM erasable programmable read-only memory
- CD-ROM compact disc read-only memory
- the input device is used to input data and/or signals
- the output device is used to output data and/or signals.
- the output device and the input device can be independent devices or a whole device.
- the processor may include one or more processors, such as one or more central processing units (CPU).
- processors such as one or more central processing units (CPU).
- CPU central processing units
- the CPU may be a single-core CPU or Multi-core CPU.
- the memory is used to store the program code and data of the network device.
- the processor is used to call the program code and data in the memory to execute the steps in the foregoing method embodiment.
- the processor is used to call the program code and data in the memory to execute the steps in the foregoing method embodiment.
- the description in the method embodiment please refer to the description in the method embodiment, which will not be repeated here.
- FIG. 12 only shows a simplified design of a target tracking device.
- the target tracking device may also include other necessary components, including but not limited to any number of input/output devices, processors, controllers, memories, etc., and all target tracking devices that can implement the embodiments of the present disclosure All are within the protection scope of the present disclosure.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (14)
- 一种目标跟踪方法,其特征在于,包括:获取对应同一场景的多张场景图像;对所述多张场景图像中的每张场景图像进行特征提取处理以及目标部位检测,得到所述每张场景图像的特征信息以及所述每张场景图像上的多个目标部位的位置;获取所述每张场景图像的特征信息中与所述多个目标部位的位置分别对应的目标特征信息;根据获取的所述多个目标部位的位置分别对应的目标特征信息,确定出现在所述多张场景图像上的多个相同的目标,其中,每张场景图像中包括所述多个相同的目标的部分或全部目标。
- 根据权利要求1所述的方法,其特征在于,所述对所述多张场景图像中的每张场景图像进行特征提取处理以及目标部位检测,得到所述每张场景图像的特征信息以及所述每张场景图像上的多个目标部位的位置,包括:提取所述多张场景图像中的每张场景图像的第一特征图;在所述每张场景图像的第一特征图上进行目标部位检测,得到所述每张场景图像上的多个目标部位的位置;以及,对所述每张场景图像的第一特征图进行特征提取处理,得到多维度的第二特征图;所述获取所述每张场景图像的特征信息中与所述多个目标部位的位置分别对应的目标特征信息,包括:在所述多维度的第二特征图上获取与所述多个目标部位的位置分别对应的目标特征向量。
- 根据权利要求1或2所述的方法,其特征在于,所述根据获取的所述多个目标部位的位置分别对应的目标特征信息,确定出现在所述多张场景图像上的多个相同的目标,包括:利用所述多张场景图像中每相邻两张场景图像分别对应的多个目标特征信息,得到所述每相邻两张场景图像上各个目标部位之间的相似度;基于所述每相邻两张场景图像上各个目标部位之间的相似度,确定出现在不同场景图像上的多个相同的目标。
- 根据权利要求3所述的方法,其特征在于,所述每相邻两张场景图像为第一场景图像和第二场景图像;所述利用所述多张场景图像中每相邻两张场景图像分别对应的多个目标特征信息,得到所述每相邻两张场景图像上各个目标部位之间的相似度,包括:确定第一场景图像上的N个目标特征向量分别与第二场景图像上的M个目标特征向量之间的相似度;其中,N和M为大于等于2的正整数;根据所述第一场景图像上的N个目标特征向量分别与所述第二场景图像上的M个目标特征向量之间的所述相似度,得到N×M维度的相似度矩阵,所述相似度矩阵中任一维度的值表示所述第一场景图像的任一第一目标部位与所述第二场景图像中的任一第二目标部位的相似度。
- 根据权利要求4所述的方法,其特征在于,所述基于所述每相邻两张场景图像上各个目标部位之间的相似度,确定出现在所述不同场景图像上的多个相同的目标,包括:根据所述相似度矩阵,在所述N个目标特征向量中的第一目标特征向量分别与所述M个目标特征向量之间的相似度中,确定相似度最大值;若所述相似度最大值大于预设阈值,则在所述M个目标特征向量中确定所述相似度最大值对应的第二目标特征向量;将所述第一场景图像上所述第一目标特征向量对应的第一目标部位所属目标和所述第二场景图像上第二目标特征向量对应的第二目标部位所属目标,作为相同的目标。
- 根据权利要求1-5任一项所述的方法,其特征在于,所述对所述多张场景图像中的每张场景图像进行特征提取处理以及目标部位检测,得到所述每张场景图像的特征信息以及所述每张场景图像上的多个目标部位的位置,包括:通过特征检测模型的骨干网络提取所述多张场景图像中的每张场景图像的第一特征图;通过所述特征检测模型的部位检测分支,在所述每张场景图像的第一特征图上进行目标部位检测,得到所述每张场景图像上的多个目标部位的位置;以及,通过所述特征检测模型的特征提取分支,对所述每张场景图像的第一特征图进行特征提取处理,得到多维度的第二特征图。
- 根据权利要求6所述的方法,其特征在于,所述方法还包括:将对应同一场景的多张样本场景图像输入初始神经网络模型,获得所述初始神经网络模型输出的每张样本场景图像上多个目标部位的位置分别对应的样本特征向量;根据所述每张样本场景图像上已标注的多个目标部位分别对应的目标标识,确定在每相邻两张样本场景图像上,相同的所述目标标识的所述目标部位的位置对应的所述样本特征向量之间的第一相似度,和/或确定不同的所述目标标识的所述目标部位的位置对应的所述样本特征向量之间的第二相似度;基于所述每张样本场景图像上已标注的多个目标部位分别对应的目标标识,根据所述第一相似度和所述第二相似度中的至少一项,对所述初始神经网络模型进行监督训练,得到所述特征检测模型。
- 根据权利要求7所述的方法,其特征在于,所述基于所述每张样本场景图像上已标注的多个目标部位分别对应的目标标识,根据所述第一相似度和所述第二相似度中的至少一项,对所述初始神经网络模型进行监督训练,得到所述特征检测模型,包括:将第一相似度参考值与所述第一相似度之间的差作为第一损失函数;其中,所述第一相似度参考值是所述每相邻两张样本场景图像上已标注的相同的目标标识的目标部位所对应的样本特征向量之间的相似度参考值;将第二相似度参考值与所述第二相似度之间的差作为第二损失函数;其中,所述第二相似度参考值是所述每相邻两张样本场景图像上已标注的不同的目标标识的目标部位所对应的样本特征向量之间的相似度参考值;根据所述第一损失函数和所述第二损失函数中的至少一项,对所述初始神经网络模 型进行训练,得到所述特征检测模型。
- 根据权利要求1-8任一项所述的方法,其特征在于,所述方法还包括:确定出现在所述多个场景图像上的多个相同的目标中的至少一个目标在预设时间段内的运动轨迹是否符合目标运动轨迹。
- 根据权利要求9所述的方法,其特征在于,所述多张场景图像对应教室场景,所述目标包括教学对象,所述目标运动轨迹包括教学任务中对所述教学对象指定的至少一种运动轨迹。
- 一种目标跟踪装置,其特征在于,所述装置包括:获取模块,用于获取对应同一场景的多张场景图像;处理模块,用于对所述多张场景图像中的每张场景图像进行特征提取处理以及目标部位检测,得到所述每张场景图像的特征信息以及所述每张场景图像上的多个目标部位的位置;特征信息确定模块,用于获取所述每张场景图像的特征信息中与所述多个目标部位的位置分别对应的目标特征信息;目标确定模块,用于根据获取的所述多个目标部位的位置分别对应的目标特征信息,确定出现在所述多张场景图像上的多个相同的目标,其中,每张场景图像中包括所述多个相同的目标的部分或全部目标。
- 一种计算机可读存储介质,其特征在于,所述存储介质存储有计算机程序,所述计算机程序用于执行上述权利要求1-10任一所述的目标跟踪方法。
- 一种目标跟踪装置,其特征在于,包括:处理器;用于存储所述处理器可执行指令的存储器;其中,所述处理器被配置为调用所述存储器中存储的可执行指令,实现权利要求1-10中任一项所述的目标跟踪方法。
- 一种计算机程序,其中所述计算机程序被处理器执行时,能够实现权利要求1至10中任一项所述的目标跟踪方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020227002703A KR20220024986A (ko) | 2020-04-28 | 2021-04-16 | 타깃 추적 방법 및 장치, 저장 매체 및 컴퓨터 프로그램 |
JP2022504275A JP7292492B2 (ja) | 2020-04-28 | 2021-04-16 | オブジェクト追跡方法及び装置、記憶媒体並びにコンピュータプログラム |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010352365.6 | 2020-04-28 | ||
CN202010352365.6A CN111539991B (zh) | 2020-04-28 | 2020-04-28 | 目标跟踪方法及装置、存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021218671A1 true WO2021218671A1 (zh) | 2021-11-04 |
Family
ID=71977335
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/087870 WO2021218671A1 (zh) | 2020-04-28 | 2021-04-16 | 目标跟踪方法及装置、存储介质及计算机程序 |
Country Status (5)
Country | Link |
---|---|
JP (1) | JP7292492B2 (zh) |
KR (1) | KR20220024986A (zh) |
CN (1) | CN111539991B (zh) |
TW (1) | TWI769787B (zh) |
WO (1) | WO2021218671A1 (zh) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114783043A (zh) * | 2022-06-24 | 2022-07-22 | 杭州安果儿智能科技有限公司 | 一种儿童行为轨迹定位方法和系统 |
CN115880614A (zh) * | 2023-01-19 | 2023-03-31 | 清华大学 | 一种宽视场高分辨视频高效智能检测方法及系统 |
CN116721045A (zh) * | 2023-08-09 | 2023-09-08 | 经智信息科技(山东)有限公司 | 一种多ct图像融合的方法及装置 |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111539991B (zh) * | 2020-04-28 | 2023-10-20 | 北京市商汤科技开发有限公司 | 目标跟踪方法及装置、存储介质 |
CN113129339B (zh) * | 2021-04-28 | 2023-03-10 | 北京市商汤科技开发有限公司 | 一种目标跟踪方法、装置、电子设备及存储介质 |
WO2024071587A1 (ko) * | 2022-09-29 | 2024-04-04 | 삼성전자 주식회사 | 객체를 추적하는 방법 및 전자 장치 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109522843A (zh) * | 2018-11-16 | 2019-03-26 | 北京市商汤科技开发有限公司 | 一种多目标跟踪方法及装置、设备和存储介质 |
CN109800624A (zh) * | 2018-11-27 | 2019-05-24 | 上海眼控科技股份有限公司 | 一种基于行人重识别的多目标跟踪方法 |
CN109859238A (zh) * | 2019-03-14 | 2019-06-07 | 郑州大学 | 一种基于多特征最优关联的在线多目标跟踪方法 |
CN110163890A (zh) * | 2019-04-24 | 2019-08-23 | 北京航空航天大学 | 一种面向空基监视的多目标跟踪方法 |
CN110866428A (zh) * | 2018-08-28 | 2020-03-06 | 杭州海康威视数字技术股份有限公司 | 目标跟踪方法、装置、电子设备及存储介质 |
CN110889464A (zh) * | 2019-12-10 | 2020-03-17 | 北京市商汤科技开发有限公司 | 神经网络训练、目标对象的检测方法及装置 |
CN111539991A (zh) * | 2020-04-28 | 2020-08-14 | 北京市商汤科技开发有限公司 | 目标跟踪方法及装置、存储介质 |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009020897A (ja) | 2002-09-26 | 2009-01-29 | Toshiba Corp | 画像解析方法、画像解析装置、画像解析プログラム |
JP4580189B2 (ja) | 2004-05-28 | 2010-11-10 | セコム株式会社 | センシング装置 |
TWI492188B (zh) * | 2008-12-25 | 2015-07-11 | Univ Nat Chiao Tung | 利用多攝影機自動偵測與追蹤多目標的方法及系統 |
CN108875465B (zh) * | 2017-05-26 | 2020-12-11 | 北京旷视科技有限公司 | 多目标跟踪方法、多目标跟踪装置以及非易失性存储介质 |
JP2020522002A (ja) * | 2017-06-02 | 2020-07-27 | エスゼット ディージェイアイ テクノロジー カンパニー リミテッドSz Dji Technology Co.,Ltd | 移動ターゲットを認識、追跡、および合焦するための方法及びシステム |
CN109214238B (zh) * | 2017-06-30 | 2022-06-28 | 阿波罗智能技术(北京)有限公司 | 多目标跟踪方法、装置、设备及存储介质 |
US9946960B1 (en) | 2017-10-13 | 2018-04-17 | StradVision, Inc. | Method for acquiring bounding box corresponding to an object in an image by using convolutional neural network including tracking network and computing device using the same |
CN108491816A (zh) * | 2018-03-30 | 2018-09-04 | 百度在线网络技术(北京)有限公司 | 在视频中进行目标跟踪的方法和装置 |
-
2020
- 2020-04-28 CN CN202010352365.6A patent/CN111539991B/zh active Active
-
2021
- 2021-04-16 WO PCT/CN2021/087870 patent/WO2021218671A1/zh active Application Filing
- 2021-04-16 KR KR1020227002703A patent/KR20220024986A/ko not_active Application Discontinuation
- 2021-04-16 JP JP2022504275A patent/JP7292492B2/ja active Active
- 2021-04-20 TW TW110114037A patent/TWI769787B/zh active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110866428A (zh) * | 2018-08-28 | 2020-03-06 | 杭州海康威视数字技术股份有限公司 | 目标跟踪方法、装置、电子设备及存储介质 |
CN109522843A (zh) * | 2018-11-16 | 2019-03-26 | 北京市商汤科技开发有限公司 | 一种多目标跟踪方法及装置、设备和存储介质 |
CN109800624A (zh) * | 2018-11-27 | 2019-05-24 | 上海眼控科技股份有限公司 | 一种基于行人重识别的多目标跟踪方法 |
CN109859238A (zh) * | 2019-03-14 | 2019-06-07 | 郑州大学 | 一种基于多特征最优关联的在线多目标跟踪方法 |
CN110163890A (zh) * | 2019-04-24 | 2019-08-23 | 北京航空航天大学 | 一种面向空基监视的多目标跟踪方法 |
CN110889464A (zh) * | 2019-12-10 | 2020-03-17 | 北京市商汤科技开发有限公司 | 神经网络训练、目标对象的检测方法及装置 |
CN111539991A (zh) * | 2020-04-28 | 2020-08-14 | 北京市商汤科技开发有限公司 | 目标跟踪方法及装置、存储介质 |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114783043A (zh) * | 2022-06-24 | 2022-07-22 | 杭州安果儿智能科技有限公司 | 一种儿童行为轨迹定位方法和系统 |
CN115880614A (zh) * | 2023-01-19 | 2023-03-31 | 清华大学 | 一种宽视场高分辨视频高效智能检测方法及系统 |
CN116721045A (zh) * | 2023-08-09 | 2023-09-08 | 经智信息科技(山东)有限公司 | 一种多ct图像融合的方法及装置 |
CN116721045B (zh) * | 2023-08-09 | 2023-12-19 | 经智信息科技(山东)有限公司 | 一种多ct图像融合的方法及装置 |
Also Published As
Publication number | Publication date |
---|---|
CN111539991B (zh) | 2023-10-20 |
JP2022542566A (ja) | 2022-10-05 |
JP7292492B2 (ja) | 2023-06-16 |
TWI769787B (zh) | 2022-07-01 |
CN111539991A (zh) | 2020-08-14 |
TW202141424A (zh) | 2021-11-01 |
KR20220024986A (ko) | 2022-03-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021218671A1 (zh) | 目标跟踪方法及装置、存储介质及计算机程序 | |
CN111709409B (zh) | 人脸活体检测方法、装置、设备及介质 | |
WO2021043168A1 (zh) | 行人再识别网络的训练方法、行人再识别方法和装置 | |
Liu et al. | Bayesian model adaptation for crowd counts | |
CN104601964B (zh) | 非重叠视域跨摄像机室内行人目标跟踪方法及系统 | |
CN205334563U (zh) | 一种学生课堂参与度检测系统 | |
US10140508B2 (en) | Method and apparatus for annotating a video stream comprising a sequence of frames | |
CN110532970B (zh) | 人脸2d图像的年龄性别属性分析方法、系统、设备和介质 | |
JP2020520512A (ja) | 車両外観特徴識別及び車両検索方法、装置、記憶媒体、電子デバイス | |
CN111767882A (zh) | 一种基于改进yolo模型的多模态行人检测方法 | |
Bedagkar-Gala et al. | Multiple person re-identification using part based spatio-temporal color appearance model | |
CN107145826B (zh) | 基于双约束度量学习和样本重排序的行人再识别方法 | |
CN111666919B (zh) | 一种对象识别方法、装置、计算机设备和存储介质 | |
CN107798313A (zh) | 一种人体姿态识别方法、装置、终端和存储介质 | |
CN112001278A (zh) | 一种基于结构化知识蒸馏的人群计数模型及其方法 | |
Krajník et al. | Image features and seasons revisited | |
CN112906520A (zh) | 一种基于姿态编码的动作识别方法及装置 | |
Zhang et al. | Joint discriminative representation learning for end-to-end person search | |
CN113793362A (zh) | 基于多镜头视频的行人轨迹提取方法和装置 | |
CN111626212B (zh) | 图片中对象的识别方法和装置、存储介质及电子装置 | |
CN113822134A (zh) | 一种基于视频的实例跟踪方法、装置、设备及存储介质 | |
CN115018886B (zh) | 运动轨迹识别方法、装置、设备及介质 | |
Proenca et al. | SHREC’15 Track: Retrieval of Oobjects captured with kinect one camera | |
CN103020631A (zh) | 基于星型模型的人体运动识别方法 | |
TWI776429B (zh) | 動作識別方法及裝置、電腦可讀存儲介質 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21796788 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2022504275 Country of ref document: JP Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 20227002703 Country of ref document: KR Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21796788 Country of ref document: EP Kind code of ref document: A1 |