WO2021218671A1 - 目标跟踪方法及装置、存储介质及计算机程序 - Google Patents

目标跟踪方法及装置、存储介质及计算机程序 Download PDF

Info

Publication number
WO2021218671A1
WO2021218671A1 PCT/CN2021/087870 CN2021087870W WO2021218671A1 WO 2021218671 A1 WO2021218671 A1 WO 2021218671A1 CN 2021087870 W CN2021087870 W CN 2021087870W WO 2021218671 A1 WO2021218671 A1 WO 2021218671A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
scene
feature
scene image
similarity
Prior art date
Application number
PCT/CN2021/087870
Other languages
English (en)
French (fr)
Inventor
王飞
陈光启
钱晨
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Priority to KR1020227002703A priority Critical patent/KR20220024986A/ko
Priority to JP2022504275A priority patent/JP7292492B2/ja
Publication of WO2021218671A1 publication Critical patent/WO2021218671A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/248Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30232Surveillance

Definitions

  • the present disclosure relates to the field of computer vision, and in particular to a target tracking method and device, storage medium and computer program.
  • the demand for analyzing the movement trajectory of a target through multi-target tracking technology is increasing.
  • the processing time of the above-mentioned multi-target tracking is linearly related to the number of targets in the scene. For example, if there are N objects in the scene, where N is a positive integer, then multi-target tracking requires N times of single-target tracking reasoning, and the processing time will increase to N times the time required for single-target tracking. The larger the value of N, the longer the multi-target tracking time, which requires the device to have higher computing power and time-consuming.
  • the present disclosure provides a target tracking method and device, storage medium and computer program.
  • a target tracking method comprising: acquiring multiple scene images corresponding to the same scene; performing feature extraction processing on each of the multiple scene images; and Target part detection to obtain the feature information of each scene image and the positions of multiple target parts on each scene image; to obtain the location of the multiple target parts in the feature information of each scene image Respectively corresponding target feature information; according to the acquired target feature information corresponding to the positions of the multiple target parts, multiple identical targets appearing on the multiple scene images are determined, wherein each scene image includes Part or all of the plurality of identical targets.
  • the feature extraction processing and target part detection are performed on each scene image of the plurality of scene images to obtain the feature information of each scene image and the feature information of each scene image.
  • the positions of the multiple target parts include: extracting a first feature map of each scene image in the multiple scene images; performing target part detection on the first feature map of each scene image to obtain the The positions of multiple target parts on each scene image; and, performing feature extraction processing on the first feature map of each scene image to obtain a multi-dimensional second feature map; said acquiring each scene image
  • the target feature information corresponding to the positions of the multiple target parts in the feature information includes: acquiring target feature vectors respectively corresponding to the positions of the multiple target parts on the multi-dimensional second feature map.
  • the determining the multiple identical targets appearing on the multiple scene images according to the acquired target feature information corresponding to the positions of the multiple target parts includes: using the The multiple target feature information corresponding to each adjacent two scene images in the multiple scene images are respectively obtained, and the similarity between each target part on each two adjacent scene images is obtained; based on each adjacent two scene images The similarity between various target parts on the image determines multiple identical targets appearing on the images of different scenes.
  • each of the two adjacent scene images is a first scene image and a second scene image; the use of the plurality of scene images corresponding to each adjacent two scene images respectively Target feature information to obtain the similarity between each target part on each two adjacent scene images, including: determining N target feature vectors on the first scene image and M target features on the second scene image, respectively The similarity between vectors; where N and M are positive integers greater than or equal to 2; according to the difference between the N target feature vectors on the first scene image and the M target feature vectors on the second scene image, respectively The similarity between the N ⁇ M dimensions is obtained, and the value of any dimension in the similarity matrix represents any first target part of the first scene image and the second scene image. The similarity of any second target part of.
  • the determining a plurality of the same targets appearing on the different scene images based on the similarity between the respective target parts on each two adjacent scene images includes: According to the similarity matrix, in the similarities between the first target eigenvectors in the N target eigenvectors and the M target eigenvectors, determine the maximum similarity; if the maximum similarity is greater than A preset threshold, the second target feature vector corresponding to the maximum similarity is determined among the M target feature vectors; the first target part corresponding to the first target feature vector on the first scene image is determined The belonging target and the target belonging to the second target part corresponding to the second target feature vector on the second scene image are regarded as the same target.
  • the feature extraction processing and target part detection are performed on each scene image of the plurality of scene images to obtain the feature information of each scene image and the feature information of each scene image.
  • the positions of the multiple target parts include: extracting the first feature map of each of the multiple scene images through the backbone network of the feature detection model; detecting branches through the part of the feature detection model, Perform target part detection on the first feature map of each scene image to obtain the positions of multiple target parts on each scene image;
  • the first feature map of the image is subjected to feature extraction processing to obtain a multi-dimensional second feature map.
  • the method further includes: inputting multiple sample scene images corresponding to the same scene into the initial neural network model, and obtaining information about multiple target parts on each sample scene image output by the initial neural network model.
  • the first similarity between the sample feature vectors corresponding to the positions of the target parts, and/or the second similarity between the sample feature vectors corresponding to the positions of the target parts of the different target identifiers is determined Degree; based on the target identifiers corresponding to the multiple target parts marked on each sample scene image, according to at least one of the first similarity and the second similarity, the initial neural network
  • the model undergoes supervised training to obtain the feature detection model.
  • the target identifiers corresponding to the multiple target parts marked on each sample scene image are based on at least one of the first similarity and the second similarity.
  • performing supervised training on the initial neural network model to obtain the feature detection model includes: using the difference between the first similarity reference value and the first similarity as a first loss function; wherein, the The first similarity reference value is the similarity reference value between the sample feature vectors corresponding to the target parts of the same target identifier marked on each of the two adjacent sample scene images; the second similarity reference value is compared with The difference between the second similarities is used as a second loss function; wherein, the second similarity reference value is corresponding to target parts of different target identifiers marked on each of the two adjacent sample scene images The reference value of the similarity between the sample feature vectors of the; according to at least one of the first loss function and the second loss function, the initial neural network model is trained to obtain the feature detection model.
  • the method further includes: determining whether the motion trajectory of at least one of the multiple identical targets appearing on the multiple scene images within a preset time period conforms to the target motion trajectory.
  • the multiple scene images correspond to a classroom scene
  • the target includes a teaching object
  • the target motion track includes at least one motion track designated for the teaching object in a teaching task.
  • a target tracking device the device includes: an acquisition module, configured to acquire multiple scene images corresponding to the same scene; Feature extraction processing and target part detection are performed on each scene image of each scene image to obtain the characteristic information of each scene image and the positions of multiple target parts on each scene image; the characteristic information determination module is used to obtain the The target feature information corresponding to the positions of the multiple target parts in the feature information of each scene image; the target determination module is used to determine the occurrence of the target feature information corresponding to the positions of the multiple target parts obtained A plurality of identical targets on the plurality of scene images, wherein each scene image includes part or all of the targets of the plurality of identical targets.
  • a computer-readable storage medium stores a computer program, and the computer program is used to execute the target tracking method of any one of the first aspects.
  • a target tracking device including: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured to call the storage in the memory Executable instructions to implement the target tracking method described in any one of the first aspect.
  • a computer program is provided, wherein when the computer program is executed by a processor, the target tracking method described in any one of the first aspect can be implemented.
  • Single target tracking reasoning is to perform single frame inference for a single scene image, to obtain target feature information corresponding to the positions of multiple target parts, and to match the single frame inference results to obtain multiple of each adjacent two scene images.
  • the same target achieves the purpose of multi-target tracking, and even if multiple targets are included in the current scene, due to the inference for the entire scene image, the duration of the entire multi-target tracking process is independent of the number of targets included in the scene image , It will not increase the tracking time due to the increase in the number of targets to perform single-target tracking reasoning one by one, which greatly saves computing resources, shortens the duration of multi-target tracking, and effectively improves the detection efficiency of multi-target tracking.
  • Fig. 1 is a flowchart of a target tracking method according to an exemplary embodiment of the present disclosure
  • Fig. 2 is a flowchart of another target tracking method according to an exemplary embodiment of the present disclosure
  • Fig. 3 is a flowchart of another target tracking method according to an exemplary embodiment of the present disclosure.
  • Fig. 4 is a flowchart of another target tracking method according to an exemplary embodiment of the present disclosure.
  • Fig. 5 is a flowchart of another target tracking method according to an exemplary embodiment of the present disclosure.
  • Fig. 6 is a schematic structural diagram of a feature detection model according to an exemplary embodiment of the present disclosure.
  • Fig. 7 is a schematic diagram of an inference process of multi-target tracking according to an exemplary embodiment of the present disclosure
  • Fig. 8 is a flowchart of another target tracking method according to an exemplary embodiment of the present disclosure.
  • Fig. 9 is a schematic diagram of a feature detection model training scene according to an exemplary embodiment of the present disclosure.
  • Fig. 10 is a flowchart of another target tracking method according to an exemplary embodiment of the present disclosure.
  • Fig. 11 is a block diagram showing a target tracking device according to an exemplary embodiment of the present disclosure.
  • Fig. 12 is a schematic structural diagram of a target tracking device according to an exemplary embodiment of the present disclosure.
  • first, second, third, etc. may be used in this disclosure to describe various information, the information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other.
  • first information may also be referred to as second information, and similarly, the second information may also be referred to as first information.
  • word “if” as used herein can be interpreted as "when” or “when” or “in response to a certainty”.
  • the embodiments of the present disclosure provide a multi-target tracking solution, which is exemplary and can be applied to terminal devices in different scenarios. Different scenarios include, but are not limited to, classrooms, locations where surveillance is deployed, or other indoor or outdoor scenarios that require tracking of multiple targets.
  • the terminal device can be any device with a camera, or the terminal device can also be an external camera device.
  • the terminal device may successively collect multiple scene images in the same scene, or may directly collect a video stream, and use the multiple images in the video stream as the multiple scene images.
  • the terminal device performs feature extraction processing and target part detection on each of the multiple scene images acquired, based on the feature information of each scene image and the number of target parts on each scene image. Position, acquiring target characteristic information corresponding to the positions of multiple target parts in the characteristic information of each scene image, so as to determine multiple identical targets appearing in the multiple scene images.
  • the terminal equipment can be a teaching multimedia device with a camera deployed in the classroom, including but not limited to teaching projectors, monitoring equipment in the classroom, etc.
  • the terminal device acquires multiple scene images in the classroom, thereby performing feature extraction processing and target part detection on each of the multiple scene images to obtain the feature information of each scene image and each scene The location of multiple target parts on the image.
  • Acquire target feature information corresponding to the positions of the multiple target parts in the feature information of each scene image, so as to determine multiple identical targets appearing on the multiple scene images, and achieve multi-target tracking Purpose.
  • the target in this scene may include but is not limited to teaching objects, such as students, and the target parts include but are not limited to human face parts and human body parts.
  • one or more surveillance cameras can be deployed in a subway or railway station, and multiple scene images of the subway or railway station can be acquired through the surveillance cameras.
  • Targets in this scenario may include passengers, luggage carried by passengers, staff, and so on.
  • the multi-target tracking solution provided by the embodiments of the present disclosure can also be applied to cloud servers in different scenarios.
  • the cloud server can be equipped with an external camera, and the external camera can collect multiple scene images in the same scene successively, or directly A video stream is collected, and multiple images in the video stream are used as the multiple scene images.
  • the collected scene images can be sent to a cloud server through a router or a gateway, and the cloud server performs feature extraction processing and target part detection on each scene image to obtain the feature information of each scene image and each scene image The positions of the multiple target parts on the upper part, so as to obtain the target characteristic information corresponding to the positions of the multiple target parts in the characteristic information of each scene image, and further, it is determined to appear on the multiple scene images Multiple same goals.
  • the external camera is set in the classroom, and the external camera collects multiple scene images in the classroom, and sends them to the cloud server through the router or gateway, and the cloud server executes the above-mentioned target tracking method.
  • the same target is identified with the same identification frame and the identified scene image is output.
  • a red marking box is used to identify target 1 in the scene
  • a green marking box is used to identify target 2 in the scene
  • a blue marking box is used to identify the target in the scene. 3 and so on, in order to better show multiple identical targets in the current scene.
  • the same or different targets can be distinguished by the target identifiers corresponding to the marking boxes.
  • an output scene image includes 3 marking boxes, and the corresponding target identifiers are 1, 2 and 3 respectively.
  • the adjacent scene image includes two identification frames, and the corresponding target identifications are 1 and 3. Then it can be determined that the identification frame with the target identification 1 on the two scene images corresponds to the same target, and the identification frame with the target identification 3 is also Corresponding to the same target, the recognition boxes with target identifiers 1 and 3 respectively correspond to different targets.
  • the target motion trajectory may include but is not limited to at least one motion trajectory specified for the teaching object in the teaching task, such as moving from the current position to another position designated by the teacher.
  • the other positions may be the podium, blackboard, or the position of other students, or the target motion trajectory may also include the same position. Teachers can better perform teaching work according to the movement trajectory of multiple teaching objects.
  • the target includes but is not limited to passengers.
  • the target motion trajectory can include but is not limited to a designated dangerous motion trajectory or an illegal motion trajectory, such as from a platform
  • the position is moved to the position of the rail, above or below the gate, etc.
  • the staff can better manage the station according to the movement trajectory of the passengers to avoid dangerous behaviors or fare evasion.
  • Fig. 1 shows a target tracking method according to an exemplary embodiment, which includes the following steps:
  • step 101 multiple scene images corresponding to the same scene are acquired.
  • multiple scene images can be collected in the same scene one after another, or a video stream can be collected, and multiple images in the video stream can be used as multiple scene images.
  • the scenarios of the present disclosure include, but are not limited to, any scenarios that require multi-target tracking, such as classrooms, locations where monitoring is arranged, and so on.
  • step 102 feature extraction processing and target part detection are performed on each scene image of the multiple scene images to obtain feature information of each scene image and multiple target parts on each scene image s position.
  • performing feature extraction on each scene image refers to extracting feature information from each scene image
  • the feature information may include, but is not limited to, color features, texture features, shape features, and the like.
  • Color feature is a kind of global feature, which describes the surface color attribute of the object corresponding to the image
  • texture feature is also a kind of global feature, which describes the surface texture attribute of the object corresponding to the image
  • there are two types of representation methods for shape features one is It is the contour feature, and the other is the regional feature.
  • the contour feature of the image is mainly for the outer boundary of the object, and the regional feature of the image is related to the shape of the image area.
  • one target part corresponds to one target, but it is not restrictive, and multiple target parts may correspond to one target.
  • the target part may include, but is not limited to, a human face part and/or a human body part.
  • the human body part may include the entire human body of the person or a certain designated part of the human body, such as hands, legs, and so on.
  • the position of the target part can be represented by at least the center position of the identification frame of the target part.
  • the target part includes a face part, and the position of the target part can be represented by the center position of the face identification frame.
  • the marking frame of the target part can be realized as a rectangular frame circumscribing the target part, and so on.
  • step 103 target feature information corresponding to the positions of the multiple target parts in the feature information of each scene image is obtained.
  • each scene image includes multiple target parts, and according to the acquired feature information of each scene image, feature extraction is performed on the pixels of the area including the target part, and the positions of the multiple target parts are determined. Corresponding target feature information respectively.
  • the target characteristic information corresponding to the multiple pixels included in the region of each target part in the characteristic information of each scene image may be obtained through convolution processing or the like.
  • step 104 according to the acquired target feature information corresponding to the positions of the multiple target parts, multiple identical targets appearing on the multiple scene images are determined, wherein each scene image includes the multiple Part of the goal or all goals of the same goal.
  • the target feature information corresponding to the positions of multiple target parts is obtained on each scene image, and by matching the target feature information of the multiple scene images, it can be determined that it appears in the multiple scenes. Multiple identical targets on the image.
  • Target tracking reasoning instead, a single frame inference is performed on a single scene image, and the target feature information corresponding to the positions of multiple target parts is obtained.
  • the target feature information corresponding to the positions of multiple target parts is obtained.
  • each Multiple identical targets in two adjacent scene images achieve the purpose of multi-target tracking.
  • the duration of the entire multi-target tracking process has nothing to do with the number of targets included in the scene image.
  • Target tracking reasoning leads to an increase in tracking time, which greatly saves computing resources, shortens the duration of multi-target tracking, and effectively improves the detection efficiency of multi-target tracking.
  • step 102 may include:
  • step 102-1 the first feature map of each scene image in the plurality of scene images is extracted.
  • the image feature of each scene image can be extracted through a pre-trained neural network model to obtain the first feature map.
  • the neural network model can adopt, but is not limited to, models such as Visual Geometry Group Network (VGG Net).
  • step 102-2 target part detection is performed on the first feature map of each scene image to obtain the positions of multiple target parts on each scene image; and, for each scene image Perform feature extraction processing on the first feature map to obtain a multi-dimensional second feature map.
  • the target part may include a human face part and/or a human body part.
  • RPN Regional Proposal Network
  • face parts and/or human body parts can be detected on the first feature map of each scene image to determine the face area and/or corresponding to the face part.
  • the face area can be identified by the face recognition frame
  • the human body area can be identified by the human body recognition frame.
  • the center position of the face recognition frame may be used as the position of the face part.
  • the center position of the human body identification frame can be regarded as the position of the human body part.
  • the size of the second feature map may be the same as the size of the first feature map, and the dimension value of the second feature map is the preset number of channels corresponding to each scene image.
  • step 103 may include:
  • the target feature information is used to represent the feature information corresponding to multiple pixels in each of the multiple target part regions included in the second feature map of any one dimension.
  • the target part may include a human face part and/or a human body part.
  • the feature information corresponding to any one pixel can constitute a one-dimensional feature vector.
  • these feature vectors can be used One or more feature vectors are selected to represent the feature information of the region of the target part, that is, the target feature information.
  • the feature vector corresponding to the pixel of the position of the target part can be selected, and the feature vector can be used as the target feature vector corresponding to the position of the target part on the second feature map of the dimension.
  • the position of the target part may include the center position of the face recognition frame/or the center position of the human body recognition frame.
  • the feature information corresponding to the pixels of the positions of multiple target parts can be obtained, and the corresponding feature information can be obtained.
  • the positions of multiple target parts respectively correspond to target feature vectors.
  • target feature vectors corresponding to the positions of multiple target parts can be obtained, so that the dimensional value of the target feature vector is the same as the dimensional value of the second feature map. For example, if the dimension value of the second feature map is C, then the dimension value of the target feature vector is also C.
  • feature extraction, target part detection, and target feature vectors corresponding to the positions of multiple target parts are sequentially performed for the entire scene image.
  • the entire process is a single frame inference for a single scene image, so It has nothing to do with the number of targets included; subsequent matching will be performed on the target feature vectors corresponding to multiple target positions on every two adjacent scene images, so that there is no need to perform single target tracking and inference separately, even if the scene image There are more targets included, and the matching process can also be completed at one time.
  • the target tracking method of the present disclosure has nothing to do with the number of targets in the scene image, and will not increase the tracking time due to the increase in the number of targets, which greatly saves computing resources, shortens the duration of multi-target tracking, and effectively improves multi-target tracking The detection efficiency.
  • step 104 may include:
  • step 104-1 the multiple target feature information corresponding to each two adjacent scene images on the multiple scene images are used to obtain the difference between the target parts on each two adjacent scene images. Similarity.
  • multiple target feature information corresponding to the multiple target parts in the feature information of each scene image have been determined, and multiple target feature information corresponding to each two adjacent scene images can be used.
  • the similarity calculation is performed to obtain the similarity between each target part on each two adjacent scene images.
  • step 104-2 based on the similarity between the respective target parts on each of the two adjacent scene images, multiple identical targets appearing on the different scene images are determined.
  • the target to which the target part with the greatest similarity belongs can be regarded as the same target appearing on different scene images.
  • multiple identical targets appearing on different scene images can be determined according to the similarity between each target part on each two adjacent scene images, which achieves the purpose of multi-target tracking, and the tracking process is the same as The number of targets is irrelevant, and the availability is high.
  • every two adjacent scene images are the first scene image T 0 and the second scene image T 1 .
  • the foregoing step 104-1 may include:
  • step 104-11 the similarity between the N target feature vectors on the first scene image and the M target feature vectors on the second scene image is determined.
  • the target feature information is used to represent the feature information corresponding to multiple pixels in each of the multiple target part regions included in the second feature map of any one dimension.
  • the target part may include a human face part and/or a human body part.
  • the feature information corresponding to any one pixel can constitute a one-dimensional feature vector.
  • you can One or more feature vectors are selected from these feature vectors to represent the feature information of the region of the target part.
  • the feature vector corresponding to the pixel of the position of the target part can be selected, and the feature vector can be used as the target feature vector corresponding to the position of the target part on the second feature map of the dimension.
  • the position of the target part may include the center position of the face recognition frame/or the center position of the human body recognition frame.
  • the similarity between the N target feature vectors on the first scene image and the M target feature vectors on the second scene image in every two adjacent scene images can be determined, where, N and M are positive integers greater than or equal to 2. That is, the similarity between the multiple target feature vectors on the first scene image and the multiple target feature vectors on the second scene image is determined.
  • the cosine similarity value between the target feature vectors can be determined.
  • the similarity between them is evaluated.
  • step 104-12 according to the similarity between the N target feature vectors on the first scene image and the M target feature vectors on the second scene image, an N ⁇ M dimension Similarity matrix.
  • the value of any dimension in the similarity matrix represents the similarity between any first target part in the first scene image and any second target part in the second scene image.
  • N and M can be equal or unequal.
  • the similarity between the N target feature vectors on the first scene image and the M target feature vectors on the second scene image can be determined to obtain a similarity matrix of N ⁇ M dimensions.
  • the degree matrix represents the similarity between any first target part in the first scene image and any second target part in the second scene image, which is easy to implement and has high usability.
  • a bipartite graph algorithm may be used for step 104-2. Under the condition that the spatial distance constraint is satisfied, based on the similarity between each target part on each two adjacent scene images, it is determined Multiple identical objects appearing on the images of the different scenes.
  • the bipartite graph algorithm refers to a bipartite graph, assuming that the left vertex is X, and the right vertex is Y. Now for each group of left and right connections X i Y j with weights w ij , find a match so that all w ij The sum is the largest.
  • X i corresponds to a feature vector of the N target scene on the first image
  • Y j corresponding to the target feature vectors of the M image of the scene on the second one, on the weights w ij Corresponding similarity.
  • the present disclosure needs to match the N target feature vectors with the second target feature vector when the degree of similarity is the largest, and finally multiple identical targets appearing in every two adjacent scene images can be determined.
  • the conditions for satisfying the spatial distance constraint include: the dimension of the similarity between the N target feature vectors and the M target feature vectors does not exceed N ⁇ M.
  • step 104-2 may include:
  • step 104-21 according to the similarity matrix, among the similarities between the first target eigenvectors in the N target eigenvectors and the M target eigenvectors, the maximum similarity is determined .
  • the first target feature vector is any one of the N target feature vectors determined on the first scene image. According to the similarity matrix, the similarity between the first target feature vector and each target feature vector on the second scene image can be obtained, and a maximum similarity can be determined among these similarities.
  • the similarity matrix is A:
  • the similarities between the first target feature vector and the M second target feature vectors are a 11 , a 12, and a 13 respectively , and the maximum value can be determined, which is assumed to be a 11 .
  • step 104-22 if the maximum similarity is greater than a preset threshold, a second target feature vector corresponding to the maximum similarity is determined among the M target feature vectors.
  • the second target feature vector is the target feature vector corresponding to the maximum similarity among the M target feature vectors included in the second scene image.
  • step 104-23 the first target part corresponding to the first target feature vector on the first scene image belongs to the target and the second target part corresponding to the second target feature vector on the second scene image belongs to Goal, as the same goal.
  • the first target feature vector corresponding to the first target feature vector of the first scene image is assigned to the target and the second scene image.
  • the target of the second target part corresponding to the second target feature vector is regarded as the same target.
  • the target to which the first target part corresponding to the first target feature vector on the first scene image belongs does not have the same target on the second scene image.
  • the number of repetitions is the number N of target feature vectors included in the first scene image, and finally all the same targets appearing on the first scene image and the second scene image can be determined.
  • the two targets with the closest similarity between the target parts on each two adjacent scene images can be taken as the same target according to the similarity matrix, which achieves the purpose of multi-target tracking and has high availability.
  • At least two of the multiple scene images may be input to a pre-trained feature detection model, and the feature detection model can
  • Each scene image in the scene image is subjected to feature extraction processing and target part detection to obtain the characteristic information of each scene image and the positions of multiple target parts on each scene image, and based on each scene The positions of the multiple target parts on the image are obtained, and the multiple target feature information corresponding to the multiple target parts in the feature information of each scene image is acquired.
  • the structure of the feature detection model is shown in Figure 6, where multiple scene images are input into the feature detection model.
  • the feature detection model first extracts features from each of the multiple scene images through the backbone network to obtain each scene The first feature map of the image.
  • the location detection branch of the feature detection model perform target location detection on the first feature map of each scene image to obtain the positions of multiple target locations on each scene image; and
  • feature extraction processing is performed on the first feature map of each scene image to obtain a multi-dimensional second feature map.
  • the target may include a person
  • the target part may include a face part and/or a body part.
  • the feature extraction branch may be formed by concatenating at least one convolutional layer.
  • the size of the second feature map is the same as that of the first feature map, so that the positions of multiple target parts on the second feature map of each dimension are the same.
  • the dimension value of the second feature map is the same as the number of preset channels corresponding to each scene image.
  • the position of the target part may be represented by the center position of the face recognition frame and/or the center position of the human body recognition frame.
  • the dimension value of the target feature vector is the same as the dimension value of the second feature map.
  • the size of the second feature map obtained by the feature extraction branch is the same as the size of the first feature map, both are H ⁇ W, where H and W are images respectively.
  • the dimensionality value of the second feature map is C, and C is the preset number of channels corresponding to each scene image. On each channel, the target feature vector corresponding to the center position (x, y) of the face recognition frame can be obtained. Therefore, the dimension value of the target feature vector is C.
  • N target feature vectors on the first scene image can be determined.
  • the similarity between the M target feature vectors on the second scene image is obtained, and the similarity matrix is obtained. According to the similarity matrix, multiple identical targets appearing on the different scene images are determined.
  • the determination method is the same as that of step 104-2, and will not be repeated here.
  • the above feature detection models are input respectively, and N target feature vectors and M target feature vectors can be obtained respectively.
  • a bipartite graph algorithm may be used to match the extracted features of the target part under the condition of satisfying the spatial distance constraint, so as to determine the same target appearing in T 0 and T 1.
  • the method may further include:
  • step 100-1 input multiple sample scene images corresponding to the same scene into the initial neural network model, and obtain sample feature vectors corresponding to the positions of multiple target parts on each sample scene image output by the initial neural network model.
  • the existing multiple sample images corresponding to the same scene are used as the input value of the initial neural network model. Multiple same goals and different goals.
  • the structure of the initial neural network model may also be as shown in FIG. 6, including a backbone network, a location detection branch, and a feature extraction branch.
  • the input value includes multiple sample scene images
  • sample feature vectors corresponding to the positions of multiple target parts on each sample scene image can be obtained.
  • step 100-2 according to the target identifiers corresponding to the multiple target parts marked on each sample scene image, on every two adjacent sample scene images, determine the same target identifier.
  • the first similarity between the sample feature vectors corresponding to the positions of the target parts, and/or the second similarity between the sample feature vectors corresponding to the positions of the target parts of the different target identifiers is determined .
  • the same target on every two adjacent sample scene images can be determined
  • the first similarity between the sample feature vectors corresponding to the positions of the identified target parts, and/or, the difference between the target parts of the target identifiers on each of the two adjacent sample scene images The second degree of similarity between the sample feature vectors corresponding to the positions.
  • the first similarity value and the second similarity value can be obtained according to the cosine similarity value between the sample feature vectors.
  • step 100-3 based on the target identifiers corresponding to the multiple target parts marked on each sample scene image, according to at least one of the first similarity and the second similarity, Supervised training is performed on the initial neural network model to obtain the feature detection model.
  • the loss function can be determined by increasing the first similarity value and reducing the second similarity value, for example, as shown in FIG. 9. Based on the target identifiers corresponding to the multiple target positions on each of the two adjacent sample scene images, the network parameters of the preset model are adjusted according to the determined loss function, and the feature detection model is obtained after the supervision training is completed.
  • the initial neural network model is supervised and trained based on the target identifiers corresponding to the multiple target parts marked on each sample scene image to obtain the feature detection model, which improves the feature detection model.
  • the detection performance and generalization performance are supervised and trained based on the target identifiers corresponding to the multiple target parts marked on each sample scene image to obtain the feature detection model, which improves the feature detection model.
  • the difference between the first similarity reference value and the first similarity may be used as the first loss function.
  • the first similarity reference value is the similarity reference value between the sample feature vectors corresponding to the target parts of the same target identifier marked on each of the two sample scene images.
  • the first similarity reference value is the cosine similarity value between the sample feature vectors, and the value may be 1.
  • the first loss function is minimized or reaches the preset number of training times, and the feature detection model is obtained.
  • the difference between the second similarity reference value and the second similarity may be used as the second loss function.
  • the second similarity reference value is the similarity reference value between sample feature vectors corresponding to target parts of different target identifiers marked on each of the two sample scene images.
  • the second similarity reference value is the cosine similarity value between the sample feature vectors, and the value may be zero.
  • the second loss function is minimized or reaches the preset number of training times, and the feature detection model is obtained.
  • the method may further include:
  • step 105 it is determined whether the motion trajectory of at least one of the multiple identical targets appearing on the multiple scene images within a preset time period conforms to the target motion trajectory.
  • multiple scene images correspond to a classroom scene
  • the target includes a teaching object
  • the target motion track includes at least one motion track designated for the teaching object in a teaching task.
  • the at least one motion trajectory specified for the teaching object in the teaching task includes but is not limited to walking from the current position to other positions designated by the teacher.
  • the other positions may be the positions of the podium, blackboard, or other students, or the target motion trajectory It can also include no movement at the current location.
  • teaching multimedia equipment with cameras deployed in the classroom can be used to sequentially collect multiple scene images in the classroom.
  • the motion track of at least one teaching object included in the classroom scene image is determined, and the teaching object may be a student.
  • each teaching object for example, the movement trajectory of each student, conforms to at least one specified for the teaching object in the teaching task Movement trajectory. For example, whether to move from the current position to the blackboard or the position of other students according to the instruction of the teacher, or always stay in the same position without movement of the motion track, for example, always sit in your own position and listen to the lecture.
  • the above results can be displayed through teaching multimedia equipment, so that teachers can better carry out teaching tasks.
  • the present disclosure also provides an embodiment of the device.
  • FIG. 11 is a block diagram of a target tracking device according to an exemplary embodiment of the present disclosure.
  • the device includes: an acquisition module 210 for acquiring multiple scene images corresponding to the same scene; and a processing module 220 for Performing feature extraction processing and target part detection on each scene image of the multiple scene images to obtain feature information of each scene image and the positions of multiple target parts on each scene image;
  • the information determining module 230 is used to obtain the target feature information corresponding to the positions of the multiple target parts in the characteristic information of each scene image;
  • the target determining module 240 is used to obtain the multiple target parts according to the acquired target feature information.
  • the positions of the corresponding target feature information respectively determine multiple identical targets appearing on the multiple scene images, wherein each scene image includes part or all of the multiple identical targets.
  • the processing module includes: a first processing sub-module for extracting a first feature map of each scene image in the plurality of scene images; a second processing sub-module for Perform target part detection on the first feature map of each scene image to obtain the positions of multiple target parts on each scene image; and perform feature extraction on the first feature map of each scene image Processing to obtain a multi-dimensional second feature map;
  • the feature information determining module includes: a feature vector determining sub-module for obtaining information corresponding to the positions of the multiple target parts on the multi-dimensional second feature map Multiple target feature vectors.
  • the target determination module includes: a similarity determination sub-module, configured to use multiple target feature information corresponding to each two adjacent scene images in the multiple scene images to obtain the The similarity between each target part on every two adjacent scene images; the target determination sub-module is used to determine the similarity between each target part on every two adjacent scene images, Multiple identical targets on the scene image.
  • each of the two adjacent scene images is a first scene image and a second scene image;
  • the similarity determination sub-module includes: determining the N target feature vectors on the first scene image, respectively The degree of similarity with the M target feature vectors on the second scene image; where N and M are positive integers greater than or equal to 2; and the N target feature vectors on the first scene image are respectively compared with the first scene image.
  • the similarity between the M target feature vectors on the two scene images obtains a similarity matrix of N ⁇ M dimensions, and the value of any dimension in the similarity matrix represents any first scene image of the first scene image.
  • the similarity between a target part and any second target part in the second scene image is determining the N target feature vectors on the first scene image, respectively The degree of similarity with the M target feature vectors on the second scene image; where N and M are positive integers greater than or equal to 2; and the N target feature vectors on the first scene image are respectively compared with the first scene image.
  • the target determination submodule includes: according to the similarity matrix, the difference between the first target eigenvectors in the N target eigenvectors and the M target eigenvectors is In the similarity, determine the maximum value of the similarity; if the maximum value of the similarity is greater than the preset threshold, determine the second target feature vector corresponding to the maximum value of the similarity among the M target feature vectors; The target of the first target part corresponding to the first target feature vector on the first scene image and the target of the second target part corresponding to the second target feature vector on the second scene image are regarded as the same target.
  • the processing module includes: a third processing sub-module, configured to extract the first feature map of each of the multiple scene images through the backbone network of the feature detection model; and fourth The processing sub-module is used to detect the target part on the first feature map of each scene image through the part detection branch of the feature detection model to obtain the positions of multiple target parts on each scene image And, through the feature extraction branch of the feature detection model, feature extraction processing is performed on the first feature map of each scene image to obtain a multi-dimensional second feature map.
  • a third processing sub-module configured to extract the first feature map of each of the multiple scene images through the backbone network of the feature detection model
  • fourth The processing sub-module is used to detect the target part on the first feature map of each scene image through the part detection branch of the feature detection model to obtain the positions of multiple target parts on each scene image
  • feature extraction processing is performed on the first feature map of each scene image to obtain a multi-dimensional second feature map.
  • the device further includes: a feature vector determining module, configured to input multiple sample scene images corresponding to the same scene into a preset model, and obtain each sample scene image output by the preset model. Multiple feature vectors corresponding to the positions of multiple target parts; the similarity determination module is used to determine the target identifiers corresponding to the multiple target parts marked on every two adjacent sample scene images, respectively.
  • a feature vector determining module configured to input multiple sample scene images corresponding to the same scene into a preset model, and obtain each sample scene image output by the preset model. Multiple feature vectors corresponding to the positions of multiple target parts; the similarity determination module is used to determine the target identifiers corresponding to the multiple target parts marked on every two adjacent sample scene images, respectively.
  • the first similarity between the sample feature vectors corresponding to the positions of the target parts of the same target identifier on the two sample scene images; and/or determine the different targets on each of the two adjacent sample scene images The second degree of similarity between the sample feature vectors corresponding to the positions of the identified target parts; the training module is configured to be based on the target identifications respectively corresponding to the multiple target parts marked on each of the two adjacent sample scene images, According to at least one of the second degree of similarity and the first degree of similarity, supervised training is performed on the preset model to obtain the feature detection model.
  • the difference between the first similarity reference value and the first similarity is taken as the first loss function; wherein, the first similarity reference value is every two adjacent sample scenes The similarity reference value between the sample feature vectors corresponding to the target parts of the same target identifier marked on the image; taking the difference between the second similarity reference value and the second similarity as the second loss function;
  • the second similarity reference value is the similarity reference value between the sample feature vectors corresponding to target parts of different target identifiers marked on each of the two adjacent sample scene images; according to the first At least one of a loss function and the second loss function is trained on the initial neural network model to obtain the feature detection model.
  • the device further includes: a motion trajectory determining module, configured to determine the motion of at least one target among the multiple same targets appearing on the multiple scene images within a preset time period Whether the trajectory conforms to the target motion trajectory.
  • a motion trajectory determining module configured to determine the motion of at least one target among the multiple same targets appearing on the multiple scene images within a preset time period Whether the trajectory conforms to the target motion trajectory.
  • the multiple scene images correspond to a classroom scene
  • the target includes a teaching object
  • the target motion track includes at least one motion track designated for the teaching object in a teaching task.
  • the relevant part can refer to the part of the description of the method embodiment.
  • the device embodiments described above are merely illustrative, where the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place. , Or it can be distributed to multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the present disclosure. Those of ordinary skill in the art can understand and implement without creative work.
  • the embodiment of the present disclosure also provides a computer-readable storage medium, the storage medium stores a computer program, and the computer program is used to execute any of the target tracking methods described above.
  • the embodiments of the present disclosure provide a computer program product, including computer-readable code.
  • the processor in the device executes any of the above implementations.
  • the example provides instructions for the target tracking method.
  • the embodiments of the present disclosure also provide another computer program product for storing computer-readable instructions, which when executed, cause the computer to perform the operations of the target tracking method provided by any of the foregoing embodiments.
  • the computer program product can be specifically implemented by hardware, software, or a combination thereof.
  • the computer program product is specifically embodied as a computer storage medium.
  • the computer program product is specifically embodied as a software product, such as a software development kit (SDK), etc. Wait.
  • SDK software development kit
  • the embodiments of the present disclosure provide a computer program, wherein when the computer program is executed, the computer executes the operation of the target tracking method provided in any of the foregoing embodiments.
  • the embodiment of the present disclosure also provides a target tracking device, including: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured to call the executable instructions stored in the memory to implement any of the foregoing.
  • a target tracking device including: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured to call the executable instructions stored in the memory to implement any of the foregoing.
  • FIG. 12 is a schematic diagram of the hardware structure of a target tracking device provided by an embodiment of the disclosure.
  • the target tracking device 310 includes a processor 311, and may also include an input device 312, an output device 313, and a memory 314.
  • the input device 312, the output device 313, the memory 314, and the processor 311 are connected to each other through a bus.
  • Memory includes, but is not limited to, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), or portable Read-only memory (compact disc read-only memory, CD-ROM), which is used for related instructions and data.
  • RAM random access memory
  • ROM read-only memory
  • EPROM erasable programmable read-only memory
  • CD-ROM compact disc read-only memory
  • the input device is used to input data and/or signals
  • the output device is used to output data and/or signals.
  • the output device and the input device can be independent devices or a whole device.
  • the processor may include one or more processors, such as one or more central processing units (CPU).
  • processors such as one or more central processing units (CPU).
  • CPU central processing units
  • the CPU may be a single-core CPU or Multi-core CPU.
  • the memory is used to store the program code and data of the network device.
  • the processor is used to call the program code and data in the memory to execute the steps in the foregoing method embodiment.
  • the processor is used to call the program code and data in the memory to execute the steps in the foregoing method embodiment.
  • the description in the method embodiment please refer to the description in the method embodiment, which will not be repeated here.
  • FIG. 12 only shows a simplified design of a target tracking device.
  • the target tracking device may also include other necessary components, including but not limited to any number of input/output devices, processors, controllers, memories, etc., and all target tracking devices that can implement the embodiments of the present disclosure All are within the protection scope of the present disclosure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

本公开提供了一种目标跟踪方法及装置、存储介质及计算机程序,其中,该方法包括:获取对应同一场景的多张场景图像;对所述多张场景图像中的每张场景图像进行特征提取处理以及目标部位检测,得到所述每张场景图像的特征信息以及所述每张场景图像上的多个目标部位的位置;获取所述每张场景图像的特征信息中与所述多个目标部位的位置分别对应的目标特征信息;根据获取的所述多个目标部位的位置分别对应的目标特征信息,确定出现在所述多张场景图像上的多个相同的目标,其中,每张场景图像中包括所述多个相同的目标的部分或全部目标。

Description

目标跟踪方法及装置、存储介质及计算机程序
相关申请的交叉引用
本专利申请要求于2020年4月28日提交的、申请号为202010352365.6、发明名称为“目标跟踪方法及装置、存储介质”的中国专利申请的优先权,该申请的全文以引用的方式并入本文中。
技术领域
本公开涉及计算机视觉领域,尤其涉及一种目标跟踪方法及装置、存储介质及计算机程序。
背景技术
目前,通过多目标跟踪技术分析目标的运动轨迹的需求日益增强。在进行多目标跟踪的过程中,需要先通过目标检测获得多个目标所在的位置,然后对每个目标进行单目标跟踪。
上述多目标跟踪的处理时间与场景中目标的数目呈线性相关。例如,场景中包括N个对象,这里的N为正整数,则多目标跟踪需要进行N次单目标跟踪的推理,处理时间会增加到单目标跟踪所需时间的N倍。N的取值越大,多目标跟踪的时间就越长,这就需要设备具备较高的计算能力且耗时较长。
发明内容
本公开提供了一种目标跟踪方法及装置、存储介质及计算机程序。
根据本公开实施例的第一方面,提供一种目标跟踪方法,所述方法包括:获取对应同一场景的多张场景图像;对所述多张场景图像中的每张场景图像进行特征提取处理以及目标部位检测,得到所述每张场景图像的特征信息以及所述每张场景图像上的多个目标部位的位置;获取所述每张场景图像的特征信息中与所述多个目标部位的位置分别对应的目标特征信息;根据获取的所述多个目标部位的位置分别对应的目标特征信息,确定出现在所述多张场景图像上的多个相同的目标,其中,每张场景图像中包括所述多个相同的目标的部分或全部目标。
在一些可选实施例中,所述对所述多张场景图像中的每张场景图像进行特征提取处理以及目标部位检测,得到所述每张场景图像的特征信息以及所述每张场景图像上的多个目标部位的位置,包括:提取所述多张场景图像中的每张场景图像的第一特征图;在所述每张场景图像的第一特征图上进行目标部位检测,得到所述每张场景图像上的多个目标部位的位置;以及,对所述每张场景图像的第一特征图进行特征提取处理,得到多维度的第二特征图;所述获取所述每张场景图像的特征信息中与所述多个目标部位的位置分别对应的目标特征信息,包括:在所述多维度的第二特征图上获取与所述多个目标部位的位置分别对应的目标特征向量。
在一些可选实施例中,所述根据获取的所述多个目标部位的位置分别对应的目标特征信息,确定出现在所述多张场景图像上的多个相同的目标,包括:利用所述多张场景图像中每相邻两张场景图像分别对应的多个目标特征信息,得到所述每相邻两张场景图像上各个目标部位之间的相似度;基于所述每相邻两张场景图像上各个目标部位之间的相似度,确定出现在所述不同场景图像上的多个相同的目标。
在一些可选实施例中,所述每相邻两张场景图像为第一场景图像和第二场景图像;所述利用所述多张场景图像中每相邻两张场景图像分别对应的多个目标特征信息,得到所述每相邻两张场景图像上各个目标部位之间的相似度,包括:确定第一场景图像上的N个目标特征向量分别与第二场景图像上的M个目标特征向量之间的相似度;其中,N和M为大于等于2的正整数;根据所述第一场景图像上的N个目标特征向量分别与所述第二场景图像上的M个目标特征向量之间的所述相似度,得到N×M维度的相似度矩阵,所述相似度矩阵中任一维度的值表示所述第一场景图像的任一第一目标部位与所述第二场景图像中的任一第二目标部位的相似度。
在一些可选实施例中,所述基于所述每相邻两张场景图像上各个目标部位之间的相似度,确定出现在所述不同场景图像上的多个相同的目标,包括:根据所述相似度矩阵,在所述N个目标特征向量中的第一目标特征向量分别与所述M个目标特征向量之间的相似度中,确定相似度最大值;若所述相似度最大值大于预设阈值,则在所述M个目标特征向量中确定所述相似度最大值对应的第二目标特征向量;将所述第一场景图像上所述第一目标特征向量对应的第一目标部位所属目标和所述第二场景图像上第二目标特征向量对应的第二目标部位所属目标,作为相同的目标。
在一些可选实施例中,所述对所述多张场景图像中的每张场景图像进行特征提取处理以及目标部位检测,得到所述每张场景图像的特征信息以及所述每张场景图像上的多个目标部位的位置,包括:通过特征检测模型的骨干网络提取所述多张场景图像中的每张场景图像的第一特征图;通过所述特征检测模型的部位检测分支,在所述每张场景图像的第一特征图上进行目标部位检测,得到所述每张场景图像上的多个目标部位的位置;以及,通过所述特征检测模型的特征提取分支,对所述每张场景图像的第一特征图进行特征提取处理,得到多维度的第二特征图。
在一些可选实施例中,所述方法还包括:将对应同一场景的多张样本场景图像输入初始神经网络模型,获得所述初始神经网络模型输出的每张样本场景图像上多个目标部位的位置分别对应的样本特征向量;根据所述每张样本场景图像上已标注的多个目标部位分别对应的目标标识,确定在每相邻两张样本场景图像上,相同的所述目标标识的所述目标部位的位置对应的所述样本特征向量之间的第一相似度,和/或确定不同的所述目标标识的所述目标部位的位置对应的所述样本特征向量之间的第二相似度;基于所述每张样本场景图像上已标注的多个目标部位分别对应的目标标识,根据所述第一相似度和所述第二相似度中的至少一项,对所述初始神经网络模型进行监督训练,得到所述特征检测模型。
在一些可选实施例中,所述基于所述每张样本场景图像上已标注的多个目标部位分别对应的目标标识,根据所述第一相似度和所述第二相似度中的至少一项,对所述初始 神经网络模型进行监督训练,得到所述特征检测模型,包括:将第一相似度参考值与所述第一相似度之间的差作为第一损失函数;其中,所述第一相似度参考值是所述每相邻两张样本场景图像上已标注的相同的目标标识的目标部位所对应的样本特征向量之间的相似度参考值;将第二相似度参考值与所述第二相似度之间的差作为第二损失函数;其中,所述第二相似度参考值是所述每相邻两张样本场景图像上已标注的不同的目标标识的目标部位所对应的样本特征向量之间的相似度参考值;根据所述第一损失函数和所述第二损失函数中的至少一项,对所述初始神经网络模型进行训练,得到所述特征检测模型。
在一些可选实施例中,所述方法还包括:确定出现在所述多个场景图像上的多个相同的目标中的至少一个目标在预设时间段内的运动轨迹是否符合目标运动轨迹。
在一些可选实施例中,所述多张场景图像对应教室场景,所述目标包括教学对象,所述目标运动轨迹包括教学任务中对所述教学对象指定的至少一种运动轨迹。
根据本公开实施例的第二方面,提供一种目标跟踪装置,所述装置包括:获取模块,用于获取对应同一场景的多张场景图像;处理模块,用于对所述多张场景图像中的每张场景图像进行特征提取处理以及目标部位检测,得到所述每张场景图像的特征信息以及所述每张场景图像上的多个目标部位的位置;特征信息确定模块,用于获取所述每张场景图像的特征信息中与所述多个目标部位的位置分别对应的目标特征信息;目标确定模块,用于根据获取的所述多个目标部位的位置分别对应的目标特征信息,确定出现在所述多张场景图像上的多个相同的目标,其中,每张场景图像中包括所述多个相同的目标的部分或全部目标。
根据本公开实施例的第三方面,提供一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序用于执行第一方面任一所述的目标跟踪方法。
根据本公开实施例的第四方面,提供一种目标跟踪装置,包括:处理器;用于存储所述处理器可执行指令的存储器;其中,所述处理器被配置为调用所述存储器中存储的可执行指令,实现第一方面任一项所述的目标跟踪方法。
根据本公开实施例的第五方面,提供一种计算机程序,其中所述计算机程序被处理器执行时,能够实现第一方面任一项所述的目标跟踪方法。
本公开的实施例提供的技术方案可以包括以下有益效果:
本公开实施例中,不需要在相邻每两张场景图像中分别确定多个目标后,针对前一张场景图像上的每个目标在后一张场景图像所包括的多个目标中分别进行单目标跟踪推理,而是针对单张场景图像进行单帧推断,得到多个目标部位的位置对应的目标特征信息,针对单帧推断结果进行匹配,得到每相邻两张场景图像中的多个相同的目标,实现了多目标跟踪的目的,且即使当前场景中包括多个目标,由于针对整张场景图像进行推断,使得整个多目标跟踪过程的时长与场景图像中所包括的目标的数目无关,不会因为目标的数目的增长去逐个进行单目标跟踪推理导致跟踪时长的增加,极大节省了计算资源,缩短了多目标跟踪的时长,有效提高了多目标跟踪的检测效率。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能 限制本公开。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。
图1是本公开根据一示例性实施例示出的一种目标跟踪方法流程图;
图2是本公开根据一示例性实施例示出的另一种目标跟踪方法流程图;
图3是本公开根据一示例性实施例示出的另一种目标跟踪方法流程图;
图4是本公开根据一示例性实施例示出的另一种目标跟踪方法流程图;
图5是本公开根据一示例性实施例示出的另一种目标跟踪方法流程图;
图6是本公开根据一示例性实施例示出的一种特征检测模型的结构示意图;
图7是本公开根据一示例性实施例示出的一种多目标跟踪的推断过程示意图;
图8是本公开根据一示例性实施例示出的另一种目标跟踪方法流程图;
图9是本公开根据一示例性实施例示出的一种特征检测模型训练场景示意图;
图10是本公开根据一示例性实施例示出的另一种目标跟踪方法流程图;
图11是本公开根据一示例性实施例示出的一种目标跟踪装置框图;
图12是本公开根据一示例性实施例示出的一种用于目标跟踪装置的一结构示意图。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。
在本公开使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本公开。在本公开和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。
应当理解,尽管在本公开可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本公开范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所运行的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。
本公开实施例提供了一种多目标跟踪方案,示例性的,可以适用于不同场景下的终端设备。不同的场景包括但不限于教室、部署了监控的地点、或其他需要对多目标跟踪 的室内或室外场景。终端设备可以采用任意带摄像头的设备,或者,终端设备也可以是外接摄像设备。终端设备可以在同一场景下先后采集多张场景图像,或者可以直接采集视频流,将该视频流中的多张图像作为所述多张场景图像。
进一步地,终端设备对获取的多张场景图像中的每张场景图像,进行特征提取处理以及目标部位检测,基于每张场景图像的特征信息以及所述每张场景图像上的多个目标部位的位置,获取每张场景图像的特征信息中与多个目标部位的位置分别对应的目标特征信息,从而确定出现在多张场景图像中的多个相同的目标。
例如在教室中,终端设备可以采用部署在教室内的带摄像头的教学多媒体设备,包括但不限于教学投影机、教室内的监控设备等。终端设备获取教室中的多张场景图像,从而对所述多张场景图像中的每张场景图像进行特征提取处理以及目标部位检测,得到所述每张场景图像的特征信息以及所述每张场景图像上的多个目标部位的位置。获取所述每张场景图像的特征信息中与所述多个目标部位的位置分别对应的目标特征信息,从而确定出现在所述多张场景图像上的多个相同的目标,实现多目标跟踪的目的。该场景下的目标可以包括但不限于教学对象,例如学生,目标部位包括但不限于人脸部位和人体部位。
再例如,在地铁或火车站可以部署一个或多个监控摄像头,通过监控摄像头可以获取地铁或火车站的多张场景图像。该场景下的目标可以包括乘客、乘客携带的行李箱、工作人员等等。采用本公开实施例提供的方案,可以在地铁站或火车站这种人流量大的场景下,确定出现在多张场景图像中的多个相同的目标,实现多目标跟踪的目的。
示例性的,本公开实施例提供的多目标跟踪方案还可以适用于不同场景下的云端服务器,该云端服务器可以设置外接摄像头,由外接摄像头在同一场景下先后采集多张场景图像,或者可以直接采集视频流,将该视频流中的多张图像作为所述多张场景图像。所采集的场景图像可以通过路由器或网关发送给云端服务器,由云端服务器,对每张场景图像进行特征提取处理以及目标部位检测,得到所述每张场景图像的特征信息以及所述每张场景图像上的多个目标部位的位置,从而获取所述每张场景图像的特征信息中与所述多个目标部位的位置分别对应的目标特征信息,进一步地,确定出现在所述多张场景图像上的多个相同的目标。
例如,外接摄像头设置在教室中,外接摄像头在教室内下采集多张场景图像,通过路由器或网关发送给云端服务器,云端服务器执行上述目标跟踪方法。
在本公开实施例中,还可以通过终端设备或云端服务器确定出现在多张场景图像上的多个相同的目标后,用相同的标识框对同一目标进行标识并输出标识后的场景图像。例如在输出的相邻两张场景图像上,用红色标识框标识出该场景中的目标1,用绿色标识框标识出该场景中的目标2,用蓝色标识框标识出该场景中的目标3等等,以便更好的示出当前场景下的多个相同的目标。或者还可以通过标识框所对应的目标标识来区分相同或不同的目标,例如,在输出的一张场景图像上包括3个标识框,分别对应的目标标识为1、2和3,在与其相邻的场景图像上包括2个标识框,分别对应的目标标识为1和3,那么可以确定这两张场景图像上目标标识为1的识别框对应相同的目标,目标标识为3的识别框也对应相同的目标,目标标识为1和3的识别框分别对应了不同的目标。
另外,还可以通过终端设备或云端服务器确定多个相同的目标中的至少一个目标在预设时间段内的运动轨迹,分析该运动轨迹是否符合目标运动轨迹。
例如,当前场景为教室,目标包括教学对象,则目标运动轨迹可以包括但不限于教学任务中对所述教学对象指定的至少一种运动轨迹,例如从当前所在位置移动到老师指定的其他位置,其他位置可以是讲台、黑板或其他同学所在位置,或者目标运动轨迹还可以包括处于同一位置。老师可以根据多个教学对象的运动轨迹,更好地进行教学工作。
再例如,以当前场景为部署了监控的地铁站或火车站为例,目标包括但不限于乘车人员,则目标运动轨迹可以包括但不限于指定的危险运动轨迹或非法运动轨迹,例如从站台位置移动到铁轨所在位置、移动到闸机的上方或下方等。工作人员可以根据乘车人员的运动轨迹,更好地进行车站管理,避免危险行为或逃票行为的发生。
以上仅是对本公开适用的场景进行的举例说明,其他需要快速进行动作类型识别的室内或场景也属于本公开的保护范围。
例如图1所示,图1是根据一示例性实施例示出的一种目标跟踪方法,包括以下步骤:
在步骤101中,获取对应同一场景的多张场景图像。
本公开实施例中,可以在同一场景下先后采集多张场景图像,或者可以采集视频流,将视频流中的多张图像作为多张场景图像。本公开的场景包括但不限于任何需要进行多目标跟踪的场景,例如教室、布置监控的地点等。
在步骤102中,对所述多张场景图像中的每张场景图像进行特征提取处理以及目标部位检测,得到所述每张场景图像的特征信息以及所述每张场景图像上的多个目标部位的位置。
在本公开实施例中,对每张场景图像进行特征提取是指从每张场景图像中提取特征信息,该特征信息可以包括但不限于颜色特征、纹理特征、形状特征等。颜色特征是一种全局特征,描述了图像所对应的对象的表面颜色属性;纹理特征也是一种全局特征,它描述了图像所对应对象的表面纹理属性;形状特征有两类表示方法,一类是轮廓特征,另一类是区域特征,图像的轮廓特征主要针对对象的外边界,而图像的区域特征则关系到图像区域的形状。
在本公开实施例中,一个目标部位对应一个目标,但是不具有限制性,也可以多个目标部位对应一个目标。目标部位可以包括但不限于人脸部位和/或人体部位,人体部位可以包括人物的整个人体或人体的某个指定部位,例如手部、腿部等。目标部位的位置至少可以通过该目标部位的标识框的中心位置来表示,例如目标部位包括人脸部位,则目标部位的位置可以通过人脸标识框的中心位置表示。该目标部位的标识框例如可以实现为该目标部位的外接矩形框,等等。
在步骤103中,获取所述每张场景图像的特征信息中与所述多个目标部位的位置分别对应的目标特征信息。
在本公开实施例中,每张场景图像上包括多个目标部位,根据获取到的每张场景图 像的特征信息,对包括目标部位的区域的像素进行特征提取,确定与多个目标部位的位置分别对应的目标特征信息。示例性的,可以通过卷积处理等,获取每张场景图像的特征信息中与每个目标部位的区域所包括的多个像素分别对应的目标特征信息。
在步骤104中,根据获取的所述多个目标部位的位置分别对应的目标特征信息,确定出现在所述多张场景图像上的多个相同的目标,其中每张场景图像包括所述多个相同的目标的部分目标或全部目标。
上述实施例中,在每张场景图像上获得了多个目标部位的位置对应的目标特征信息,通过将所述多张场景图像的这些目标特征信息进行匹配,可以确定出现在所述多张场景图像上的多个相同的目标。
上述实施例中,不需要在相邻每两张场景图像中分别确定多个目标后,针对前一张场景图像上的每个目标在后一张场景图像所包括的多个目标中分别进行单目标跟踪推理;而是针对单张场景图像进行单帧推断,得到多个目标部位的位置对应的目标特征信息,通过将获得的每相邻两张场景图像的单帧推断结果进行匹配,得到每相邻两张场景图像中的多个相同的目标,实现了多目标跟踪的目的。即使当前场景中包括多个目标,由于针对整张场景图像进行推断,使得整个多目标跟踪过程的时长与场景图像中所包括的目标的数目无关,不会因为目标的数目的增长去逐个进行单目标跟踪推理导致跟踪时长的增加,极大节省了计算资源,缩短了多目标跟踪的时长,有效提高了多目标跟踪的检测效率。
在一些可选实施例中,例如图2所示,步骤102可以包括:
在步骤102-1中,提取所述多张场景图像中的每张场景图像的第一特征图。
在本公开实施例中,可以通过预先训练好的神经网络模型,来提取每张场景图像的图像特征,得到第一特征图。该神经网络模型可以采用但不限于视觉几何群网络(Visual Geometry Group Network,VGG Net)等模型。
在步骤102-2中,在所述每张场景图像的第一特征图上进行目标部位检测,得到所述每张场景图像上的多个目标部位的位置;以及,对所述每张场景图像的第一特征图进行特征提取处理,得到多维度的第二特征图。
在本公开实施例中,目标部位可以包括人脸部位和/或人体部位。通过区域预测网络(Region Proposal Network,RPN),可以在每张场景图像的第一特征图上,进行人脸部位和/或人体部位检测,确定对应人脸部位的人脸区域和/或对应人体部位的人体区域。其中,人脸区域可以通过人脸识别框进行标识,人体区域可以通过人体识别框进行标识。示例性的,可以将人脸识别框的中心位置作为人脸部位的位置。同样地,可以将人体识别框的中心位置作为人体部位的位置。
进一步地,还可以对每张场景图像的第一特征图进行特征提取处理,将第一特征图所包括的多类特征信息通过不同的通道提取出来,从而得到多维度的第二特征图。示例性的,第二特征图的尺寸与第一特征图的尺寸可以相同,且第二特征图的维度值为每张场景图像对应的预设通道数目。
相应地,步骤103可以包括:
在所述多维度的第二特征图上获取与所述多个目标部位的位置分别对应的目标特征向量。
在本公开实施例中,目标特征信息用于表示任一个维度的第二特征图所包括的多个目标部位的区域的各个区域中的多个像素分别对应的特征信息。其中,目标部位可以包括人脸部位和/或人体部位。
在任一个维度的第二特征图所包括的多个目标部位的区域中,任意一个像素对应的特征信息均可以构成一个一维的特征向量,为了后续便于进行相似度计算,可以从这些特征向量中选取出一个或多个特征向量来表示该目标部位的区域的特征信息,即目标特征信息。在本公开实施例中,可以选取目标部位的位置的像素所对应的特征向量,将该特征向量作为该维度的第二特征图上目标部位的位置对应的目标特征向量。其中,目标部位的位置可以包括人脸识别框的中心位置/或人体识别框的中心位置。
进一步地,为了提高后续目标部位匹配的准确度,可以针对多维度的第二特征图中至少一个维度的第二特征图,获取多个目标部位的位置的像素对应的特征信息,得到与所述多个目标部位的位置分别对应的目标特征向量。示例性的,针对每个维度的第二特征图均可以获取多个目标部位的位置分别对应的目标特征向量,使得目标特征向量的维度值与第二特征图的维度值相同。例如,第二特征图的维度值为C,则目标特征向量的维度值也为C。
上述实施例中,针对整张场景图像依次进行特征提取、目标部位检测、以及确定与多个目标部位的位置分别对应的目标特征向量,整个过程是对单张场景图像进行的单帧推断,因此与其中包括的目标的数目的多少无关;后续会针对每相邻两张场景图像上与多个目标位置分别对应的目标特征向量进行匹配,从而不需要分别进行单目标跟踪推理,即使场景图像上包括的目标数目较多,也可以一次性完成匹配过程。本公开的目标跟踪方法与场景图像中的目标数目无关,不会因为目标的数目的增长导致跟踪时长的增加,极大节省了计算资源,缩短了多目标跟踪的时长,有效提高了多目标跟踪的检测效率。
在一些可选实施例中,例如图3所示,步骤104可以包括:
在步骤104-1中,利用所述多张场景图像上每相邻两张场景图像分别对应的所述多个目标特征信息,得到所述每相邻两张场景图像上各个目标部位之间的相似度。
在本公开实施例中,已经确定了每张场景图像的特征信息中与所述多个目标部位对应的多个目标特征信息,可以利用每相邻两张场景图像分别对应的多个目标特征信息进行相似度计算,得到每相邻两张场景图像上各个目标部位之间的相似度。
在步骤104-2中,基于所述每相邻两张场景图像上各个目标部位之间的相似度,确定出现在所述不同场景图像上的多个相同的目标。
在本公开实施例中,可以将每相邻两张场景图像上,相似度最大的目标部位所属的目标作为出现在不同场景图像上的相同的目标。
上述实施例中,可以根据每相邻两张场景图像上各个目标部位之间的相似度来确定出现在不同场景图像上的多个相同的目标,实现了多目标跟踪的目的,且跟踪过程与目标数目无关,可用性高。
在一些可选实施例中,每相邻两张场景图像为第一场景图像T 0和第二场景图像T 1
例如图4所示,上述步骤104-1可以包括:
在步骤104-11中,确定第一场景图像上的N个目标特征向量分别与第二场景图像上的M个目标特征向量之间的相似度。
在本公开实施例中,目标特征信息用于表示任一个维度的第二特征图所包括的多个目标部位的区域的各个区域中的多个像素分别对应的特征信息。其中,目标部位可以包括人脸部位和/或人体部位。
根据目标特征信息,在任一个维度的第二特征图所包括的多个目标部位的区域中,任意一个像素对应的特征信息均可以构成一个一维的特征向量,为了后续便于进行相似度计算,可以从这些特征向量中选取出一个或多个特征向量来表示该目标部位的区域的特征信息。在本公开实施例中,可以选取目标部位的位置的像素所对应的特征向量,将该特征向量作为该维度的第二特征图上目标部位的位置对应的目标特征向量。其中,目标部位的位置可以包括人脸识别框的中心位置/或人体识别框的中心位置。
在确定相似度的过程中,可以确定每相邻两张场景图像中第一场景图像上的N个目标特征向量分别与第二场景图像上的M个目标特征向量之间的相似度,其中,N和M为大于等于2的正整数。即确定第一场景图像上的多个目标特征向量分别与第二场景图像上的多个目标特征向量之间的相似度。
在一种可能地实现方式中,确定相似度时,可以确定目标特征向量之间的余弦相似度值。通过计算第一场景图像上的任一个目标特征向量与第二场景图像上的任一个目标特征向量的夹角余弦值,来评估它们的相似度。
在步骤104-12中,根据所述第一场景图像上的N个目标特征向量分别与所述第二场景图像上的M个目标特征向量之间的所述相似度,得到N×M维度的相似度矩阵。
在本公开实施例中,相似度矩阵中任一维度的值表示所述第一场景图像的任一第一目标部位与所述第二场景图像中的任一第二目标部位的相似度。其中,N和M可以相等或不相等。
上述实施例中,可以通过确定第一场景图像上的N个目标特征向量分别与第二场景图像上的M个目标特征向量之间的相似度,得到N×M维度的相似度矩阵,通过相似度矩阵表示所述第一场景图像的任一第一目标部位与所述第二场景图像中的任一第二目标部位的相似度,实现简便,可用性高。
在一些可选实施例中,针对步骤104-2可以采用二部图算法,在满足空间距离约束的条件下,基于所述每相邻两张场景图像上各个目标部位之间的相似度,确定出现在所述不同场景图像上的多个相同的目标。
其中,二部图算法是指在一个二部图内,假设左顶点为X,右顶点为Y,现对于每组左右连接X iY j有权值w ij,求一种匹配使得所有w ij的和最大。在本公开实施例中,X i相当于第一场景图像上的N个目标特征向量中的一个,Y j相当于第二场景图像上的M个目标特征向量中的一个,权值w ij就对应相似度。本公开需要在相似度最大的情况 下,将N个目标特征向量与第二目标特征向量匹配起来,最终可以确定出现在相邻每两张场景图像中的多个相同的目标。
在本公开实施例中,满足空间距离约束的条件包括:N个目标特征向量与M个目标特征向量之间的相似度的维度,不超过N×M。
在一种可能地实现方式中,相似度最大的同时还需要确保这个相似度最大值超过预设阈值,以便进一步提高多目标跟踪的准确性。
例如图5所示,步骤104-2可以包括:
在步骤104-21中,根据所述相似度矩阵,在所述N个目标特征向量中的第一目标特征向量分别与所述M个目标特征向量之间的相似度中,确定相似度最大值。
在本公开实施例中,第一目标特征向量是第一场景图像上确定的N个目标特征向量中的任一个。根据相似度矩阵可以得到该第一目标特征向量与第二场景图像上的每个目标特征向量之间的相似度,在这些相似度中可以确定出一个相似度最大值。
假设相似度矩阵为A:
Figure PCTCN2021087870-appb-000001
第一目标特征向量与M个第二目标特征向量之间的相似度分别为a 11、a 12和a 13,可以确定其中的最大值,假设为a 11
在步骤104-22中,若所述相似度最大值大于预设阈值,则在所述M个目标特征向量中确定所述相似度最大值对应的第二目标特征向量。
在本公开实施例中,第二目标特征向量是第二场景图像所包括的M个目标特征向量中该相似度最大值对应的目标特征向量。
为了进一步确保多目标跟踪的准确性,需要确保相似度最大值大于预设阈值。
在步骤104-23中,将所述第一场景图像上所述第一目标特征向量对应的第一目标部位所属目标和所述第二场景图像上第二目标特征向量对应的第二目标部位所属目标,作为相同的目标。
在本公开实施例中,在上述的相似度最大值大于预设阈值时,才将所述第一场景图像的第一目标特征向量对应的第一目标部位所属目标和所述第二场景图像上第二目标特征向量对应的第二目标部位所属目标,作为相同的目标。
相似度最大值如果小于或等于预设阈值,可以认为第一场景图像上的第一目标特征向量对应的第一目标部位所属目标在第二场景图像上不存在相同的目标。
重复上述步骤104-21至104-23,重复次数为第一场景图像所包括的目标特征向量的数目N,最终可以确定出现在第一场景图像和第二场景图像上的所有相同的目标。
上述实施例中,可以根据相似度矩阵,将相邻每两张场景图像上目标部位之间 的相似度最接近的两个目标作为相同的目标,实现了多目标跟踪的目的,可用性高。
在一些可选实施例中,在获取了多张场景图像之后,可以将所述多张场景图像中的至少两张输入预先训练好的特征检测模型,由所述特征检测模型对所述多张场景图像中的每张场景图像进行特征提取处理以及目标部位检测,得到所述每张场景图像的特征信息以及所述每张场景图像上的多个目标部位的位置,以及基于所述每张场景图像上多个目标部位的位置,获取所述每张场景图像的特征信息中与所述多个目标部位对应的多个目标特征信息。
特征检测模型的结构例如图6所示,将多张场景图像输入特征检测模型,特征检测模型先通过骨干网络(backbone)对多张场景图像中的每张场景图像进行特征提取,获得每张场景图像的第一特征图。
进一步地,通过特征检测模型的部位检测分支,在所述每张场景图像的第一特征图上进行目标部位检测,得到所述每张场景图像上的多个目标部位的位置;以及,通过所述特征检测模型的特征提取分支,对所述每张场景图像的第一特征图进行特征提取处理,得到多维度的第二特征图。其中,目标可以包括人物,目标部位可以包括人脸部位和/或人体部位。特征提取分支可以由至少一个卷积层串联而成。第二特征图的尺寸与第一特征图相同,这样在每个维度的第二特征图上多个目标部位的位置都是相同的。第二特征图的维度值与每张场景图像对应的预设通道数目相同。
进一步地在所述多维度的第二特征图上,可以获取与所述多个目标部位的位置对应的多个目标特征向量。目标部位的位置可以通过人脸识别框的中心位置和/或人体识别框的中心位置表示。目标特征向量的维度值与第二特征图的维度值相同。假设某个人脸识别框的中心位置坐标为(x,y),特征提取分支得到的第二特征图的尺寸与第一特征图尺寸一致,均为H×W,其中,H和W分别为图像的长度和宽度,第二特征图的维度值为C,C是每张场景图像对应的预设通道数目。在每个通道上,均可以得到与人脸识别框中心位置(x,y)对应的目标特征向量,因此,目标特征向量的维度值为C。
在本公开实施例中,在所述多维度的第二特征图上提取与所述多个目标部位的位置对应的多个目标特征向量之后,可以确定第一场景图像上的N个目标特征向量分别与第二场景图像上的M个目标特征向量之间的相似度,从而得到相似度矩阵,根据该相似度矩阵,确定出现在所述不同场景图像上的多个相同的目标。确定方式与上述步骤104-2的方式相同,在此不再赘述。
例如图7所示,针对第一场景图像T 0和第二场景图像T 1,分别输入上述特征检测模型,可以得到分别得到N个目标特征向量和M个目标特征向量。进一步地,可以采用二部图算法,在满足空间距离约束的条件下对提取的所述目标部位的特征进行匹配,从而在确定出现在T 0和T 1中的相同的目标。
上述实施例中,针对每张场景图像进行单帧推断,无论每张场景图像中包括多少目标,都可以快速实现多目标跟踪,有效提高了多目标跟踪的检测效率。
在一些可选实施例中,例如图8所示,该方法还可以包括:
在步骤100-1中,将对应同一场景的多张样本场景图像输入初始神经网络模型,获得所述初始神经网络模型输出的每张样本场景图像上多个目标部位的位置分别对应的样本特征向量。
在本公开实施例中,采用已有的对应同一场景的多张样本图像作为初始神经网络模型的输入值,多张样本图像中预先通过每个标识框和/或对应的目标标识,标识出了多个相同的目标和不同的目标。
在本公开实施例中,初始神经网络模型的结构同样可以如图6所示,包括骨干网络、部位检测分支和特征提取分支。在输入值包括多张样本场景图像的情况下,可以得到每张样本场景图像上多个目标部位的位置分别对应的样本特征向量。
在步骤100-2中,根据所述每张样本场景图像上已标注的多个目标部位分别对应的目标标识,在每相邻两张样本场景图像上,确定相同的所述目标标识的所述目标部位的位置对应的所述样本特征向量之间的第一相似度,和/或确定不同的所述目标标识的所述目标部位的位置对应的所述样本特征向量之间的第二相似度。
本公开实施例中,基于初始神经网络模型输出的每张样本场景图像上多个目标部位的位置分别对应的样本特征向量,可以确定出每相邻两张样本场景图像上的相同的所述目标标识的所述目标部位的位置对应的所述样本特征向量之间的第一相似度,和/或,所述每相邻两张样本场景图像上不同的所述目标标识的所述目标部位的位置对应的所述样本特征向量之间的第二相似度。
其中,可以根据样本特征向量之间的余弦相似度值来得到上述第一相似度值和第二相似度值。
在步骤100-3中,基于所述每张样本场景图像上已标注的多个目标部位分别所对应的目标标识,根据所述第一相似度和所述第二相似度中的至少一项,对所述初始神经网络模型进行监督训练,得到所述特征检测模型。
在本公开实施例中,可以通过提高第一相似度值,降低第二相似度值的方式,例如图9所示,确定损失函数。基于所述每相邻两张样本场景图像上多个目标部位分别所对应的目标标识,根据确定出的损失函数,调整预设模型的网络参数,监督训练完成后,得到特征检测模型。
上述实施例中,通过基于所述每张样本场景图像上已标注的多个目标部位分别所对应的目标标识,对初始神经网络模型进行监督训练,得到所述特征检测模型,提高了特征检测模型的检测性能和泛化性能。
在一些可选实施例中,针对步骤100-3,可以将第一相似度参考值与所述第一相似度之间的差作为第一损失函数。其中,第一相似度参考值是所述每两张样本场景图像上已标注的相同的目标标识的目标部位所对应的样本特征向量之间的相似度参考值。示例性的,第一相似度参考值是样本特征向量之间的余弦相似度值,取值可以为1。
通过调整初始神经网络模型的网络参数,让第一损失函数最小或达到预设训练次数,得到特征检测模型。
或者,可以将第二相似度参考值与所述第二相似度之间的差作为第二损失函数。其中,第二相似度参考值是所述每两张样本场景图像上已标注的不同的目标标识的目标部位所对应的样本特征向量之间的相似度参考值。示例性的,第二相似度参考值是样本特征向量之间的余弦相似度值,取值可以为0。
同样通过调整初始神经网络模型的网络参数,让第二损失函数最小或达到预设训练次数,得到特征检测模型。
或者,还可以同时将第一损失函数和第二损失函数作为初始神经网络模型的损失函数,调整初始神经网络模型的网络参数,让两个损失函数最小或达到预设训练次数,得到特征检测模型。
在一些可选实施例中,例如图10所示,该方法还可以包括:
在步骤105中,确定出现在所述多个场景图像上的多个相同的目标中的至少一个目标在预设时间段内的运动轨迹是否符合目标运动轨迹。
在本公开实施例中,多张场景图像对应教室场景,所述目标包括教学对象,所述目标运动轨迹包括教学任务中对所述教学对象指定的至少一种运动轨迹。其中,教学任务中对所述教学对象指定的至少一种运动轨迹包括但不限于从当前所在位置走到老师指定的其他位置,其他位置可以是讲台、黑板或其他同学所在位置,或者目标运动轨迹还可以包括在当前位置未发生移动。
例如在教室中,可以采用部署在教室内的带摄像头的教学多媒体设备,包括但不限于教学投影机、教室内的监控设备等来在教室中先后采集多张场景图像。确定教室场景图像包括的至少一个教学对象的运动轨迹,该教学对象可以是学生。
进一步地,可以在设定时间段内,例如老师教学的一堂课的时间段内,确定每个教学对象,例如每个学生的运动轨迹是否符合教学任务中对所述教学对象指定的至少一种运动轨迹。例如,是否根据老师的指示从当前位置移动到黑板前、或者其他同学所在位置,或者始终位于同一位置未发生运动轨迹的移动,例如始终坐在自己的位置上听讲等。可以通过教学多媒体设备显示上述结果,以便老师更好地进行教学任务。
与前述方法实施例相对应,本公开还提供了装置的实施例。
如图11所示,图11是本公开根据一示例性实施例示出的一种目标跟踪装置框图,装置包括:获取模块210,用于获取对应同一场景的多张场景图像;处理模块220,用于对所述多张场景图像中的每张场景图像进行特征提取处理以及目标部位检测,得到所述每张场景图像的特征信息以及所述每张场景图像上的多个目标部位的位置;特征信息确定模块230,用于获取所述每张场景图像的特征信息中与所述多个目标部位的位置分别对应的目标特征信息;目标确定模块240,用于根据获取的所述多个目标部位的位置分别对应的目标特征信息,确定出现在所述多张场景图像上的多个相同的目标,其中, 每张场景图像中包括所述多个相同的目标的部分或全部目标。
在一些可选实施例中,所述处理模块包括:第一处理子模块,用于提取所述多张场景图像中的每张场景图像的第一特征图;第二处理子模块,用于在所述每张场景图像的第一特征图上进行目标部位检测,得到所述每张场景图像上的多个目标部位的位置;以及,对所述每张场景图像的第一特征图进行特征提取处理,得到多维度的第二特征图;所述特征信息确定模块包括:特征向量确定子模块,用于在所述多维度的第二特征图上获取与所述多个目标部位的位置对应的多个目标特征向量。
在一些可选实施例中,所述目标确定模块包括:相似度确定子模块,用于利用所述多张场景图像中每相邻两张场景图像分别对应的多个目标特征信息,得到所述每相邻两张场景图像上各个目标部位之间的相似度;目标确定子模块,用于基于所述每相邻两张场景图像上各个目标部位之间的相似度,确定出现在所述不同场景图像上的多个相同的目标。
在一些可选实施例中,所述每相邻两张场景图像为第一场景图像和第二场景图像;所述相似度确定子模块包括:确定第一场景图像上的N个目标特征向量分别与第二场景图像上的M个目标特征向量之间的相似度;其中,N和M为大于等于2的正整数;根据所述第一场景图像上的N个目标特征向量分别与所述第二场景图像上的M个目标特征向量之间的所述相似度,得到N×M维度的相似度矩阵,所述相似度矩阵中任一维度的值表示所述第一场景图像的任一第一目标部位与所述第二场景图像中的任一第二目标部位的相似度。
在一些可选实施例中,所述目标确定子模块包括:根据所述相似度矩阵,在所述N个目标特征向量中的第一目标特征向量分别与所述M个目标特征向量之间的相似度中,确定相似度最大值;若所述相似度最大值大于预设阈值,则在所述M个目标特征向量中确定所述相似度最大值对应的第二目标特征向量;将所述第一场景图像上所述第一目标特征向量对应的第一目标部位所属目标和所述第二场景图像上第二目标特征向量对应的第二目标部位所属目标,作为相同的目标。
在一些可选实施例中,所述处理模块包括:第三处理子模块,用于通过特征检测模型的骨干网络提取所述多张场景图像中的每张场景图像的第一特征图;第四处理子模块,用于通过所述特征检测模型的部位检测分支,在所述每张场景图像的第一特征图上进行目标部位检测,得到所述每张场景图像上的多个目标部位的位置;以及,通过所述特征检测模型的特征提取分支,对所述每张场景图像的第一特征图进行特征提取处理,得到多维度的第二特征图。
在一些可选实施例中,所述装置还包括:特征向量确定模块,用于将对应同一场景的多张样本场景图像输入预设模型,获得所述预设模型输出的每张样本场景图像上多个目标部位的位置对应的多个特征向量;相似度确定模块,用于根据每相邻两张样本场景图像上已标注的多个目标部位分别对应的目标标识,确定所述每相邻两张样本场景图像上相同的所述目标标识的所述目标部位的位置对应的样本特征向量之间的第一相 似度;和/或确定所述每相邻两张样本场景图像上不相同的目标标识的目标部位的位置所对应的样本特征向量之间的第二相似度;训练模块,用于基于所述每相邻两张样本场景图像上已标注的多个目标部位分别对应的目标标识,根据所述第二相似度与所述第一相似度中的至少一项,对所述预设模型进行监督训练,得到所述特征检测模型。
在一些实施例中,将第一相似度参考值与所述第一相似度之间的差作为第一损失函数;其中,所述第一相似度参考值是所述每相邻两张样本场景图像上已标注的相同的目标标识的目标部位所对应的样本特征向量之间的相似度参考值;将第二相似度参考值与所述第二相似度之间的差作为第二损失函数;其中,所述第二相似度参考值是所述每相邻两张样本场景图像上已标注的不同的目标标识的目标部位所对应的样本特征向量之间的相似度参考值;根据所述第一损失函数和所述第二损失函数中的至少一项,对所述初始神经网络模型进行训练,得到所述特征检测模型。
在一些可选实施例中,所述装置还包括:运动轨迹确定模块,用于确定出现在所述多个场景图像上的多个相同的目标中的至少一个目标在预设时间段内的运动轨迹是否符合目标运动轨迹。
在一些可选实施例中,所述多张场景图像对应教室场景,所述目标包括教学对象,所述目标运动轨迹包括教学任务中对所述教学对象指定的至少一种运动轨迹。
对于装置实施例而言,由于其基本对应于方法实施例,所以相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本公开方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
本公开实施例还提供了一种计算机可读存储介质,存储介质存储有计算机程序,计算机程序用于执行上述任一所述的目标跟踪方法。
在一些可选实施例中,本公开实施例提供了一种计算机程序产品,包括计算机可读代码,当计算机可读代码在设备上运行时,设备中的处理器执行用于实现如上任一实施例提供的目标跟踪方法的指令。
在一些可选实施例中,本公开实施例还提供了另一种计算机程序产品,用于存储计算机可读指令,指令被执行时使得计算机执行上述任一实施例提供的目标跟踪方法的操作。
该计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选实施例中,所述计算机程序产品具体体现为计算机存储介质,在另一个可选实施例中,计算机程序产品具体体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。
在一些可选实施例中,本公开实施例提供了一种计算机程序,其中所述计算机 程序被执行时使得计算机执行上述任一实施例提供的目标跟踪方法的操作。
本公开实施例还提供了一种目标跟踪装置,包括:处理器;用于存储处理器可执行指令的存储器;其中,处理器被配置为调用所述存储器中存储的可执行指令,实现上述任一项所述的目标跟踪方法。
图12为本公开实施例提供的一种目标跟踪装置的硬件结构示意图。该目标跟踪装置310包括处理器311,还可以包括输入装置312、输出装置313和存储器314。该输入装置312、输出装置313、存储器314和处理器311之间通过总线相互连接。
存储器包括但不限于是随机存储记忆体(random access memory,RAM)、只读存储器(read-only memory,ROM)、可擦除可编程只读存储器(erasable programmable read only memory,EPROM)、或便携式只读存储器(compact disc read-only memory,CD-ROM),该存储器用于相关指令及数据。
输入装置用于输入数据和/或信号,以及输出装置用于输出数据和/或信号。输出装置和输入装置可以是独立的器件,也可以是一个整体的器件。
处理器可以包括是一个或多个处理器,例如包括一个或多个中央处理器(central processing unit,CPU),在处理器是一个CPU的情况下,该CPU可以是单核CPU,也可以是多核CPU。
存储器用于存储网络设备的程序代码和数据。
处理器用于调用该存储器中的程序代码和数据,执行上述方法实施例中的步骤。具体可参见方法实施例中的描述,在此不再赘述。
可以理解的是,图12仅仅示出了一种目标跟踪装置的简化设计。在实际应用中,目标跟踪装置还可以分别包含必要的其他元件,包含但不限于任意数量的输入/输出装置、处理器、控制器、存储器等,而所有可以实现本公开实施例的目标跟踪装置都在本公开的保护范围之内。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其它实施方案。本公开旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或者惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由下面的权利要求指出。
以上所述仅为本公开的较佳实施例而已,并不用以限制本公开,凡在本公开的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本公开保护的范围之内。

Claims (14)

  1. 一种目标跟踪方法,其特征在于,包括:
    获取对应同一场景的多张场景图像;
    对所述多张场景图像中的每张场景图像进行特征提取处理以及目标部位检测,得到所述每张场景图像的特征信息以及所述每张场景图像上的多个目标部位的位置;
    获取所述每张场景图像的特征信息中与所述多个目标部位的位置分别对应的目标特征信息;
    根据获取的所述多个目标部位的位置分别对应的目标特征信息,确定出现在所述多张场景图像上的多个相同的目标,其中,每张场景图像中包括所述多个相同的目标的部分或全部目标。
  2. 根据权利要求1所述的方法,其特征在于,所述对所述多张场景图像中的每张场景图像进行特征提取处理以及目标部位检测,得到所述每张场景图像的特征信息以及所述每张场景图像上的多个目标部位的位置,包括:
    提取所述多张场景图像中的每张场景图像的第一特征图;
    在所述每张场景图像的第一特征图上进行目标部位检测,得到所述每张场景图像上的多个目标部位的位置;以及,对所述每张场景图像的第一特征图进行特征提取处理,得到多维度的第二特征图;
    所述获取所述每张场景图像的特征信息中与所述多个目标部位的位置分别对应的目标特征信息,包括:
    在所述多维度的第二特征图上获取与所述多个目标部位的位置分别对应的目标特征向量。
  3. 根据权利要求1或2所述的方法,其特征在于,所述根据获取的所述多个目标部位的位置分别对应的目标特征信息,确定出现在所述多张场景图像上的多个相同的目标,包括:
    利用所述多张场景图像中每相邻两张场景图像分别对应的多个目标特征信息,得到所述每相邻两张场景图像上各个目标部位之间的相似度;
    基于所述每相邻两张场景图像上各个目标部位之间的相似度,确定出现在不同场景图像上的多个相同的目标。
  4. 根据权利要求3所述的方法,其特征在于,所述每相邻两张场景图像为第一场景图像和第二场景图像;
    所述利用所述多张场景图像中每相邻两张场景图像分别对应的多个目标特征信息,得到所述每相邻两张场景图像上各个目标部位之间的相似度,包括:
    确定第一场景图像上的N个目标特征向量分别与第二场景图像上的M个目标特征向量之间的相似度;其中,N和M为大于等于2的正整数;
    根据所述第一场景图像上的N个目标特征向量分别与所述第二场景图像上的M个目标特征向量之间的所述相似度,得到N×M维度的相似度矩阵,所述相似度矩阵中任一维度的值表示所述第一场景图像的任一第一目标部位与所述第二场景图像中的任一第二目标部位的相似度。
  5. 根据权利要求4所述的方法,其特征在于,所述基于所述每相邻两张场景图像上各个目标部位之间的相似度,确定出现在所述不同场景图像上的多个相同的目标,包括:
    根据所述相似度矩阵,在所述N个目标特征向量中的第一目标特征向量分别与所述M个目标特征向量之间的相似度中,确定相似度最大值;
    若所述相似度最大值大于预设阈值,则在所述M个目标特征向量中确定所述相似度最大值对应的第二目标特征向量;
    将所述第一场景图像上所述第一目标特征向量对应的第一目标部位所属目标和所述第二场景图像上第二目标特征向量对应的第二目标部位所属目标,作为相同的目标。
  6. 根据权利要求1-5任一项所述的方法,其特征在于,所述对所述多张场景图像中的每张场景图像进行特征提取处理以及目标部位检测,得到所述每张场景图像的特征信息以及所述每张场景图像上的多个目标部位的位置,包括:
    通过特征检测模型的骨干网络提取所述多张场景图像中的每张场景图像的第一特征图;
    通过所述特征检测模型的部位检测分支,在所述每张场景图像的第一特征图上进行目标部位检测,得到所述每张场景图像上的多个目标部位的位置;以及,通过所述特征检测模型的特征提取分支,对所述每张场景图像的第一特征图进行特征提取处理,得到多维度的第二特征图。
  7. 根据权利要求6所述的方法,其特征在于,所述方法还包括:
    将对应同一场景的多张样本场景图像输入初始神经网络模型,获得所述初始神经网络模型输出的每张样本场景图像上多个目标部位的位置分别对应的样本特征向量;
    根据所述每张样本场景图像上已标注的多个目标部位分别对应的目标标识,确定在每相邻两张样本场景图像上,相同的所述目标标识的所述目标部位的位置对应的所述样本特征向量之间的第一相似度,和/或确定不同的所述目标标识的所述目标部位的位置对应的所述样本特征向量之间的第二相似度;
    基于所述每张样本场景图像上已标注的多个目标部位分别对应的目标标识,根据所述第一相似度和所述第二相似度中的至少一项,对所述初始神经网络模型进行监督训练,得到所述特征检测模型。
  8. 根据权利要求7所述的方法,其特征在于,所述基于所述每张样本场景图像上已标注的多个目标部位分别对应的目标标识,根据所述第一相似度和所述第二相似度中的至少一项,对所述初始神经网络模型进行监督训练,得到所述特征检测模型,包括:
    将第一相似度参考值与所述第一相似度之间的差作为第一损失函数;其中,所述第一相似度参考值是所述每相邻两张样本场景图像上已标注的相同的目标标识的目标部位所对应的样本特征向量之间的相似度参考值;
    将第二相似度参考值与所述第二相似度之间的差作为第二损失函数;其中,所述第二相似度参考值是所述每相邻两张样本场景图像上已标注的不同的目标标识的目标部位所对应的样本特征向量之间的相似度参考值;
    根据所述第一损失函数和所述第二损失函数中的至少一项,对所述初始神经网络模 型进行训练,得到所述特征检测模型。
  9. 根据权利要求1-8任一项所述的方法,其特征在于,所述方法还包括:
    确定出现在所述多个场景图像上的多个相同的目标中的至少一个目标在预设时间段内的运动轨迹是否符合目标运动轨迹。
  10. 根据权利要求9所述的方法,其特征在于,所述多张场景图像对应教室场景,所述目标包括教学对象,所述目标运动轨迹包括教学任务中对所述教学对象指定的至少一种运动轨迹。
  11. 一种目标跟踪装置,其特征在于,所述装置包括:
    获取模块,用于获取对应同一场景的多张场景图像;
    处理模块,用于对所述多张场景图像中的每张场景图像进行特征提取处理以及目标部位检测,得到所述每张场景图像的特征信息以及所述每张场景图像上的多个目标部位的位置;
    特征信息确定模块,用于获取所述每张场景图像的特征信息中与所述多个目标部位的位置分别对应的目标特征信息;
    目标确定模块,用于根据获取的所述多个目标部位的位置分别对应的目标特征信息,确定出现在所述多张场景图像上的多个相同的目标,其中,每张场景图像中包括所述多个相同的目标的部分或全部目标。
  12. 一种计算机可读存储介质,其特征在于,所述存储介质存储有计算机程序,所述计算机程序用于执行上述权利要求1-10任一所述的目标跟踪方法。
  13. 一种目标跟踪装置,其特征在于,包括:
    处理器;
    用于存储所述处理器可执行指令的存储器;
    其中,所述处理器被配置为调用所述存储器中存储的可执行指令,实现权利要求1-10中任一项所述的目标跟踪方法。
  14. 一种计算机程序,其中所述计算机程序被处理器执行时,能够实现权利要求1至10中任一项所述的目标跟踪方法。
PCT/CN2021/087870 2020-04-28 2021-04-16 目标跟踪方法及装置、存储介质及计算机程序 WO2021218671A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
KR1020227002703A KR20220024986A (ko) 2020-04-28 2021-04-16 타깃 추적 방법 및 장치, 저장 매체 및 컴퓨터 프로그램
JP2022504275A JP7292492B2 (ja) 2020-04-28 2021-04-16 オブジェクト追跡方法及び装置、記憶媒体並びにコンピュータプログラム

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010352365.6 2020-04-28
CN202010352365.6A CN111539991B (zh) 2020-04-28 2020-04-28 目标跟踪方法及装置、存储介质

Publications (1)

Publication Number Publication Date
WO2021218671A1 true WO2021218671A1 (zh) 2021-11-04

Family

ID=71977335

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/087870 WO2021218671A1 (zh) 2020-04-28 2021-04-16 目标跟踪方法及装置、存储介质及计算机程序

Country Status (5)

Country Link
JP (1) JP7292492B2 (zh)
KR (1) KR20220024986A (zh)
CN (1) CN111539991B (zh)
TW (1) TWI769787B (zh)
WO (1) WO2021218671A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114783043A (zh) * 2022-06-24 2022-07-22 杭州安果儿智能科技有限公司 一种儿童行为轨迹定位方法和系统
CN115880614A (zh) * 2023-01-19 2023-03-31 清华大学 一种宽视场高分辨视频高效智能检测方法及系统
CN116721045A (zh) * 2023-08-09 2023-09-08 经智信息科技(山东)有限公司 一种多ct图像融合的方法及装置

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111539991B (zh) * 2020-04-28 2023-10-20 北京市商汤科技开发有限公司 目标跟踪方法及装置、存储介质
CN113129339B (zh) * 2021-04-28 2023-03-10 北京市商汤科技开发有限公司 一种目标跟踪方法、装置、电子设备及存储介质
WO2024071587A1 (ko) * 2022-09-29 2024-04-04 삼성전자 주식회사 객체를 추적하는 방법 및 전자 장치

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109522843A (zh) * 2018-11-16 2019-03-26 北京市商汤科技开发有限公司 一种多目标跟踪方法及装置、设备和存储介质
CN109800624A (zh) * 2018-11-27 2019-05-24 上海眼控科技股份有限公司 一种基于行人重识别的多目标跟踪方法
CN109859238A (zh) * 2019-03-14 2019-06-07 郑州大学 一种基于多特征最优关联的在线多目标跟踪方法
CN110163890A (zh) * 2019-04-24 2019-08-23 北京航空航天大学 一种面向空基监视的多目标跟踪方法
CN110866428A (zh) * 2018-08-28 2020-03-06 杭州海康威视数字技术股份有限公司 目标跟踪方法、装置、电子设备及存储介质
CN110889464A (zh) * 2019-12-10 2020-03-17 北京市商汤科技开发有限公司 神经网络训练、目标对象的检测方法及装置
CN111539991A (zh) * 2020-04-28 2020-08-14 北京市商汤科技开发有限公司 目标跟踪方法及装置、存储介质

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009020897A (ja) 2002-09-26 2009-01-29 Toshiba Corp 画像解析方法、画像解析装置、画像解析プログラム
JP4580189B2 (ja) 2004-05-28 2010-11-10 セコム株式会社 センシング装置
TWI492188B (zh) * 2008-12-25 2015-07-11 Univ Nat Chiao Tung 利用多攝影機自動偵測與追蹤多目標的方法及系統
CN108875465B (zh) * 2017-05-26 2020-12-11 北京旷视科技有限公司 多目标跟踪方法、多目标跟踪装置以及非易失性存储介质
JP2020522002A (ja) * 2017-06-02 2020-07-27 エスゼット ディージェイアイ テクノロジー カンパニー リミテッドSz Dji Technology Co.,Ltd 移動ターゲットを認識、追跡、および合焦するための方法及びシステム
CN109214238B (zh) * 2017-06-30 2022-06-28 阿波罗智能技术(北京)有限公司 多目标跟踪方法、装置、设备及存储介质
US9946960B1 (en) 2017-10-13 2018-04-17 StradVision, Inc. Method for acquiring bounding box corresponding to an object in an image by using convolutional neural network including tracking network and computing device using the same
CN108491816A (zh) * 2018-03-30 2018-09-04 百度在线网络技术(北京)有限公司 在视频中进行目标跟踪的方法和装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110866428A (zh) * 2018-08-28 2020-03-06 杭州海康威视数字技术股份有限公司 目标跟踪方法、装置、电子设备及存储介质
CN109522843A (zh) * 2018-11-16 2019-03-26 北京市商汤科技开发有限公司 一种多目标跟踪方法及装置、设备和存储介质
CN109800624A (zh) * 2018-11-27 2019-05-24 上海眼控科技股份有限公司 一种基于行人重识别的多目标跟踪方法
CN109859238A (zh) * 2019-03-14 2019-06-07 郑州大学 一种基于多特征最优关联的在线多目标跟踪方法
CN110163890A (zh) * 2019-04-24 2019-08-23 北京航空航天大学 一种面向空基监视的多目标跟踪方法
CN110889464A (zh) * 2019-12-10 2020-03-17 北京市商汤科技开发有限公司 神经网络训练、目标对象的检测方法及装置
CN111539991A (zh) * 2020-04-28 2020-08-14 北京市商汤科技开发有限公司 目标跟踪方法及装置、存储介质

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114783043A (zh) * 2022-06-24 2022-07-22 杭州安果儿智能科技有限公司 一种儿童行为轨迹定位方法和系统
CN115880614A (zh) * 2023-01-19 2023-03-31 清华大学 一种宽视场高分辨视频高效智能检测方法及系统
CN116721045A (zh) * 2023-08-09 2023-09-08 经智信息科技(山东)有限公司 一种多ct图像融合的方法及装置
CN116721045B (zh) * 2023-08-09 2023-12-19 经智信息科技(山东)有限公司 一种多ct图像融合的方法及装置

Also Published As

Publication number Publication date
CN111539991B (zh) 2023-10-20
JP2022542566A (ja) 2022-10-05
JP7292492B2 (ja) 2023-06-16
TWI769787B (zh) 2022-07-01
CN111539991A (zh) 2020-08-14
TW202141424A (zh) 2021-11-01
KR20220024986A (ko) 2022-03-03

Similar Documents

Publication Publication Date Title
WO2021218671A1 (zh) 目标跟踪方法及装置、存储介质及计算机程序
CN111709409B (zh) 人脸活体检测方法、装置、设备及介质
WO2021043168A1 (zh) 行人再识别网络的训练方法、行人再识别方法和装置
Liu et al. Bayesian model adaptation for crowd counts
CN104601964B (zh) 非重叠视域跨摄像机室内行人目标跟踪方法及系统
CN205334563U (zh) 一种学生课堂参与度检测系统
US10140508B2 (en) Method and apparatus for annotating a video stream comprising a sequence of frames
CN110532970B (zh) 人脸2d图像的年龄性别属性分析方法、系统、设备和介质
JP2020520512A (ja) 車両外観特徴識別及び車両検索方法、装置、記憶媒体、電子デバイス
CN111767882A (zh) 一种基于改进yolo模型的多模态行人检测方法
Bedagkar-Gala et al. Multiple person re-identification using part based spatio-temporal color appearance model
CN107145826B (zh) 基于双约束度量学习和样本重排序的行人再识别方法
CN111666919B (zh) 一种对象识别方法、装置、计算机设备和存储介质
CN107798313A (zh) 一种人体姿态识别方法、装置、终端和存储介质
CN112001278A (zh) 一种基于结构化知识蒸馏的人群计数模型及其方法
Krajník et al. Image features and seasons revisited
CN112906520A (zh) 一种基于姿态编码的动作识别方法及装置
Zhang et al. Joint discriminative representation learning for end-to-end person search
CN113793362A (zh) 基于多镜头视频的行人轨迹提取方法和装置
CN111626212B (zh) 图片中对象的识别方法和装置、存储介质及电子装置
CN113822134A (zh) 一种基于视频的实例跟踪方法、装置、设备及存储介质
CN115018886B (zh) 运动轨迹识别方法、装置、设备及介质
Proenca et al. SHREC’15 Track: Retrieval of Oobjects captured with kinect one camera
CN103020631A (zh) 基于星型模型的人体运动识别方法
TWI776429B (zh) 動作識別方法及裝置、電腦可讀存儲介質

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21796788

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022504275

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20227002703

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21796788

Country of ref document: EP

Kind code of ref document: A1