WO2022019076A1 - 情報処理装置、情報処理方法及びプログラム - Google Patents
情報処理装置、情報処理方法及びプログラム Download PDFInfo
- Publication number
- WO2022019076A1 WO2022019076A1 PCT/JP2021/024898 JP2021024898W WO2022019076A1 WO 2022019076 A1 WO2022019076 A1 WO 2022019076A1 JP 2021024898 W JP2021024898 W JP 2021024898W WO 2022019076 A1 WO2022019076 A1 WO 2022019076A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- tracking target
- information processing
- feature amount
- candidate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
- G06T7/248—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/776—Validation; Performance evaluation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30241—Trajectory
Definitions
- the present invention relates to a technique for tracking a specific subject in an image.
- Non-Patent Document 1 is one of the methods for tracking a specific subject in an image.
- the image showing the tracking target and the image to be the search range are input to the Convolutional Neural Network (hereinafter abbreviated as CNN) having the same weight.
- CNN Convolutional Neural Network
- Non-Patent Document 1 when an object similar to the tracking target exists in the image, the cross-correlation value with the similar object becomes high, so that an error of erroneously tracking the similar object as the tracking target occurs. there is a possibility. Further, Patent Document 1 predicts the positions of the tracking target and the similar object when an object similar to the tracking target exists in the vicinity of the tracking target. However, in the method shown in Patent Document 1, since only the position of the tracking target is used for prediction, when the tracking target exists at a position distant from the predicted position or when a similar object is close to the tracking target. In addition, it may occur that the tracked object is lost.
- the present invention has been made in view of such a problem, and an object thereof is to track a specific object.
- the information processing device that solves the above problems is an information processing device that tracks a specific object from images captured at a plurality of times, and is a learning device that detects the position of a predetermined object in the input image.
- a holding means for holding the feature amount of the tracking target based on the completed model, an acquisition means for acquiring the feature amount of the object in a plurality of images based on the learned model, the feature amount of the tracking target, and the above.
- a detection means for detecting a candidate object similar to the tracking target based on the feature amount of the object acquired from a plurality of images, and the candidate object detected in the first image among the plurality of images. It is characterized by having a specific means for specifying a correspondence relationship between the candidate object and the candidate object in the second image captured at a time different from that of the first image.
- a specific object can be tracked.
- FIG. 1 shows the hardware configuration example of an information processing apparatus.
- Block diagram showing a functional configuration example of an information processing device Flow chart showing the processing procedure executed by the information processing device Flowchart showing the processing procedure executed by the tracking target determination unit Flow chart showing the processing procedure executed by the object detection unit Flow chart showing the processing procedure executed by the information processing device Flowchart showing the processing procedure executed by the tracking unit
- the figure which shows the example which the tracking target is shielded
- the figure which shows the example which detects the position of the tracking target in an image Flow chart showing the processing procedure executed by the information processing device Flow chart showing the processing procedure executed by the information processing device
- the figure which shows the example of the shielding judgment A diagram showing an example of an image in which multiple candidate objects are detected.
- Block diagram showing a functional configuration example of an information processing device Flow chart showing the processing procedure executed by the information processing device Figure showing an example of the acquired template image and search range image Figure showing an example of the acquired template image and search range image
- Block diagram showing a functional configuration example of an information processing device Block diagram showing a functional configuration example of an information processing device Flow chart showing the processing procedure executed by the information processing device
- FIG. 1 is a hardware configuration diagram of an information processing device 1 that tracks a specific object from images captured at a plurality of times in the present embodiment.
- the CPU H101 controls the entire apparatus by executing the control program stored in the ROM H102.
- RAM H103 temporarily stores various data from each component.
- the program is expanded so that the CPU H101 can be executed.
- the storage unit H104 stores the data to be processed according to the present embodiment, and stores the data to be tracked.
- As the medium of the storage unit H104 an HDD, a flash memory, various optical media, or the like can be used.
- the input unit H105 is composed of a keyboard, a touch panel, a dial, and the like, and receives input from the user, and is used when setting a tracking target or the like.
- the display unit H106 is composed of a liquid crystal display or the like, and displays a subject and a tracking result to the user. In addition, this device can communicate with other devices such as a photographing device via the communication unit H107.
- FIG. 2 is a block diagram showing a functional configuration example of the information processing apparatus 1.
- the information processing device 1 has an image acquisition unit 201, a tracking target determination unit 202, a holding unit 203, an object detection unit 204, and a tracking unit 205, and each component is connected to a storage unit 206.
- the storage unit 206 may be in an external device or may be possessed by the information processing device 1.
- Each functional component will be briefly described.
- the image acquisition unit 201 acquires an image obtained by capturing an image of a predetermined object by an image pickup device.
- the predetermined object is, for example, an object such as a person or a vehicle, and an object having a certain degree of individual difference. In the following embodiment, tracking of a person will be described as a specific example.
- the tracking target determination unit 202 determines an object to be a tracking target (object of interest) among the objects included in the image.
- the holding unit 203 holds the feature amount of the object that is a candidate for tracking target from the initial image.
- the object detection unit 204 detects the position of an object from images captured at a plurality of times.
- the tracking unit 205 identifies and tracks a tracking target from images captured at a plurality of times.
- FIG. 3 is a flowchart showing the processing flow of the present embodiment.
- the notation of the process (step) is omitted by adding S at the beginning of each process (step).
- the information processing apparatus does not necessarily have to perform all the steps described in this flowchart.
- the processes executed by the CPU H101 are shown as functional blocks.
- the image acquisition unit 201 acquires an image (initial image) obtained by capturing a predetermined object.
- the image acquisition unit 201 may acquire an image captured by an image pickup device connected to the information processing device, or may acquire an image stored in the storage unit H104.
- it is an object to set the object of interest to be tracked by using the initial image.
- the tracking target determination unit 202 determines an object to be a tracking target (object of interest) from the image acquired in S301.
- the tracking target may be one or a plurality.
- an example of selecting one tracking target will be described.
- a trained model that detects the position of a predetermined object is used to acquire the position of an image feature indicating the predetermined object from the image, and a partial image including the object of interest is determined.
- the trained model for example, a model in which image features are learned in advance for a predetermined object such as a person or a vehicle is used. The learning method will be described later. If one object is detected in the image, it will be tracked.
- the image of the next frame may be input.
- the candidates for the tracking target are output, and the tracking target is determined by the method specified in advance.
- the tracking target (object of interest) in the acquired image is determined according to the instruction specified by the input unit H105.
- the tracking target may be determined by automatically detecting the main subject or the like in the image.
- a method for automatically detecting the main subject in an image for example, Japanese Patent No. 6556033 can be mentioned. Further, the determination may be made based on both the designation by the input unit H105 and the object detection result in the image. Examples of the technique for detecting an object in an image include "Liu, SSD: Single Shot Multibox Detector. In: ECCV2016" and the like.
- Figure 12 shows the results of detecting candidates for tracking in the image.
- Person 1303, person 1305, and person 1307 in FIG. 12 are candidates for tracking, respectively.
- It is a Bounding Box (hereinafter referred to as BB) indicating the position of a candidate in which the frame 1303, the frame 1305, and the frame 1307 are detected.
- the user can determine the tracking target by touching any of the candidate BBs shown on the display unit 106 or selecting by dialing or the like.
- the present embodiment does not limit the means for designating the tracking target.
- the holding unit 203 holds the feature amount of the tracking target from the image including the determined tracking target based on the trained model.
- FIG. 4 shows a detailed flowchart of S303 for the feature amount holding process.
- the holding unit 202 provides a template feature amount representing the tracking target based on the image obtained by the image acquisition unit 201 and the Bounding Box (hereinafter referred to as BB) indicating the position of the tracking target obtained by the tracking target determination unit 202. Generate and retain.
- BB Bounding Box
- the holding unit 203 acquires information about the position in the image of the tracking target determined by the tracking target determination unit 202.
- the information about the position of the tracking target acquired here is hereinafter referred to as a Bounding Box (hereinafter referred to as BB).
- BB Bounding Box
- the information about the position of the tracking target in S302, the information in which the center position of the tracking target is input by the user when the tracking target is determined, or the predetermined position (for example, the center of gravity) of the tracking target by the learning model is used. Use the detected result.
- the holding unit 202 acquires a template image obtained by extracting an image indicating the tracking target to a predetermined size based on the position of the tracking target in the image. That is, the periphery of the region obtained by S401 is cut out as a template image from the initial image and resized to a predetermined size.
- the predetermined size may be adjusted to the size of the input image of the trained model.
- the holding unit 202 acquires the feature amount of the tracking target by inputting the template image indicating the tracking target into the trained model that detects the position of a predetermined object in the input image.
- the image resized in S402 is input to the CNN (trained model).
- the CNN has been learned in advance so as to obtain a feature amount that makes it easy to distinguish between a tracking target and a non-tracking target. The learning method will be described later.
- the CNN is composed of convolution and non-linear transformations such as Rectified Liner Unit (hereinafter referred to as ReLU) and Max Polling. ReLU and Max Polling described here are just examples.
- the holding unit 202 holds the feature amount of the tracking target obtained in S403 as a template feature amount indicating the tracking target.
- the above processing is the processing of the setting phase of the tracking target.
- the image acquisition unit 201 acquires images captured at a plurality of times in order to perform tracking processing.
- a process of detecting the tracking target set in the first image from the second image captured at a time different from that of the first image will be described. Further, it is assumed that the first image and the second image are captured so that the tracking target is reflected as much as possible.
- FIG. 5 shows a flowchart illustrating the process executed by the object detection unit 204 in S305.
- the processing after S304 is a processing for an image captured after the image for which the tracking target is determined, and is a processing for detecting the tracking target from the image.
- the object detection unit 204 acquires a search range image (partial image) indicating a region for searching the tracking target from the current image (second image).
- the search range image is acquired based on the detection position of the previous tracking target or candidate object. That is, in the second image, a partial image of a predetermined size is extracted from the region corresponding to the vicinity of the candidate object detected from the first image (past image).
- the size of the search area may be changed according to the speed of the object and the angle of view of the image. Further, the search area may be the entire search image or may be around the position of the previous tracking target.
- the object detection unit 204 extracts the input image to be input to the trained model from the search range image.
- the object detection unit 204 cuts out the search range area from the search range image and resizes it.
- the size of the search range is determined as a constant multiple of the size of the BB to be tracked. By obtaining the features from images of the same size, it is possible to obtain the features with less noise. Based on the determined search region, the region is cut out and resized so as to be equivalent to the resizing ratio in S402.
- the object detection unit 204 inputs the extracted search range image into the trained model (CNN) that detects the position of a predetermined object in the input image, so that the feature amount of each search range image is obtained.
- CNN trained model
- the feature amount of each search range image indicates the feature amount of the object existing in each search range image. It is assumed that the CNN in S503 has a part or all of the same weight as the CNN in S403. With this CNN, for example, when a certain search range image contains a shield that shields a person, a feature amount indicating the shield can be acquired. Further, when the other partial image does not include a person but includes an animal, a feature amount indicating the animal can be obtained.
- the object detection unit 204 acquires a cross-correlation between the feature amount of the tracking target and the feature amount of the object existing in the current search range image obtained in S503.
- Cross-correlation is an index showing the degree of similarity between detected objects.
- an object similar to the tracking target an object of the same type
- a candidate object an object whose cross-correlation is larger than a predetermined value
- Candidate objects include tracked and / or non-tracked objects, or both.
- the tracking target is a person
- the cross-correlation of the search range image having the feature amount indicating the person becomes high.
- the object detection unit 204 detects the position of the candidate object in the current image. Since some or all of the weights of the CNN in S503 and the CNN in S403 are the same, the value of the cross-correlation becomes large at the position where the probability that the candidate object exists in the search range is high. Therefore, it is possible to detect the position of the candidate object from the search range image in which the value of the cross-correlation is equal to or larger than the threshold value. That is, based on the cross-correlation obtained in S504, a position where the cross-correlation is larger than a predetermined value is detected as the position of the candidate object.
- Positions where the cross-correlation is smaller than a predetermined value can be regarded as unlikely to have a tracking target.
- a BB that further surrounds the candidate object is acquired.
- the position of BB is determined based on the search range image showing a high reaction by cross-correlation.
- FIG. 9 shows an example of the processing result of S305.
- Map 901 shows a map obtained based on cross-correlation.
- the tracking target is a person 902, and the cross-correlation value of the cell 904 near the center of the person 902 is high. If this correlation value is equal to or greater than the threshold value, it can be estimated that the person 902 is located in the cell 904.
- the width and height of the BB may be learned so that the CNN can estimate them in advance (described later). Further, the width and height of the BB to be tracked obtained in S302 may be used as they are.
- the tracking unit 205 identifies the correspondence between the candidate object detected in the first image among the plurality of images and the candidate object in the second image captured at a time different from that of the first image. do. By identifying the correspondence between the objects detected at a plurality of times, it is possible to track the objects having the correspondence. Further, by updating the feature amount and the position of the tracking target based on the image in which the tracking target is detected, more stable tracking can be performed.
- FIG. 7 shows a flowchart illustrating the process executed by the tracking unit 205.
- the tracking unit 205 specifies a combination (correspondence relationship) in which the acquired similarity is equal to or higher than the threshold value.
- the high degree of similarity between the past candidate and the current candidate indicates that the past candidate and the current candidate are likely to be the same object.
- the method of mapping is not limited.
- the same object is specified based on the similarity with the candidate object in the second image.
- the similarity L between the past candidate c 1 and the current candidate c 2 is calculated as follows.
- BB is a vector that summarizes the four variables (center coordinate value x, center coordinate value y, width, height) of each candidate BB, and f indicates the characteristics of each candidate.
- the features are extracted features in which each candidate is located from the feature map obtained from CNN.
- W 1 and W 2 are coefficients obtained empirically, respectively, and W 1 > 0 and W 2 > 0. That is, the closer the feature amount is, the higher the similarity is, and the closer the detection position and the size of the detection area are, the higher the similarity is.
- the tracking unit 205 specifies the tracking target based on the matching result.
- the current candidate associated with the past tracking target can be specified as the tracking target.
- Candidate objects other than the tracking object are given information indicating that they are not the tracking target. If there is no current candidate object whose similarity to the feature amount of the past tracking target is greater than a predetermined threshold value, it is possible that the tracking target is out of the angle of view or is shielded by another object. be. In that case, it may be notified that the tracking target has not been specified.
- the storage unit 206 holds the feature amount of the tracking target in the second image and the feature amount of the candidate object in the second image. If the tracking target is specified from the current image, the feature amount of the tracking target is updated. When a candidate object whose similarity with the feature amount of the tracking target in the first image is larger than a predetermined threshold value is detected from the second image, the feature amount acquired from the second image is used as the feature amount of the tracking target. Hold. When a candidate object whose similarity with the feature amount of the tracking target is greater than a predetermined threshold value is not detected from the second image, the feature amount acquired from the first image is retained as the feature amount of the tracking target.
- the image acquisition unit 201 determines whether or not to end the tracking process. If the tracking process is to be continued, the process returns to S304, and if the tracking process is to be completed, the process proceeds to the end.
- the end determination is, for example, when the user's end instruction is acquired or when the image of the next frame cannot be acquired. If the image of the next frame can be acquired, the process proceeds to S304.
- the above is the process in the execution step of the tracking process. Next, the learning process will be described.
- ⁇ Learning step> a method for learning a trained model (specifically, CNN) for estimating the position of an object in an image will be shown.
- the object classification task for example, detecting a person but not an animal
- the object classification task has been learned to some extent, and an individual can be identified based on the appearance characteristics of a predetermined object.
- the information processing device 2 is described from the Ground Truth acquisition unit 1400, the template image acquisition unit 1401, the search range image acquisition unit 1402, the tracking target estimation unit 1403, the loss calculation unit 1404, the parameter update unit 1405, the parameter storage unit 1406, and the storage unit 1407. Obviously.
- the storage unit 1407 stores images captured at a plurality of times and GT information indicating the position and size of the tracking target in each of the images.
- the information in which the user inputs the center position (or BB indicating the area) of the object to be tracked is stored as GT information.
- the GT information generation method may be a method other than the GT attachment by the user. For example, the result of detecting the position of the object to be tracked by using another trained model may be used.
- the GT acquisition unit 1400, the template image acquisition unit 401, and the search range image acquisition unit 1402 acquire images stored in the storage unit 1407, respectively.
- the Ground Truth (hereinafter referred to as GT) acquisition unit 1400 acquires the correct position of the object to be tracked in the template image and the correct position of the tracking target in the search range image by acquiring the GT information.
- the BB of the tracking target in the template image obtained by the template image acquisition unit 1401 and the BB of the tracking target in the search range image obtained by the search range image acquisition unit 1402 are acquired. Specifically, as shown in FIG. 17, information indicating that the object 1705 to be the tracking target object is the tracking target object is given to the image 1704, and the tracking target object is given to the other regions. Information is given to indicate that it is not.
- the region of the tracked object 1705 is labeled with a binary real number of 1 and the other regions are labeled with a binary real number of 0.
- the template image acquisition unit 1401 acquires an image in which a tracking target exists as a template image.
- the template image may include a plurality of objects of the same category.
- the search range image acquisition unit 1402 acquires an image to be searched for the tracking target. That is, it is an image that can acquire the feature amount of a specific object to be tracked.
- the template image acquisition unit 1401 selects an arbitrary frame from a series of sequence images, and the image acquisition unit 1402 is selected by the search group from another sequence image that is not selected by the template image acquisition unit 1401. Select a frame.
- the tracking target estimation unit 1403 estimates the position of the tracking target in the search range image.
- the position of the tracking target in the search range image is estimated based on the template image obtained by the template image acquisition unit 1401 and the search range image obtained by the search range image acquisition unit 1402.
- the loss calculation unit 1404 calculates the loss based on the tracking result obtained by the tracking target estimation unit 1403 and the position of the tracking target in the search range image obtained by the GT acquisition unit 1404. The closer to the estimation result from the teacher data, the smaller the loss. In addition, based on the GT information acquired by the GT acquisition unit, the correct answer of the position of the tracking target in the search range image is acquired.
- the parameter update unit 1405 updates the parameters of the CNN based on the loss obtained in the loss calculation unit 1404.
- the parameters are updated so that the loss values converge.
- the parameter set is updated and the learning is terminated.
- the parameter storage unit 1406 stores the CNN parameters updated in the parameter update unit 1405 in the storage unit 206 as learned parameters.
- the GT acquisition unit 1400 acquires GT information, based on the GT information, the correct position of the object to be tracked in the template image (BB of the tracking target) and the correct position of the tracking target in the search range image. And get.
- the template image acquisition unit 1401 acquires the template image. For example, an image as shown in FIG. 15A is acquired.
- the object 1601 of FIG. 15A is the tracking target
- the partial image 1602 shows the BB of the tracking target obtained by the GT acquisition unit 1400
- the partial image 1603 shows the region to be cut out as a template. That is, here, the template image acquisition unit 1401 acquires the partial image 1603 as the template image.
- the template image acquisition unit 1401 cuts out a template area from the template image and resizes it to a predetermined size.
- the size of the area to be cut out is determined as a constant multiple of the size of the BB based on the BB to be tracked.
- the tracking target estimation unit 1403 inputs the template image generated in S1502 into the learning model (CNN), and obtains the CNN feature amount of the template.
- the search range image acquisition unit 1402 acquires the search range image.
- the partial image to be the search range is acquired as a partial image including the tracking target based on the position and size of the tracking target object.
- An example of an image in the search range is shown in FIG. 15B.
- the object 1604 shows the tracking target
- the partial image 1605 shows the tracking target BB
- the partial image 1606 shows the search range area.
- the search range image 1606 includes an object similar to the object to be tracked.
- the search range image acquisition unit 1402 cuts out the search range area from the search range image and resizes it.
- the size of the search range is determined to be a constant multiple of the size of the BB to be tracked, and in S1502, the template is resized according to the resized magnification (the size of the tracking target after resizing the template and the tracking after resizing the search range). Resize so that the target sizes are approximately the same).
- the tracking target estimation unit 1403 inputs the search range image generated in S1506 into the learning model (CNN), and obtains the CNN feature amount of the search range.
- CNN learning model
- the tracking target estimation unit 1403 estimates the position of the tracking target in the search range image.
- the tracking target estimation unit 1403 calculates a cross-correlation indicating the degree of similarity between the CNN feature of the tracking target obtained in S1506 and the CNN feature of the search range obtained in S1506, and outputs it as a map.
- the tracking target is estimated by indicating the position where the cross-correlation is equal to or higher than the threshold value.
- Map 1701 is a map obtained by cross-correlation, and regions 1702 and 1703 show places where the cross-correlation value is high.
- the cross-correlation value at the position where there is a high possibility that an object similar to the tracking target exists becomes high.
- the position of the tracking target that is the correct answer obtained by the GT acquisition unit 1400 is 1705 in FIG. That is, since 1702 indicates the position of the tracking target, a desirable value is estimated, but since 1703 has a high cross-correlation value even though it is not a tracking target, an undesired value is estimated. It can be said that.
- the purpose of the learning step is to update the weight so that the cross-correlation value at the position of the tracking target is high and the cross-correlation value at the position other than the tracking target is low.
- Equation (1-2) is the average of the squares of the differences between the map Cin and the map Cgt for each pixel. If the tracking target can be estimated correctly, the loss will be small, and it is estimated that the non-tracking target is the tracking target. If the tracking target is estimated to be a non-tracking target, the loss will be large.
- the loss related to size is calculated according to the formula (1-3).
- Loss W and Loss H are estimated losses related to the width and height of the tracked object, respectively.
- W gt and H gt the width value and the height value of the tracking target are embedded in the positions of the tracking targets, respectively.
- W i n also in H in, the learning so that the width and height of the tracking target position of the tracking target is inferred move on.
- the loss is described in the form of Mean Squared Error (hereinafter referred to as MSE), but the loss is not limited to MSE. It may be Smooth-L1 or the like. It does not limit the formula for calculating loss. Also, the loss function for position and the loss function for size may be different.
- MSE Mean Squared Error
- the parameter update unit 1405 updates the CNN parameter based on the loss calculated in S1508.
- the parameters are updated based on the backpropagation method (Backpropagation) using Momentum SGD or the like.
- Backpropagation Backpropagation
- the loss value of the equation (1-2) is calculated for the scores estimated for a plurality of various images.
- the coupling weighting coefficient between the layers of the training model is updated so that the loss values for the plurality of images are all smaller than the predetermined threshold values.
- the parameter storage unit 1406 stores the CNN parameters updated by S1509 in the storage unit 206.
- the tracking target can be correctly tracked by inferring using the parameters stored in S1510.
- the parameter update unit 1405 determines whether or not the learning is completed.
- the learning end determination is determined when the value of the loss obtained by the equation (1-2) becomes smaller than a predetermined threshold value.
- Embodiment 1 The present embodiment is characterized in that a tracking target is tracked and an object similar to the tracking target is tracked at the same time. It will be described with reference to FIG. 8 that erroneous tracking of a similar object is reduced by simultaneously tracking an object similar to the tracking target.
- a person 804 and a person 805 are shown in the image, of which the tracking target is the person 804 and the similar object is the person 805.
- the feature amount of the object 804 is shielded and the feature amount in which the object-likeness is impaired is detected.
- the similarity between the past candidates 804 and 805 and the object 808 is compared, the similarity between the object 805 and the object 808 is higher than the similarity between the object 804 and the object 808.
- the high degree of similarity is due to the fact that the CNN features associated with each candidate are learned to distinguish between objects, and that the position and size of the BB change slowly over time.
- the past candidate associated with the current candidate 808 is not 804 but 805.
- there are two candidates, 811 and 812, which are newly obtained at t 2. Since these two candidate objects have no shielding, desirable features can be obtained.
- the similarity between the object 808 and the object 811 and the similarity between the object 804 and the object 812 are high, and the similarity between the 808 and 812 and the 806 and 811 are low. Therefore, since the tracking target 806 is associated with the 812, the tracking target can be tracked correctly.
- Modification 1-1 Online Metric Learning
- the formula (1-1) in the first embodiment sequentially updated by using the feature amount of the tracking target and similar objects obtained in time series the weight W 2 for the feature quantity.
- f target is a feature quantity of the tracking target obtained at each time
- f dispatch devisr is a feature quantity of a similar object obtained at each time.
- the conversion F has a configuration in which one or more layers of Neural Networks are connected, and can be learned in advance by using triplet l effetss or the like.
- triplet l effetss By learning the transformation F by triplet l effetss, it is possible to learn the transformation in which the distance is short for the same object in the past and the present, and the distance is long for different objects.
- triplet l effetss refer to "Wang, Learning Fine-grounded Image Simulity with Deep Ranking, In: CVPR2014".
- the shielding determination process is further performed in the tracking target identification process of S306 of FIG. 7 in the first embodiment.
- the hardware configuration is the same as that of the first embodiment.
- FIG. 18 shows an example of the functional configuration of the information processing apparatus 1'in the second embodiment.
- a shielding determination unit 207 that newly performs a shielding determination is added with basically the same configuration as that of FIG. 2 in the first embodiment. It is assumed that the functional configurations with the same reference numerals perform the same processing as in the first embodiment.
- the shielding determination unit 207 determines the shielding relationship between the objects based on the partial image of the candidate object detected from the image.
- the flowchart of this embodiment corresponds to FIG. 3, FIG. 10A, and FIG. 10B.
- the basic processing is the same as that of the first embodiment, and only the processing of S306 is different. Therefore, here, the difference in S306 will be described in detail below, and the description of other processes will be omitted.
- S305 a candidate object similar to the target object is detected based on the feature amount of the tracking target. At this time, when the tracking target is shielded by another object, if the object blocking the tracking target is an object similar to the tracking target, it is detected as a candidate object.
- the tracking target is associated with the position of a similar object that has been shielded by the shielding determination process, but since the original tracking feature is retained at the timing when the shielding is canceled, tracking can be performed again.
- the tracking target is shielded by an obstacle such as a wall
- the shielded tracking target is not detected as a candidate object in S305.
- the subsequent shielding determination process it is determined that there is no candidate object that can be associated with the previously detected tracking target immediately before being shielded, and the feature amount of the tracking target is stored in S303. After that, tracking can be resumed at the timing when the shielding is cleared and the detection becomes possible again.
- FIG. 10A shows a flowchart illustrating the tracking target identification process S306 including the shielding determination process.
- the tracking unit 205 acquires the similarity between the past time candidates stored in the storage unit 206 in advance and the current time candidates obtained by the object detection unit 204.
- the processing of S701 is the same as that of S701 of the first embodiment.
- the process of S702 is the same as the process of S702 of the first embodiment.
- the shielding determination unit 207 determines whether or not there is a shielding region in which the candidate object is shielded, based on the position of the candidate object in the current image to be processed (second image). That is, the shielding determination is performed for each candidate object for the current image.
- the shielding determination process of S1002 will be described in more detail with reference to FIG. 10B.
- a shielding determination is performed for a candidate (referred to as an object of interest) for which a matching candidate cannot be found.
- the shielding determination unit 207 determines in S702 whether or not the association has been established for all the candidate objects detected in the past.
- the process proceeds to S10025. If there is a past candidate object (attention object) whose similarity with the candidate object detected from the current image is equal to or less than the threshold value among the candidate objects detected from the past image, the process proceeds to S10022. That is, when proceeding to S10022, there is a possibility that a shielded candidate object exists.
- the shielding determination unit 207 acquires information indicating the degree of overlap between the candidate BB and the other candidate BB for the current candidate object (object of interest).
- IoU Intersection THERf Union
- p s is the position of the candidate
- p THER is the position of the occluder
- ⁇ is an empirically set value
- the tracking unit 205 specifies the correspondence between the candidate object in the first image and the candidate object in the second image based on the shielding determination result. That is, the position of the tracking target object in the second image is specified.
- the candidate object specified as the tracking target object last time is specified in the current image
- the position of the tracking target in the current image is specified.
- the shielding determination is performed in S1002. If it is determined that the tracking target is shielded in the current image, the occluder is specified and the position of the tracking target is updated based on the equation (2-1). On the other hand, the features to be tracked are not updated.
- the storage unit 206 stores the position and feature amount of the tracking target specified by the tracking unit 205.
- ⁇ Modification 2-1> the shielding determination is performed by the Neural Network.
- An example of performing a shielding determination by a Neural Network is "Zhou, Bi-box Regression forr Pedestrian Detection and Occlusion, In: ECCV2018".
- the tracking unit 205 estimates the BB of the object and simultaneously estimates the unshielded area (visible area) of the object area. Then, when the ratio of the area where the shielding occurs in the object area exceeds a predetermined threshold value, it is possible to determine the shielding.
- FIG. 11 shows the effect of updating the position of the candidate to the position of the occluder by such a shielding determination and updating the position of the candidate according to the position of the occluder.
- the tracking target is 1216.
- the position of 1216 is updated to match the position of 1217 by the equation (2-1).
- the position of 1216 is updated according to the position of 1218 which is an occluder.
- the shielding is canceled, and there are three candidates, 1219, 1220, and 1221.
- the correct association results are 1218 and 1219, and 1216 and 1220.
- the position of the candidate 1216 is updated according to the equation (2-1)
- the position of the 1216 becomes close to the position of the 1220, and the 1216 and the 1220 can be associated with each other. Therefore, it is possible to reduce erroneous tracking.
- FIG. 19 shows an example of the functional configuration of the information processing apparatus 3 in the present embodiment.
- a learning unit 1902 that performs online learning is newly added with the same configuration as that of FIG. 2 in the first embodiment.
- the tracking unit 1901 identifies the position of the tracking target by inputting the current image into the trained model.
- the learning unit 1902 updates the coupling weighting parameter of the trained model that estimates the position of the object based on the position of the tracking target estimated in the current image.
- MDNet Learning Multi-Domain Convolutional Neural Networks forr Visual Tracking, In: CVPR2016.
- MDNet an image is input to a CNN (trained model) to obtain a feature amount indicating an object. Further, the acquired feature amount is input to the Fully Connection layer (hereinafter referred to as FC layer), and it is determined whether or not the input feature amount is the feature amount to be tracked.
- FC layer Fully Connection layer
- the learning is learned online so that the FC layer outputs a higher value for an object that seems to be a tracking target. Online learning learns the FC layer at initial frames and then at intervals of several frames.
- FIG. 20 shows the processing executed by the information processing apparatus 3 in the present embodiment.
- the processing of S301 to S304 is the same as the processing of S301 to S304 in the first embodiment.
- the search range is set from the acquired image.
- the search range image is determined based on the position and size of the candidate object in the past.
- each feature amount acquired from the search range image is input to the FC layer, and the obtained tracking target-likeness (similarity) sets a threshold value.
- the exceeded object is acquired as a candidate object.
- MDNet described above is used as the trained model.
- the tracking unit 1901 specifies the position of the tracking target from the candidate objects.
- the learning unit 1902 updates the parameters of the trained model based on the determination result of the tracking target.
- the hardware configuration is the same as that of the first embodiment.
- the information processing apparatus that executes the present embodiment has the same functional configuration as the information processing apparatus 1 of the first embodiment, but there is a difference in the processing of the tracking target determination unit 202 and the tracking unit 205.
- the tracking target determination unit 202 determines a plurality of objects as tracking targets. The tracking target is determined by the same method as in the first embodiment.
- the tracking unit 205 tracks each detected object for a plurality of tracking targets. Specifically, the CNN characteristics of a plurality of candidate objects are retained, and the association is performed using the similarity between the time t and the candidate objects at time t + 1.
- the image acquisition unit 201 acquires an image (initial image) obtained by capturing a predetermined object.
- the tracking target determination unit 202 determines a plurality of objects to be tracked from the image acquired in S301.
- the holding unit 203 holds a plurality of features of the tracking target from the image including the determined tracking target based on the trained model.
- the Direct-Track method is used for the trained model.
- Detect-Track object detection is performed using CNN for each frame in a continuous time series.
- the image acquisition unit 201 acquires images captured at a plurality of times in order to perform tracking processing.
- the object detection unit 204 detects the position of the candidate object from the temporally continuous images obtained by the image acquisition unit 201 based on the trained model.
- the object detection unit 204 detects a candidate object using a CNN (learned model) for each frame in a continuous time series. That is, the CNN feature at time t and the CNN feature at time t + 1 are acquired.
- the tracking unit 205 identifies a plurality of tracking targets from the current image (t + 1).
- the tracking unit 205 estimates the change amount ⁇ BB (change in BB position and change in BB size) of BB for each object. That is, the tracking unit 205 estimates the change in BB by comparing BB (t + 1) with BB (t) + ⁇ BB (t).
- ⁇ BB change in BB position and change in BB size
- the tracking unit 205 calculates the distance between the CNN features of the associated candidate object at time t and time t + 1 based on the equation (1-1), and calculates the degree of similarity. If there is a correspondence relationship whose similarity is greater than a predetermined value, it is tracked in association with the previous detection result. The correspondence may be determined in descending order of relative similarity. If there is no correspondence whose similarity is greater than a predetermined value, the current detection result (feature amount and position) is retained without being associated with the previous detection result.
- the object at time t + 1 It is considered that what is the same as is an object with a high degree of similarity. By associating objects with high similarity with each other, erroneous tracking can be reduced. However, there may be a case where the object detected at the time t is not detected at the time t + 1 due to hiding or the like. At this time, if at least one candidate object exists at time t + 1 in addition to the tracking target object, erroneous tracking of the candidate objects having close positions may start.
- the CNN characteristics of a plurality of objects as candidate objects may be retained, and the similarity with the feature amount of the candidate objects retained at the time of similarity calculation may be calculated. If the object to be tracked is shielded, the correspondence cannot be specified, but if the shielding is resolved, tracking can be resumed.
- the present invention is also realized by executing the following processing. That is, software (program) that realizes the functions of the above-described embodiment is supplied to the system or device via a network for data communication or various storage media. Then, the computer (or CPU, MPU, etc.) of the system or device reads and executes the program. Further, the program may be recorded and provided on a computer-readable recording medium.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202180060244.7A CN116157831A (zh) | 2020-07-20 | 2021-07-01 | 信息处理设备、信息处理方法和程序 |
| EP21847050.8A EP4184431A4 (en) | 2020-07-20 | 2021-07-01 | INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD AND PROGRAM |
| US18/155,349 US20230154016A1 (en) | 2020-07-20 | 2023-01-17 | Information processing apparatus, information processing method, and storage medium |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2020-123796 | 2020-07-20 | ||
| JP2020123796A JP7781512B2 (ja) | 2020-07-20 | 2020-07-20 | 情報処理装置、情報処理方法及びプログラム |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US18/155,349 Continuation US20230154016A1 (en) | 2020-07-20 | 2023-01-17 | Information processing apparatus, information processing method, and storage medium |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2022019076A1 true WO2022019076A1 (ja) | 2022-01-27 |
Family
ID=79729698
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2021/024898 Ceased WO2022019076A1 (ja) | 2020-07-20 | 2021-07-01 | 情報処理装置、情報処理方法及びプログラム |
Country Status (5)
| Country | Link |
|---|---|
| US (1) | US20230154016A1 (https=) |
| EP (1) | EP4184431A4 (https=) |
| JP (2) | JP7781512B2 (https=) |
| CN (1) | CN116157831A (https=) |
| WO (1) | WO2022019076A1 (https=) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115187924A (zh) * | 2022-06-01 | 2022-10-14 | 浙江大华技术股份有限公司 | 一种目标检测方法、装置、终端及计算机可读存储介质 |
| CN116777902A (zh) * | 2023-08-04 | 2023-09-19 | 城云科技(中国)有限公司 | 工业缺陷检测场景的缺陷目标检测模型的构建方法及应用 |
| JP2023182192A (ja) * | 2022-06-14 | 2023-12-26 | 日本電気株式会社 | 物体検出システム、物体検出方法および物体検出プログラム |
Families Citing this family (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2020133330A1 (en) * | 2018-12-29 | 2020-07-02 | Zhejiang Dahua Technology Co., Ltd. | Systems and methods for video surveillance |
| US12272137B2 (en) * | 2020-12-04 | 2025-04-08 | Samsung Electronics Co., Ltd. | Video object detection and tracking method and apparatus |
| JP7700708B2 (ja) * | 2022-03-11 | 2025-07-01 | 株式会社デンソー | 追跡システム、追跡装置、追跡方法、追跡プログラム |
| US12597244B2 (en) * | 2022-07-18 | 2026-04-07 | 42Dot Inc. | Method and device for improving object recognition rate of self-driving car |
| US20240397059A1 (en) * | 2023-05-23 | 2024-11-28 | Adobe Inc. | Panoptic mask propagation with active regions |
| JP7844524B2 (ja) * | 2024-02-16 | 2026-04-13 | キヤノン株式会社 | 情報処理装置、情報処理方法及びプログラム |
| CN117809121B (zh) * | 2024-02-27 | 2024-08-06 | 阿里巴巴达摩院(杭州)科技有限公司 | 目标对象识别方法、对象识别模型训练方法、目标对象处理方法以及信息处理方法 |
| JP2026004840A (ja) * | 2024-06-26 | 2026-01-15 | キヤノン株式会社 | 撮像装置、制御方法およびプログラム |
Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH0256033B2 (https=) | 1986-12-08 | 1990-11-29 | Eastern Steel | |
| JP2011059898A (ja) * | 2009-09-08 | 2011-03-24 | Fujifilm Corp | 画像解析装置、画像解析方法およびプログラム |
| JP2013219531A (ja) | 2012-04-09 | 2013-10-24 | Olympus Imaging Corp | 画像処理装置及び画像処理方法 |
| JP2017010224A (ja) * | 2015-06-19 | 2017-01-12 | キヤノン株式会社 | 物体追尾装置、物体追尾方法及びプログラム |
| JP2017041022A (ja) * | 2015-08-18 | 2017-02-23 | キヤノン株式会社 | 情報処理装置、情報処理方法及びプログラム |
| WO2017043258A1 (ja) * | 2015-09-09 | 2017-03-16 | シャープ株式会社 | 計算装置および計算装置の制御方法 |
| JP2019096006A (ja) * | 2017-11-21 | 2019-06-20 | キヤノン株式会社 | 情報処理装置、情報処理方法 |
| JP2020123796A (ja) | 2019-01-30 | 2020-08-13 | キヤノン株式会社 | 画像読取装置、画像読取装置の制御方法、及びプログラム |
Family Cites Families (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP5744437B2 (ja) * | 2010-08-18 | 2015-07-08 | キヤノン株式会社 | 追尾装置、追尾方法及びプログラム |
| JP6495705B2 (ja) * | 2015-03-23 | 2019-04-03 | 株式会社東芝 | 画像処理装置、画像処理方法、画像処理プログラムおよび画像処理システム |
| JP2017138659A (ja) * | 2016-02-01 | 2017-08-10 | トヨタ自動車株式会社 | 物体追跡方法、物体追跡装置、およびプログラム |
| WO2018030048A1 (ja) * | 2016-08-08 | 2018-02-15 | パナソニックIpマネジメント株式会社 | 物体追跡方法、物体追跡装置およびプログラム |
| JP6598746B2 (ja) * | 2016-08-22 | 2019-10-30 | Kddi株式会社 | 他の物体の画像領域も考慮して物体を追跡する装置、プログラム及び方法 |
| JP2019070934A (ja) * | 2017-10-06 | 2019-05-09 | 東芝デジタルソリューションズ株式会社 | 映像処理装置、映像処理方法およびプログラム |
| US10628961B2 (en) * | 2017-10-13 | 2020-04-21 | Qualcomm Incorporated | Object tracking for neural network systems |
| CN108460787B (zh) * | 2018-03-06 | 2020-11-27 | 北京市商汤科技开发有限公司 | 目标跟踪方法和装置、电子设备、程序、存储介质 |
| CN110322473A (zh) * | 2019-07-09 | 2019-10-11 | 四川大学 | 基于显著部位的目标抗遮挡跟踪方法 |
-
2020
- 2020-07-20 JP JP2020123796A patent/JP7781512B2/ja active Active
-
2021
- 2021-07-01 EP EP21847050.8A patent/EP4184431A4/en active Pending
- 2021-07-01 WO PCT/JP2021/024898 patent/WO2022019076A1/ja not_active Ceased
- 2021-07-01 CN CN202180060244.7A patent/CN116157831A/zh active Pending
-
2023
- 2023-01-17 US US18/155,349 patent/US20230154016A1/en active Pending
-
2024
- 2024-11-21 JP JP2024203252A patent/JP2025024192A/ja active Pending
Patent Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH0256033B2 (https=) | 1986-12-08 | 1990-11-29 | Eastern Steel | |
| JP2011059898A (ja) * | 2009-09-08 | 2011-03-24 | Fujifilm Corp | 画像解析装置、画像解析方法およびプログラム |
| JP2013219531A (ja) | 2012-04-09 | 2013-10-24 | Olympus Imaging Corp | 画像処理装置及び画像処理方法 |
| JP2017010224A (ja) * | 2015-06-19 | 2017-01-12 | キヤノン株式会社 | 物体追尾装置、物体追尾方法及びプログラム |
| JP2017041022A (ja) * | 2015-08-18 | 2017-02-23 | キヤノン株式会社 | 情報処理装置、情報処理方法及びプログラム |
| WO2017043258A1 (ja) * | 2015-09-09 | 2017-03-16 | シャープ株式会社 | 計算装置および計算装置の制御方法 |
| JP2019096006A (ja) * | 2017-11-21 | 2019-06-20 | キヤノン株式会社 | 情報処理装置、情報処理方法 |
| JP2020123796A (ja) | 2019-01-30 | 2020-08-13 | キヤノン株式会社 | 画像読取装置、画像読取装置の制御方法、及びプログラム |
Non-Patent Citations (7)
| Title |
|---|
| BERTINETTO: "Fully-Convolutional Siamese Networks for Object Tracking", ARXIV, 2016 |
| FEICHTENHOFER: "Detect to Track and Track to Detect", ICCV, 2017 |
| LIU: "SSD: Single Shot Multibox Detector", ECCV, 2016 |
| NAM: "Learning Multi-Domain Convolutional Neural Networks for Visual Tracking", CILPR, 2016 |
| See also references of EP4184431A4 |
| WANG: "Learning Fine-grained Image Similarity with Deep Ranking", CVPR, 2014 |
| ZHOU: "Bi-box Regression for Pedestrian Detection and Occlusion", ECCV, 2018 |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115187924A (zh) * | 2022-06-01 | 2022-10-14 | 浙江大华技术股份有限公司 | 一种目标检测方法、装置、终端及计算机可读存储介质 |
| JP2023182192A (ja) * | 2022-06-14 | 2023-12-26 | 日本電気株式会社 | 物体検出システム、物体検出方法および物体検出プログラム |
| JP7835120B2 (ja) | 2022-06-14 | 2026-03-25 | 日本電気株式会社 | 物体検出システム、物体検出方法および物体検出プログラム |
| CN116777902A (zh) * | 2023-08-04 | 2023-09-19 | 城云科技(中国)有限公司 | 工业缺陷检测场景的缺陷目标检测模型的构建方法及应用 |
Also Published As
| Publication number | Publication date |
|---|---|
| JP2022020353A (ja) | 2022-02-01 |
| US20230154016A1 (en) | 2023-05-18 |
| EP4184431A1 (en) | 2023-05-24 |
| CN116157831A (zh) | 2023-05-23 |
| EP4184431A4 (en) | 2025-01-01 |
| JP2025024192A (ja) | 2025-02-19 |
| JP7781512B2 (ja) | 2025-12-08 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2022019076A1 (ja) | 情報処理装置、情報処理方法及びプログラム | |
| CN111627045B (zh) | 单镜头下的多行人在线跟踪方法、装置、设备及存储介质 | |
| US10672131B2 (en) | Control method, non-transitory computer-readable storage medium, and control apparatus | |
| CN111144364B (zh) | 一种基于通道注意更新机制的孪生网络目标跟踪方法 | |
| WO2023082882A1 (zh) | 一种基于姿态估计的行人摔倒动作识别方法及设备 | |
| US20230042187A1 (en) | Behavior recognition method and system, electronic device and computer-readable storage medium | |
| US9141196B2 (en) | Robust and efficient learning object tracker | |
| JP7093427B2 (ja) | オブジェクト追跡方法および装置、電子設備並びに記憶媒体 | |
| WO2021139484A1 (zh) | 目标跟踪方法、装置、电子设备及存储介质 | |
| CN107145867A (zh) | 基于多任务深度学习的人脸及人脸遮挡物检测方法 | |
| CN110986969A (zh) | 地图融合方法及装置、设备、存储介质 | |
| CN109241829A (zh) | 基于时空注意卷积神经网络的行为识别方法及装置 | |
| Ali et al. | Multiple object tracking with partial occlusion handling using salient feature points | |
| WO2016179808A1 (en) | An apparatus and a method for face parts and face detection | |
| CN111429485B (zh) | 基于自适应正则化和高信度更新的跨模态滤波跟踪方法 | |
| WO2015008432A1 (ja) | 物体追跡装置、物体追跡方法および物体追跡プログラム | |
| CN114902299B (zh) | 图像中关联对象的检测方法、装置、设备和存储介质 | |
| CN113158870B (zh) | 2d多人姿态估计网络的对抗式训练方法、系统及介质 | |
| CN119516160B (zh) | 基于混合伪标签生成的单点监督红外小目标检测方法 | |
| CN116844185A (zh) | 基于质量分数的多人姿态识别方法 | |
| JP6570905B2 (ja) | グラフ表示装置、グラフ表示プログラム及びグラフ表示プログラムが記憶されたコンピュータ読取可能な記憶媒体 | |
| US20250225175A1 (en) | Object search via re-ranking | |
| CN114494349A (zh) | 基于目标特征时空对齐的视频跟踪系统及方法 | |
| JP2011232845A (ja) | 特徴点抽出装置および方法 | |
| KR20250007222A (ko) | 자세 추정을 위한 방법 및 장치 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21847050 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2021847050 Country of ref document: EP Effective date: 20230220 |