WO2020029874A1 - 对象跟踪方法及装置、电子设备及存储介质 - Google Patents

对象跟踪方法及装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2020029874A1
WO2020029874A1 PCT/CN2019/099001 CN2019099001W WO2020029874A1 WO 2020029874 A1 WO2020029874 A1 WO 2020029874A1 CN 2019099001 W CN2019099001 W CN 2019099001W WO 2020029874 A1 WO2020029874 A1 WO 2020029874A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame image
current frame
target object
image
video
Prior art date
Application number
PCT/CN2019/099001
Other languages
English (en)
French (fr)
Inventor
王强
朱政
李搏
武伟
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Priority to JP2020567591A priority Critical patent/JP7093427B2/ja
Priority to SG11202011644XA priority patent/SG11202011644XA/en
Priority to KR1020207037347A priority patent/KR20210012012A/ko
Publication of WO2020029874A1 publication Critical patent/WO2020029874A1/zh
Priority to US17/102,579 priority patent/US20210124928A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/248Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/255Detecting or recognising potential candidate objects based on visual cues, e.g. shapes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • G06V20/42Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/62Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • the present disclosure relates to computer vision technology, and in particular, to an object tracking method and device, an electronic device, and a storage medium.
  • Object tracking is one of the hot topics in computer vision research, and it has a wide range of applications in many fields. For example: camera tracking and focusing, automatic target tracking of drones, human body tracking, vehicle tracking in traffic monitoring systems, face tracking and gesture tracking in intelligent interactive systems, etc.
  • the embodiments of the present disclosure provide a technical solution for object tracking.
  • an object tracking method including:
  • a candidate object whose filtering information meets a predetermined condition is determined to be a target object in the current frame image.
  • an object tracking device including:
  • a detection unit configured to detect at least one candidate object in a current frame image in the video according to a target object in a reference frame image in the video;
  • An obtaining unit configured to obtain an interference object in at least one previous frame image in the video
  • An adjusting unit configured to adjust the filtering information of the at least one candidate object according to the obtained interference object
  • the determining unit is configured to determine a candidate object whose filtering information meets a predetermined condition, and is a target object of the current frame image.
  • an electronic device including the apparatus described in any one of the foregoing embodiments.
  • an electronic device including:
  • Memory for storing executable instructions
  • a processor configured to execute the executable instructions to complete the method described in any one of the foregoing embodiments.
  • a computer program including computer-readable code, and when the computer-readable code runs on a device, a processor in the device executes a program for implementing any of the foregoing. Instructions for the methods described in the embodiments.
  • a computer storage medium for storing computer-readable instructions, and when the instructions are executed, the method according to any one of the foregoing embodiments is implemented.
  • At least one candidate object in a current frame image in a video is acquired by detecting a target object in a reference frame image in a video to obtain
  • the interference object in at least one previous frame image in the video is adjusted according to the obtained interference object to the screening information of at least one candidate object, and the candidate object whose screening information meets a predetermined condition is determined to be the target object of the current frame image.
  • the interference object in the previous frame image before the current frame image is used to adjust the filtering information of the candidate object, so when the filtering information of the candidate object is used to determine the target object in the current frame image , Can effectively suppress the interference object in the candidate object, obtain the target object from the candidate object, so in the process of determining the target object in the current frame image, can effectively suppress the interference object surrounding the target object to the discrimination result Impact and promotion; Discrimination ability of target object tracking.
  • FIG. 1 is a flowchart of an object tracking method according to some embodiments of the present disclosure
  • FIG. 2 is a flowchart of an object tracking method according to another embodiment of the present disclosure.
  • FIG. 3 is a flowchart of an object tracking method according to some embodiments of the present disclosure.
  • FIGS. 4A to 4C are schematic diagrams of an application example of an object tracking method according to some embodiments of the present disclosure.
  • 4D and 4E are schematic diagrams of another application example of the object tracking method according to some embodiments of the present disclosure.
  • FIG. 5 is a schematic structural diagram of an object tracking device according to some embodiments of the present disclosure.
  • FIG. 6 is a schematic structural diagram of an object tracking device according to another embodiment of the present disclosure.
  • FIG. 7 is a schematic structural diagram of an electronic device provided by some embodiments of the present disclosure.
  • a plurality may refer to two or more, and “at least one” may refer to one, two, or more.
  • the term "and / or” in the disclosure is only an association relationship describing the associated object, which means that there can be three kinds of relationships, for example, A and / or B can mean: A exists alone, and A and B exist simultaneously, There are three cases of B alone.
  • the character "/" in the present disclosure generally indicates that the related objects before and after are an "or" relationship.
  • Embodiments of the present disclosure may be applied to a computer system / server, which may operate with many other general or special purpose computing system environments or configurations.
  • Examples of well-known computing systems, environments, and / or configurations suitable for use with computer systems / servers include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, handheld or laptop devices, based on Microprocessor systems, set-top boxes, programmable consumer electronics, network personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the above, and so on.
  • a computer system / server may be described in the general context of computer system executable instructions, such as program modules, executed by a computer system.
  • program modules may include routines, programs, target programs, components, logic, data structures, and so on, which perform specific tasks or implement specific abstract data types.
  • the computer system / server can be implemented in a distributed cloud computing environment. In a distributed cloud computing environment, tasks are performed by remote processing devices linked through a communication network. In a distributed cloud computing environment, program modules may be located on a local or remote computing system storage medium including a storage device.
  • FIG. 1 is a flowchart of an object tracking method according to some embodiments of the present disclosure. As shown in Figure 1, the method includes:
  • the video for object tracking may be a piece of video obtained from a video capture device, for example, the video capture device may include a camera and a camera, or a piece of video obtained from a storage device.
  • the storage device may Including an optical disc, a hard disk, a U disk, etc., may also be a piece of video obtained from a network server, and the obtaining manner of the video to be processed is not limited in this embodiment.
  • the reference frame image may be the first frame image in the video, or the first frame image for subject tracking processing of the video, or may be an intermediate frame image of the video. The selection of the reference frame image is not limited in this embodiment.
  • the current frame image may be a frame image other than the reference frame image in the video, and it may be located before or after the reference frame image, which is not limited in this embodiment. In an optional example, the current frame image in the video is located after the reference frame image.
  • the correlation between the image of the target object in the reference frame image and the current frame image may be determined, and the detection frame and filtering information of at least one candidate object in the current frame image may be obtained according to the correlation.
  • the correlation between the image of the target object in the reference frame image and the current frame image may be determined according to the first feature of the image of the target object in the reference frame image and the second feature of the current frame image. For example: get correlation through convolution processing. This embodiment does not limit the manner of determining the correlation between the image of the target object in the reference frame image and the current frame image.
  • the detection frame of the candidate object can be obtained by means of non-maximum suppression (NMS), and the screening information of the candidate object can be, for example, the score of the detection frame of the candidate object, the selection probability, and other information.
  • NMS non-maximum suppression
  • This embodiment does not limit the manner of obtaining the detection frame and the screening information of the candidate object according to the correlation.
  • the operation 102 may be performed by a processor calling a corresponding instruction stored in a memory, or may be performed by a detection unit executed by the processor.
  • the previous frame image may include: a reference frame image, and / or at least one intermediate frame image located between the reference frame image and the current frame image.
  • the interference object in at least one previous frame image in the video may be obtained according to a preset interference object set, and when the object tracking processing is performed on each frame image in the video through the preset interference object set, The at least one candidate object that is not determined as one or more candidate objects in the target object is determined as an interference object in the current frame image, and is put into the interference object set.
  • at least one candidate object that is not determined as the target object may be selected, candidate objects whose information meets a predetermined condition of the interference object may be filtered, the interference object may be determined, and the interference object set may be added.
  • the filtering information is the score of the detection frame
  • the predetermined condition of the interference object may be that the score of the detection frame is greater than a preset threshold.
  • all interference objects in the previous frame image in the video can be obtained.
  • the operation 104 may be performed by a processor calling a corresponding instruction stored in the memory, or may be performed by an obtaining unit executed by the processor.
  • a first similarity between the at least one candidate object and the obtained interference object may be determined, and the filtering information of the at least one candidate object is adjusted according to the first similarity.
  • the first similarity between the at least one candidate object and the obtained interference object may be determined according to the characteristics of the at least one candidate object and the characteristics of the obtained interference object.
  • the filtering information is the score of the detection frame. When the first similarity between the candidate object and the obtained interference object is high, the score of the detection frame of the candidate object can be lowered, and vice versa When the first similarity between the candidate object and the obtained interference object is low, the score of the detection frame of the candidate object can be increased or the score remains unchanged.
  • the weighted average of the similarity between the candidate object and all the obtained interference objects may be calculated, and the weighted average may be used to adjust the screening information of the candidate object, wherein, the weight of each interference object in the weighted average is related to the interference degree selected by the interference object on the target object, for example, the larger the interference object selected on the target object, the larger the weight of the interference object is.
  • the filtering information is the score of the detection frame.
  • the first coefficient of similarity between the candidate object and the obtained interference object can be represented by the correlation coefficient between the candidate object and the obtained interference object.
  • the difference between the correlation coefficient between the target object and the candidate object in the frame image and the weighted average of the first similarity between the candidate object and the obtained interference object is used to adjust the score of the detection frame of the candidate object.
  • the operation 106 may be performed by a processor calling a corresponding instruction stored in a memory, or may be performed by an adjustment unit executed by the processor.
  • the detection frame of the candidate object whose filtering information meets a predetermined condition may be determined as the detection frame of the target object of the current frame image.
  • the filtering information is the score of the detection frame.
  • the candidate objects can be sorted according to the score of the detection frame of the candidate object, and the detection frame of the candidate object with the highest score is used as the current frame image. The detection frame of the target object, thereby determining the target object in the current frame image.
  • the position and shape of the detection frame of the candidate object can also be compared with the position and shape of the detection frame of the target object in the previous frame image of the current frame image in the video, and the current frame image can be adjusted according to the comparison result
  • the scores of the checkboxes of the candidate objects in the image are re-sorted, and the scores of the checkboxes of the candidate objects in the current frame image after adjustment are re-ranked.
  • the detection frame of the target object in the current frame image For example, compared with the previous frame image, the detection frame of the candidate object that has a large amount of position movement and a large amount of shape change is adjusted to reduce the score.
  • the detection frame of the target object may also be displayed in the current frame image, so as to The position of the target object is marked in the image.
  • the operation 108 may be performed by a processor calling a corresponding instruction stored in the memory, or may be performed by a determining unit executed by the processor.
  • the object tracking method Based on the object tracking method provided in this embodiment, by detecting at least one candidate object in a current frame image in a video according to a target object in a reference frame image in a video, obtaining an interfering object in at least one previous frame image in the video, Adjust the filtering information of at least one candidate object according to the obtained interference object, determine the candidate object whose filtering information meets the predetermined conditions, and be the target object of the current frame image.
  • the interference information in the candidate object is used to adjust the filtering information of the candidate object, so that when the filtering information of the candidate object is used to determine the target object in the current frame image, the interference object in the candidate object can be effectively suppressed.
  • the influence of the interference objects around the target object on the discrimination result can be effectively suppressed, and the discrimination ability of the object tracking can be improved.
  • FIG. 4A to 4C are schematic diagrams of an application example of an object tracking method according to some embodiments of the present disclosure.
  • FIG. 4A to FIG. 4C wherein FIG. 4A is the current frame image of the object tracking to-be-processed video, and in FIG. 4A, boxes a, b, d, e, f, and g are alternatives in the current frame image
  • the object detection frame, the c frame is the detection frame of the target object in the current frame image
  • FIG. 4B is a schematic diagram of the score of the detection frame of the candidate object in the current frame image obtained by the existing object tracking method. From FIG.
  • FIG. 4C is obtained by using the object tracking method of some embodiments of the present disclosure.
  • the object tracking method may further obtain a target object in at least one intermediate frame image between the reference frame image and the current frame image in the video, and optimize at least one backup according to the target object in the obtained at least one intermediate frame image. Filter information for the selected object.
  • a second similarity between the at least one candidate object and the target object in the obtained at least one intermediate frame image may be determined, and then the filtering information of the at least one candidate object is optimized according to the second similarity.
  • the second similarity between the at least one candidate object and the target object in the obtained at least one intermediate frame image may be determined according to the characteristics of the at least one candidate object and the target object characteristics in the obtained at least one intermediate frame image. degree.
  • the target object may be obtained from at least one intermediate frame image of the target object that has been determined between the reference frame image and the current frame image in the video.
  • a target object in all intermediate frame images for which a target object has been determined between the reference frame image in the video and the current frame image can be obtained.
  • the weighted average of the similarity between the candidate objects and all the obtained target objects may be calculated, and the weighted average may be used to optimize the screening information of the candidate objects.
  • the weight of each target object in the weighted average is related to the degree of influence of the target object on the selection of the target object in the current frame image. For example, the weight value of the target object in a frame image that is closer to the current frame image time is also Bigger.
  • the filtering information is the score of the detection frame.
  • the first coefficient of similarity between the candidate object and the obtained interference object can be represented by the correlation coefficient between the candidate object and the obtained interference object.
  • the weighted average of the correlation coefficient between the target object and the candidate object in the frame image, the second similarity between the candidate object and the obtained target object, and the weighted average of the first similarity between the candidate object and the obtained interference object is adjusted.
  • a target object of an intermediate frame image obtained between a reference frame image in a video and a current frame image is used to optimize the filtering information of the candidate object, so that the obtained filtering information of the candidate object in the current frame image can be obtained. It can more realistically reflect the attributes of each candidate object, so that a more accurate discrimination result can be obtained when determining the position of the target object in the current frame image of the video to be processed.
  • a search area in the current frame image may also be obtained to improve the operation speed.
  • Operation 102 may detect at least one candidate object in the current frame image in the video according to the target object in the reference frame image in the video in the search area in the current frame image.
  • the operation of obtaining the search area in the current frame image can estimate and assume the area where the target object may appear in the current frame image through a predetermined search algorithm.
  • the next information of the current frame image in the video may be determined according to the filtering information of the target object in the current frame image.
  • Search area in a frame image The process of determining the search area in the next frame image of the current frame image in the video according to the filtering information of the target object in the current frame image will be described in detail below with reference to FIG. 2. As shown in Figure 2, the method includes:
  • the first preset threshold may be determined statistically according to the filtering information of the target object and the state of the target object being blocked or leaving the field of view.
  • the filtering information is the score of the detection frame of the target object.
  • the search area is gradually expanded according to a preset step size until the enlarged search area covers the current frame image, and the enlarged search area is the search area in the next frame image of the current frame image.
  • next frame image of the current frame image in the video may be used as the current frame image, and the target object of the current frame image is determined in the enlarged search area.
  • a target object in the current frame image may also be determined in the search area in the current frame image.
  • the operations 202-206 may be performed by a processor calling corresponding instructions stored in a memory, or may be performed by a search unit executed by the processor.
  • the filtering information of the target object in the current frame image is compared with the first preset threshold.
  • the search area is expanded until it is expanded.
  • the subsequent search area covers the current frame image.
  • the entire current frame image may be covered with the same expanded search area as the current frame image, and
  • the entire next frame of image is covered with the enlarged search area.
  • the enlarged search area covers the entire next frame of image, so it does not There may be situations where the target object appears outside the search area and the target object cannot be tracked, and the target object can be tracked for a long time.
  • the search area is gradually enlarged according to a preset step size until the enlarged search area covers the current frame image, and the next frame image of the current frame image in the video may also be used as the current
  • the expanded search area is obtained as the search area in the current frame image.
  • the target object of the current frame image is determined, and whether the target object in the current frame image is filtered can be used to determine whether The search area in the current frame image needs to be restored.
  • the method includes:
  • the second preset threshold value is greater than the first preset threshold value, and the second preset threshold value can be statistically determined according to the filtering information of the target object and the state of the target object that is not blocked and has not left the field of view.
  • a target object of the current frame image is determined from the search area in the current frame image.
  • next frame image of the current frame image in the video is used as the current frame image, and the expanded search area is obtained as the search area in the current frame image.
  • the current frame image after the next frame image of the current frame image in the video is used as the current frame image, after acquiring the enlarged search area as the search area image in the current frame image, the current frame image may also be determined in the enlarged search area. Audience.
  • the operations 302-306 may be executed by the processor by calling corresponding instructions stored in the memory, or may be executed by a search unit executed by the processor.
  • the filtering information is compared with a second preset threshold.
  • the filtering information of the target object in the current frame image is greater than the second preset threshold, a search area in the current frame image is obtained, and in the search area, the The target object can restore the original object tracking method when the target object in the current frame image of the object tracking is not occluded and the target object does not leave the field of vision, that is, using a preset search algorithm to obtain the search area in the current frame image for the object Tracking can reduce the amount of data processing and increase the calculation speed.
  • FIG. 4D and 4E are schematic diagrams of another application example of the object tracking method according to some embodiments of the present disclosure.
  • FIG. 4D is a four-frame image of a video for object tracking.
  • the serial numbers of the four-frame images are 692, 697, 722, and 727, respectively.
  • Box b is a box representing the true contour of the target object, and box c is a detection frame for target tracking.
  • FIG. 4E is a schematic diagram of a change in the score of the target object and a change in the overlap between the target object and the detection frame in FIG. 4D.
  • the phase indicates the change in the score of the target object
  • the e-line indicates the overlap between the target object and the detection frame. From Figure 4D, it can be seen that the target object's score decreases rapidly at 697.
  • the target's score has recovered to a large value at 722, and the overlap between the target and the detection frame has also increased rapidly at 722. Therefore, the judgment of the target's score can improve the target Problems with object tracking when the object is out of view or blocked.
  • operation 108 determines that the candidate information whose filtering information meets a predetermined condition is the target object of the current frame image, and can further identify the category of the target object in the current frame image, which can enhance the function of object tracking and extend the object tracking. Application scenarios.
  • the object tracking method in each of the above embodiments may be performed through a neural network.
  • the god general network can be trained according to the sample images.
  • the sample images used for training the neural network may include positive samples and negative samples, where the positive samples include: a positive sample image in a preset training data set and a positive sample image in a preset test data set.
  • the preset training data set can use video sequences on Youtube BB and VID
  • the preset test data set can use detection data from ImageNet and COCO.
  • the neural network is trained by using the positive sample images in the test data set, which can increase the category of the positive samples, ensure the pan-Chinese performance of the neural network, and improve the discrimination ability of the object tracking.
  • the positive samples may further include: a positive sample obtained by performing data enhancement processing on the positive sample images in the preset test data set.
  • Sample image For example, in addition to conventional data enhancement processing such as translation, scale change, and illumination change, data enhancement processing for specific motion modes, such as motion blur, may also be used. This embodiment does not limit the method of data enhancement processing.
  • a positive sample image is obtained by performing data enhancement processing on the positive sample images in the test data set to train the neural network, which can increase the diversity of the positive sample images, improve the robustness of the neural network, and avoid overfitting. .
  • the negative samples may include a negative sample image of an object having the same category as the target object and / or a negative sample image of an object having a different category from the target object.
  • the negative sample image obtained from the positive sample images in the preset test data set may be an image selected from the background surrounding the target object in the positive sample images in the preset test data set; these two types of negative sample images usually have no semantics
  • the negative sample image of the object with the same category as the target object can be a frame of random images extracted from other videos or images, and the object in the image has the same category as the target object in the positive sample image;
  • a negative sample image of an object of a different category from the target object may be a frame of image randomly extracted from other videos or images, and the object in the image has a different category from the target object in the positive sample image; the two types of negative sample images Usually images with semantics.
  • This embodiment trains a neural network by using a negative sample image of an object of the same category as the target object and / or a negative sample image of an object of a different category from the target object, which can ensure a balanced distribution of positive and negative sample images and improve neural
  • the performance of the network improves the discriminative ability of object tracking.
  • any of the object tracking methods provided by the embodiments of the present disclosure may be executed by any appropriate device having data processing capabilities, including, but not limited to, a terminal device and a server.
  • any of the object tracking methods provided in the embodiments of the present disclosure may be executed by a processor, for example, the processor executes any of the object tracking methods mentioned in the embodiments of the present disclosure by calling corresponding instructions stored in a memory. I will not repeat them below.
  • the foregoing program may be stored in a computer-readable storage medium.
  • the program When the program is executed, it is executed
  • the method includes the steps of the foregoing method embodiment; and the foregoing storage medium includes: a ROM, a RAM, a magnetic disk, or an optical disc, which can store various program codes.
  • FIG. 5 is a flowchart of an object tracking apparatus according to some embodiments of the present disclosure.
  • the device includes: a detection unit 510, an acquisition unit 520, an adjustment unit 530, and a determination unit 540. among them:
  • the detecting unit 510 is configured to detect at least one candidate object in a current frame image in a video according to a target object in a reference frame image in the video.
  • the video for object tracking may be a piece of video obtained from a video capture device, for example, the video capture device may include a camera and a camera, or a piece of video obtained from a storage device.
  • the storage device may Including an optical disc, a hard disk, a USB flash drive, etc., may also be a piece of video obtained from a network server; the method for obtaining a video to be processed is not limited in this embodiment.
  • the reference frame image may be the first frame image in the video, or the first frame image for subject tracking processing of the video, or may be an intermediate frame image of the video. The selection of the reference frame image is not limited in this embodiment.
  • the current frame image may be a frame image other than the reference frame image in the video, and it may be located before or after the reference frame image, which is not limited in this embodiment. In an optional example, the current frame image in the video is located after the reference frame image.
  • the detection unit 510 may determine the correlation between the image of the target object in the reference frame image and the current frame image, and obtain the detection frame and filtering information of at least one candidate object in the current frame image according to the correlation.
  • the detection unit 510 may determine the correlation between the image of the target object in the reference frame image and the current frame image according to the first feature of the target object in the reference frame image and the second feature of the current frame image. For example: get correlation through convolution processing.
  • This embodiment does not limit the manner of determining the correlation between the image of the target object in the reference frame image and the current frame image.
  • the detection frame of the candidate object can be obtained by non-maximum suppression (NMS), for example.
  • NMS non-maximum suppression
  • the screening information of the candidate object is information related to the nature of the candidate object.
  • the candidate object is distinguished from other candidate objects, for example, it can be information such as the score of the detection frame of the candidate object, the probability of selection, etc., where the score of the detection frame and the probability of selection can be the correlation of the candidate object obtained according to the correlation. Coefficient correlation coefficient.
  • the manner of obtaining detection frames and screening information of candidate objects based on correlation is not limited.
  • the obtaining unit 520 is configured to obtain an interference object in at least one previous frame image in a video.
  • the previous frame image may include: a reference frame image, and / or at least one intermediate frame image located between the reference frame image and the current frame image.
  • the obtaining unit 520 may obtain an interference object in at least one previous frame image in the video according to a preset interference object set, and may perform object tracking on each frame image in the video through the preset interference object set.
  • one or more candidate objects that are not determined as the target object among at least one candidate object are determined as the interference objects in the current frame image, and are put into the interference object set.
  • at least one candidate object that is not determined as the target object may be selected, candidate objects whose information meets a predetermined condition of the interference object may be filtered, the interference object may be determined, and the interference object set may be added.
  • the filtering information is the score of the detection frame
  • the predetermined condition of the interference object may be that the score of the detection frame is greater than a preset threshold.
  • the obtaining unit 520 may obtain all interference objects in a previous frame image in the video.
  • the adjusting unit 530 is configured to adjust filtering information of at least one candidate object according to the obtained interference object.
  • the adjusting unit 530 may determine a first similarity between the at least one candidate object and the obtained interference object, and adjust the filtering information of the at least one candidate object according to the first similarity.
  • the adjusting unit 530 may determine a first similarity between the at least one candidate object and the obtained interference object according to the characteristics of the at least one candidate object and the obtained interference object.
  • the filtering information is the score of the detection frame. When the first similarity between the candidate object and the obtained interference object is high, the score of the detection frame of the candidate object can be lowered, and vice versa When the first similarity between the candidate object and the obtained interference object is low, the score of the detection frame of the candidate object can be increased or the score remains unchanged.
  • the weighted average of the similarity between the candidate object and all the obtained interference objects may be calculated, and the weighted average may be used to adjust the screening information of the candidate object, wherein, the weight of each interference object in the weighted average is related to the interference degree selected by the interference object on the target object, for example, the larger the interference object selected on the target object, the larger the weight of the interference object is.
  • the filtering information is the score of the detection frame.
  • the first coefficient of similarity between the candidate object and the obtained interference object can be represented by the correlation coefficient between the candidate object and the obtained interference object.
  • the difference between the correlation coefficient between the target object and the candidate object in the frame image and the weighted average of the first similarity between the candidate object and the obtained interference object is used to adjust the score of the detection frame of the candidate object.
  • the determining unit 540 is configured to determine that the candidate object whose filtering information meets a predetermined condition is a target object of the current frame image.
  • the determining unit 540 may determine a detection frame of candidate objects whose filtering information meets a predetermined condition, which is a detection frame of a target object of the current frame image.
  • the filtering information is the score of the detection frame.
  • the candidate objects can be sorted according to the score of the detection frame of the candidate object, and the detection frame of the candidate object with the highest score is used as the current frame image. The detection frame of the target object, thereby determining the target object in the current frame image.
  • the position and shape of the detection frame of the candidate object can also be compared with the position and shape of the detection frame of the target object in the previous frame image of the current frame image in the video, and the current frame image can be adjusted according to the comparison result
  • the scores of the checkboxes of the candidate objects in the image are re-sorted, and the scores of the checkboxes of the candidate objects in the current frame image after adjustment are re-ranked.
  • the detection frame of the target object in the current frame image For example, compared with the previous frame image, the detection frame of the candidate object that has a large amount of position movement and a large amount of shape change is adjusted to reduce the score.
  • the device may further include: a display unit, after determining a detection frame of candidate objects whose filtering information meets a predetermined condition, and a detection frame of a target object of the current frame image, the display unit may further be in the current frame image Display the detection frame of the target object to mark the position of the target object in the current frame image.
  • the object tracking device Based on the object tracking device provided in this embodiment, by detecting at least one candidate object in a current frame image in a video according to a target object in a reference frame image in a video, obtaining an interfering object in at least one previous frame image in the video, Adjust the filtering information of at least one candidate object according to the obtained interference object, determine the candidate object whose filtering information meets the predetermined conditions, and be the target object of the current frame image.
  • the interference information in the candidate object is used to adjust the filtering information of the candidate object, so that when the filtering information of the candidate object is used to determine the target object in the current frame image, the interference object in the candidate object can be effectively suppressed.
  • the influence of the interference objects around the target object on the discrimination result can be effectively suppressed, and the discrimination ability of the object tracking can be improved.
  • the obtaining unit 520 may further obtain a target object in at least one intermediate frame image between the reference frame image and the current frame image in the video, and the apparatus may further include an optimization unit configured to obtain the target object according to the obtained at least one intermediate frame.
  • the target object in the frame image optimizes the filtering information of at least one candidate object.
  • the optimization unit may determine a second similarity between the at least one candidate object and the target object in the obtained at least one intermediate frame image, and then optimize the filtering of the at least one candidate object according to the second similarity information.
  • the optimization unit may determine a second similarity between the at least one candidate object and the target object in the obtained at least one intermediate frame image according to the characteristics of the at least one candidate object and the characteristics of the target object in the obtained at least one intermediate frame image. degree.
  • the obtaining unit 520 may obtain the target object from at least one intermediate frame image of the target object that has been determined between the reference frame image and the current frame image in the video. In an optional example, the obtaining unit 520 may obtain a target object in all intermediate frame images in which a target object has been determined between the reference frame image in the video and the current frame image.
  • the weighted average of the similarity between the candidate objects and all the obtained target objects may be calculated, and the weighted average may be used to optimize the screening information of the candidate objects.
  • the weight of each target object in the weighted average is related to the degree of influence of the target object on the selection of the target object in the current frame image. For example, the weight value of the target object in a frame image that is closer to the current frame image time is also Bigger.
  • the filtering information is the score of the detection frame.
  • the first coefficient of similarity between the candidate object and the obtained interference object can be represented by the correlation coefficient between the candidate object and the obtained interference object.
  • the weighted average of the correlation coefficient between the target object and the candidate object in the frame image, the second similarity between the candidate object and the obtained target object, and the weighted average of the first similarity between the candidate object and the obtained interference object is adjusted.
  • a target object of an intermediate frame image obtained between a reference frame image in a video and a current frame image is used to optimize the filtering information of the candidate object, so that the obtained filtering information of the candidate object in the current frame image can be obtained. It can more realistically reflect the attributes of each candidate object, so that a more accurate discrimination result can be obtained when determining the position of the target object in the current frame image of the video to be processed.
  • FIG. 6 is a flowchart of an object tracking apparatus according to another embodiment of the present disclosure.
  • the device in addition to the detection unit 610, the acquisition unit 620, the adjustment unit 630, and the determination unit 640, compared with the embodiment shown in FIG. 5, the device further includes a search unit 650.
  • the search unit 650 is used for For acquiring the search area in the current frame image, the detection unit 610 is configured to detect at least one candidate object in the current frame image in the video according to the target object in the reference frame image in the video in the search area.
  • the operation of obtaining the search area in the current frame image can estimate and assume the area where the target object may appear in the current frame image through a predetermined search algorithm.
  • the search unit 650 is further configured to determine a search area according to the filtering information of the target object in the current frame image.
  • the search unit 650 is configured to detect whether the filtering information of the target object is less than the first preset threshold; if the filtering information of the target object is less than the first preset threshold, gradually expand the search area according to a preset step size Until the expanded search area covers the current frame image; and / or, if the filtering information of the target object is greater than or equal to the first preset threshold, use the next frame image of the current frame image in the video as the current frame image to obtain the current frame Search area in the image.
  • the filtering information of the target object in the current frame image is compared with the first preset threshold.
  • the search area is expanded until it is expanded.
  • the subsequent search area covers the current frame image.
  • the entire current frame image may be covered with the same expanded search area as the current frame image, and
  • the entire next frame of image is covered with the enlarged search area.
  • the enlarged search area covers the entire next frame of image, so it does not There may be situations where the target object appears outside the search area and the target object cannot be tracked, and the target object can be tracked for a long time.
  • the search unit 650 is further configured to detect whether the filtering information of the target object is greater than a second preset threshold value after determining the target object of the current frame image in the enlarged search area; wherein the second preset threshold value Greater than the first preset threshold; if the filtering information of the target object is greater than the second preset threshold, obtaining the search area in the current frame image; and / or, if the filtering information of the target object is less than or equal to the second preset threshold, the video is The next frame image in the current frame image is used as the current frame image, and the enlarged search area is the search area in the current frame image.
  • the filtering information is compared with a second preset threshold.
  • the filtering information of the target object in the current frame image is greater than the second preset threshold, a search area in the current frame image is obtained, and in the search area, the The target object can restore the original object tracking method when the target object in the current frame image of the object tracking is not occluded and the target object does not leave the field of vision, that is, using a preset search algorithm to obtain the search area in the current frame image for the object Tracking can reduce the amount of data processing and increase the calculation speed.
  • the object tracking device further includes a recognition unit. After determining that the candidate information whose filtering information meets a predetermined condition is the target object of the current frame image, the recognition unit may further identify the category of the target object in the current frame image. Enhance the function of object tracking, expand the application scenarios of object tracking.
  • the object tracking device includes a neural network, and the object tracking method is performed by the neural network.
  • the god general network can be trained according to the sample images.
  • the sample images used for training the neural network may include positive samples and negative samples, where the positive samples include: a positive sample image in a preset training data set and a positive sample image in a preset test data set.
  • the preset training data set can use video sequences on Youtube BB and VID
  • the preset test data set can use detection data from ImageNet and COCO.
  • the neural network is trained by using the positive sample images in the test data set, which can increase the category of the positive samples, ensure the pan-Chinese performance of the neural network, and improve the discrimination ability of the object tracking.
  • the positive samples may further include: a positive sample obtained by performing data enhancement processing on the positive sample images in the preset test data set.
  • Sample image For example, in addition to conventional data enhancement processing such as translation, scale change, and illumination change, data enhancement processing for specific motion modes, such as motion blur, may also be used. This embodiment does not limit the method of data enhancement processing.
  • a positive sample image is obtained by performing data enhancement processing on the positive sample images in the test data set to train the neural network, which can increase the diversity of the positive sample images, improve the robustness of the neural network, and avoid overfitting. .
  • the negative samples may include a negative sample image of an object having the same category as the target object and / or a negative sample image of an object having a different category from the target object.
  • the negative sample image obtained from the positive sample images in the preset test data set may be an image selected from the background surrounding the target object in the positive sample images in the preset test data set; these two types of negative sample images usually have no semantics
  • the negative sample image of the object with the same category as the target object can be a frame of random images extracted from other videos or images, and the object in the image has the same category as the target object in the positive sample image;
  • a negative sample image of an object of a different category from the target object may be a frame of image randomly extracted from other videos or images, and the object in the image has a different category from the target object in the positive sample image; the two types of negative sample images Usually images with semantics.
  • This embodiment trains a neural network by using a negative sample image of an object of the same category as the target object and / or a negative sample image of an object of a different category from the target object, which can ensure a balanced distribution of positive and negative sample images and improve neural
  • the performance of the network improves the discriminative ability of object tracking.
  • the depth map obtained by stereo matching of the binocular image is used as the training data. "Label data.”
  • an embodiment of the present disclosure also provides an electronic device, which may be, for example, a mobile terminal, a personal computer (PC), a tablet computer, a server, and the like.
  • an electronic device which may be, for example, a mobile terminal, a personal computer (PC), a tablet computer, a server, and the like.
  • FIG. 7 illustrates a schematic structural diagram of an electronic device 700 suitable for implementing a terminal device or server according to an embodiment of the present disclosure.
  • the electronic device 700 includes one or more processors and a communication unit.
  • the one or more processors are, for example, one or more central processing units (CPUs) 701, and / or one or more image processors (GPUs) 713, etc.
  • CPUs central processing units
  • GPUs image processors
  • the processors may be stored in a read-only memory (ROM) 702 or executable instructions loaded from the storage section 708 into a random access memory (RAM) 703 to perform various appropriate actions and processes.
  • the communication unit 712 may include, but is not limited to, a network card.
  • the network card may include, but is not limited to, an IB (Infiniband) network card.
  • the processor may communicate with the read-only memory 702 and / or the random access memory 730 to execute executable instructions.
  • the communication unit 712 Connected to the communication unit 712 and communicated with other target devices via the communication unit 712, thereby completing operations corresponding to any of the methods provided in the embodiments of the present disclosure, for example, detecting the video according to the target object in a reference frame image in the video At least one candidate object in the current frame image; obtaining interference objects in at least one previous frame image in the video; adjusting filtering information of the at least one candidate object according to the obtained interference objects; determining the filtering information
  • the candidate object satisfying the predetermined condition is a target object of the current frame image.
  • the RAM 703 can also store various programs and data required for the operation of the device.
  • the CPU 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704.
  • ROM 702 is an optional module.
  • the RAM 703 stores executable instructions or writes executable instructions to the ROM 702 at runtime, and the executable instructions cause the central processing unit 701 to perform operations corresponding to the above-mentioned object tracking method.
  • An input / output (I / O) interface 705 is also connected to the bus 704.
  • the communication unit 712 may be provided in an integrated manner, or may be provided with a plurality of sub-modules (for example, a plurality of IB network cards) and connected on a bus link.
  • the following components are connected to the I / O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output portion 707 including a cathode ray tube (CRT), a liquid crystal display (LCD), and a speaker; a storage portion 708 including a hard disk and the like And a communication section 709 including a network interface card such as a LAN card, a modem, and the like.
  • the communication section 709 performs communication processing via a network such as the Internet.
  • the driver 710 is also connected to the I / O interface 705 as needed.
  • a removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 710 as needed, so that a computer program read out therefrom is installed into the storage section 708 as needed.
  • FIG. 7 is only an optional implementation manner. In practice, the number and types of components in FIG. 7 may be selected, deleted, added, or replaced according to actual needs. Functional settings can also be implemented in separate settings or integrated settings. For example, GPU713 and CPU701 can be set separately or GPU713 can be integrated on CPU701. The communication unit can be set separately or integrated on CPU701 or GPU713. Wait. These alternative embodiments all fall within the protection scope of the present disclosure.
  • embodiments of the present disclosure include a computer program product including a computer program tangibly embodied on a machine-readable medium, the computer program including program code for performing a method shown in a flowchart, and the program code may include a corresponding Executing the instructions corresponding to the method steps provided in the embodiments of the present disclosure, for example, detecting at least one candidate object in a current frame image in the video according to a target object in a reference frame image in the video; obtaining at least one The interference object in the first frame image; adjusting the filtering information of the at least one candidate object according to the obtained interference object; and determining the candidate object whose filtering information meets a predetermined condition is a target object of the current frame image.
  • the computer program may be downloaded and installed from a network through the communication section 709, and / or installed from a removable medium 711.
  • a central processing unit (CPU) 701 the above-mentioned functions defined in the method of the present disclosure are executed.
  • an embodiment of the present disclosure further provides a computer program program product for storing computer-readable instructions that, when executed, cause a computer to execute any of the foregoing possible implementation manners.
  • Image recovery method
  • the computer program product may be implemented by hardware, software, or a combination thereof.
  • the computer program product is embodied as a computer storage medium.
  • the computer program product is embodied as a software product, such as a Software Development Kit (SDK), or the like.
  • SDK Software Development Kit
  • an embodiment of the present disclosure further provides an object tracking method and a corresponding device, an electronic device, a computer storage medium, a computer program, and a computer program product.
  • the method includes: A device sends an object tracking instruction to a second device, and the instruction causes the second device to execute the object tracking method in any of the foregoing possible embodiments; the first device receives a result of the object tracking sent by the second device.
  • the object tracking instruction may be a calling instruction
  • the first device may instruct the second device to perform object tracking in a calling manner.
  • the second device may execute the above-mentioned object tracking method Steps and / or processes in any of the embodiments.
  • a plurality may refer to two or more, and “at least one” may refer to one, two, or more.
  • the methods and apparatus of the present disclosure may be implemented in many ways.
  • the methods and apparatuses of the present disclosure may be implemented by software, hardware, firmware or any combination of software, hardware, firmware.
  • the above order of the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order described above unless specifically stated otherwise.
  • the present disclosure may also be implemented as programs recorded in a recording medium, which programs include machine-readable instructions for implementing the method according to the present disclosure.
  • the present disclosure also covers a recording medium storing a program for executing a method according to the present disclosure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)
  • Closed-Circuit Television Systems (AREA)

Abstract

本公开实施例公开了一种对象跟踪方法及装置、电子设备及存储介质,其中,方法包括:根据视频中参考帧图像中的目标对象,检测所述视频中当前帧图像中的至少一备选对象;获取所述视频中至少一在先帧图像中的干扰对象;根据获取的干扰对象调整所述至少一备选对象的筛选信息;确定筛选信息满足预定条件的备选对象,为所述当前帧图像的目标对象。本公开实施例可以提升对象跟踪的判别能力。

Description

对象跟踪方法及装置、电子设备及存储介质
本公开要求在2018年08月07日提交中国专利局、申请号为CN201810893022.3、发明名称为“对象跟踪方法及装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。
技术领域
本公开涉及计算机视觉技术,尤其是一种对象跟踪方法及装置、电子设备及存储介质。
背景技术
目标跟踪是计算机视觉研究的热点之一,它在许多领域具有广泛的应用。例如:相机的跟踪对焦、无人机的自动目标跟踪、人体跟踪、交通监控系统中的车辆跟踪、人脸跟踪和智能交互系统中的手势跟踪等。
发明内容
本公开实施例提供一种对象跟踪技术方案。
根据本公开实施例的一个方面,提供一种对象跟踪方法,包括:
根据视频中参考帧图像中的目标对象,检测所述视频中当前帧图像中的至少一备选对象;
获取所述视频中至少一在先帧图像中的干扰对象;
根据获取的干扰对象调整所述至少一备选对象的筛选信息;
确定筛选信息满足预定条件的备选对象,为所述当前帧图像中的目标对象。
根据本公开实施例的另一个方面,提供一种对象跟踪装置,包括:
检测单元,用于根据视频中参考帧图像中的目标对象,检测所述视频中当前帧图像中的至少一备选对象;
获取单元,用于获取所述视频中至少一在先帧图像中的干扰对象;
调整单元,用于根据获取的干扰对象调整所述至少一备选对象的筛选信息;
确定单元,用于确定筛选信息满足预定条件的备选对象,为所述当前帧图像的目标对象。
根据本公开实施例的又一个方面,提供的一种电子设备,包括上述任一实施例所述的装置。
根据本公开实施例的再一个方面,提供的一种电子设备,包括:
存储器,用于存储可执行指令;以及
处理器,用于执行所述可执行指令从而完成上述任一实施例所述的方法。
根据本公开实施例的再一个方面,提供的一种计算机程序,包括计算机可读代码,当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现上述任一实施例所述方法的指令。
根据本公开实施例的再一个方面,提供的一种计算机存储介质,用于存储计算机可读指令,所述指令被执行时实现上述任一实施例所述的方法。
基于本公开上述实施例提供的对象跟踪方法及装置、电子设备、计算机程序及存储介质,通过根据视频中参考帧图像中的目标对象,检测视频中当前帧图像中的至少一备选对象,获取视频中至少一在先帧图像中的干扰对象,根据获取的干扰对象调整至少一备选对象的筛选信息,确定筛选信息满足预定条件的备选对象,为当前帧图像的目标对象,本公开实施例在对象跟踪过程中,利用当前帧图像之前的在先帧图像中的干扰对象,来调整备选对象的筛选信息,从而在利用备选对象的筛选信息来确定当前帧图像中的目标对象时,可以有效地抑制备选对象中的干扰对象,从备选对象中获取目标对象,从而在确定当前帧图像中的目标对象的过程中,可以有效抑制目标对象周围的干扰对象对判别结果造成的影响,提升;了目标对象跟踪的判别能力。
下面通过附图和实施例,对本公开的技术方案做进一步的详细描述。
附图说明
构成说明书的一部分的附图描述了本公开的实施例,并且连同描述一起用于解释本公开的原理。
参照附图,根据下面的详细描述,可以更加清楚地理解本公开,其中:
图1为本公开一些实施例的对象跟踪方法的流程图;
图2为本公开另一些实施例的对象跟踪方法的流程图;
图3为本公开又一些实施例的对象跟踪方法的流程图;
图4A至图4C为本公开一些实施例的对象跟踪方法的一个应用示例的示意图;
图4D及图4E为本公开一些实施例的对象跟踪方法的另一个应用示例的示意图;
图5为本公开一些实施例的对象跟踪装置的结构示意图;
图6为本公开另一些实施例的对象跟踪装置的结构示意图;
图7是本公开一些实施例提供的电子设备的结构示意图。
具体实施方式
现在将参照附图来详细描述本公开的各种示例性实施例。应注意到:除非另外说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本公开的范围。
还应理解,在本公开实施例中,“多个”可以指两个或两个以上,“至少一个”可以指一个、两个或两个以上。
本领域技术人员可以理解,本公开实施例中的“第一”、“第二”等术语仅用于区别不同步骤、设备或模块等,既不代表任何特定技术含义,也不表示它们之间的必然逻辑顺序。
还应理解,对于本公开实施例中提及的任一部件、数据或结构,在没有明确限定或者在前后文给出相反启示的情况下,一般可以理解为一个或多个。
还应理解,本公开对各个实施例的描述着重强调各个实施例之间的不同之处,其相同或相似之处可以相互参考,为了简洁,不再一一赘述。
同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本公开及其 应用或使用的任何限制。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
另外,公开中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本公开中字符“/”,一般表示前后关联对象是一种“或”的关系。
本公开实施例可以应用于计算机系统/服务器,其可与众多其它通用或专用计算系统环境或配置一起操作。适于与计算机系统/服务器一起使用的众所周知的计算系统、环境和/或配置的例子包括但不限于:个人计算机系统、服务器计算机系统、瘦客户机、厚客户机、手持或膝上设备、基于微处理器的系统、机顶盒、可编程消费电子产品、网络个人电脑、小型计算机系统﹑大型计算机系统和包括上述任何系统的分布式云计算技术环境,等等。
计算机系统/服务器可以在由计算机系统执行的计算机系统可执行指令(诸如程序模块)的一般语境下描述。通常,程序模块可以包括例程、程序、目标程序、组件、逻辑、数据结构等等,它们执行特定的任务或者实现特定的抽象数据类型。计算机系统/服务器可以在分布式云计算环境中实施,分布式云计算环境中,任务是由通过通信网络链接的远程处理设备执行的。在分布式云计算环境中,程序模块可以位于包括存储设备的本地或远程计算系统存储介质上。
图1为本公开一些实施例的对象跟踪方法的流程图。如图1所示,该方法包括:
102,根据视频中参考帧图像中的目标对象,检测视频中当前帧图像中的至少一备选对象。
在本实施例中,进行对象跟踪的视频可以是从视频采集设备获取的一段视频,例如:视频采集设备可以包括摄像机和摄像头等,也可以是从存储设备获取的一段视频,例如:存储设备可以包括光盘、硬盘和U盘等还可以是从网络服务器获取的一段视频,本实施例对待处理视频的获取方式不作限定。参考帧图像可以是视频中的首帧图像,也可以是对视频进行对象跟踪处理的首帧图像,还可以是视频的某个中间帧图像,本实施例对参考帧图像的选取不作限定。当前帧图像可以是视频中除参考帧图像外的一帧图像,它可以位于参考帧图像之前,也可以位于参考帧图像之后,本实施例对此不作限定。在一个可选的例子中,视频中的当前帧图像位于参考帧图像之后。
可选地,可以确定参考帧图像中的目标对象的图像与当前帧图像的相关性,根据相关性获得当前帧图像中至少一备选对象的检测框和筛选信息。在一个可选的例子中,可以根据参考帧图像中的目标对象的图像的第一特征与当前帧图像的第二特征,确定参考帧图像中的目标对象的图像与当前帧图像的相关性,例如:通过卷积处理获得相关性。本实施例对确定参考帧图像中的目标对象的图像与当前帧图像的相关性的方式不作限定。其中,备选对象的检测框例如可以通过非极大值抑制(non maximum suppression,NMS)的方式获得,备选对象的筛选信息例如可以是备选对象的检测框的得分、选中概率等信息,本实施例对根据相关性获得备选对象的检测框和筛选信息的方式不作限定。
在一个可选示例中,该操作102可以由处理器调用存储器存储的相应指令执行,也可 以由被处理器运行的检测单元执行。
104,获取视频中至少一在先帧图像中的干扰对象。
在本实施例中,在先帧图像可以包括:参考帧图像,和/或,位于参考帧图像与当前帧图像之间的至少一中间帧图像。
可选地,可以根据预设的干扰对象集合,获取视频中至少一在先帧图像中的干扰对象,可以通过预设干扰对象集合,在对视频中的每一帧图像进行对象跟踪处理时,将至少一备选对象中未确定为目标对象中的一个或多个备选对象,确定为当前帧图像中的干扰对象,放入干扰对象集合中。在一个可选的例子中,可以将未确定为目标对象的至少一备选对象备中,筛选信息满足干扰对象预定条件的备选对象,确定干扰对象,放入干扰对象集合中。例如:筛选信息为检测框的得分,干扰对象预定条件可以为检测框的得分大于预设阈值。
在一个可选的例子中,可以获取视频中所有在先帧图像中的干扰对象。
在一个可选示例中,该操作104可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的获取单元执行。
106,根据获取的干扰对象调整至少一备选对象的筛选信息。
可选地,可以确定至少一备选对象和获取的干扰对象之间的第一相似度,根据第一相似度调整至少一备选对象的筛选信息。在一个可选的例子中,可以根据至少一备选对象的特征和获取的干扰对象的特征,确定至少一备选对象和获取的干扰对象之间的第一相似度。在一个可选的例子中,筛选信息为检测框的得分,当备选对象与获取的干扰对象之间的第一相似度较高时,可以调低该备选对象的检测框的得分,反之,当备选对象与获取的干扰对象之间的第一相似度较低时,可以调高该备选对象的检测框的得分或者保持得分不变。
可选地,当获取的干扰对象的数量非一个时,可以通过计算备选对象与获取的所有干扰对象的相似度的加权平均值,利用该加权平均值来调整该备选对象的筛选信息,其中,加权平均值中各干扰对象的权重与该干扰对象对目标对象选取的干扰程度相关,例如:对目标对象选取的干扰越大的干扰对象的权重的数值也越大。在一个可选的例子中,筛选信息为检测框的得分,可以以备选对象与获取的干扰对象的相关系数来表示备选对象和获取的干扰对象之间的第一相似度,可以通过参考帧图像中的目标对象与备选对象的相关系数,与备选对象与获取的干扰对象的第一相似度的加权平均值的差值,来调整该备选对象的检测框的得分。
在一个可选示例中,该操作106可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的调整单元执行。
108,确定筛选信息满足预定条件的备选对象,为当前帧图像的目标对象。
可选地,可以确定筛选信息满足预定条件的备选对象的检测框,为当前帧图像的目标对像的检测框。在一个可选的例子中,筛选信息为检测框的得分,可以根据备选对象的检测框的得分对备选对象进行排序,将得分最高的备选对象的检侧框,作为当前帧图像的目标对象的检测框,从而确定当前帧图像中的目标对象。
可选地,还可以将备选对象的检测框的位置和形状,与视频中当前帧图像的前一帧图像中的目标对象的检测框的位置和形状进行比较,根据比较结果调整当前帧图像中的备选对象的检侧框的得分,并对调整后的当前帧图像中的备选对象的检侧框的得分重新进行排序,将重新排序后得分最高的备选对象的检测框,作为当前帧图像中的目标对象的检测框。 例如:对与前一帧图像相比,位置移动量较大,形状变化量较大的备选对象的检测框进行降低得分的调整。
可选地,在确定筛选信息满足预定条件的备选对象的检测框,为当前帧图像的目标对像的检测框之后,还可以在当前帧图像中显示目标对象的检测框,以在当前帧图像中标示出目标对象的位置。
在一个可选示例中,该操作108可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的确定单元执行。
基于本实施例提供的对象跟踪方法,通过根据视频中参考帧图像中的目标对象,检测视频中当前帧图像中的至少一备选对象,获取视频中至少一在先帧图像中的干扰对象,根据获取的干扰对象调整至少一备选对象的筛选信息,确定筛选信息满足预定条件的备选对象,为当前帧图像的目标对象,在对象跟踪过程中,利用当前帧图像之前的在先帧图像中的干扰对象,来调整备选对象的筛选信息,从而在利用备选对象的筛选信息来确定当前帧图像中的目标对象时,可以有效地抑制备选对象中的干扰对象,从备选对象中获取目标,从而在确定当前帧图像中的目标对象的过程中,可以有效抑制目标对象周围的干扰对象对判别结果造成的影响,提升对象跟踪的判别能力。
图4A至图4C为本公开一些实施例的对象跟踪方法的一个应用示例的示意图。如图4A至图4C所示,其中,图4A为对象跟踪的待处理视频的当前帧图像,在图4A中,方框a、b、d、e、f、g为当前帧图像中备选对象检测框,c方框为当前帧图像中目标对象的检测框,图4B为采用现有的对象跟踪方法获得的当前帧图像中备选对象的检测框的得分的示意图,从图4B中,可以看出我们期望获得最高得分的目标对象,即c方框对应的目标对象,由于受到干扰对象的影响而并未获得最高的得分,图4C为采用本公开一些实施例的对象跟踪方法获得的当前帧图像中备选对象的检测框的得分的示意图,从图4C中,可以看出我们期望获得最高得分的目标对象,即c方框对应的目标对象,获得了最高的得分,而它周围干扰对象的得分受到了抑制。
在一些实施例中,对象跟踪方法还可以获取视频中参考帧图像与当前帧图像之间的至少一中间帧图像中的目标对象,根据获取的至少一中间帧图像中的目标对象优化至少一备选对象的筛选信息。在一个可选的例子中,可以确定至少一备选对象和获取的至少一中间帧图像中的目标对象之间的第二相似度,然后根据第二相似度优化至少一备选对象的筛选信息。例如:可以根据至少一备选对象的特征和获取的至少一中间帧图像中的目标对象的特征,确定至少一备选对象和获取的至少一中间帧图像中的目标对象之间的第二相似度。
可选地,可以从视频中参考帧图像与当前帧图像之间的已经确定目标对象的至少一中间帧图像中获取目标对象。在一个可选的例子中,可以获取视频中参考帧图像与当前帧图像之间所有已经确定目标对象的中间帧图像中的目标对象。
可选地,当获取的目标对象的数量非一个时,可以通过计算备选对象与获取的所有目标对象的相似度的加权平均值,利用该加权平均值来优化该备选对象的筛选信息,其中,加权平均值中各目标对象的权重与该目标对象对当前帧图像中的目标对象选取的影响程度相关,例如:与当前帧图像时间越接近的一帧图像的目标对象的权重的数值也越大。在一个可选的例子中,筛选信息为检测框的得分,可以以备选对象与获取的干扰对象的相关系数来表示备选对象和获取的干扰对象之间的第一相似度,可以通过参考帧图像中的目标 对象与备选对象的相关系数与备选对象与获取的目标对象的第二相似度的加权平均值,与备选对象与获取的干扰对象的第一相似度的加权平均值的差值,来调整该备选对象的检测框的得分。
本实施例利用从视频中参考帧图像与当前帧图像之间获得的一中间帧图像的目标对象,来优化备选对象的筛选信息,可以使所获得的当前帧图像中备选对象的筛选信息能够更加真实的反映各备选对像的属性,从而在确定待处理视频当前帧图像中目标对象的位置时可以获得更加准确的判别结果。
在一些实施例中,在操作102根据视频中参考帧图像中的目标对象,检测视频中当前帧图像中的至少一备选对象之前,还可以获取当前帧图像中的搜索区域,以提高运算速度,操作102可以在当前帧图像中的搜索区域中,根据视频中参考帧图像中的目标对象,检测视频中当前帧图像中的至少一备选对象。其中,获取当前帧图像中的搜索区域的操作可以通过预定的搜索算法对当前帧图像中目标对象可能出现的区域进行估计和假设。
可选地,在操作108确定筛选信息满足预定条件的备选对象,为当前帧图像的目标对象之后,还可以根据当前帧图像中的目标对象的筛选信息,确定视频中当前帧图像的下一帧图像中的搜索区域。下面将结合图2,详细描述根据当前帧图像中的目标对象的筛选信息确定视频中当前帧图像的下一帧图像中的搜索区域的流程。如图2所示,该方法包括:
202,检测目标对象的筛选信息是否小于第一预设阈值。
可选地,第一预设阈值可以根据对目标对象的筛选信息与目标对象被遮挡或者离开视野的状态通过统计确定。在一个可选的例子中,筛选信息为目标对象的检测框的得分。
若目标对象的筛选信息小于第一预设阈值,执行操作204;和/或,若目标对象的筛选信息大于或等于第一预设阈值,执行操作206。
204,根据预设步长逐步扩大搜索区域,直到扩大后的搜索区域覆盖当前帧图像,以扩大后的搜索区域为当前帧图像的下一帧图像中的搜索区域。
可选地,在操作204之后,还可以以视频中当前帧图像的下一帧图像为当前帧图像,在扩大后的搜索区域中,确定当前帧图像的目标对象。
206,以视频中当前帧图像的下一帧图像为当前帧图像,获取当前帧图像中的搜索区域。
可选地,以视频中当前帧图像的下一帧图像为当前帧图像,获取当前帧图像中的搜索区域之后,还可以在当前帧图像中的搜索区域中,确定当前帧图像的目标对象。
在一个可选示例中,该操作202-206可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的搜索单元执行。
本实施例通过将当前帧图像中的目标对象的筛选信息与第一预设阈值进行比较,在当前帧图像中目标对象的筛选信息小于第一预设阈值时,对搜索区域进行扩大,直到扩大后的搜索区域覆盖所述当前帧图像,可以在对象跟踪的当前帧图像出现目标对象被遮挡或者目标对象离开视野时,利用与当前帧图像相同的扩大后的搜索区域覆盖整个当前帧图像,并在对下一帧图像进行对象跟踪时,利用扩大后的搜索区域覆盖整个下一帧图像,当目标对象在下一帧图像中出现时,由于扩大后的搜索区域覆盖整个下一帧图像,因此不会出现目标对象出现在搜索区域之外的区域而造成目标对象无法跟踪的情形,可以实现长时间对目标对象的跟踪。
在一些实施例中,在操作204根据预设步长逐步扩大所述搜索区域,直到扩大后的搜索区域覆盖所述当前帧图像之后,还可以以视频中当前帧图像的下一帧图像作为当前帧图像,获取扩大后的搜索区域为当前帧图像中的搜索区域,在扩大后的搜索区域中,确定当前帧图像的目标对象,并且还可以根据当前帧图像中目标对象的筛选信息,确定是否需要恢复当前帧图像中的搜索区域。下面将结合图3,详细描述根据当前帧图像中的目标对象的筛选信息确定恢复当前帧图像中的搜索区域的流程。如图3所示,该方法包括:
302,检测目标对象的筛选信息是否大于第二预设阈值。
其中,第二预设阈值大于第一预设阈值,第二预设阈值可以根据对目标对象的筛选信息与目标对象未遮挡和未离开视野的状态通过统计确定。
若目标对象的筛选信息大于第二预设阈值,执行操作304;和/或,目标对象的筛选信息小于或等于第二预设阈值,执行操作306。
304,获取所当前帧图像中的搜索区域。
可选地,在操作304之后,从在当前帧图像中的搜索区域中,确定当前帧图像的目标对象。
306,视频中当前帧图像的下一帧图像作为当前帧图像,获取扩大后的搜索区域为当前帧图像中的搜索区域。
其中,在以视频中当前帧图像的下一帧图像作为当前帧图像,获取扩大后的搜索区域为当前帧图像中的搜索区域像之后,还可以在扩大后的搜索区域中,确定当前帧图像的目标对象。
在一个可选示例中,该操作302-306可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的搜索单元执行。
本实施例在对根据当前帧图像中的目标对象的筛选信息扩大搜索区域后的下一帧图像进行对象跟踪时,将下一帧图像作为当前帧图像,然后将当前帧图像中的目标对象的筛选信息与第二预设阈值进行比较,在当前帧图像中的目标对象的筛选信息大于第二预设阈值时,获取当前帧图像中的搜索区域,并在搜索区域中,确定当前帧图像的目标对象,可以在对象跟踪的当前帧图像中的目标对象未被遮挡和目标对象未离开视野时,恢复原来的对象跟踪方法,即利用预设的搜索算法获取当前帧图像中的搜索区域进行对象跟踪,可以减少数据的处理量,提高运算速度。
图4D及图4E为本公开一些实施例的对象跟踪方法的另一个应用示例的示意图。如图4D及图4E所示,其中,图4D为进行对象跟踪的视频的四帧图像,在图4D中,四帧图像的序号分别为692、697、722和727,a方框为确定当前帧图像中搜索区域的搜索框,b方框为表示目标对象真实轮廓的方框,c方框为目标跟踪的检测框,从图4D中,可以看出697和722两帧图像的目标对象均不在视野范围内,因此对搜索区域进行了扩大,692和727两帧图像的目标对象又回到视野范围内,因此对搜索区域又恢复为正常的搜索区域。图4E为图4D中目标对象的得分的变化情况及目标对象与检测框的重叠情况的变化示意图。其中的相表示目标对象的得分的变化情况,e线表示目标对象与检测框的重叠情况,从图4D中,可以看出目标对象的得分在697时迅速减小,同时目标对象与检测框的重叠情况在697时也迅速减小,目标对象的得分在722时已经恢复成较大数值,目标对象与检测框的重叠情况在722时也迅速提升,因此利用对目标对象得分的判断可以改善目标对象不在视 野范围或者被遮挡时对象跟踪存在的问题。
在一些实施例中,操作108确定筛选信息满足预定条件的备选对象,为当前帧图像的目标对象之后,还可以识别当前帧图像中目标对象的类别,可以增强对象跟踪的功能,扩展对象跟踪的应用场景。
在一些实施例中,上述各实施例的对象跟踪方法可以通过神经网络执行。
可选地,在执行对象跟踪方法之前,可以根据样本图像对该神将网络进行训练。其中,用于训练神经网络的样本图像可以包括正样本和负样本,其中正样本包括:预设训练数据集中的正样本图像和预设测试数据集中的正样本图像。例如:预设训练数据集可以采用Youtube BB和VID上的视频序列,预设测试数据集可以采用来自ImageNet和COCO的检测数据。本实施例通过采用测试数据集中的正样本图像对神经网络进行训练,可以增加正样本的类别,保证神经网络的泛华性能,从而提升对象跟踪的判别能力。
可选地,正样本除了包括预设训练数据集中的正样本图像和预设测试数据集中的正样本图像外,还可以包括:对预设测试数据集中的正样本图像进行数据增强处理获得的正样本图像。例如:除了可以采用平移、尺度变化和光照变化等外常规的数据增强处理外,还可以采用运动模糊等针对特定运动模式的数据增强处理,本实施例对于数据增强处理的方法不作限定。本实施例通过采用对测试数据集中的正样本图像进行数据增强处理获得正样本图像对神经网络进行训练,可以增加正样本图像的多样性,提高神经网络的鲁棒性,避免过拟合的发生。
可选地,负样本可以包括:具有与目标对象相同类别的对象的负样本图像和/或具有与目标对象不同类别的对象的负样本图像。例如:根据预设测试数据集中的正样本图像获得的负样本图像,可以是选自预设测试数据集中正样本图像中目标对象周围的背景的图像;这两类负样本图像通常是不具有语义的图像;而具有与目标对象相同类别的对象的负样本图像,可以是随机从其他视频或者图像中抽取一帧图像,该图像中的对象与正样本图像中的目标对象具有相同的类别;具有与目标对象不同类别的对象的负样本图像,可以是随机从其他视频或者图像中抽取一帧图像,该图像中的对象与正样本图像中的目标对象具有不同的类别;这两类负样本图像通常是具有语义的图像。本实施例通过采用具有与目标对象相同类别的对象的负样本图像和/或具有与目标对象不同类别的对象的负样本图像对神经网络进行训练,可以保证正负样本图像的分布均衡,改善神经网络的性能,从而提升对象跟踪的判别能力。
本公开实施例提供的任一种对象跟踪方法可以由任意适当的具有数据处理能力的设备执行,包括但不限于:终端设备和服务器等。或者,本公开实施例提供的任一种对象跟踪方法可以由处理器执行,如处理器通过调用存储器存储的相应指令来执行本公开实施例提及的任一种对象跟踪方法。下文不再赘述。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
图5为本公开一些实施例的对象跟踪装置的流程图。如图5所示,该装置包括:检测单元510、获取单元520、调整单元530和确定单元540。其中:
检测单元510,用于根据视频中参考帧图像中的目标对象,检测视频中当前帧图像中的至少一备选对象。
在本实施例中,进行对象跟踪的视频可以是从视频采集设备获取的一段视频,例如:视频采集设备可以包括摄像机和摄像头等,也可以是从存储设备获取的一段视频,例如:存储设备可以包括光盘、硬盘和U盘等还可以是从网络服务器获取的一段视频;本实施例对待处理视频的获取方式不作限定。参考帧图像可以是视频中的首帧图像,也可以是对视频进行对象跟踪处理的首帧图像,还可以是视频的某个中间帧图像,本实施例对参考帧图像的选取不作限定。当前帧图像可以是视频中除参考帧图像外的一帧图像,它可以位于参考帧图像之前,也可以位于参考帧图像之后,本实施例对此不作限定。在一个可选的例子中,视频中的当前帧图像位于参考帧图像之后。
可选地,检测单元510可以确定参考帧图像中的目标对象的图像与当前帧图像的相关性,根据相关性获得当前帧图像中至少一备选对象的检测框和筛选信息。在一个可选的例子中,检测单元510可以根据参考帧图像中的目标对象的第一特征与当前帧图像的第二特征,确定参考帧图像中的目标对象的图像与当前帧图像的相关性例如:通过卷积处理获得相关性。本实施例对确定参考帧图像中的目标对象的图像与当前帧图像的相关性的方式不作限定。其中,备选对象的检测框例如可以通过非极大值抑制(non maximum suppression,NMS)的方式获得,备选对象的筛选信息,是与备选对象本身的性质有关的信息,可以根据这些信息将该备选对象与其他备选对象相区别,例如可以是备选对象的检测框的得分、选中概率等信息,其中检测框的得分和选中概率可以是根据相关性获得的备选对象的相关系数相关系数,本实施例对根据相关性获得备选对象的检测框和筛选信息的方式不作限定。
获取单元520,用于获取视频中至少一在先帧图像中的干扰对象。
在本实施例中,在先帧图像可以包括:参考帧图像,和/或,位于参考帧图像与当前帧图像之间的至少一中间帧图像。
可选地,获取单元520可以根据预设的干扰对象集合,获取视频中至少一在先帧图像中的干扰对象,可以通过预设干扰对象集合,在对视频中的每一帧图像进行对象跟踪处理时,将至少一备选对象中未确定为目标对象的一个或多个备选对象,确定为当前帧图像中的干扰对象,放入干扰对象集合中。在一个可选的例子中,可以将未确定为目标对象的至少一备选对象备中,筛选信息满足干扰对象预定条件的备选对象,确定干扰对象,放入干扰对象集合中。例如:筛选信息为检测框的得分,干扰对象预定条件可以为检测框的得分大于预设阈值。
在一个可选的例子中,获取单元520可以获取视频中所有在先帧图像中的干扰对象。
调整单元530,用于根据获取的干扰对象调整至少一备选对象的筛选信息。
可选地,调整单元530可以确定至少一备选对象和获取的干扰对象之间的第一相似度,根据第一相似度调整至少一备选对象的筛选信息。在一个可选的例子中,调整单元530可以根据至少一备选对象的特征和获取的干扰对象的特征,确定至少一备选对象和获取的干扰对象之间的第一相似度。在一个可选的例子中,筛选信息为检测框的得分,当备选对象与获取的干扰对象之间的第一相似度较高时,可以调低该备选对象的检测框的得分,反之,当备选对象与获取的干扰对象之间的第一相似度较低时,可以调高该备选对象的检测框的得分或者保持得分不变。
可选地,当获取的干扰对象的数量非一个时,可以通过计算备选对象与获取的所有干扰对象的相似度的加权平均值,利用该加权平均值来调整该备选对象的筛选信息,其中,加权平均值中各干扰对象的权重与该干扰对象对目标对象选取的干扰程度相关,例如:对目标对象选取的干扰越大的干扰对象的权重的数值也越大。在一个可选的例子中,筛选信息为检测框的得分,可以以备选对象与获取的干扰对象的相关系数来表示备选对象和获取的干扰对象之间的第一相似度,可以通过参考帧图像中的目标对象与备选对象的相关系数,与备选对象与获取的干扰对象的第一相似度的加权平均值的差值,来调整该备选对象的检测框的得分。
确定单元540,用于确定筛选信息满足预定条件的备选对象,为当前帧图像的目标对象。
可选地,确定单元540可以确定筛选信息满足预定条件的备选对象的检测框,为当前帧图像的目标对像的检测框。在一个可选的例子中,筛选信息为检测框的得分,可以根据备选对象的检测框的得分对备选对象进行排序,将得分最高的备选对象的检侧框,作为当前帧图像的目标对象的检测框,从而确定当前帧图像中的目标对象。
可选地,还可以将备选对象的检测框的位置和形状,与视频中当前帧图像的前一帧图像中的目标对象的检测框的位置和形状进行比较,根据比较结果调整当前帧图像中的备选对象的检侧框的得分,并对调整后的当前帧图像中的备选对象的检侧框的得分重新进行排序,将重新排序后得分最高的备选对象的检测框,作为当前帧图像中的目标对象的检测框。例如:对与前一帧图像相比,位置移动量较大,形状变化量较大的备选对象的检测框进行降低得分的调整。
可选地,该装置还可以包括:显示单元,在确定筛选信息满足预定条件的备选对象的检测框,为当前帧图像的目标对像的检测框之后,显示单元还可以在当前帧图像中显示目标对象的检测框,以在当前帧图像中标示出目标对象的位置。
基于本实施例提供的对象跟踪装置,通过根据视频中参考帧图像中的目标对象,检测视频中当前帧图像中的至少一备选对象,获取视频中至少一在先帧图像中的干扰对象,根据获取的干扰对象调整至少一备选对象的筛选信息,确定筛选信息满足预定条件的备选对象,为当前帧图像的目标对象,在对象跟踪过程中,利用当前帧图像之前的在先帧图像中的干扰对象,来调整备选对象的筛选信息,从而在利用备选对象的筛选信息来确定当前帧图像中的目标对象时,可以有效地抑制备选对象中的干扰对象,从备选对象中获取目标,从而在确定当前帧图像中的目标对象的过程中,可以有效抑制目标对象周围的干扰对象对判别结果造成的影响,提升对象跟踪的判别能力。
在一些实施例中,获取单元520还可以获取视频中参考帧图像与当前帧图像之间的至少一中间帧图像中的目标对象,该装置还可以包括优化单元,用于根据获取的至少一中间帧图像中目标对象优化至少一备选对象的筛选信息。在一个可选的例子中,优化单元可以确定至少一备选对象和获取的至少一中间帧图像中目标对象之间的第二相似度,然后根据第二相似度优化至少一备选对象的筛选信息。例如:优化单元可以根据至少一备选对象的特征和获取的至少一中间帧图像中目标对象的特征,确定至少一备选对象和获取的至少一中间帧图像中目标对象之间的第二相似度。
可选地,获取单元520可以从视频中参考帧图像与当前帧图像之间的已经确定目标对 象的至少一中间帧图像中获取目标对象。在一个可选的例子中,获取单元520可以获取视频中参考帧图像与当前帧图像之间所有已经确定目标对象的中间帧图像中的目标对象。
可选地,当获取的目标对象的数量非一个时,可以通过计算备选对象与获取的所有目标对象的相似度的加权平均值,利用该加权平均值来优化该备选对象的筛选信息,其中,加权平均值中各目标对象的权重与该目标对象对当前帧图像中的目标对象选取的影响程度相关,例如:与当前帧图像时间越接近的一帧图像的目标对象的权重的数值也越大。在一个可选的例子中,筛选信息为检测框的得分,可以以备选对象与获取的干扰对象的相关系数来表示备选对象和获取的干扰对象之间的第一相似度,可以通过参考帧图像中的目标对象与备选对象的相关系数与备选对象与获取的目标对象的第二相似度的加权平均值,与备选对象与获取的干扰对象的第一相似度的加权平均值的差值,来调整该备选对象的检测框的得分。
本实施例利用从视频中参考帧图像与当前帧图像之间获得的一中间帧图像的目标对象,来优化备选对象的筛选信息,可以使所获得的当前帧图像中备选对象的筛选信息能够更加真实的反映各备选对像的属性,从而在确定待处理视频当前帧图像中目标对象的位置时可以获得更加准确的判别结果。
图6为本公开另一些实施例的对象跟踪装置的流程图。如图6所示,该装置除了包括检测单元610、获取单元620、调整单元630和确定单元640外,与图5所示的实施例相比,该装置还包括搜索单元650,搜索单元650用于获取当前帧图像中的搜索区域,检测单元610用于在搜索区域中,根据视频中参考帧图像中的目标对象,检测视频中当前帧图像中的至少一备选对象。其中,获取当前帧图像中的搜索区域的操作可以通过预定的搜索算法对当前帧图像中目标对象可能出现的区域进行估计和假设。
可选地,搜索单元650,还用于根据当前帧图像中的目标对象的筛选信息,确定搜索区域。
在一些实施例中,搜索单元650,用于检测目标对象的筛选信息是否小于第一预设阈值;若目标对象的筛选信息小于第一预设阈值,根据预设步长逐步扩大所述搜索区域,直到扩大后的搜索区域覆盖当前帧图像;和/或,若目标对象的筛选信息大于或等于第一预设阈值,以视频中当前帧图像的下一帧图像作为当前帧图像,获取当前帧图像中的搜索区域。
本实施例通过将当前帧图像中的目标对象的筛选信息与第一预设阈值进行比较,在当前帧图像中目标对象的筛选信息小于第一预设阈值时,对搜索区域进行扩大,直到扩大后的搜索区域覆盖所述当前帧图像,可以在对象跟踪的当前帧图像出现目标对象被遮挡或者目标对象离开视野时,利用与当前帧图像相同的扩大后的搜索区域覆盖整个当前帧图像,并在对下一帧图像进行对象跟踪时,利用扩大后的搜索区域覆盖整个下一帧图像,当目标对象在下一帧图像中出现时,由于扩大后的搜索区域覆盖整个下一帧图像,因此不会出现目标对象出现在搜索区域之外的区域而造成目标对象无法跟踪的情形,可以实现长时间对目标对象的跟踪。
在一些实施例中,搜索单元650,还用于在扩大后的搜索区域中,确定当前帧图像的目标对象后,检测目标对象的筛选信息是否大于第二预设阈值;其中第二预设阈值大于第一预设阈值;若目标对象的筛选信息大于第二预设阈值,获取当前帧图像中的搜索区域;和/或,若目标对象的筛选信息小于或等于第二预设阈值,以视频中当前帧图像的下一帧图 像作为当前帧图像,获取扩大后的搜索区域为当前帧图像中的搜索区域。
本实施例在对根据当前帧图像中的目标对象的筛选信息扩大搜索区域后的下一帧图像进行对象跟踪时,将下一帧图像作为当前帧图像,然后将当前帧图像中的目标对象的筛选信息与第二预设阈值进行比较,在当前帧图像中的目标对象的筛选信息大于第二预设阈值时,获取当前帧图像中的搜索区域,并在搜索区域中,确定当前帧图像的目标对象,可以在对象跟踪的当前帧图像中的目标对象未被遮挡和目标对象未离开视野时,恢复原来的对象跟踪方法,即利用预设的搜索算法获取当前帧图像中的搜索区域进行对象跟踪,可以减少数据的处理量,提高运算速度。
在一些实施例中,对象跟踪装置还包括识别单元,在确定筛选信息满足预定条件的备选对象,为当前帧图像的目标对象之后,识别单元还可以识别当前帧图像中目标对象的类别,可以增强对象跟踪的功能,扩展对象跟踪的应用场景。
在一些实施例中,对象跟踪装置包括神经网络,通过神经网络执行对象跟踪方法。
可选地,在执行对象跟踪方法之前,可以根据样本图像对该神将网络进行训练。其中,用于训练神经网络的样本图像可以包括正样本和负样本,其中正样本包括:预设训练数据集中的正样本图像和预设测试数据集中的正样本图像。例如:预设训练数据集可以采用Youtube BB和VID上的视频序列,预设测试数据集可以采用来自ImageNet和COCO的检测数据。本实施例通过采用测试数据集中的正样本图像对神经网络进行训练,可以增加正样本的类别,保证神经网络的泛华性能,从而提升对象跟踪的判别能力。
可选地,正样本除了包括预设训练数据集中的正样本图像和预设测试数据集中的正样本图像外,还可以包括:对预设测试数据集中的正样本图像进行数据增强处理获得的正样本图像。例如:除了可以采用平移、尺度变化和光照变化等外常规的数据增强处理外,还可以采用运动模糊等针对特定运动模式的数据增强处理,本实施例对于数据增强处理的方法不作限定。本实施例通过采用对测试数据集中的正样本图像进行数据增强处理获得正样本图像对神经网络进行训练,可以增加正样本图像的多样性,提高神经网络的鲁棒性,避免过拟合的发生。
可选地,负样本可以包括:具有与目标对象相同类别的对象的负样本图像和/或具有与目标对象不同类别的对象的负样本图像。例如:根据预设测试数据集中的正样本图像获得的负样本图像,可以是选自预设测试数据集中正样本图像中目标对象周围的背景的图像;这两类负样本图像通常是不具有语义的图像;而具有与目标对象相同类别的对象的负样本图像,可以是随机从其他视频或者图像中抽取一帧图像,该图像中的对象与正样本图像中的目标对象具有相同的类别;具有与目标对象不同类别的对象的负样本图像,可以是随机从其他视频或者图像中抽取一帧图像,该图像中的对象与正样本图像中的目标对象具有不同的类别;这两类负样本图像通常是具有语义的图像。本实施例通过采用具有与目标对象相同类别的对象的负样本图像和/或具有与目标对象不同类别的对象的负样本图像对神经网络进行训练,可以保证正负样本图像的分布均衡,改善神经网络的性能,从而提升对象跟踪的判别能力。
在一个可选的例子中,由于采用其它方法获得的训练数据的“标注数据”比较稀疏,即深度图中有效的像素值比较少,因此采用双目图像立体匹配获得的深度图作为训练数据的“标注数据”。
另外,本公开实施例还提供了一种电子设备,例如可以是移动终端、个人计算机(PC)、平板电脑、服务器等。下面参考图7,其示出了适于用来实现本公开实施例的终端设备或服务器的电子设备700的结构示意图:如图7所示,电子设备700包括一个或多个处理器、通信部等,所述一个或多个处理器例如:一个或多个中央处理单元(CPU)701,和/或一个或多个图像处理器(GPU)713等,处理器可以根据存储在只读存储器(ROM)702中的可执行指令或者从存储部分708加载到随机访问存储器(RAM)703中的可执行指令而执行各种适当的动作和处理。通信部712可包括但不限于网卡,所述网卡可包括但不限于IB(Infiniband)网卡,处理器可与只读存储器702和/或随机访问存储器730中通信以执行可执行指令,通过总线704与通信部712相连、并经通信部712与其他目标设备通信,从而完成本公开实施例提供的任一项方法对应的操作,例如,根据视频中参考帧图像中的目标对象,检测所述视频中当前帧图像中的至少一备选对象;获取所述视频中至少一在先帧图像中的干扰对象;根据获取的干扰对象调整所述至少一备选对象的筛选信息;确定所述筛选信息满足预定条件的备选对象,为所述当前帧图像的目标对象。
此外,在RAM 703中,还可存储有装置操作所需的各种程序和数据。CPU701、ROM702以及RAM703通过总线704彼此相连。在有RAM703的情况下,ROM702为可选模块。RAM703存储可执行指令,或在运行时向ROM702中写入可执行指令,可执行指令使中央处理单元701执行上述对象跟踪方法对应的操作。输入/输出(I/O)接口705也连接至总线704。通信部712可以集成设置,也可以设置为具有多个子模块(例如多个IB网卡),并在总线链接上。
以下部件连接至I/O接口705:包括键盘、鼠标等的输入部分706;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分707;包括硬盘等的存储部分708;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分709。通信部分709经由诸如因特网的网络执行通信处理。驱动器710也根据需要连接至I/O接口705。可拆卸介质711,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器710上,以便于从其上读出的计算机程序根据需要被安装入存储部分708。
需要说明的,如图7所示的架构仅为一种可选实现方式,在实践过程中,可根据实际需要对上述图7的部件数量和类型进行选择、删减、增加或替换;在不同功能部件设置上,也可采用分离设置或集成设置等实现方式,例如GPU713和CPU701可分离设置或者可将GPU713集成在CPU701上,通信部可分离设置,也可集成设置在CPU701或GPU713上,等等。这些可替换的实施方式均落入本公开公开的保护范围。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括有形地包含在机器可读介质上的计算机程序,计算机程序包含用于执行流程图所示的方法的程序代码,程序代码可包括对应执行本公开实施例提供的方法步骤对应的指令,例如,根据视频中参考帧图像中的目标对象,检测所述视频中当前帧图像中的至少一备选对象;获取所述视频中至少一在先帧图像中的干扰对象;根据获取的干扰对象调整所述至少一备选对象的筛选信息;确定筛选信息满足预定条件的备选对象,为所述当前帧图像的目标对象。在这样的实施例中,该计算机程序可以通过通信部分709从网络上被下载和安装,和/或从可拆卸介质711被安装。在该计算机程序被中央处理单元(CPU)701执行时,执行本公开的方法中限定的上 述功能。
在一个或多个可选实施方式中,本公开实施例还提供了一种计算机程序程序产品,用于存储计算机可读指令,该指令被执行时使得计算机执行上述任一可能的实现方式中的图像恢复方法。
该计算机程序产品可以通过硬件、软件或其结合的方式实现。在一个可选例子中,该计算机程序产品体现为计算机存储介质,在另一个可选例子中,该计算机程序产品体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。
在一个或多个可选实施方式中,本公开实施例还提供了一种对象跟踪方法及其对应的装置、电子设备、计算机存储介质、计算机程序以及计算机程序产品,其中,该方法包括:第一装置向第二装置发送对象跟踪指示,该指示使得第二装置执行上述任一可能的实施例中的对象跟踪方法;第一装置接收第二装置发送的对象跟踪的结果。
在一些实施例中,该对象跟踪指示可以为调用指令,第一装置可以通过调用的方式指示第二装置执行对象跟踪,相应地,响应于接收到调用指令,第二装置可以执行上述对象跟踪方法中的任意实施例中的步骤和/或流程。
应理解,本公开实施例中的“第一”、“第二”等术语仅仅是为了区分,而不应理解成对本公开实施例的限定。
还应理解,在本公开中,“多个”可以指两个或两个以上,“至少一个”可以指一个、两个或两个以上。
还应理解,对于本公开中提及的任一部件、数据或结构,在没有明确限定或者在前后文给出相反启示的情况下,一般可以理解为一个或多个。
还应理解,本公开对各个实施例的描述着重强调各个实施例之间的不同之处,其相同或相似之处可以相互参考,为了简洁,不再一一赘述。
本说明书中各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似的部分相互参见即可。对于系统实施例而言,由于其与方法实施例基本对应,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
可能以许多方式来实现本公开的方法和装置。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本公开的方法和装置。用于所述方法的步骤的上述顺序仅是为了进行说明,本公开的方法的步骤不限于以上描述的顺序,除非以其它方式特别说明。此外,在一些实施例中,还可将本公开实施为记录在记录介质中的程序,这些程序包括用于实现根据本公开的方法的机器可读指令。因而,本公开还覆盖存储用于执行根据本公开的方法的程序的记录介质。
本公开的描述是为了示例和描述起见而给出的,而并不是无遗漏的或者将本公开限于所公开的形式。很多修改和变化对于本领域的普通技术人员而言是显然的。选择和描述实施例是为了更好说明本公开的原理和实际应用,并且使本领域的普通技术人员能够理解本公开从而设计适于特定用途的带有各种修改的各种实施例。

Claims (44)

  1. 一种对象跟踪方法,其特征在于,包括:
    根据视频中参考帧图像中的目标对象,检测所述视频中当前帧图像中的至少一备选对象;
    获取所述视频中至少一在先帧图像中的干扰对象;
    根据获取的干扰对象调整所述至少一备选对象的筛选信息;
    确定筛选信息满足预定条件的备选对象,为所述当前帧图像中的目标对象。
  2. 根据权利要求1所述的方法,其特征在于,所述视频中的所述当前帧图像位于所述参考帧图像之后;
    所述在先帧图像包括:所述参考帧图像,和/或,位于所述参考帧图像与所述当前帧图像之间的至少一中间帧图像。
  3. 根据权利要求1或2所述的方法,其特征在于,还包括:
    将所述至少一备选对象中未确定为目标对象的一个或多个备选对象,确定为所述当前帧图像中的干扰对象。
  4. 根据权利要求1至3中任意一项所述的方法,其特征在于,所述根据获取的干扰对象调整所述至少一备选对象的筛选信息,包括:
    确定所述至少一备选对象和获取的干扰对象之间的第一相似度;
    根据所述第一相似度调整所述至少一备选对象的筛选信息。
  5. 根据权利要求4所述的方法,其特征在于,所述确定所述至少一备选对象和获取的干扰对象之间的第一相似度,包括:
    根据所述至少一备选对象的特征和获取的干扰对象的特征确定所述第一相似度。
  6. 根据权利要求1至5中任意一项所述的方法,其特征在于,还包括:
    获取所述视频中所述参考帧图像与所述当前帧图像之间的至少一中间帧图像中的目标对象;
    根据所述至少一中间帧图像中的目标对象优化所述至少一备选对象的筛选信息。
  7. 根据权利要求6所述的方法,其特征在于,所述根据所述至少一中间帧图像中的目标对象优化所述至少一备选对象的筛选信息,包括:
    确定所述至少一备选对象和所述至少一中间帧图像中的目标对象之间的第二相似度;
    根据所述第二相似度优化所述至少一备选对象的筛选信息。
  8. 根据权利要求7所述的方法,其特征在于,所述确定所述至少一备选对象和所述至少一中间帧图像中的目标对象之间的第二相似度,包括:
    根据所述至少一备选对象的特征和所述至少一中间帧图像中的目标对象的特征确定所述第二相似度。
  9. 根据权利要求1至8中任意一项所述的方法,其特征在于,所述根据视频中参考帧图像中的目标对象,检测所述视频中当前帧图像中的至少一备选对象,包括:
    确定所述参考帧图像中的目标对象的图像与所述当前帧图像的相关性;
    根据所述相关性获得所述当前帧图像中至少一备选对象的检测框和所述筛选信息。
  10. 根据权利要求9所述的方法,其特征在于,所述确定所述参考帧图像中的目标对象的图像与所述当前帧图像的相关性,包括:
    根据所述参考帧图像中的目标对象的图像的第一特征与所述当前帧图像的第二特征确定所述相关性。
  11. 根据权利要求9或10所述的方法,其特征在于,所述确定筛选信息满足预定条件的备选对象,为所述当前帧图像中的目标对象,包括:
    确定筛选信息满足预定条件的备选对象的检测框,为所述当前帧图像的目标对像的检测框。
  12. 根据权利要求11所述的方法,其特征在于,所述确定筛选信息满足预定条件的备选对象的检测框,为所述当前帧图像的目标对像的检测框之后,还包括:
    在所述当前帧图像中显示所述目标对象的检测框。
  13. 根据权利要求1至12中任意一项所述的方法,其特征在于,所述根据视频中参考帧图像中的目标对象,检测所述视频中当前帧图像中的至少一备选对象之前,还包括:
    获取所述当前帧图像中的搜索区域;
    所述根据视频中参考帧图像中的目标对象,检测所述视频中当前帧图像中的至少一备选对象,包括:
    在所述当前帧图像中的搜索区域中,根据视频中参考帧图像中的目标对象,检测所述视频中当前帧图像中的至少一备选对象。
  14. 根据权利要求1至13中任意一项所述的方法,其特征在于,所述确定筛选信息满足预定条件的备选对象,为所述当前帧图像的目标对象之后,还包括:
    根据所述当前帧图像中的目标对象的筛选信息,确定所述视频中所述当前帧图像的下一帧图像中的搜索区域。
  15. 根据权利要求14所述的方法,其特征在于,所述根据所述当前帧图像中的目标对象的筛选信息,确定所述视频中所述当前帧图像的下一帧图像中的搜索区域,包括:
    检测所述目标对象的筛选信息是否小于第一预设阈值;
    若所述目标对象的筛选信息小于第一预设阈值,根据预设步长逐步扩大所述搜索区域,直到所述扩大后的搜索区域覆盖所述当前帧图像,以所述扩大后的搜索区域为所述当前帧图像的下一帧图像中的搜索区域;和/或,
    若所述目标对象的筛选信息大于或等于第一预设阈值,以所述视频中所述当前帧图像的下一帧图像作为当前帧图像,获取所述当前帧图像中的搜索区域。
  16. 根据权利要求15所述的方法,其特征在于,所述根据预设步长逐步扩大所述搜索区域,直到所述者扩大后的搜索区域覆盖所述当前帧图像之后,还包括:
    以所述视频中所述当前帧图像的下一帧图像作为当前帧图像;
    在所述扩大后的搜索区域中,确定所述当前帧图像的目标对象;
    检测所述目标对象的筛选信息是否大于第二预设阈值;其中所述第二预设阈值大于所述第一预设阈值;
    若所述目标对象的筛选信息大于第二预设阈值,获取所述当前帧图像中的搜索区域;和/或,
    若所述目标对象的筛选信息小于或等于第二预设阈值,以所述视频中所述当前帧图像的下一帧图像为当前帧图像,获取所述扩大后的搜索区域为所述当前帧图像中的搜索区域。
  17. 根据权利要求1至16中任意一项所述的方法,其特征在于,所述确定筛选信息 满足预定条件的备选对象,为所述当前帧图像的目标对象之后,还包括:
    识别所述当前帧图像中的目标对象的类别。
  18. 根据权利要求1至17中任意一项所述的方法,其特征在于,通过神经网络执行所述对象跟踪方法,所述神经网络根据样本图像训练获得,所述样本图像包括正样本和负样本,所述正样本包括:预设训练数据集中的正样本图像和预设测试数据集中的正样本图像。
  19. 根据权利要求18所述的方法,其特征在于,所述正样本还包括:对所述预设测试数据集中的正样本图像进行数据增强处理获得的正样本图像。
  20. 根据权利要求18或19所述的方法,其特征在于,所述负样本包括:具有与所述目标对象相同类别的对象的负样本图像,和/或,具有与所述目标对象不同类别的对象的负样本图像。
  21. 一种对象跟踪装置,其特征在于,包括:
    检测单元,用于根据视频中参考帧图像中的目标对象,检测所述视频中当前帧图像中的至少一备选对象;
    获取单元,用于获取所述视频中至少一在先帧图像中的干扰对象;
    调整单元,用于根据获取的干扰对象调整所述至少一备选对象的筛选信息;
    确定单元,用于确定筛选信息满足预定条件的备选对象,为所述当前帧图像的目标对象。
  22. 根据权利要求21所述的装置,其特征在于,所述视频中的所述当前帧图像位于所述参考帧图像之后;
    所述在先帧图像包括:所述参考帧图像,和/或,位于所述参考帧图像与所述当前帧图像之间的至少一中间帧图像。
  23. 根据权利要求21或22所述的装置,其特征在于,所述确定单元,还用于将所述至少一备选对象中未确定为目标对象的一个或多个备选对象,确定为所述当前帧图像中的干扰对象。
  24. 根据权利要求21至23中任意一项所述的装置,其特征在于,所述调整单元,用于确定所述至少一备选对象和获取的干扰对象之间的第一相似度;以及根据所述第一相似度调整所述至少一备选对象的筛选信息。
  25. 根据权利要求24所述的装置,其特征在于,所述调整单元,用于根据所述至少一备选对象的特征和获取的干扰对象的特征确定所述第一相似度。
  26. 根据权利要求21至25中任意一项所述的装置,其特征在于,所述获取单元,还用于获取所述视频中所述参考帧图像与所述当前帧图像之间的至少一中间帧图像中的目标对象;
    所述装置还包括:
    优化单元,用于根据所述至少一中间帧图像中的目标对象优化所述至少一备选对象的筛选信息。
  27. 根据权利要求26所述的装置,其特征在于,所述优化单元,用于确定所述至少一备选对象和所述至少一中间帧图像中的目标对象之间的第二相似度;以及根据所述第二相似度优化所述至少一备选对象的筛选信息。
  28. 根据权利要求27所述的装置,其特征在于,所述优化单元,用于根据所述至少一备选对象的特征和所述至少一中间帧图像中的目标对象的特征确定所述第二相似度。
  29. 根据权利要求21至28中任意一项所述的装置,其特征在于,所述检测单元,用于确定所述参考帧图像中的目标对象的图像与所述当前帧图像的相关性;以及根据所述相关性获得所述当前帧图像中至少一备选对象的检测框和所述筛选信息。
  30. 根据权利要求29所述的装置,其特征在于,所述检测单元,用于根据所述参考帧图像中的目标对象的图像的第一特征与所述当前帧图像的第二特征确定所述相关性。
  31. 根据权利要求29或30所述的装置,其特征在于,所述确定单元,用于确定筛选信息满足预定条件的备选对象的检测框,为所述当前帧图像的目标对像的检测框。
  32. 根据权利要求31所述的装置,其特征在于,还包括:
    显示单元,用于在所述当前帧图像中显示所述目标对象的检测框。
  33. 根据权利要求21至32中任意一项所述的装置,其特征在于,还包括:
    搜索单元,用于获取所述当前帧图像中的搜索区域;
    所述检测单元,用于在所述当前帧图像中的搜索区域中,根据视频中参考帧图像中的目标对象,检测所述视频中当前帧图像中的至少一备选对象。
  34. 根据权利要求21至33中任意一项所述的装置,其特征在于,所述搜索单元,还用于根据所述当前帧图像中的目标对象的筛选信息,确定所述视频中所述当前帧图像的下一帧图像中的搜索区域。
  35. 根据权利要求21至34中任意一项所述的装置,其特征在于,所述搜索单元,用于检测所述目标对象的筛选信息是否小于第一预设阈值;若所述目标对象的筛选信息小于第一预设阈值,根据预设步长逐步扩大所述搜索区域,直到所述扩大后的搜索区域覆盖所述当前帧图像,以所述扩大后的搜索区域为所述当前帧图像的下一帧图像中的搜索区域;和/或,若所述目标对象的筛选信息大于或等于第一预设阈值,以所述视频中所述当前帧图像的下一帧图像为当前帧图像,获取所述当前帧图像中的搜索区域。
  36. 根据权利要求35所述的装置,其特征在于,所述搜索单元,还用于在所述扩大后的搜索区域中,确定所述当前帧图像的目标对象后,检测所述目标对象的筛选信息是否大于第二预设阈值;其中所述第二预设阈值大于所述第一预设阈值;若所述目标对象的筛选信息大于第二预设阈值,获取所述当前帧图像中的搜索区域;和/或,若所述目标对象的筛选信息小于或等于第二预设阈值,以所述视频中所述当前帧图像的下一帧图像为当前帧图像,获取所述扩大后的搜索区域为所述当前帧图像中的搜索区域。
  37. 根据权利要求21至36中任意一项所述的装置,其特征在于,还包括:
    识别单元,用于识别所述当前帧图像中的目标对象的类别。
  38. 根据权利要求21至37中任意一项所述的装置,其特征在于,包括神经网络,通过所述神经网络执行对象跟踪方法,所述神经网络根据样本图像训练获得,所述样本图像包括正样本和负样本,所述正样本包括:预设训练数据集中的正样本图像和预设测试数据集中的正样本图像。
  39. 根据权利要求38所述的装置,其特征在于,所述正样本还包括:对所述预设测试数据集中的正样本图像进行数据增强处理获得的正样本图像。
  40. 根据权利要求38或39所述的装置,其特征在于,所述负样本包括:具有与所述 目标对象相同类别的对象的负样本图像,和/或,具有与所述目标对象不同类别的对象的负样本图像。
  41. 一种电子设备,其特征在于,包括权利要求21至40中任意一项所述的装置。
  42. 一种电子设备,其特征在于,包括:
    存储器,用于存储可执行指令;以及
    处理器,用于执行所述可执行指令从而完成权利要求1至20中任意一项所述的方法。
  43. 一种计算机存储介质,用于存储计算机可读取的指令,其特征在于,所述指令被执行时实现权利要求1至20中任意一项所述的方法。
  44. 一种计算机程序,包括计算机可读代码,其特征在于,当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现权利要求1至20中任意一项所述的方法的指令。
PCT/CN2019/099001 2018-08-07 2019-08-02 对象跟踪方法及装置、电子设备及存储介质 WO2020029874A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
JP2020567591A JP7093427B2 (ja) 2018-08-07 2019-08-02 オブジェクト追跡方法および装置、電子設備並びに記憶媒体
SG11202011644XA SG11202011644XA (en) 2018-08-07 2019-08-02 Object tracking methods and apparatuses, electronic devices and storage media
KR1020207037347A KR20210012012A (ko) 2018-08-07 2019-08-02 물체 추적 방법들 및 장치들, 전자 디바이스들 및 저장 매체
US17/102,579 US20210124928A1 (en) 2018-08-07 2020-11-24 Object tracking methods and apparatuses, electronic devices and storage media

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810893022.3 2018-08-07
CN201810893022.3A CN109284673B (zh) 2018-08-07 2018-08-07 对象跟踪方法及装置、电子设备及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/102,579 Continuation US20210124928A1 (en) 2018-08-07 2020-11-24 Object tracking methods and apparatuses, electronic devices and storage media

Publications (1)

Publication Number Publication Date
WO2020029874A1 true WO2020029874A1 (zh) 2020-02-13

Family

ID=65182985

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/099001 WO2020029874A1 (zh) 2018-08-07 2019-08-02 对象跟踪方法及装置、电子设备及存储介质

Country Status (6)

Country Link
US (1) US20210124928A1 (zh)
JP (1) JP7093427B2 (zh)
KR (1) KR20210012012A (zh)
CN (1) CN109284673B (zh)
SG (1) SG11202011644XA (zh)
WO (1) WO2020029874A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112085769A (zh) * 2020-09-09 2020-12-15 武汉融氢科技有限公司 对象追踪方法及装置、电子设备

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109284673B (zh) * 2018-08-07 2022-02-22 北京市商汤科技开发有限公司 对象跟踪方法及装置、电子设备及存储介质
CN109726683B (zh) 2018-12-29 2021-06-22 北京市商汤科技开发有限公司 目标对象检测方法和装置、电子设备和存储介质
CN110223325B (zh) * 2019-06-18 2021-04-27 北京字节跳动网络技术有限公司 对象跟踪方法、装置及设备
CN111797728B (zh) * 2020-06-19 2024-06-14 浙江大华技术股份有限公司 一种运动物体的检测方法、装置、计算设备及存储介质
CN112037255A (zh) * 2020-08-12 2020-12-04 深圳市道通智能航空技术有限公司 目标跟踪方法和装置
CN115393616A (zh) * 2022-07-11 2022-11-25 影石创新科技股份有限公司 目标跟踪方法、装置、设备以及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130272548A1 (en) * 2012-04-13 2013-10-17 Qualcomm Incorporated Object recognition using multi-modal matching scheme
CN103593641A (zh) * 2012-08-16 2014-02-19 株式会社理光 基于立体摄像机的物体检测方法和装置
CN105654510A (zh) * 2015-12-29 2016-06-08 江苏精湛光电仪器股份有限公司 适用于夜间场景下的基于特征融合的自适应目标跟踪方法
CN107748873A (zh) * 2017-10-31 2018-03-02 河北工业大学 一种融合背景信息的多峰目标跟踪方法
CN109284673A (zh) * 2018-08-07 2019-01-29 北京市商汤科技开发有限公司 对象跟踪方法及装置、电子设备及存储介质

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10222678A (ja) * 1997-02-05 1998-08-21 Toshiba Corp 物体検出装置および物体検出方法
JP2002342762A (ja) 2001-05-22 2002-11-29 Matsushita Electric Ind Co Ltd 物体追跡方法
JP4337727B2 (ja) 2004-12-14 2009-09-30 パナソニック電工株式会社 人体検知装置
JP4515332B2 (ja) 2005-05-30 2010-07-28 オリンパス株式会社 画像処理装置及び対象領域追跡プログラム
JP5024116B2 (ja) * 2007-05-02 2012-09-12 株式会社ニコン 被写体追跡プログラム、および被写体追跡装置
KR101607224B1 (ko) * 2008-03-03 2016-03-29 아비길론 페이턴트 홀딩 2 코포레이션 동적 물체 분류 방법 및 장치
CN102136147B (zh) * 2011-03-22 2012-08-22 深圳英飞拓科技股份有限公司 一种目标检测与跟踪方法、系统及视频监控设备
JP2013012940A (ja) 2011-06-29 2013-01-17 Olympus Imaging Corp 追尾装置及び追尾方法
CN106355188B (zh) * 2015-07-13 2020-01-21 阿里巴巴集团控股有限公司 图像检测方法及装置
CN105760854B (zh) * 2016-03-11 2019-07-26 联想(北京)有限公司 信息处理方法及电子设备
US10395385B2 (en) * 2017-06-27 2019-08-27 Qualcomm Incorporated Using object re-identification in video surveillance
CN107633220A (zh) * 2017-09-13 2018-01-26 吉林大学 一种基于卷积神经网络的车辆前方目标识别方法
CN108009494A (zh) * 2017-11-30 2018-05-08 中山大学 一种基于无人机的道路交叉口车辆跟踪方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130272548A1 (en) * 2012-04-13 2013-10-17 Qualcomm Incorporated Object recognition using multi-modal matching scheme
CN103593641A (zh) * 2012-08-16 2014-02-19 株式会社理光 基于立体摄像机的物体检测方法和装置
CN105654510A (zh) * 2015-12-29 2016-06-08 江苏精湛光电仪器股份有限公司 适用于夜间场景下的基于特征融合的自适应目标跟踪方法
CN107748873A (zh) * 2017-10-31 2018-03-02 河北工业大学 一种融合背景信息的多峰目标跟踪方法
CN109284673A (zh) * 2018-08-07 2019-01-29 北京市商汤科技开发有限公司 对象跟踪方法及装置、电子设备及存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112085769A (zh) * 2020-09-09 2020-12-15 武汉融氢科技有限公司 对象追踪方法及装置、电子设备

Also Published As

Publication number Publication date
CN109284673A (zh) 2019-01-29
US20210124928A1 (en) 2021-04-29
CN109284673B (zh) 2022-02-22
JP7093427B2 (ja) 2022-06-29
KR20210012012A (ko) 2021-02-02
SG11202011644XA (en) 2020-12-30
JP2021526269A (ja) 2021-09-30

Similar Documents

Publication Publication Date Title
WO2020029874A1 (zh) 对象跟踪方法及装置、电子设备及存储介质
KR102641115B1 (ko) 객체 검출을 위한 영상 처리 방법 및 장치
US10198823B1 (en) Segmentation of object image data from background image data
US11182592B2 (en) Target object recognition method and apparatus, storage medium, and electronic device
US8224042B2 (en) Automatic face recognition
WO2018166438A1 (zh) 图像处理方法、装置及电子设备
Li et al. Finding the secret of image saliency in the frequency domain
US7925081B2 (en) Systems and methods for human body pose estimation
US11430124B2 (en) Visual object instance segmentation using foreground-specialized model imitation
JP5591360B2 (ja) 分類及び対象物検出の方法及び装置、撮像装置及び画像処理装置
US8948522B2 (en) Adaptive threshold for object detection
CN108229673B (zh) 卷积神经网络的处理方法、装置和电子设备
US8396303B2 (en) Method, apparatus and computer program product for providing pattern detection with unknown noise levels
US7643674B2 (en) Classification methods, classifier determination methods, classifiers, classifier determination devices, and articles of manufacture
EP4024270A1 (en) Gesture recognition method, electronic device, computer-readable storage medium, and chip
US11138464B2 (en) Image processing device, image processing method, and image processing program
KR20180074556A (ko) 얼굴검출 방법 및 그 장치
Hao et al. Low-light image enhancement based on retinex and saliency theories
CN110909685A (zh) 姿势估计方法、装置、设备及存储介质
CN111931544B (zh) 活体检测的方法、装置、计算设备及计算机存储介质
KR20220127188A (ko) 맞춤형 객체 검출 모델을 가진 객체 검출 장치
CN113487562A (zh) 一种基于手持设备的皮肤光泽度检测系统
Robinson et al. Foreground segmentation in atmospheric turbulence degraded video sequences to aid in background stabilization
Elahi et al. Webcam-based accurate eye-central localization
CN112070022A (zh) 人脸图像识别方法、装置、电子设备和计算机可读介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19848050

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020567591

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20207037347

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19848050

Country of ref document: EP

Kind code of ref document: A1