WO2021227351A1 - 目标部位跟踪方法、装置、电子设备和可读存储介质 - Google Patents

目标部位跟踪方法、装置、电子设备和可读存储介质 Download PDF

Info

Publication number
WO2021227351A1
WO2021227351A1 PCT/CN2020/120965 CN2020120965W WO2021227351A1 WO 2021227351 A1 WO2021227351 A1 WO 2021227351A1 CN 2020120965 W CN2020120965 W CN 2020120965W WO 2021227351 A1 WO2021227351 A1 WO 2021227351A1
Authority
WO
WIPO (PCT)
Prior art keywords
detection area
target part
current detection
frame
probability
Prior art date
Application number
PCT/CN2020/120965
Other languages
English (en)
French (fr)
Inventor
岳海潇
冯浩城
王珂尧
Original Assignee
北京百度网讯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京百度网讯科技有限公司 filed Critical 北京百度网讯科技有限公司
Priority to JP2022554423A priority Critical patent/JP2023516480A/ja
Priority to EP20935079.2A priority patent/EP4152258A4/en
Priority to US17/925,527 priority patent/US20230196587A1/en
Priority to KR1020227043801A priority patent/KR20230003346A/ko
Publication of WO2021227351A1 publication Critical patent/WO2021227351A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/248Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/277Analysis of motion involving stochastic approaches, e.g. using Kalman filters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/269Analysis of motion using gradient-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20076Probabilistic image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30232Surveillance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30241Trajectory

Definitions

  • the embodiments of the present disclosure mainly relate to the field of artificial intelligence, specifically computer vision, and more specifically, to a target location tracking method, device, electronic device, and computer-readable storage medium.
  • the face recognition system implements face recognition and comparison tasks through face detection, face tracking, face alignment, face living, face recognition and other technologies. It is used in video surveillance, building access control, face gates, and financial verification. It is widely used in other fields. Face tracking technology refers to a technology that determines the facial motion trajectory and size change of an object in a sequence of videos or frames. As a method to accurately and quickly obtain the coordinates of the face position, this technology is one of the important components of the face recognition system.
  • the traditional face tracking technology can only obtain the face frame coordinates of the current frame, and output the face frame coordinates after the face is successfully tracked, and provide them to the subsequent face alignment model to determine the key points. If the face is blocked by obstacles or exceeds the image collection range, the existing face tracking technology cannot accurately determine whether there is a problem of face tracking failure, resulting in the failure of the face recognition function.
  • a target part tracking solution is provided.
  • a method for tracking a target part may include determining a current detection area for detecting the target part in the current frame of the video based on the previous detection area of the target part of the object in the previous frame of the video. The method further includes determining the probability that the target part is located within the current detection area. In addition, the method may further include, in response to the probability being greater than or equal to a predetermined threshold, based on at least the current detection area and the previous detection area, determining a subsequent frame for detecting the target part in a subsequent frame of the video. Detection area.
  • a target part tracking device including: a current detection area determining module configured to determine the target part of the video based on the previous detection area of the target part in the previous frame of the video A current detection area for detecting the target part in the current frame; a probability determination module configured to determine the probability that the target part is located within the current detection area; and a subsequent detection area determination module configured to respond to The probability is greater than or equal to a predetermined threshold, and a subsequent detection area for detecting the target part in a subsequent frame of the video is determined based on at least the current detection area and the previous detection area.
  • an electronic device including one or more processors; and a storage device, for storing one or more programs, when one or more programs are used by one or more processors Execution enables one or more processors to implement the method according to the first aspect of the present disclosure.
  • a computer-readable storage medium having a computer program stored thereon, and when the program is executed by a processor, the method according to the first aspect of the present disclosure is implemented.
  • a target part tracking system including: a video capture module configured to provide a video associated with the target part of an object; a calculation module communicatively connected with the video capture module, the The calculation module is configured to implement the method according to the first aspect of the present disclosure; and the output display module is configured to display the processing result of the calculation module.
  • FIG. 1 shows a schematic diagram of an example environment in which multiple embodiments of the present disclosure can be implemented
  • FIG. 2 shows a schematic diagram of a detailed example environment in which multiple embodiments of the present disclosure can be implemented
  • FIG. 3 shows a flowchart of a process of tracking a target part according to an embodiment of the present disclosure
  • FIG. 4 shows a block diagram of a system related to target part tracking according to an embodiment of the present disclosure
  • FIG. 5 shows a block diagram of a device for tracking a target part according to an embodiment of the present disclosure.
  • Figure 6 shows a block diagram of a computing device capable of implementing various embodiments of the present disclosure.
  • the face tracking technology generally has the following three optimization schemes:
  • Model-based face tracking scheme This scheme is mainly based on skin color model, texture model, etc., by obtaining prior knowledge of the face, establishing a parameter model, and establishing a sliding window for each frame of image for model matching, so as to realize face tracking.
  • this scheme has low tracking accuracy for different scales of faces and partial occlusion of faces, and it is impossible to judge whether the tracking fails during the tracking process.
  • Face tracking scheme based on neural network.
  • This scheme uses neural network to learn facial features implicitly, and performs feature matching according to methods such as sliding windows on the image, so as to realize face tracking.
  • This scheme is better than scheme (1) for the feature expression of human face, but the amount of calculation is huge, and it is difficult to guarantee real-time performance on the embedded side.
  • a target part tracking solution is proposed.
  • the motion prediction function for the target part can be added on the basis of the target part detection. After predicting the detection area where the target part is located in the current frame based on the previous frame, while determining the key points of the target part based on the detection area, it is determined whether the target part is located in the detection area. If it is determined that the target part is still located in the detection area, it indicates that the motion prediction function is normal, and the detection area of the target part in subsequent frames can be continuously predicted, so that there is no need to use a complicated target part detection model that requires a large amount of computing power.
  • the target part detection model can be directly called to correct the prediction result. In this way, even if the target part of the monitored object is blocked or the monitored object exhibits irregular motion, the detection area of the subsequent frame can be determined with low cost and high accuracy.
  • Figure 1 shows a schematic diagram of an example environment 100 in which multiple embodiments of the present disclosure can be implemented.
  • the example environment 100 includes a frame 110 in a surveillance video, a computing device 120, and a determined detection area 130.
  • the frame 110 may be one or more frames in the real-time surveillance video acquired by the image acquisition device connected to the computing device 120.
  • the image acquisition device may be set up in a public place with a large flow of people (for example, video surveillance, face gates, etc.), so as to acquire the image information of each person in the crowd passing by the place.
  • the image acquisition device may be installed in a private place with a small number of people (for example, building access control, financial verification, etc.).
  • the object for acquiring image information may not be limited to people, but may also include animals that need to be identified in batches (for example, animals in a zoo or breeding ground) and still life (for example, goods on a conveyor belt).
  • the computing device 120 may receive the frame 110 to determine the detection area 130 of the target part of the monitored object, such as the face.
  • the detection area described herein is an area used to detect a target part, for example, it can be calibrated by a detection frame or other appropriate tools, or it can only determine a part of the image without actually calibrating it.
  • the detection area may have various implementation forms, for example, it may have a shape such as a box, a circle, an ellipse, an irregular shape, etc., or it may be depicted by a solid line, a dashed line, a dotted line, and the like.
  • the computing device 120 can determine multiple key points of the target part in the detection area 130 through an artificial intelligence network such as a convolutional neural network CNN loaded therein and determine whether the target part is still Located in the detection area 130. In this way, it is monitored whether the predictive function of the computing device 120 is normal. In addition, when it is determined that the target part is not located in the detection area 130, the computing device 120 also needs to determine the detection area of the target part in the subsequent frame through another artificial intelligence network such as a convolutional neural network CNN loaded therein.
  • an artificial intelligence network such as a convolutional neural network CNN loaded therein
  • FIG. 2 shows a schematic diagram of a detailed example environment 200 in which multiple embodiments of the present disclosure can be implemented.
  • the example environment 200 may include a computing device 220, an input frame 210, and an output result 230.
  • the example environment 200 may include a model training system 260 and a model application system 270 as a whole.
  • the model training system 260 and/or the model application system 270 may be implemented in the computing device 120 as shown in FIG. 1 or the computing device 220 as shown in FIG. 2.
  • the structure and functions of the example environment 200 are described for exemplary purposes only and are not intended to limit the scope of the subject matter described herein.
  • the subject matter described herein can be implemented in different structures and/or functions.
  • the process of determining the key points of the target part of the monitored object, such as the face, and whether the target part is within the detection area, and the process of determining the detection area of the target part can be divided into two stages: model training phase and Model application stage.
  • the model training system 260 may use the training data set 250 to train the CNN 240 that determines the key points and probabilities.
  • the model application system 270 may receive the trained CNN 240, so that the CNN 240 determines the key points and probabilities as the output result 230 based on the input frame 210.
  • the training data set 250 may be a large number of labeled reference frames.
  • the model training system 260 may use the training data set 250 to train the CNN 240 that determines the detection area.
  • the model application system 270 may receive the trained CNN 240, so that the CNN 240 determines the detection area of the target part based on the input frame 210.
  • CNN 240 may be constructed as a learning network.
  • a learning network can also be called a learning model, or simply called a network or model.
  • the learning network may include multiple networks, for example, to determine the key points of the target part of the monitored object, such as the face, and the probability of whether the target part is located in the detection area, and to determine the detection of the target part. area.
  • Each of these networks can be a multilayer neural network, which can be composed of a large number of neurons. Through the training process, the corresponding parameters of the neurons in each network can be determined. The parameters of the neurons in these networks are collectively referred to as the parameters of CNN 240.
  • the training process of CNN 240 can be performed in an iterative manner.
  • the model training system 260 may obtain reference images from the training data set 250, and use the reference images to perform one iteration of the training process to update the corresponding parameters of the CNN 240.
  • the model training system 260 may repeatedly perform the above process based on multiple reference images in the training data set 250 until at least some of the parameters of the CNN 240 converge, thereby obtaining the final model parameters.
  • FIG. 3 shows a flowchart of a process 300 of tracking a target part according to an embodiment of the present disclosure.
  • the method 300 may be implemented in the computing device 120 of FIG. 1, the computing device 220 of FIG. 2, and the device shown in FIG. 6.
  • a process 300 for tracking a target part according to an embodiment of the present disclosure will now be described with reference to FIG. 1.
  • the specific examples mentioned in the following description are all exemplary and are not used to limit the protection scope of the present disclosure.
  • the computing device 120 may determine the current detection area for detecting the target part in the current frame of the video based on the previous detection area of the target part of the object in the previous frame of the video. In some embodiments, the computing device 120 may apply the previous detection area to the position prediction model to determine the current detection area.
  • the position prediction model can be at least Kalman filter, Wiener filter, strong tracking filter, primary moving average prediction model, secondary moving average prediction model, single exponential smoothing model, double exponential smoothing model, Holt One of exponential smoothing models, etc.
  • the Kalman filter in the computing device 120 or connected to the computing device can be based on the frame and the priori in the Kalman filter. Information to predict the detection area of the next frame.
  • the calculation formula of the algorithm in the Kalman filter is:
  • X k and X k-1 are the state vectors of the kth frame and the k-1th frame respectively
  • Y k is the observation vector of the kth frame
  • Ak,k-1 are the state transition matrix
  • H is the observation matrix
  • V k-1 and W k are distributed as the system state noise and observation noise of the k-1 frame and the k frame
  • Q and R are the corresponding variance matrices, respectively.
  • the status update formula is:
  • X k K k, k-1 + K k [Y k- H k X k, k-1 ]
  • X k, k-1 is the one-step state estimation value
  • X k is the correction value of the a priori estimate X k, k-1
  • X k is the Kalman filter gain matrix
  • P k, k-1 is X k
  • P k is the covariance matrix of X k
  • I is the identity matrix.
  • the predicted detection area can be used to determine multiple key point information of the target part in the frame 110, for example, the coordinates of each key point.
  • the motion prediction based on the Kalman filter can be implemented flexibly. For example, it is also possible to predict the detection area of the next frame based on the key point information of the target part in the previous frame and the prior information in the Kalman filter.
  • the target part is the face, eyes, or fingerprints of the subject.
  • the object is not limited to people.
  • the object described herein may be a person, an animal or an object in motion (for example, goods on a conveyor belt).
  • the solution of the present disclosure can be applied to the recognition of a multi-object scene. Specifically, the present disclosure can identify each animal or each animal in the area where animals in a zoo or ranch must pass, and can also identify each or each commodity or industry in the delivery channel of goods in shopping malls or factories. Products, so as to realize automated logistics information management.
  • the computing device 120 may determine the probability that the target part is located within the current detection area.
  • the computing device 120 may apply the current detection area to a probability determination model (such as a model included in the CNN 240 described above) to determine the probability that the target part is located in the current detection area.
  • the probability determination model may be trained based on the reference detection area in the reference frame and the pre-labeled reference probability.
  • the probability determination model more simply determines the probability that the target part is located in the current detection area by determining the probability of the specific target part (such as a human face) in the current detection area.
  • the probability can be output in the form of a score, with a score ranging from 0 to 1. The higher the score, the higher the probability that there is a face in the face frame.
  • the predetermined threshold for judging whether there is a human face may be 0.5 or other numerical values.
  • the artificial intelligence network in the computing device 120 may also determine multiple key points of the target part based on the current detection area.
  • the computing device 120 may apply the current detection area to a key point determination model (such as a model included in the CNN 240 described above) to determine the key points of the target part.
  • the key point determination model is trained based on the reference detection area in the reference frame and the pre-labeled reference key points.
  • the key point determination model and the aforementioned probability determination model may be combined into one model to simultaneously determine multiple key points of the target part and the probability that the target part is located in the current detection area based on the current detection area. In this way, it is possible to know whether the predicted detection area is correct without significantly increasing the computing power.
  • the computing device 120 can determine whether the probability is greater than or equal to a predetermined threshold.
  • the computing device 120 may determine a subsequent detection area for detecting the target part in a subsequent frame of the video based on at least the current detection area and the previous detection area.
  • the position prediction model in the computing device 120 may determine the subsequent detection area based on the current detection area and a priori information.
  • the position prediction model can be at least Kalman filter, Wiener filter, strong tracking filter, primary moving average prediction model, secondary moving average prediction model, single exponential smoothing model, double exponential smoothing model, Huo One of the Erte exponential smoothing models. In this way, when there is no abnormal movement or occlusion of the monitored object, the computing device 120 can determine the detection area of the target part by using a position prediction model that requires less computing power, thereby significantly saving computing resources.
  • the computing device 120 may detect the target part in the subsequent frame, and determine the subsequent detection area for detecting the target part in the subsequent frame based on the detection result.
  • the computing device 120 may apply subsequent frames to an area determination model (such as a model included in the CNN 240 described above) to determine the subsequent detection area of the target part.
  • the region determination model is trained based on reference frames and pre-labeled reference detection regions. In this way, errors in motion prediction can be found in time, and a more accurate region determination model can be used to correct the errors and ensure the accuracy of region tracking.
  • the region determination model may perform face region detection on the frame 110.
  • a six-layer convolutional network can be used to extract basic facial features of frame 110, and each layer of convolutional network can implement image downsampling. Based on the final three-layer convolutional neural network, a fixed number of faces of different sizes can be preset. The anchor point area performs face detection area regression, and finally the face detection area.
  • the foregoing examples are only exemplary, and other layers of convolutional networks may also be used, and it is not limited to determining the detection area of a human face. In this way, the detection area of the target part in the frame 110 can be quickly identified based on the area determination model.
  • the present disclosure can transfer most of the work of determining the detection area of the target part to a motion prediction model that requires less computing power, thereby saving computing power resources.
  • the present disclosure also integrates the aforementioned probability determination model on the basis of the key point determination model, so that the result of motion prediction can be checked frame by frame, and the region determination model can be used to obtain the correct detection region when a prediction error may occur.
  • the present disclosure not only saves computing power, but also improves the accuracy of the detection area prediction.
  • the key point determination model and the probability determination model are fused into one model, the processing time of the computing device 120 for the input frame 110 will not be increased. Therefore, the present disclosure improves the performance of determining the detection area of the computing device 120 almost without defects, thereby optimizing the user experience.
  • the present disclosure also provides a system 400 for tracking a target part.
  • the system includes an image acquisition module 410, which may be an image sensing device such as an RGB camera.
  • the system 400 may further include a calculation module 420 communicatively connected with the image acquisition module 410, and the calculation module 420 is used for the various methods and processes described above, such as the process 300.
  • the system 400 may include an output display module 430 for displaying the processing result of the calculation module 420 to the user.
  • the output display module 430 may display the face tracking result of the monitored object to the user.
  • the system 400 can be applied to a face tracking scene of multiple pedestrians.
  • the system 400 may be applied to a building access control scenario or a financial verification scenario.
  • the system 400 can predict the position of the face of the object in the next frame of monitoring image based on the first frame of monitoring image containing the face of the object and prior information, and when determining key points At the same time, it is determined whether the position still contains the face of the object. In this way, not only can the computing power of repeated face detection be saved by predicting the position of the face, but also the accuracy of the prediction can be determined through subsequent face review. When the prediction is found to be inaccurate, the face detection can be restarted to ensure that the face tracking results are available at any time.
  • the system 400 can also be applied in the field of video surveillance, especially in the case of monitoring the body temperature of multiple monitored objects at the entrance of a subway or a stadium.
  • the system 400 may respectively predict the position of the faces of these objects in the corresponding next frame of monitoring image based on the corresponding first frame of monitoring image containing the faces of these objects and the prior information.
  • determine whether the position still contains the face of the corresponding object can greatly save the computing power of repeated face detection, while ensuring that the face tracking results are correct and available at any time.
  • FIG. 5 shows a block diagram of an apparatus 500 for tracking a target part according to an embodiment of the present disclosure.
  • the apparatus 500 may include: a current detection area determining module 502, configured to determine the current detection area of the target part in the current frame of the video based on the previous detection area of the target part of the object in the previous frame of the video. A current detection area; a probability determination module configured to determine the probability that the target part is located within the current detection area; and a subsequent detection area determination module configured to respond to the probability being greater than or equal to a predetermined threshold, at least based on the current detection area and the previous detection area , Determining the subsequent detection area for detecting the target part in the subsequent frame of the video.
  • the apparatus 500 may further include: a target part detection module configured to detect the target part in a subsequent frame in response to the probability being less than a predetermined threshold; and an area determination module configured to be based on the detection result , Determine the subsequent detection area used to detect the target part in the subsequent frame.
  • the target part detection module may include: a subsequent frame application module configured to apply the subsequent frame to the area determination model to determine the subsequent detection area of the target part, the area determination model is based on the reference frame and pre-labeled The reference detection area is obtained by training.
  • the probability determination module 504 may include: a current detection area application module configured to apply the current detection area to the probability determination model to determine the probability that the target part is located in the current detection area, the probability determination model is based on The reference detection area in the reference frame and the pre-labeled reference probability are obtained through training.
  • the current detection area determination module 502 may include: a previous detection area application module configured to apply the previous detection area to a position prediction model to determine the current detection area.
  • the position prediction model may be at least the following One item: Kalman filter; Wiener filter; and strong tracking filter.
  • the target part may be at least one of the subject's face, eyes, and fingerprints.
  • the device 500 may further include a key point determination module configured to determine the key point of the target part based on the current detection area.
  • the key point determination module may include: a current detection area application module configured to apply the current detection area to the key point determination model to determine the key point of the target part, the key point determination model is based on The reference detection area in the reference frame and the pre-labeled reference key points are obtained through training.
  • FIG. 6 shows a block diagram of a computing device 600 capable of implementing various embodiments of the present disclosure.
  • the device 600 may be used to implement the computing device 120 in FIG. 1 or the computing device 220 in FIG. 2.
  • the device 600 includes a central processing unit (CPU) 601, which can be based on computer program instructions stored in a read-only memory (ROM) 602 or loaded from a storage unit 608 to a computer in a random access memory (RAM) 603. Program instructions to perform various appropriate actions and processing.
  • ROM read-only memory
  • RAM random access memory
  • RAM 603 various programs and data required for the operation of the device 600 can also be stored.
  • the CPU 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604.
  • An input/output (I/O) interface 605 is also connected to the bus 604.
  • the I/O interface 605 includes: an input unit 606, such as a keyboard, a mouse, etc.; an output unit 607, such as various types of displays, speakers, etc.; and a storage unit 608, such as a magnetic disk, an optical disk, etc. ; And the communication unit 609, such as a network card, a modem, a wireless communication transceiver, etc.
  • the communication unit 609 allows the device 600 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks.
  • the processing unit 601 executes the various methods and processes described above, such as the process 300.
  • the process 300 may be implemented as a computer software program, which is tangibly contained in a machine-readable medium, such as the storage unit 608.
  • part or all of the computer program may be loaded and/or installed on the device 600 via the ROM 602 and/or the communication unit 609.
  • the CPU 601 may be configured to execute the process 300 in any other suitable manner (for example, by means of firmware).
  • exemplary types of hardware logic components include: Field Programmable Gate Array (FPGA), Application Specific Integrated Circuit (ASIC), Application Specific Standard Product (ASSP), System on Chip System (SOC), Load programmable logic device (CPLD) and so on.
  • FPGA Field Programmable Gate Array
  • ASIC Application Specific Integrated Circuit
  • ASSP Application Specific Standard Product
  • SOC System on Chip System
  • CPLD Load programmable logic device
  • the program code for implementing the method of the present disclosure can be written in any combination of one or more programming languages. These program codes can be provided to the processors or controllers of general-purpose computers, special-purpose computers, or other programmable data processing devices, so that when the program codes are executed by the processor or controller, the functions specified in the flowcharts and/or block diagrams/ The operation is implemented.
  • the program code can be executed entirely on the machine, partly executed on the machine, partly executed on the machine and partly executed on the remote machine as an independent software package, or entirely executed on the remote machine or server.
  • a machine-readable medium may be a tangible medium, which may contain or store a program for use by the instruction execution system, apparatus, or device or in combination with the instruction execution system, apparatus, or device.
  • the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • the machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any suitable combination of the foregoing.
  • machine-readable storage media would include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read-only memory
  • EPROM or flash memory erasable programmable read-only memory
  • CD-ROM compact disk read only memory
  • magnetic storage device or any suitable combination of the foregoing.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

一种目标部位跟踪方法、装置、电子设备和计算机可读存储介质,涉及人工智能领域,具体为计算机视觉。该方法可以包括基于视频的先前帧中的对象的目标部位的先前检测区域,确定所述视频的当前帧中的用于检测所述目标部位的当前检测区域(302)。该方法进一步包括确定所述目标部位位于所述当前检测区域内的概率(304)。此外,该方法可以进一步包括响应于所述概率大于或等于预定阈值,至少基于所述当前检测区域和所述先前检测区域,确定所述视频的后续帧中的用于检测所述目标部位的后续检测区域(306)。该方法可以快速高效且低成本地获取追踪目标部位的位置信息,从而降低目标部位跟踪的算力和时间成本。

Description

目标部位跟踪方法、装置、电子设备和可读存储介质
本申请要求于2020年05月15日提交的中国专利申请第202010415394.2号的优先权权益。
技术领域
本公开的实施例主要涉及人工智能领域,具体为计算机视觉,并且更具体地,涉及目标部位跟踪方法、装置、电子设备和计算机可读存储介质。
背景技术
人脸识别系统通过人脸检测、人脸跟踪、人脸对齐、人脸活体、人脸识别等技术实现人脸的识别与比对任务,在视频监控、楼宇门禁、人脸闸机、金融核验等领域有着广泛应用。人脸跟踪技术是指在视频或帧的序列中确定某个对象的面部运动轨迹以及大小变化的技术。该技术作为准确、快速获取人脸位置坐标的方法,是人脸识别系统的重要组成之一。传统的人脸跟踪技术仅可以获取当前帧的人脸框坐标,并在跟踪人脸成功后输出人脸框坐标,提供给后续的人脸对齐模型来确定关键点。如果人脸被障碍物遮挡或超出图像采集范围,现有的人脸跟踪技术并不能准确判断是否存在人脸跟踪失败的问题,导致人脸识别功能失效。
发明内容
根据本公开的示例实施例,提供了一种目标部位跟踪方案。
在本公开的第一方面中,提供了一种目标部位跟踪方法。该方法可以包括基于视频的先前帧中的对象的目标部位的先前检测区域,确定所述视频的当前帧中的用于检测所述目标部位的当前检测区域。该方法进一步包括确定所述目标部位位于所述当前检测区域 内的概率。此外,该方法可以进一步包括响应于所述概率大于或等于预定阈值,至少基于所述当前检测区域和所述先前检测区域,确定所述视频的后续帧中的用于检测所述目标部位的后续检测区域。
在本公开的第二方面中,提供了一种目标部位跟踪装置,包括:当前检测区域确定模块,被配置为基于视频的先前帧中的对象的目标部位的先前检测区域,确定所述视频的当前帧中的用于检测所述目标部位的当前检测区域;概率确定模块,被配置为确定所述目标部位位于所述当前检测区域内的概率;以及后续检测区域确定模块,被配置为响应于所述概率大于或等于预定阈值,至少基于所述当前检测区域和所述先前检测区域,确定所述视频的后续帧中的用于检测所述目标部位的后续检测区域。
在本公开的第三方面中,提供了一种电子设备,包括一个或多个处理器;以及存储装置,用于存储一个或多个程序,当一个或多个程序被一个或多个处理器执行,使得一个或多个处理器实现根据本公开的第一方面的方法。
在本公开的第四方面中,提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现根据本公开的第一方面的方法。
在本公开的第五方面中,提供了一种目标部位跟踪系统,包括:视频采集模块,被配置为提供与对象的目标部位相关联的视频;计算模块,与视频采集模块通信连接,所述计算模块被配置为实现根据本公开的第一方面的方法;以及输出展示模块,被配置为展示计算模块的处理结果。
应当理解,发明内容部分中所描述的内容并非旨在限定本公开的实施例的关键或重要特征,亦非用于限制本公开的范围。本公开的其它特征将通过以下的描述变得容易理解。
附图说明
结合附图并参考以下详细说明,本公开各实施例的上述和其他 特征、优点及方面将变得更加明显。在附图中,相同或相似的附图标注表示相同或相似的元素,其中:
图1示出了本公开的多个实施例能够在其中实现的示例环境的示意图;
图2示出了本公开的多个实施例能够在其中实现的详细示例环境的示意图;
图3示出了根据本公开的实施例的目标部位跟踪的过程的流程图;
图4示出了根据本公开的实施例的涉及目标部位跟踪的系统的框图;
图5示出了根据本公开的实施例的目标部位跟踪的装置的框图;以及
图6示出了能够实施本公开的多个实施例的计算设备的框图。
具体实施方式
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。
在本公开的实施例的描述中,术语“包括”及其类似用语应当理解为开放性包含,即“包括但不限于”。术语“基于”应当理解为“至少部分地基于”。术语“一个实施例”或“该实施例”应当理解为“至少一个实施例”。术语“第一”、“第二”等等可以指代不同的或相同的对象。下文还可能包括其他明确的和隐含的定义。
针对上文描述的干扰情况,人脸跟踪技术一般存在如下三种优化方案:
(1)基于模型的人脸跟踪方案。这种方案主要依据肤色模型、 纹理模型等,通过获取人脸的先验知识,建立参数模型,建立针对每帧图像的滑动窗口进行模型匹配,以实现人脸跟踪。然而,此方案对于不同尺度人脸和人脸部分遮挡的情况跟踪准确率较低,且在跟踪过程中无法判断是否跟踪失败。
(2)基于运动信息的人脸跟踪方案。这种方案依据光流分析等方法进行人脸的运动估计。然而,此方案对于连续帧中尺度变化人脸跟踪准确率低,并且对于快速运动人脸的跟踪效果较差。
(3)基于神经网络的人脸跟踪方案。这种方案利用神经网络隐性学习人脸特征,根据对图像进行滑窗等方式进行特征匹配,从而实现人脸的跟踪。此方案对于人脸的特征表达要优于方案(1),但计算量巨大,在嵌入式端难以保证实时性。
如上文提及的,亟需一种目标部位跟踪方法,来快速高效且低成本地追踪目标部位的位置信息,从而降低目标部位跟踪的算力和时间成本。
根据本公开的实施例,提出了一种目标部位跟踪方案。在该方案中,可以在目标部位检测的基础上加入针对目标部位的运动预测功能。在基于先前帧预测到当前帧中目标部位所处的检测区域之后、在基于该检测区域确定目标部位的关键点的同时确定目标部位是否位于该检测区域内。如果判断目标部位仍位于该检测区域内,则表明运动预测功能正常,可以继续预测后续帧中目标部位的检测区域,从而无需使用复杂且对算力需求较大的目标部位检测模型。如果判断目标部位并不位于该检测区域内,则表明运动预测的结果与实际情况不符,此时可以直接调用目标部位检测模型来对预测结果进行纠正。以此方式,即便存在被监控对象的目标部位被遮挡的情况或者被监控对象出现不规则运动的情况,均可以低成本且高精度地确定后续帧的检测区域。
以下将参照附图来具体描述本公开的实施例。图1示出了本公开的多个实施例能够在其中实现的示例环境100的示意图。如图1所示,示例环境100中包含监控视频中的帧110、计算设备120和经 确定的检测区域130。
帧110可以是由与计算设备120相连接的图像获取设备获取的实时监控视频中的一帧或多帧。作为示例,图像获取设备可以设置在人流量较大的公共场所(例如,视频监控、人脸闸机等),以便获取经过该场所的人群中的每一个人的图像信息。作为另一示例,图像获取设备可以设置在人流量较少的私密场所(例如,楼宇门禁、金融核验等)。应理解,获取图像信息的对象可以不限于人,而是还可以包含需要批量识别的动物(例如,动物园或饲养场所内的动物)和静物(例如,传送带上的货物)。计算设备120可以接收帧110以确定被监控对象的诸如脸部的目标部位的检测区域130。
应理解,本文所述的检测区域是用于检测目标部位的区域,例如可以通过检测框或者其他适当的工具来标定,也可以仅仅是确定图像上的一部分区域、而不实际标定。作为示例,检测区域可以具有多种实现形式,例如可以具有方框、圆形、椭圆形、不规则形状等形状,也可以以实线、虚线、点划线等来描绘。
当确定了帧110的检测区域130之后,计算设备120可以通过其中加载的诸如卷积神经网络CNN等的人工智能网络来确定目标部位在检测区域130中的多个关键点并且判定目标部位是否仍位于检测区域130内。以此来监控计算设备120的预测功能是否正常。此外,当判定目标部位没有位于检测区域130内时,计算设备120还需要通过其中加载的诸如卷积神经网络CNN等的另一人工智能网络来确定后续帧中的目标部位的检测区域。
下文将以CNN为例参考图2对计算设备120中的人工智能网络的构建和使用进行描述。
图2示出了本公开的多个实施例能够在其中实现的详细示例环境200的示意图。与图1类似地,示例环境200可以包含计算设备220、输入帧210和输出结果230。区别在于,示例环境200总体上可以包括模型训练系统260和模型应用系统270。作为示例,模型训练系统260和/或模型应用系统270可以在如图1所示的计算设备120 或如图2所示的计算设备220中实现。应当理解,仅出于示例性的目的描述示例环境200的结构和功能并不旨在限制本文所描述主题的范围。本文所描述主题可以在不同的结构和/或功能中实施。
如前所述,确定被监控对象的诸如脸部的目标部位的关键点以及目标部位是否位于检测区域内的过程以及确定目标部位的检测区域的过程均可以分为两个阶段:模型训练阶段和模型应用阶段。作为示例,对于确定目标部位的关键点以及目标部位位于检测区域内的概率的过程,在模型训练阶段中,模型训练系统260可以利用训练数据集250来训练确定关键点和概率的CNN 240。在模型应用阶段中,模型应用系统270可以接收经训练的CNN 240,从而由CNN 240基于输入帧210确定关键点和概率作为输出结果230。应理解,训练数据集250可以是海量的被标注的参考帧。
作为另一示例,对于确定目标部位的检测区域的过程,在模型训练阶段中,模型训练系统260可以利用训练数据集250来训练确定检测区域的CNN 240。在模型应用阶段中,模型应用系统270可以接收经训练的CNN 240,从而由CNN 240基于输入帧210确定目标部位的检测区域。
在其他实施例中,CNN 240可以被构建为学习网络。这样的学习网络也可以被称为学习模型,或者被简称为网络或模型。在一些实施例中,该学习网络可以包括多个网络,例如分别用于确定被监控对象的诸如脸部的目标部位的关键点以及目标部位是否位于检测区域内的概率,以及确定目标部位的检测区域。其中每个网络可以是一个多层神经网络,其可以由大量的神经元组成。通过训练过程,每个网络中的神经元的相应参数能够被确定。这些网络中的神经元的参数被统称为CNN 240的参数。
CNN 240的训练过程可以以迭代方式来被执行。具体地,模型训练系统260可以从训练数据集250中获取参考图像,并且利用参考图像来进行训练过程的一次迭代,以更新CNN 240的相应参数。模型训练系统260可以基于训练数据集250中的多个参考图像重复 执行上述过程,直至CNN 240的参数中的至少部分参数收敛,由此获得最终的模型参数。
上文描述的技术方案仅用于示例,而非限制本发明。应理解,还可以按照其他方式和连接关系来布置各个网络。为了更清楚地解释上述方案的原理,下文将参考图3来更详细描述目标部位跟踪过程。
图3示出了根据本公开的实施例的目标部位跟踪的过程300的流程图。在某些实施例中,方法300可以在图1的计算设备120、图2的计算设备220以及图6示出的设备中实现。现参照图1描述根据本公开实施例的用于跟踪目标部位的过程300。为了便于理解,在下文描述中提及的具体实例均是示例性的,并不用于限定本公开的保护范围。
在302,计算设备120可以基于视频的先前帧中的对象的目标部位的先前检测区域,确定视频的当前帧中的用于检测目标部位的当前检测区域。在某些实施例中,计算设备120可以将先前检测区域应用于位置预测模型,以确定当前检测区域。作为示例,位置预测模型至少可以是卡尔曼滤波器、维纳滤波器、强跟踪滤波器、一次移动平均预测模型、二次移动平均预测模型、单指数平滑模型、双指数平滑模型、霍尔特指数平滑模型等中的一个。
以卡尔曼滤波器为例,当接收到监控视频中的帧110之前的一帧之后,计算设备120中或与计算设备连接的卡尔曼滤波器可以基于该帧以及卡尔曼滤波器中的先验信息来预测下一帧的检测区域。卡尔曼滤波器中的算法的计算公式为:
状态方程:X k=A k,k-1·X k-1+V k-1
观测方程:Y k=H·X k+W k
上式X k与X k-1分别为第k帧与第k-1帧的状态向量,Y k为第k帧的观测向量;A k,k-1为状态转移矩阵;H为观测矩阵,V k-1和W k分布为第k-1帧和第k帧的系统状态噪声与观测噪声,Q和R分别为相应方差矩阵。
设状态向量为X k=[S xk,S yk,V xk,V yk],其中S xk,S yk,V xk,V yk分别为当前帧人脸框中心点x轴坐标、y轴坐标、x轴方向速度、y轴方向速度;观测向量为Y k=[O xk,O yk],其中O xk,O yk分别为当前帧观测人脸框中心点中心点x轴坐标、y轴坐标,状态更新公式为:
X k,k-1=A k,k-1X k-1
X k=K k,k-1+K k[Y k-H kX k,k-1]
Figure PCTCN2020120965-appb-000001
Figure PCTCN2020120965-appb-000002
P k=[I-K kH k]P k,k-1
式中,X k,k-1为一步状态估计值,X k为先验估计X k,k-1的修正值,X k为卡尔曼滤波增益矩阵,P k,k-1为X k,k-1的协方差矩阵,P k为X k的协方差矩阵,I为单位阵。
由此,当计算设备120接收到帧110时,即可使用预测的检测区域来确定帧110中的目标部位的多个关键点信息,例如,每个关键点的坐标。应理解,基于卡尔曼滤波器的运动预测可以灵活实现。例如,还可以基于先前帧中的目标部位的关键点信息以及卡尔曼滤波器中的先验信息来预测下一帧的检测区域。
在某些实施例中,目标部位是对象的面部、眼睛、或指纹等。并且对象也不限于是人。还应理解,本文描述的对象可以是人,也可以是动物或处于运动状态的物体(例如,传送带上的货物)。本公开的方案可以应用于多对象场景的识别。具体来说,本公开可以在动物园或牧场中的动物必经的区域识别每一只或每一种动物,还可以在商场或工厂中的货物的运送通道识别每一个或每一种商品或工业品,从而实现自动化的物流信息管理。
在304,计算设备120可以确定目标部位位于当前检测区域内的概率。作为示例,计算设备120可以将当前检测区域应用于概率确定模型(诸如,上文所述的CNN 240包含的一个模型),以确定目标部位位于当前检测区域内的概率。该概率确定模型可以是基于参考帧中的参考检测区域以及预先标注的参考概率来训练得到的。在一些实施例中,该概率确定模型更为简单地通过确定当前检测区域 内存在特定目标部位(诸如人脸)的概率来快速确定目标部位位于当前检测区域内的概率。该概率可以按照得分的形式输出,分值范围为0到1。分值越高代表人脸框内存在人脸的可能性越高。优选地,判断是否存在人脸的预定阈值可以是0.5或其他数值。
在某些实施例中,在确定目标部位位于当前检测区域内的概率的同时,计算设备120中的人工智能网络还可以基于当前检测区域确定目标部位的多个关键点。作为示例,计算设备120可以将当前检测区域应用于关键点确定模型(诸如,上文所述的CNN 240包含的一个模型),以确定目标部位的关键点。该关键点确定模型是基于参考帧中的参考检测区域以及预先标注的参考关键点来训练得到的。备选地或附加地,关键点确定模型和上述概率确定模型可以合并为一个模型,来基于当前检测区域同时确定目标部位的多个关键点以及目标部位位于当前检测区域内的概率。以此方式,可以在不显著增加算力的前提下获知预测得到的检测区域是否正确。
之后,计算设备120可以判定该概率是否大于或等于预定阈值。在306,当该概率大于或等于预定阈值时,计算设备120可以至少基于当前检测区域和先前检测区域,确定视频的后续帧中的用于检测目标部位的后续检测区域。作为示例,计算设备120中的位置预测模型可以基于当前检测区域和先验信息来确定后续检测区域。如上所述,该位置预测模型至少可以是卡尔曼滤波器、维纳滤波器、强跟踪滤波器、一次移动平均预测模型、二次移动平均预测模型、单指数平滑模型、双指数平滑模型、霍尔特指数平滑模型等中的一个。以此方式,当被监控对象不存在异常运动或遮挡时,计算设备120可以利用算力需求较小的位置预测模型来确定目标部位的检测区域,从而显著地节约了计算资源。
此外,当该概率小于预定阈值时,计算设备120可以对后续帧中的目标部位进行检测,并且基于检测结果确定后续帧中的用于检测目标部位的后续检测区域。作为示例,计算设备120可以将后续帧应用于区域确定模型(诸如,上文所述的CNN 240包含的一个模 型),以确定目标部位的后续检测区域。该区域确定模型是基于参考帧以及预先标注的参考检测区域来训练得到的。以此方式,可以及时发现运动预测所出现的错误,并利用更为精确的区域确定模型来纠正错误,保证区域跟踪的正确度。
在一些实施例中,区域确定模型可以对帧110进行人脸区域检测。例如,可以通过六层卷积网络对帧110进行人脸基础特征提取,并且每层卷积网络实现一次图像下采样,基于最后的三层卷积神经网络分别预设置固定数目的不同尺寸人脸锚点区域进行人脸检测区域回归,最终人脸的检测区域。应理解,上述实例仅是示例性的,还可以采用其他层数的卷积网络,并且也不限于确定人脸的检测区域。以此方式,可以基于区域确定模型快速识别帧110中的目标部位的检测区域。
以此方式,通过在传统的系统中加入了运动预测模型,本公开可以将确定目标部位的检测区域的大部分工作转移到算力需求较小的运动预测模型,从而节约了算力资源。此外,本公开还在关键点确定模型的基础上融合了上述概率确定模型,从而可以逐帧的检查运动预测的结果,并当可能发生预测错误时利用区域确定模型来获取正确的检测区域。由此,本公开在节约算力的同时还提升了检测区域预测的正确度。另外,当关键点确定模型与概率确定模型融合为一个模型时,不会增加计算设备120对输入的帧110的处理时间。因此,本公开几乎是无缺陷地提升了计算设备120的确定检测区域的性能,从而优化了用户体验。
此外,本公开还提供了一种用于目标部位跟踪的系统400。如图4所示,该系统包括图像采集模块410,该图像采集模块可以是诸如RGB相机的图像感测设备。该系统400还可以包括与图像采集模块410通信连接的计算模块420,该计算模块420用于上文所描述的各个方法和处理,例如过程300。此外,该系统400可以包括输出展示模块430,用于向用户展示计算模块420的处理结果。例如,输出展示模块430可以向用户展示被监控对象的人脸跟踪结果。
以此方式,可以实现系统级的人脸跟踪,且在人脸跟踪与识别的准确度不变的前提下显著降低算力需求。
在某些实施例中,系统400可以应用于多行人的人脸跟踪场景。作为示例,系统400可以应用于楼宇门禁场景或金融核验场景。当被监控对象的面部进入监控视野时,系统400可以基于包含该对象的面部的第一帧监控图像以及先验信息预测该对象的面部在下一帧监控图像中的位置,并在确定关键点时同时确定该位置中是否仍然包含该对象的面部。以此方式,既可以通过预测人脸位置来节约重复进行人脸检测的算力,还可以通过后续的人脸复查确定预测的准确性。当发现预测不准确时,可以重启人脸检测,保证人脸跟踪结果随时可用。
作为另一示例,系统400还可以应用于视频监控领域,尤其是在地铁或场馆入口对多个被监控对象进行体温监控的情形。例如,当多个被监控对象的面部进入监控视野时,系统400可以分别基于包含这些对象的面部的相应第一帧监控图像以及先验信息预测这些对象的面部在相应下一帧监控图像中的位置,并在确定关键点时同时确定该位置中是否仍然包含相应对象的面部。由于可能需要同时跟踪多个人脸,本公开的系统400可以极大的节约重复进行人脸检测的算力,同时保证保证人脸跟踪结果正确且随时可用。
图5示出了根据本公开的实施例的用于跟踪目标部位的装置500的框图。如图5所示,装置500可以包括:当前检测区域确定模块502,被配置为基于视频的先前帧中的对象的目标部位的先前检测区域,确定视频的当前帧中的用于检测目标部位的当前检测区域;概率确定模块,被配置为确定目标部位位于当前检测区域内的概率;以及后续检测区域确定模块,被配置为响应于概率大于或等于预定阈值,至少基于当前检测区域和先前检测区域,确定视频的后续帧中的用于检测所述目标部位的后续检测区域。
在某些实施例中,装置500还可以包括:目标部位检测模块,被配置为响应于概率小于预定阈值,对后续帧中的目标部位进行检 测;以及区域确定模块,被配置为基于检测的结果,确定后续帧中的用于检测目标部位的后续检测区域。
在某些实施例中,目标部位检测模块可以包括:后续帧应用模块,被配置为将后续帧应用于区域确定模型,以确定目标部位的后续检测区域,区域确定模型是基于参考帧以及预先标注的参考检测区域来训练得到的。
在某些实施例中,概率确定模块504可以包括:当前检测区域应用模块,被配置为将当前检测区域应用于概率确定模型,以确定目标部位位于当前检测区域内的概率,概率确定模型是基于参考帧中的参考检测区域以及预先标注的参考概率来训练得到的。
在某些实施例中,当前检测区域确定模块502可以包括:先前检测区域应用模块,被配置为将先前检测区域应用于位置预测模型,以确定当前检测区域,位置预测模型至少可以是以下中的一项:卡尔曼滤波器;维纳滤波器;以及强跟踪滤波器。
在某些实施例中,目标部位可以是对象的面部、眼睛、指纹中的至少一项。
在某些实施例中,装置500还可以包括:关键点确定模块,被配置为基于当前检测区域确定目标部位的关键点。
在某些实施例中,关键点确定模块可以包括:当前检测区域应用模块,被配置为将当前检测区域应用于关键点确定模型,以确定目标部位的所述关键点,关键点确定模型是基于参考帧中的参考检测区域以及预先标注的参考关键点来训练得到的。
图6示出了能够实施本公开的多个实施例的计算设备600的框图。设备600可以用于实现图1的计算设备120或者图2中的计算设备220。如图所示,设备600包括中央处理单元(CPU)601,其可以根据存储在只读存储器(ROM)602中的计算机程序指令或者从存储单元608加载到随机访问存储器(RAM)603中的计算机程序指令,来执行各种适当的动作和处理。在RAM 603中,还可存储设备600操作所需的各种程序和数据。CPU 601、ROM 602以及RAM  603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。
设备600中的多个部件连接至I/O接口605,包括:输入单元606,例如键盘、鼠标等;输出单元607,例如各种类型的显示器、扬声器等;存储单元608,例如磁盘、光盘等;以及通信单元609,例如网卡、调制解调器、无线通信收发机等。通信单元609允许设备600通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。
处理单元601执行上文所描述的各个方法和处理,例如过程300。例如,在一些实施例中,过程300可被实现为计算机软件程序,其被有形地包含于机器可读介质,例如存储单元608。在一些实施例中,计算机程序的部分或者全部可以经由ROM 602和/或通信单元609而被载入和/或安装到设备600上。当计算机程序加载到RAM 603并由CPU 601执行时,可以执行上文描述的过程300的一个或多个步骤。备选地,在其他实施例中,CPU 601可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行过程300。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统的系统(SOC)、负载可编程逻辑设备(CPLD)等等。
用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器,使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。
在本公开的上下文中,机器可读介质可以是有形的介质,其可 以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
此外,虽然采用特定次序描绘了各操作,但是这应当理解为要求这样操作以所示出的特定次序或以顺序次序执行,或者要求所有图示的操作应被执行以取得期望的结果。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实现中。相反地,在单个实现的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实现中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。

Claims (19)

  1. 一种目标部位跟踪方法,包括:
    基于视频的先前帧中的对象的目标部位的先前检测区域,确定所述视频的当前帧中的用于检测所述目标部位的当前检测区域;
    确定所述目标部位位于所述当前检测区域内的概率;以及
    响应于所述概率大于或等于预定阈值,至少基于所述当前检测区域和所述先前检测区域,确定所述视频的后续帧中的用于检测所述目标部位的后续检测区域。
  2. 根据权利要求1所述的方法,还包括:
    响应于所述概率小于所述预定阈值,对所述后续帧中的所述目标部位进行检测;以及
    基于所述检测的结果,确定所述后续帧中的用于检测所述目标部位的后续检测区域。
  3. 根据权利要求2所述的方法,其中对所述后续帧中的所述目标部位进行检测包括:
    将所述后续帧应用于区域确定模型,以确定所述目标部位的后续检测区域,所述区域确定模型是基于参考帧以及预先标注的参考检测区域来训练得到的。
  4. 根据权利要求1所述的方法,其中确定所述概率包括:
    将所述当前检测区域应用于概率确定模型,以确定所述目标部位位于所述当前检测区域内的概率,所述概率确定模型是基于参考帧中的参考检测区域以及预先标注的参考概率来训练得到的。
  5. 根据权利要求1所述的方法,其中确定所述当前检测区域包括:
    将所述先前检测区域应用于位置预测模型,以确定所述当前检测区域,所述位置预测模型至少是以下中的一项:
    卡尔曼滤波器;
    维纳滤波器;以及
    强跟踪滤波器。
  6. 根据权利要求1所述的方法,其中所述目标部位是所述对象的面部、眼睛、指纹中的至少一项。
  7. 根据权利要求1所述的方法,还包括:
    基于所述当前检测区域确定所述目标部位的关键点。
  8. 根据权利要求1所述的方法,其中确定所述关键点包括:
    将所述当前检测区域应用于关键点确定模型,以确定所述目标部位的所述关键点,所述关键点确定模型是基于参考帧中的参考检测区域以及预先标注的参考关键点来训练得到的。
  9. 一种目标部位跟踪装置,包括:
    当前检测区域确定模块,被配置为基于视频的先前帧中的对象的目标部位的先前检测区域,确定所述视频的当前帧中的用于检测所述目标部位的当前检测区域;
    概率确定模块,被配置为确定所述目标部位位于所述当前检测区域内的概率;以及
    后续检测区域确定模块,被配置为响应于所述概率大于或等于预定阈值,至少基于所述当前检测区域和所述先前检测区域,确定所述视频的后续帧中的用于检测所述目标部位的后续检测区域。
  10. 根据权利要求9所述的装置,还包括:
    目标部位检测模块,被配置为响应于所述概率小于所述预定阈值,对所述后续帧中的所述目标部位进行检测;以及
    区域确定模块,被配置为基于所述检测的结果,确定所述后续帧中的用于检测所述目标部位的后续检测区域。
  11. 根据权利要求10所述的装置,其中所述目标部位检测模块包括:
    后续帧应用模块,被配置为将所述后续帧应用于区域确定模型,以确定所述目标部位的后续检测区域,所述区域确定模型是基于参考帧以及预先标注的参考检测区域来训练得到的。
  12. 根据权利要求9所述的装置,其中所述概率确定模块包括:
    当前检测区域应用模块,被配置为将所述当前检测区域应用于概率确定模型,以确定所述目标部位位于所述当前检测区域内的概率,所述概率确定模型是基于参考帧中的参考检测区域以及预先标注的参考概率来训练得到的。
  13. 根据权利要求9所述的装置,其中所述当前检测区域确定模块包括:
    先前检测区域应用模块,被配置为将所述先前检测区域应用于位置预测模型,以确定所述当前检测区域,所述位置预测模型至少是以下中的一项:
    卡尔曼滤波器;
    维纳滤波器;以及
    强跟踪滤波器。
  14. 根据权利要求9所述的装置,其中所述目标部位是所述对象的面部、眼睛、指纹中的至少一项。
  15. 根据权利要求9所述的装置,还包括:
    关键点确定模块,被配置为基于所述当前检测区域确定所述目标部位的关键点。
  16. 根据权利要求9所述的装置,其中所述关键点确定模块包括:
    当前检测区域应用模块,被配置为将所述当前检测区域应用于关键点确定模型,以确定所述目标部位的所述关键点,所述关键点确定模型是基于参考帧中的参考检测区域以及预先标注的参考关键点来训练得到的。
  17. 一种电子设备,所述电子设备包括:
    一个或多个处理器;以及
    存储装置,用于存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-8中任一项所述的方法。
  18. 一种计算机可读存储介质,其上存储有计算机程序,所述 程序被处理器执行时实现如权利要求1-8任一项所述的方法。
  19. 一种目标部位跟踪系统,包括:
    视频采集模块,被配置为提供与对象的目标部位相关联的视频;
    计算模块,与所述视频采集模块通信连接,所述计算模块被配置为实现如权利要求1-8任一项所述的方法;以及
    输出展示模块,被配置为展示所述计算模块的处理结果。
PCT/CN2020/120965 2020-05-15 2020-10-14 目标部位跟踪方法、装置、电子设备和可读存储介质 WO2021227351A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
JP2022554423A JP2023516480A (ja) 2020-05-15 2020-10-14 対象部位追跡方法、装置、電子機器及び読み取り可能な記憶媒体
EP20935079.2A EP4152258A4 (en) 2020-05-15 2020-10-14 TARGET PART TRACKING METHOD AND APPARATUS, ELECTRONIC DEVICE AND READABLE STORAGE MEDIUM
US17/925,527 US20230196587A1 (en) 2020-05-15 2020-10-14 Method and system for tracking target part, and electronic device
KR1020227043801A KR20230003346A (ko) 2020-05-15 2020-10-14 타겟 부위 추적 방법, 장치, 전자 기기 및 판독 가능 저장 매체

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010415394.2 2020-05-15
CN202010415394.2A CN111627046A (zh) 2020-05-15 2020-05-15 目标部位跟踪方法、装置、电子设备和可读存储介质

Publications (1)

Publication Number Publication Date
WO2021227351A1 true WO2021227351A1 (zh) 2021-11-18

Family

ID=72259799

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/120965 WO2021227351A1 (zh) 2020-05-15 2020-10-14 目标部位跟踪方法、装置、电子设备和可读存储介质

Country Status (6)

Country Link
US (1) US20230196587A1 (zh)
EP (1) EP4152258A4 (zh)
JP (1) JP2023516480A (zh)
KR (1) KR20230003346A (zh)
CN (1) CN111627046A (zh)
WO (1) WO2021227351A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111627046A (zh) * 2020-05-15 2020-09-04 北京百度网讯科技有限公司 目标部位跟踪方法、装置、电子设备和可读存储介质
CN112541418B (zh) * 2020-12-04 2024-05-28 北京百度网讯科技有限公司 用于图像处理的方法、装置、设备、介质和程序产品
CN112950672B (zh) * 2021-03-03 2023-09-19 百度在线网络技术(北京)有限公司 确定关键点的位置的方法、装置和电子设备
CN115147264A (zh) * 2022-06-30 2022-10-04 北京百度网讯科技有限公司 图像处理方法、装置、电子设备及计算机可读存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109671103A (zh) * 2018-12-12 2019-04-23 易视腾科技股份有限公司 目标跟踪方法及装置
US20190191098A1 (en) * 2017-12-19 2019-06-20 Fujitsu Limited Object tracking apparatus, object tracking method, and non-transitory computer-readable storage medium for storing program
CN110490899A (zh) * 2019-07-11 2019-11-22 东南大学 一种结合目标跟踪的可变形施工机械的实时检测方法
CN110738687A (zh) * 2019-10-18 2020-01-31 上海眼控科技股份有限公司 对象跟踪方法、装置、设备及存储介质
CN110866428A (zh) * 2018-08-28 2020-03-06 杭州海康威视数字技术股份有限公司 目标跟踪方法、装置、电子设备及存储介质
CN111627046A (zh) * 2020-05-15 2020-09-04 北京百度网讯科技有限公司 目标部位跟踪方法、装置、电子设备和可读存储介质

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9295372B2 (en) * 2013-09-18 2016-03-29 Cerner Innovation, Inc. Marking and tracking an area of interest during endoscopy
CN103985137B (zh) * 2014-04-25 2017-04-05 深港产学研基地 应用于人机交互的运动物体跟踪方法及系统
CN104008371B (zh) * 2014-05-22 2017-02-15 南京邮电大学 一种基于多摄像机的区域可疑目标跟踪与识别方法
CN105488811B (zh) * 2015-11-23 2018-06-12 华中科技大学 一种基于深度梯度的目标跟踪方法与系统
CN105741316B (zh) * 2016-01-20 2018-10-16 西北工业大学 基于深度学习和多尺度相关滤波的鲁棒目标跟踪方法
CN106570490B (zh) * 2016-11-15 2019-07-16 华南理工大学 一种基于快速聚类的行人实时跟踪方法
CN106846362B (zh) * 2016-12-26 2020-07-24 歌尔科技有限公司 一种目标检测跟踪方法和装置
CN107274433B (zh) * 2017-06-21 2020-04-03 吉林大学 基于深度学习的目标跟踪方法、装置及存储介质
CN109063581A (zh) * 2017-10-20 2018-12-21 奥瞳系统科技有限公司 用于有限资源嵌入式视觉系统的增强型人脸检测和人脸跟踪方法和系统
US10510157B2 (en) * 2017-10-28 2019-12-17 Altumview Systems Inc. Method and apparatus for real-time face-tracking and face-pose-selection on embedded vision systems
CN108154159B (zh) * 2017-12-25 2018-12-18 北京航空航天大学 一种基于多级检测器的具有自恢复能力的目标跟踪方法
CN108921879A (zh) * 2018-05-16 2018-11-30 中国地质大学(武汉) 基于区域选择的CNN和Kalman滤波的运动目标跟踪方法及系统
CN108765455B (zh) * 2018-05-24 2021-09-21 中国科学院光电技术研究所 一种基于tld算法的目标稳定跟踪方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190191098A1 (en) * 2017-12-19 2019-06-20 Fujitsu Limited Object tracking apparatus, object tracking method, and non-transitory computer-readable storage medium for storing program
CN110866428A (zh) * 2018-08-28 2020-03-06 杭州海康威视数字技术股份有限公司 目标跟踪方法、装置、电子设备及存储介质
CN109671103A (zh) * 2018-12-12 2019-04-23 易视腾科技股份有限公司 目标跟踪方法及装置
CN110490899A (zh) * 2019-07-11 2019-11-22 东南大学 一种结合目标跟踪的可变形施工机械的实时检测方法
CN110738687A (zh) * 2019-10-18 2020-01-31 上海眼控科技股份有限公司 对象跟踪方法、装置、设备及存储介质
CN111627046A (zh) * 2020-05-15 2020-09-04 北京百度网讯科技有限公司 目标部位跟踪方法、装置、电子设备和可读存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4152258A4

Also Published As

Publication number Publication date
CN111627046A (zh) 2020-09-04
EP4152258A4 (en) 2024-03-20
JP2023516480A (ja) 2023-04-19
KR20230003346A (ko) 2023-01-05
EP4152258A1 (en) 2023-03-22
US20230196587A1 (en) 2023-06-22

Similar Documents

Publication Publication Date Title
WO2021227351A1 (zh) 目标部位跟踪方法、装置、电子设备和可读存储介质
CN108875833B (zh) 神经网络的训练方法、人脸识别方法及装置
CN108470332B (zh) 一种多目标跟踪方法及装置
US10789482B2 (en) On-line action detection using recurrent neural network
CN108446585B (zh) 目标跟踪方法、装置、计算机设备和存储介质
WO2018133666A1 (zh) 视频目标跟踪方法和装置
US9001199B2 (en) System and method for human detection and counting using background modeling, HOG and Haar features
WO2017185688A1 (zh) 一种在线目标跟踪方法及装置
CN109165589B (zh) 基于深度学习的车辆重识别方法和装置
CN114972418B (zh) 基于核自适应滤波与yolox检测结合的机动多目标跟踪方法
US20220180534A1 (en) Pedestrian tracking method, computing device, pedestrian tracking system and storage medium
EP2131328A2 (en) Method for automatic detection and tracking of multiple objects
Rezaee et al. An autonomous UAV-assisted distance-aware crowd sensing platform using deep ShuffleNet transfer learning
CN110991261A (zh) 交互行为识别方法、装置、计算机设备和存储介质
US11380010B2 (en) Image processing device, image processing method, and image processing program
CN113420682A (zh) 车路协同中目标检测方法、装置和路侧设备
CN113313763A (zh) 一种基于神经网络的单目相机位姿优化方法及装置
WO2018227491A1 (zh) 视频多目标模糊数据关联方法及装置
CA3136997A1 (en) RE-IDENTIFICATION METHOD FOR CREATING A DATABASE, APPARATUS, COMPUTER DEVICE AND STORAGE MEDIA
CN113780145A (zh) 精子形态检测方法、装置、计算机设备和存储介质
CN110866428A (zh) 目标跟踪方法、装置、电子设备及存储介质
CN114627339B (zh) 茂密丛林区域对越境人员的智能识别跟踪方法及存储介质
CN117372928A (zh) 一种视频目标检测方法、装置及相关设备
CN116959216A (zh) 一种实验操作监测预警方法、装置和系统
CN111401112B (zh) 人脸识别方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20935079

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022554423

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20227043801

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020935079

Country of ref document: EP

Effective date: 20221215