WO2020019353A1 - 跟踪控制方法、设备、计算机可读存储介质 - Google Patents
跟踪控制方法、设备、计算机可读存储介质 Download PDFInfo
- Publication number
- WO2020019353A1 WO2020019353A1 PCT/CN2018/097667 CN2018097667W WO2020019353A1 WO 2020019353 A1 WO2020019353 A1 WO 2020019353A1 CN 2018097667 W CN2018097667 W CN 2018097667W WO 2020019353 A1 WO2020019353 A1 WO 2020019353A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- input image
- frame
- tracking
- target object
- tracking frame
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
- G06V40/167—Detection; Localisation; Normalisation using comparisons between temporally consecutive images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/24765—Rule-based classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/62—Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
Definitions
- the present invention relates to the field of electronic information technology, and in particular, to a tracking control method, device, and computer-readable storage medium.
- Existing face detection methods may include cascaded classifier detection methods, DPM (Deformable Parts Models) detection methods, etc.
- DPM Deformable Parts Models
- CNN Convolutional Neural Network
- the face detection method based on CNN is usually trained and run on a server with a high-performance GPU (Graphics Processing Unit) and a high-performance CPU (Central Processing Unit).
- the trained network has complex networks , The number of layers, the number of parameters, and the large memory overhead, etc., resulting in complicated calculation processes, which cannot achieve the effect of real-time detection.
- the invention provides a tracking control method, a device, and a computer-readable storage medium, which can improve the accuracy and reliability of face detection, reduce network complexity and calculation volume, and achieve real-time detection effects.
- a tracking control method includes: acquiring an input image sequence; and detecting a frame of the input image in the input image sequence based on a detection algorithm to obtain tracking including a target object. Frame; based on the tracking algorithm, track the target object in multiple frames of input images following the one-frame input image according to the tracking frame of the target object.
- a tracking control device which may include: a memory and a processor; wherein, the memory is used to store program code; and the processor is used to call the program code.
- the code is executed, it is used to perform the following operations: obtaining an input image sequence; detecting a frame of the input image in the input image sequence based on a detection algorithm to obtain a tracking frame including a target object; based on the tracking algorithm, according to the The tracking frame of the target object tracks the target object in a plurality of frames of input images following the one-frame input image.
- a computer-readable storage medium stores computer instructions.
- the tracking control method is implemented.
- the embodiments of the present invention are implemented.
- the accuracy and reliability of face detection can be improved, the network complexity and calculation volume can be reduced, real-time detection effect can be achieved, multi-face detection can be achieved, read and write overhead and CPU development can be reduced. It is not necessary to frequently call the detection algorithm, thereby reducing the frequency of network calls, solving the problem of excessive power consumption, and avoiding the problem of low real-time when relying entirely on the detection algorithm.
- FIG. 1 is a schematic flowchart of a tracking control method
- FIG. 2 is a schematic diagram of a simplified MTCNN
- 3A is a schematic diagram of a state machine and a synchronization mechanism of a detection algorithm
- 3B is a schematic diagram of a state machine and a synchronization mechanism of the tracking algorithm
- FIG. 4 is a schematic structural diagram of a tracking control device.
- first, second, third, etc. may be used in the present invention to describe various information, these information should not be limited to these terms. These terms are used to distinguish the same type of information from each other.
- first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information.
- word “if” can be interpreted as “at”, or “at ", or "in response to a determination”.
- An embodiment of the present invention proposes a tracking control method, which can be applied to tracking control equipment, such as a movable platform.
- the movable platform may include, but is not limited to, drones and ground robots (e.g., unmanned vehicles). Wait).
- the movable platform can be equipped with a shooting device (such as a camera, a video camera, etc.), and the captured image can be captured by the shooting device.
- the movable platform can also be equipped with a pan / tilt, which can carry the shooting device, Stabilization and / or adjustment of shooting equipment.
- the method may include:
- Step 101 Obtain an input image sequence.
- the input image sequence may include multiple frames of input images.
- the input image sequence may be an input image of consecutive frames in the video data.
- the execution subject of the method may be a mobile platform, such as a processor of a mobile platform, the processor may be one or more, and the processor may be a general-purpose processor or a special-purpose processor.
- the mobile platform can be equipped with a shooting device.
- the shooting device can shoot the target object to obtain the captured image
- the processor of the mobile platform can obtain Said captured image.
- each captured image is an input image frame, and a set of input images of multiple frames may be used as an input image sequence.
- the target object may specifically be an object tracked by a movable platform.
- the input image may include at least one target object, and the target object includes a human face.
- Step 102 Based on the detection algorithm, detect a frame of the input image in the input image sequence to obtain a tracking frame including a target object.
- a frame of the input image in the input image sequence is detected, rather than the detection algorithm based on the detection algorithm.
- the input image is detected every frame.
- a frame of the input image sequence is detected to obtain a tracking frame including the target object, which may include, but is not limited to, a frame in the input image sequence by a specific CNN detection algorithm
- the input image is detected to obtain a tracking frame including a target object; wherein the specific CNN detection algorithm may include, but is not limited to, a weak classifier.
- the specific CNN detection algorithm may be a MTCNN (Multi Task Convolutional Neural Network) detection algorithm including pnet and rnet, but not including onet.
- the specific CNN detection algorithm may include at least one weak classifier, and different weak classifiers may have the same or different filtering strategies; the filtering strategies may include, but are not limited to, morphological filtering strategies, and / or, Skin color filtering strategy, that is, the weak classifier can use morphological filtering strategy for filtering processing, or it can also use skin color filtering strategy for filtering processing.
- the weak classifier can be deployed at any level of a particular CNN detection algorithm.
- a specific CNN detection algorithm is used to detect a frame of the input image in the input image sequence to obtain a tracking frame including the target object, which may include, but is not limited to, tracking for a weak classifier input to the specific CNN detection algorithm.
- the weak classifier is used to detect whether the tracking box meets the filtering policy; if the tracking box does not meet the filtering policy, the tracking box can be output to the next-level network of the specific CNN detection algorithm; if the filtering policy is met, the tracking box can be filtered. Tracking box.
- a specific CNN detection algorithm is used to detect an input image in an input image sequence to obtain a tracking frame including a target object.
- the tracking frame may also include, but is not limited to, converting the input image and network parameters into fixed-point data (while (Not floating-point data), and can be processed by a specific CNN detection algorithm using the converted fixed-point data to obtain a tracking frame that includes the target object.
- the specific CNN detection algorithm can also be implemented by a fixed-point network (such as a fixed-point MTCNN network), where the input image and network parameters in the fixed-point network are both fixed-point data; based on this, the input image is determined by the specific CNN detection algorithm.
- An input image in the sequence is detected to obtain a tracking frame including a target object, and may further include, but is not limited to, processing using fixed-point data in the fixed-point network through a specific CNN detection algorithm to obtain a tracking frame including the target object.
- a specific CNN detection algorithm is used to detect a frame of input image in an input image sequence, and before a tracking frame including a target object is obtained, a frame of input image in the input image sequence may be pre-processed to obtain a pre- The processed input image; then, the preprocessed input image is processed by a specific CNN detection algorithm to obtain a tracking frame including the target object.
- the pre-processing may include, but is not limited to: a compressed sensing process; and / or, a skin color detection process.
- a specific CNN detection algorithm is used to detect an input image in an input image sequence to obtain a tracking frame including the target object.
- the tracking frame may also include, but is not limited to, the time domain information can be used to predict the target object's reference area.
- a specific CNN detection algorithm is used to detect a reference area in an input image in an input image sequence, and a tracking frame including a target object is obtained.
- Step 103 Based on the tracking algorithm, according to the tracking frame of the target object (that is, the tracking frame of the target object obtained in step 102), the target object is included in the multi-frame input image after one frame of the input image (that is, one frame of the input image for detection). For tracking.
- the target object is tracked in each of the input images in the multiple frames of the input image following the one frame of the input image. That is, every certain number of input images, one frame of input images can be detected (ie, step 102), and multiple frames of input images following one frame of input images can be tracked (ie, step 103).
- step 102 is used to detect the first frame input image, and then step 103 is used to track the second frame input image to the 10th frame input image; then, step 102 is used to detect the 11th frame input image. Then, step 103 is used to track the 12th frame input image to the 20th frame input image; and so on, the above steps 102 to 103 are continuously repeated to complete the tracking control.
- tracking a target object in multiple input images following a frame of the input image according to the tracking frame of the target object may include, but is not limited to, obtaining the target object based on the previous frame input image And the spatial context model of the target object, where the spatial context model can be used to indicate the spatial correlation between the target object and the surrounding image area in the previous frame of the input image; then, the spatial context model can be based on the spatial context model , Determine the target object at the position corresponding to the tracking frame in the input image of the current frame and in the surrounding area.
- the spatial context model may include, but is not limited to, one or any combination of the following: grayscale features, hog (Histogram of Oriented Gradient) features, moment features, scale-invariant feature transformation, sift, Scale-invariant feature transform) feature.
- tracking the target object in multiple input images following one frame of the input image according to the tracking frame of the target object may include, but is not limited to, predicting the reference region of the target object through Kalman filtering; Based on the tracking algorithm, the target object is tracked according to the reference area of the target object's tracking frame in multiple frames of the input image following one frame of the input image.
- the detection algorithm may be implemented by a first thread and the tracking algorithm may be implemented by a second thread, that is, the detection algorithm and the tracking algorithm are implemented by different threads.
- tracking the target object in multiple input images following one frame of the input image according to the tracking frame of the target object may include, but is not limited to, outputting tracking of the target object to the second thread through the first thread Frame; then, the second thread may be used to track the target object in the multiple frames of the input image following one frame of the input image according to the tracking frame of the target object.
- the first thread can also stop detecting multiple frames of input images following a frame of input images, that is, not Then, multiple frames of input images following one frame of input images are detected.
- tracking the target object in multiple input images following one frame of the input image according to the tracking frame of the target object may include, but is not limited to, after obtaining the tracking frame including the target object through the first thread, the target frame may be tracked. Start the second thread; after the second thread is started, track the target object in the multi-frame input image following one frame of the input image according to the tracking frame of the target object through the second thread, that is, in the multi-frame input image The target object is tracked in each frame of the input image.
- the first state machine when the detection algorithm is turned on and the current input image is the first frame input image in the input image sequence, the first state machine can be set to the startup state by the first thread; the first state machine is started In the state, the input image can be detected by the first thread.
- the first state machine can be set to the idle state through the first thread; when the first state machine is in the idle state, Then, the detection of the input image by the first thread may be stopped.
- the first state machine when the detection algorithm is closed, the first state machine is set to the closed state by the first thread; when the first state machine is closed, the detection of the input image by the first thread is stopped.
- the second state machine when the tracking algorithm is turned on, the second state machine is set to the startup state by the second thread; when the second state machine is started, the input image is tracked by the second thread.
- the second state machine when the tracking algorithm is turned off, the second state machine is set to the closed state by the second thread; when the second state machine is turned off, the input image is stopped from being tracked by the second thread.
- the first tracking frame and the second The second tracking frame in the input image determines the target tracking frame of the target object; wherein the first tracking frame is a tracking frame including the target object in the first input image based on the detection algorithm; the second tracking frame is based on the tracking algorithm in the A tracking frame obtained when the target object is tracked in the second input image; based on the tracking algorithm, the target object is tracked according to the target tracking frame.
- the use of the first tracking frame in the first input image and the second tracking frame in the second input image to determine the target tracking frame of the target object may include, but is not limited to, calculating the first tracking frame and the second tracking frame. And the target tracking frame of the target object according to the degree of overlap.
- determining the target tracking frame of the target object according to the overlap degree may include, but is not limited to, if the overlap degree is greater than or equal to a preset threshold (which can be configured according to experience), the second tracking frame may be determined as the target.
- the degree of overlap includes: IoU (Intersection Union Set Ratio) between the first tracking frame and the second tracking frame.
- the accuracy and reliability of face detection can be improved, the network complexity and calculation volume can be reduced, real-time detection effect can be achieved, multi-face detection can be achieved, read and write overhead and CPU development can be reduced It is not necessary to frequently call the detection algorithm, thereby reducing the frequency of network calls, solving the problem of excessive power consumption, and avoiding the problem of low real-time when relying entirely on the detection algorithm.
- a frame of input image in the input image sequence can be detected to obtain a tracking frame including target objects.
- the input image can include multiple target objects, that is, step 102 can obtain multiple Tracking boxes.
- an input image sequence 1 is first collected, and the input image sequence 1 includes the input image 1-input image 10, and then the input image sequence 2 is collected, the input image sequence 2 includes the input image 11-input image 20, and so on,
- Each input image sequence includes 10 frames of input images, and each frame of input images can include a target object.
- a first frame input image (such as the input image 1) in the input image sequence 1 may be detected based on a detection algorithm to obtain a tracking frame including a target object. Then, the input image 2-the input image 10 in the input image sequence 1 are no longer detected.
- the input image 11 in the input image sequence 2 may be detected based on a detection algorithm to obtain a tracking frame including a target object. Then, the input image 12-input image 20 in the input image sequence 2 is no longer detected, and so on.
- the MTCNN detection algorithm may be used to detect the input image 1 to obtain a tracking frame including a target object.
- MTCNN can use cascaded networks for face detection.
- Traditional MTCNN includes 3 networks with increasing complexity: pnet, rnet, and onet.
- the implementation process can include: after the input image is preprocessed, the preprocessed The input image is output to pnet, and the input image is processed in pnet to obtain multiple candidate frames. These candidate frames are referred to as the first type of candidate frames.
- the local NMS (Non-Maximum Suppression, non-maximum suppression) method is used to process the first type of candidate frame to obtain the second type of candidate frame, which includes some of the first type of candidate frame.
- the second type candidate frame is output to rnet, and the second type candidate frame is processed in rnet to obtain the third type candidate frame.
- the third type candidate frame is processed by the local NMS method to obtain the fourth type.
- Class candidate box, the fourth type candidate box includes some candidate boxes of the third type candidate box.
- the fourth type candidate frame is output to onet, and the fourth type candidate frame is processed in onet to obtain the fifth type candidate frame.
- the fifth type candidate frame is processed by the local NMS method to obtain the sixth type.
- Class candidate box, the sixth type candidate box includes a part of the candidate box of the fifth type.
- the sixth type of candidate frame may be a tracking frame of each face finally obtained.
- MTCNN In the above MTCNN, onet is the network with the highest complexity. Its operation speed is slow, read and write overhead is large, and CPU overhead is large. As a result, MTCNN cannot run directly on embedded devices.
- the MTCNN is a MTCNN that includes pnet and rnet, but does not include onet.
- the MTCNN may include one or more weak classifiers. Each weak classifier may Deployed on any network of MTCNN.
- MTCNN can include pnet, local NMS (hereinafter referred to as the first local NMS), rnet, and local NMS (hereinafter referred to as the second local NMS) in this order.
- a weak classifier can be deployed before pnet. That is, MTCNN may include a weak classifier, pnet, a first local NMS, rnet, and a second local NMS in this order.
- a weak classifier may be deployed between the pnet and the first local NMS, that is, the MTCNN may include pnet, the weak classifier, the first local NMS, rnet, and the second local NMS in this order.
- a weak classifier can also be deployed between rnet and the second local NMS, that is, MTCNN can include pnet, the first local NMS, rnet, the weak classifier, and the second local NMS in this order. There is no restriction on this.
- the weak classifier Can be deployed on any level of MTCNN network.
- the above is a weak classifier as an example.
- multiple weak classifiers can be deployed on any first-level network of MTCNN.
- weak classifier 1 is deployed before pnet.
- Classifier 2 is deployed between rnet and the second local NMS, then the MTCNN can include the weak classifier 1, pnet, the first local NMS, rnet, the weak classifier 2, and the second local NMS in turn.
- Each Weak classifiers can be deployed on any level of MTCNN network.
- the weak classifier is used to filter the tracking box (that is, the above-mentioned candidate box, hereinafter referred to as a candidate box) according to the filtering strategy, and different weak classifiers may have the same or different filtering strategies.
- the filtering strategy may include, but is not limited to, a morphological filtering strategy, and / or, a skin color filtering strategy, that is, a weak classifier may perform filtering processing on the input candidate frames according to the morphological filtering strategy, or adopt a skin color filtering The strategy filters the input candidate boxes.
- the implementation process may include: after the input image 1 is pre-processed, the pre-processed input image 1 is output to the pnet, and the input image is processed in the pnet to obtain the first type of candidate frame.
- the first type of candidate frame is processed by the first local NMS method to obtain a second type of candidate frame, and the second type of candidate frame includes some candidate frames of the first type of candidate frame.
- the second type candidate frame is output to rnet, and the second type candidate frame is processed in rnet to obtain the third type candidate frame.
- the third class candidate box is output to the weak classifier.
- the weak classifier can detect whether the candidate box meets the filtering policy; if it does not meet the filtering policy, it can The candidate frame is used as a fourth type of candidate frame. If the filtering policy is met, the candidate frame can be filtered. In this way, all candidate frames that do not meet the filtering policy can be used as the fourth type of candidate frames, and then the fourth type of candidate frames are output to the next-level network of the MTCNN, that is, to the second local NMS.
- the fourth type of candidate frame is processed by the second local NMS method to obtain a fifth type of candidate frame, and the fifth type of candidate frame includes a part of the fourth type of candidate frame.
- the fifth type of candidate frame is no longer output to onet, and each candidate frame of the fifth type of candidate frame is a tracking frame including the target object.
- the input image 1 may be pre-processed to obtain the pre-processed input image 1 and the pre-processed input image 1 is output to the pnet.
- the preprocessing may include, but is not limited to, a compressed sensing process and / or a skin color detection process.
- a region where a human face may exist may be filtered out from the input image 1, and the region where a human face may exist may be output as preprocessed input image 1 to pnet.
- the three cascaded networks (such as pnet, rnet, and onet) are simplified into two cascaded networks (such as pnet and rnet), thereby simplifying the complexity of MTCNN.
- the MTCNN still maintains a good detection rate and accuracy rate, that is, morphological filtering and / or skin color filtering are performed on the candidate frames through a weak classifier to eliminate candidate frames that are obviously not faces.
- FIG. 2 is a schematic diagram of a simplified MTCNN.
- morphological filtering and skin color filtering are performed in a weak classifier.
- the time domain information can be used to predict the reference area of the target object (that is, predict the person at the next detection). Face area), this prediction method is not limited.
- the reference area in the input image 11 may be detected to obtain the tracking frame including the target object.
- the input area of the input image 11 is input to the MTCNN instead of the input image 11, thereby reducing the image content input to the MTCNN and improving the processing speed.
- all the data may be subjected to a fixed-point processing, that is, the input image and network parameters (that is, MTCNN Parameters in) are converted into fixed-point data, such as through floating-point to fixed-point operations (there is no restriction on this operation method), input images and network parameters are converted into fixed-point data.
- a fixed-point MTCNN network is retrained, and the input images and network parameters in the MTCNN network are both fixed-point data. In this way, fixed-point data can be processed by MTCNN; in this way, all data is fixed-point data, and no conversion of fixed-point data is required.
- the target object may be tracked based on a tracking algorithm in multiple input images following one frame of the input image according to the tracking frame of the target object.
- a tracking algorithm in multiple input images following one frame of the input image according to the tracking frame of the target object.
- an input image sequence 1 is first collected, and the input image sequence 1 includes the input image 1-input image 10, and then the input image sequence 2 is collected, the input image sequence 2 includes the input image 11-input image 20, and so on,
- Each input image sequence includes 10 frames of input images, and each frame of input images can include a target object.
- step 102 if the tracking frame of the target object in the input image 1 is obtained, in step 103, based on the tracking algorithm, according to the tracking frame of the target object in the input image 1, the input image in the input image sequence 1 2- In the input image 10, the target object is tracked.
- step 102 if the tracking frame of the target object in the input image 11 is obtained, in step 103, based on the tracking algorithm, according to the tracking frame of the target object in the input image 11, the input image in the input image sequence 2 12- Track the target object in the input image 20, and so on.
- an STC (Space-Time Context Visual Tracking) tracking algorithm may be used to track the target object.
- a tracking frame of the target object that is, the tracking frame obtained in step 102
- a spatial context model of the target object may be obtained, where the spatial context model is used to indicate input in the previous frame.
- the STC tracking algorithm is an object tracking algorithm based on the spatiotemporal context.
- the space-time relationship between the target to be tracked and the local context area of the target can be modeled through the Bayesian framework, and the statistical correlation between the target and its surrounding area features is obtained. Sex. Then, this time-space relationship and the characteristics of the focus on the biological vision system are integrated to evaluate the confidence map of the position of the target in the new frame of image. The position with the highest confidence is the target position in the new frame of image.
- the target object can be tracked, and there is no limitation on the tracking method of this target object.
- the scale transform of the STC tracking algorithm can be simplified to reduce the complexity of the STC tracking algorithm, and there is no limitation on this process.
- the features of the above-mentioned spatial context model may include, but are not limited to, one or any combination of the following: grayscale features, hog features, moment features, and sift features. There are no restrictions on the types of features of the spatial context model.
- the reference area of the target object (that is, the possible area of the face during the next tracking) can be predicted by Kalman filtering, and there is no limitation on this prediction method. .
- the reference area in the next frame of input image can be tracked, that is, the STC tracking algorithm is used to track the target object in the reference area. No longer track all areas of the input image, thereby assisting the STC tracking algorithm to update the target object position and improve processing speed.
- an input image sequence 1 is first collected, and the input image sequence 1 includes an input image 1-input image 10, and then an input image sequence 2 is collected, and the input image sequence 2 includes an input image 11-input image 20, and so on.
- Each input image sequence can include 10 frames of input images.
- step 102 after the input image sequence 1 is obtained, the input image 1 in the input image sequence 1 can be detected based on the detection algorithm to obtain a tracking frame A including the target object, but the input image 2 and the input image 10 are not performed. Detection. Then, after the input image sequence 2 is obtained, the input image 11 in the input image sequence 2 can be detected based on the detection algorithm to obtain a tracking frame B including the target object, but the input image 12-input image 20 is not detected to And so on.
- the target object may be tracked in the input image 2-the input image 10 according to the tracking frame A based on the tracking algorithm. Then, based on the tracking algorithm, the target object may be tracked in the input image 12-input image 20 according to the tracking frame B, and so on.
- the detection result of the detection algorithm (that is, the tracking frame B) is directly used, and the previous tracking result is no longer considered, that is, the target object in the input image 12-the input image 20
- the tracking result of input image 2-input image 10 is not considered, but the target object is tracked in input image 12-input image 20 directly according to the tracking frame B, that is, the tracking process of the target object and input image 2 -The tracking result of the input image 10 is irrelevant.
- the target object may be tracked in the input image 2-the input image 10 according to the tracking frame A based on the tracking algorithm. Then, you can continue to track the target object without stopping the tracking process, that is, you can track each frame of the input image, such as continuing to track the target object in the input image 11-input image 20 analogy.
- the tracking frame B and the tracking frame C are fused to obtain an accurate tracking Box X (the tracking box X may be tracking box B or tracking box C). Then, based on the tracking algorithm, the target object may be tracked in the input image 13-input image 20 according to the tracking box X, and so on.
- the tracking frame is obtained based on the detection algorithm, the tracking frame obtained by the detection algorithm and the tracking frame obtained by the tracking algorithm can be fused to obtain an accurate tracking frame.
- the target object is entered in the input image based on the tracking frame. For tracking.
- the detection result of the detection algorithm such as the tracking box B
- the tracking result of the tracking algorithm such as the tracking box C
- the tracking result of the input image can be considered, that is, the tracking frame B and the tracking frame C are fused, and the target object is tracked in the input image according to the fusion result, that is, the tracking process and input of the target object Image tracking results are relevant.
- the first tracking frame (the tracking frame obtained in the first input image based on the detection algorithm, such as the above-mentioned tracking frame B) in the first input image and the second tracking frame in the first input image may be used.
- the tracking frame (the tracking frame obtained in the second input image based on the tracking algorithm, such as the above-mentioned tracking frame C) determines the target tracking frame of the target object; then, the target object can be tracked according to the target tracking frame based on the tracking algorithm, and That is, step 103 is performed based on the target tracking frame, and details are not described herein again.
- using the first tracking frame in the first input image and the second tracking frame in the second input image to determine the target tracking frame of the target object may include, but is not limited to, calculating the first tracking frame and the first The overlap degree of the two tracking frames (that is, the intersection union ratio IoU, such as the intersection of the first tracking frame and the second tracking frame, divided by the union of the first tracking frame and the second tracking frame); if the degree of overlap is greater than or equal to The preset tracking value may determine the second tracking frame as the target tracking frame; or, if the degree of overlap is less than the preset threshold value, the first tracking frame may be determined as the target tracking frame.
- the overlap degree of the two tracking frames that is, the intersection union ratio IoU, such as the intersection of the first tracking frame and the second tracking frame, divided by the union of the first tracking frame and the second tracking frame
- the preset tracking value may determine the second tracking frame as the target tracking frame; or, if the degree of overlap is less than the preset threshold value, the first tracking frame may be determined as the target tracking
- the degree of coincidence between the first tracking frame and the second tracking frame is greater than or equal to a preset threshold, it indicates that the tracking result of the tracking algorithm is not shifted, and the current tracking target is maintained. That is, the second tracking frame is determined as the target. The tracking frame continues to track according to the second tracking frame.
- the degree of coincidence between the first tracking frame and the second tracking frame is less than a preset threshold, it indicates that the tracking result of the tracking algorithm is offset, or a new face is added, so it is eliminated.
- the current tracking target, or the updated tracking target is a new face, that is, the first tracking frame is determined as the target tracking frame, and tracking is performed again according to the first tracking frame.
- the detection algorithm and the tracking algorithm may also be implemented by different threads.
- the detection algorithm may be implemented by a first thread
- the tracking algorithm may be implemented by a second thread.
- the first thread may detect the input image 1 in the input image sequence 1 based on a detection algorithm to obtain a tracking frame A including the target object, and the first thread stops the input image sequence 1 In the input image 2-input image 10 for detection.
- the first thread may detect the input image 11 in the input image sequence 2 based on a detection algorithm to obtain a tracking frame B including the target object, and the first thread stops detecting the input image sequence 2 Input image 12-input image 20 are detected, and so on.
- the first thread can output the tracking frame A of the target object to the second thread, so that the second thread can The target object tracking frame A tracks the target object in the input image.
- the first thread detects the input image 12 in the input image sequence 2 and obtains the tracking frame B including the target object
- the first thread can output the tracking frame B of the target object to the second thread, so that the second thread can The tracking frame B tracks the target object in the input image.
- the first thread After the first thread obtains the tracking frame A including the target object, it can also trigger the start of the second thread; after the second thread is started, the second thread can match the tracking frame A of the target object in the input image 2-the input image 10 The target object is tracked. Then, the second thread may track the target object in the input image 12-the input image 20 according to the tracking frame B of the target object, and so on.
- FIG. 3A and FIG. 3B a schematic diagram of a state machine and a synchronization mechanism of a detection algorithm and a tracking algorithm are provided.
- the detection algorithm and the tracking algorithm are placed in different threads, maintaining their own state machine to implement state switching, and achieving state through shared memory. Synchronize.
- the detection algorithm is used to locate the frame coordinates of the face from the current input image (ie, the above-mentioned tracking frame), and the tracking algorithm is responsible for tracking the detected face frame.
- the first state machine (that is, the state machine of the detection algorithm) can be set to the startup state through the first thread; When a state machine is started, the input image can be detected by the first thread.
- the first state machine can be set to the idle state through the first thread; when the first state machine is in the idle state, Then, the detection of the input image by the first thread may be stopped.
- the detection algorithm is closed, the first state machine is set to the closed state by the first thread; when the first state machine is closed, the detection of the input image by the first thread may be stopped.
- the second state machine (that is, the state machine of the tracking algorithm) can be set to the startup state by the second thread; when the second state machine is started, the second state machine can be set by the second thread.
- the thread tracks the input image.
- the second state machine can be set to a closed state by a second thread; when the second state machine is turned off, the input image can be stopped from being tracked by a second thread. .
- the accuracy and reliability of face detection can be improved, the network complexity and calculation amount can be reduced, the real-time detection effect can be achieved, the multi-face detection can be realized, and the reading and writing overhead and the CPU can be reduced.
- the detection algorithm does not need to be called frequently, thereby reducing the frequency of network calls, solving the problem of excessive power consumption, and avoiding the problem of low real-time performance when relying entirely on the detection algorithm.
- the above method is a multi-face fast detection method in which a detection algorithm and a tracking algorithm are combined, which can achieve a real-time multi-face detection effect, can perform face detection quickly, and achieve a detection speed of hundreds of frames / second.
- using MTCNN detection algorithm to detect faces can improve the accuracy and robustness of face detection, reduce network complexity and calculation, reduce read and write overhead and CPU overhead, and reduce the frequency of network calls. Reduce power consumption.
- fixed-point conversions can be performed on network parameters and calculation processes, and the accuracy of fixed-point networks is guaranteed.
- the network complexity is reduced, the amount of calculation is reduced, all network operations are converted to fixed-point operations, and good accuracy is retained, so that they can run on embedded devices.
- an STC tracking algorithm with a small memory and CPU overhead is introduced and integrated with the detection algorithm to enable the STC tracking algorithm to perform most face detections, thereby solving the problem of low real-time performance caused by completely relying on the detection algorithm.
- the detection algorithm needs to be called frequently, thereby solving the problem of excessive power consumption. Because the STC tracking algorithm is added, the detection algorithm only plays a corrective role and does not need to be called frequently, so the power consumption on the embedded device is controlled. Because the tracking results of the STC tracking algorithm and the detection results of the detection algorithm are combined, the drift problem of the STC tracking algorithm is controlled.
- an embodiment of the present invention further provides a tracking control device 40 including a memory 41 and a processor 42 (such as one or more processors).
- the memory is used to store program code; the processor is used to call the program code, and when the program code is executed, used to perform the following operations: obtaining an input image sequence; based on a detection algorithm, Detecting a frame of an input image in the input image sequence to obtain a tracking frame including a target object; based on the tracking algorithm, according to the tracking frame of the target object, multiple frames of the input image following the one frame of the input image are compared The target object is tracked.
- the processor implements the detection algorithm through a first thread
- the processor implements the tracking algorithm through a second thread.
- the processor tracks the target object in the multi-frame input image behind the one-frame input image according to the tracking frame of the target object
- the processor is specifically configured to: use the first thread to the second thread Outputting a tracking frame of the target object; and tracking, by the second thread, the target object in multiple input images subsequent to the one-frame input image according to the tracking frame of the target object.
- the processor detects a frame of the input image in the input image sequence, and obtains a tracking frame including the target object, and is further configured to: stop, by using a first thread, the frame following the frame of the input image. Multiple frames of input images are detected.
- the processor is specifically configured to track the target object in multiple frames of input images following the one-frame input image according to the tracking frame of the target object:
- the second thread is started; after the second thread is started, the second thread is used in the The target object is tracked in multiple input images following the one frame input image.
- the processor is further configured to: when the detection algorithm is turned on and the current input image is the first frame input image in the input image sequence, set the first state machine to a startup state through a first thread; When a state machine is in an activated state, detecting an input image through said first thread;
- the first state machine When the detection algorithm is turned on and the current input image is not the first frame input image in the input image sequence, the first state machine is set to an idle state by a first thread; when the first state machine is in an idle state, Stop detecting the input image through the first thread;
- the first state machine When the detection algorithm is closed, the first state machine is set to the closed state by the first thread; when the first state machine is closed, the detection of the input image by the first thread is stopped.
- the processor is further configured to: when the tracking algorithm is turned on, set the second state machine to an activated state through a second thread; and when the second state machine is activated, track the input image through the second thread; When the tracking algorithm is turned off, the second state machine is set to the closed state by the second thread; when the second state machine is turned off, the input image is stopped from being tracked by the second thread.
- the processor detects a frame of the input image sequence in the input image sequence based on the detection algorithm, and is specifically configured to: use a specific CNN detection algorithm to detect a frame in the input image sequence.
- the input image is detected to obtain a tracking frame including the target object;
- the specific CNN detection algorithm includes a weak classifier.
- the processor detects a frame of the input image in the input image sequence by using a specific CNN detection algorithm, and obtains a tracking frame including a target object, which is specifically used to: for a weak classifier input to the specific CNN detection algorithm
- the tracking box detects whether the tracking box meets a filtering policy by using the weak classifier; if not, the tracking box is output to a lower-level network of the specific CNN detection algorithm.
- the processor is further configured to: if the tracking frame complies with the filtering policy, filter the tracking frame.
- the processor detects a frame of the input image in the input image sequence by using a specific CNN detection algorithm, and is specifically configured to obtain a tracking frame including a target object:
- the input image and network parameters are converted into fixed-point data, and the converted fixed-point data is processed by the specific CNN detection algorithm to obtain a tracking frame including the target object.
- the specific CNN detection algorithm is implemented by a fixed-point network, and the input images and network parameters in the fixed-point network are both fixed-point data;
- the processor detects a frame of the input image in the input image sequence by using a specific CNN detection algorithm, and obtains a tracking frame including a target object, which is specifically used for processing by the specific CNN detection algorithm using fixed-point data to obtain A tracking frame including the target object.
- the processor detects a frame of the input image in the input image sequence by using a specific CNN detection algorithm, and before obtaining the tracking frame including the target object, the processor is further configured to: And processing to obtain a preprocessed input image; and processing the preprocessed input image through a specific CNN detection algorithm to obtain a tracking frame including the target object.
- the processor detects a frame of the input image in the input image sequence by using a specific CNN detection algorithm, and obtains a tracking frame including the target object, which is specifically used for: predicting a reference region of the target object using time domain information; and using a specific CNN
- the detection algorithm detects a reference region in a frame of the input image in the input image sequence, and obtains a tracking frame including a target object.
- the processor is specifically configured to obtain: The tracking frame of the target object and the spatial context model of the target object, the spatial context model is used to indicate the spatial correlation between the target object and the surrounding image area in the previous frame of the input image; based on the In the space context model, the target object is determined at a position corresponding to the tracking frame in a current frame input image and in a surrounding area.
- the processor is specifically based on a tracking algorithm to track the target object in multiple input images following the one frame input image according to the tracking frame of the target object, and is specifically configured to:
- Prediction of a reference area of a target object by Kalman filtering based on a tracking algorithm, the target area is tracked according to the reference area of the target frame in a multi-frame input image following the one-frame input image .
- the processor is further based on a tracking algorithm and after tracking the target object in multiple input images following the one-frame input image according to the tracking frame of the target object, and is further configured to:
- a target tracking frame of the target object using a first tracking frame in a first input image and a second tracking frame in a second input image; wherein the first tracking frame is based on a detection algorithm in the first input A tracking frame including a target object obtained in the image; the second tracking frame is a tracking frame obtained when the target object is tracked in the second input image based on a tracking algorithm;
- the target object is tracked according to the target tracking frame.
- the processor determines the target tracking frame of the target object by using the first tracking frame in the first input image and the second tracking frame in the second input image
- the processor is specifically configured to: calculate the first tracking frame and the The degree of coincidence of the second tracking frame; the target tracking frame of the target object is determined according to the degree of coincidence.
- the processor determines the target tracking frame of the target object according to the overlap degree
- the processor is specifically configured to: if the overlap degree is greater than or equal to a preset threshold, determine the second tracking frame as the target object's A target tracking frame; or, if the degree of overlap is less than the preset threshold, determining the first tracking frame as a target tracking frame of the target object.
- Embodiment 7 Based on the same inventive concept as the above method, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium stores computer instructions, and when the computer instructions are executed, the foregoing is executed.
- the tracking control method refer to the foregoing embodiments.
- the system, device, module, or unit described in the foregoing embodiments may be implemented by a computer chip or entity, or by a product having a certain function.
- a typical implementation device is a computer, and the specific form of the computer may be a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email sending and receiving device, and a game control Desk, tablet computer, wearable device, or a combination of any of these devices.
- the embodiments of the present invention may be provided as a method, a system, or a computer program product. Therefore, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Moreover, the embodiments of the present invention may take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
- computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
- these computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer-readable memory produce a manufactured article including the instruction device,
- the instruction device implements the functions specified in a flowchart or a plurality of processes and / or a block or a block of the block diagram.
- These computer program instructions can also be loaded into a computer or other programmable data processing device, so that a series of operating steps are performed on the computer or other programmable device to produce a computer-implemented process, and the instructions executed on the computer or other programmable device Provides steps for implementing the functions specified in one or more flowcharts and / or one or more blocks of the block diagrams.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
一种跟踪控制方法、设备、计算机可读存储介质,所述方法包括:获取输入图像序列,该输入图像序列可以包括多帧输入图像(101);基于检测算法,对所述输入图像序列中的一帧输入图像进行检测,得到包括目标对象的跟踪框(102);基于跟踪算法,根据所述目标对象的跟踪框在所述一帧输入图像(即进行检测的一帧输入图像)后面的多帧输入图像中对所述目标对象进行跟踪(103)。本方法可以提高人脸检测的准确性和可靠性,降低网络复杂度和计算量,达到实时检测效果。
Description
本发明涉及电子信息技术领域,尤其是涉及一种跟踪控制方法、设备、计算机可读存储介质。
现有的人脸检测方法可以包括级联分类器检测方法、DPM(Deformable Parts Models,可变型部件模型)检测方法等,然而,这些人脸检测方法的可靠性和准确性均比较差。因此,随着CNN(Convolutional Neural Network,卷积神经网络)的兴起,基于CNN的人脸检测方法被越来越多地尝试。
基于CNN的人脸检测方法,通常在具有高性能GPU(Graphics Processing Unit,图形处理器)和高性能CPU(Central Processing Unit,中央处理器)的服务器上训练及运行,训练出来的网络存在网络复杂、层数多、参数多和内存开销大等缺陷,从而导致计算过程复杂,无法达到实时检测的效果。
发明内容
本发明提供一种跟踪控制方法、设备、计算机可读存储介质,可以提高人脸检测的准确性和可靠性,降低网络复杂度和计算量,达到实时检测效果。
本发明实施例第一方面,提供一种跟踪控制方法,所述方法包括:获取输入图像序列;基于检测算法,对所述输入图像序列中的一帧输入图像进行检测,得到包括目标对象的跟踪框;基于跟踪算法,根据所述目标对象的跟踪框在所述一帧输入图像后面的多帧输入图像中对所述目标对象进行跟踪。
本发明实施例第二方面,提供一种跟踪控制设备,可以包括:存储器和处理器;其中,所述存储器,用于存储程序代码;所述处理器,用于调用所述程序代码,当程序代码被执行时,用于执行以下操作:获取输入图像序列; 基于检测算法,对所述输入图像序列中的一帧输入图像进行检测,得到包括目标对象的跟踪框;基于跟踪算法,根据所述目标对象的跟踪框在所述一帧输入图像后面的多帧输入图像中对所述目标对象进行跟踪。
本发明实施例第三方面,提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机指令,所述计算机指令被执行时,实现上述跟踪控制方法,如实现本发明实施例第一方面所述的跟踪控制方法。
基于上述技术方案,本发明实施例中,可以提高人脸检测的准确性和可靠性,降低网络复杂度和计算量,达到实时检测效果,实现多人脸检测,可以减少读写开销和CPU开,不需要频繁调用检测算法,从而降低网络调用频次,解决了功耗过高的问题,避免完全依赖检测算法时的实时性低问题。
为了更加清楚地说明本发明实施例中的技术方案,下面将对本发明实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明中记载的一些实施例,而不是所有实施例,对于本领域普通技术人员来讲,还可以根据本发明实施例的这些附图获得其它的附图。
图1是一个跟踪控制方法的流程示意图;
图2是一个精简后MTCNN的示意图;
图3A是检测算法的状态机及同步机制示意图;
图3B是跟踪算法的状态机及同步机制示意图;
图4是一个跟踪控制设备的结构示意图。
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做 出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。另外,在不冲突的情况下,下述的实施例及实施例中的特征可以相互组合。
本发明使用的术语仅仅是出于描述特定实施例的目的,而非限制本发明。本发明和权利要求书所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其它含义。应当理解的是,本文中使用的术语“和/或”是指包含一个或多个相关联的列出项目的任何或所有可能组合。
尽管在本发明可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语用来将同一类型的信息彼此区分开。例如,在不脱离本发明范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,此外,所使用的词语“如果”可以被解释成为“在……时”,或者,“当……时”,或者,“响应于确定”。
实施例1:
本发明实施例中提出一种跟踪控制方法,该方法可以应用于跟踪控制设备,如可移动平台等,其中,所述可移动平台可以包括但不限于无人机、地面机器人(例如无人车等)。此外,该可移动平台可以配置拍摄设备(如相机、摄像机等),并通过该拍摄设备采集拍摄图像,另外,可移动平台还可以配置有云台,该云台可以承载该拍摄设备,以为该拍摄设备增稳和/或调整。
参见图1所示,为跟踪控制方法的流程示意图,该方法可以包括:
步骤101,获取输入图像序列,该输入图像序列可以包括多帧输入图像。
其中,所述输入图像序列可以是视频数据中的连续帧的输入图像。
具体地,所述方法的执行主体可以为可移动平台,如可移动平台的处理器,处理器可以为一个或多个,处理器可以为通用处理器或者专用处理器。
如前所述,可移动平台可以配置有拍摄设备,在可移动平台对目标对象进行跟踪的过程中,则拍摄设备可以对目标对象进行拍摄,以获取拍摄图像,可移动平台的处理器可以获取所述拍摄图像。其中,每个拍摄图像就是一帧 输入图像,并且可以将多帧输入图像的集合作为输入图像序列。
其中,所述目标对象具体可以为可移动平台跟踪的对象。
其中,输入图像包括的目标对象可以为至少一个,且目标对象包括人脸。
步骤102,基于检测算法,对该输入图像序列中的一帧输入图像进行检测,得到包括目标对象的跟踪框。其中,这里是基于检测算法,只对该输入图像序列中的一帧输入图像(如该输入图像序列中的第一帧输入图像)进行检测,而不是基于检测算法,对该输入图像序列中的每帧输入图像进行检测。
在一个例子中,基于检测算法,对该输入图像序列中的一帧输入图像进行检测,得到包括目标对象的跟踪框,可以包括但不限于:通过特定CNN检测算法对输入图像序列中的一帧输入图像进行检测,得到包括目标对象的跟踪框;其中,所述特定CNN检测算法可以包括但不限于弱分类器。例如,所述特定CNN检测算法可以是包括pnet和rnet,但不包括onet的MTCNN(Multi Task Convolutional Neural Network,多任务卷积神经网络)检测算法。
例如,所述特定CNN检测算法可以包括至少一个弱分类器,且不同的弱分类器可以具有相同或者不同的过滤策略;所述过滤策略可以包括但不限于:形态学过滤策略,和/或,肤色过滤策略,也就是说,弱分类器可以采用形态学过滤策略进行过滤处理,或者,也可以采用肤色过滤策略进行过滤处理。
此外,该弱分类器可以部署在特定CNN检测算法的任意一级网络。
在一个例子中,通过特定CNN检测算法对输入图像序列中的一帧输入图像进行检测,得到包括目标对象的跟踪框,可以包括但不限于:针对输入到特定CNN检测算法的弱分类器的跟踪框,则通过该弱分类器检测该跟踪框是否符合过滤策略;如果不符合过滤策略,则可以将该跟踪框输出给特定CNN检测算法的下一级网络;如果符合过滤策略,则可以过滤该跟踪框。
在一个例子中,通过特定CNN检测算法对输入图像序列中的一帧输入图像进行检测,得到包括目标对象的跟踪框,还可以包括但不限于:将输入图 像和网络参数转换为定点数据(而不是浮点数据),并可以通过特定CNN检测算法利用转换后的定点数据进行处理,得到包括目标对象的跟踪框。
在另一个例子中,特定CNN检测算法还可以通过定点网络(如定点MTCNN网络)实现,所述定点网络中的输入图像和网络参数均为定点数据;基于此,通过特定CNN检测算法对输入图像序列中的一帧输入图像进行检测,得到包括目标对象的跟踪框,还可以包括但不限于:通过特定CNN检测算法利用所述定点网络中的定点数据进行处理,得到包括目标对象的跟踪框。
在一个例子中,通过特定CNN检测算法对输入图像序列中的一帧输入图像进行检测,得到包括目标对象的跟踪框之前,还可以对输入图像序列中的一帧输入图像进行预处理,得到预处理后的输入图像;然后,通过特定CNN检测算法对预处理后的输入图像进行处理,得到包括目标对象的跟踪框。其中,所述预处理可以包括但不限于:压缩感知处理;和/或,肤色检测处理。
在一个例子中,通过特定CNN检测算法对输入图像序列中的一帧输入图像进行检测,得到包括目标对象的跟踪框,还可以包括但不限于:可以利用时域信息预测目标对象的参考区域,并通过特定CNN检测算法对输入图像序列中的一帧输入图像中的参考区域进行检测,得到包括目标对象的跟踪框。
步骤103,基于跟踪算法,根据目标对象的跟踪框(即步骤102得到的目标对象的跟踪框)在一帧输入图像(即进行检测的一帧输入图像)后面的多帧输入图像中对目标对象进行跟踪。其中,这里是基于跟踪算法,在一帧输入图像后面的多帧输入图像中的每帧输入图像中,对目标对象进行跟踪。也就是说,每隔一定数量的输入图像,就可以对一帧输入图像进行进行检测(即步骤102),并在一帧输入图像后面的多帧输入图像中进行跟踪(即步骤103)。
例如,采用步骤102,对第1帧输入图像进行检测,然后,采用步骤103,对第2帧输入图像-第10帧输入图像进行跟踪;然后,采用步骤102,对第11帧输入图像进行检测,然后,采用步骤103,对第12帧输入图像-第20帧输入图像进行跟踪;以此类推,不断重新上述步骤102-步骤103,完成跟踪控制。
在一个例子中,基于跟踪算法,根据目标对象的跟踪框在一帧输入图像 后面的多帧输入图像中对目标对象进行跟踪,可以包括但不限于:获取基于上一帧输入图像得到的目标对象的跟踪框以及该目标对象的空间上下文模型,其中,该空间上下文模型可以用于指示在所述上一帧输入图像中目标对象与周围图像区域的空间相关性;然后,可以基于该空间上下文模型,在当前帧输入图像中对应跟踪框的位置处以及周围区域中确定该目标对象。
在一个例子中,该空间上下文模型可以包括但不限于以下之一或者任意组合:灰度特征、hog(Histogram of Oriented Gradient,方向梯度直方图)特征、矩特征、sift(Scale-invariant feature transform,尺度不变特征变换)特征。
在一个例子中,基于跟踪算法,根据目标对象的跟踪框在一帧输入图像后面的多帧输入图像中对目标对象进行跟踪,可以包括但不限于:通过卡尔曼滤波预测目标对象的参考区域;基于跟踪算法,根据目标对象的跟踪框在一帧输入图像后面的多帧输入图像中的参考区域,对目标对象进行跟踪。
在一个例子中,可以通过第一线程实现检测算法,并通过第二线程实现跟踪算法,也就是说,检测算法和跟踪算法是通过不同的线程实现的。
在一个例子中,根据目标对象的跟踪框在一帧输入图像后面的多帧输入图像中对目标对象进行跟踪,可以包括但不限于:通过该第一线程向该第二线程输出目标对象的跟踪框;然后,可以通过该第二线程根据目标对象的跟踪框在一帧输入图像后面的多帧输入图像中对目标对象进行跟踪。
在一个例子中,对输入图像序列中的一帧输入图像进行检测,得到包括目标对象的跟踪框之后,还可以通过第一线程停止对一帧输入图像后面的多帧输入图像进行检测,即不再对一帧输入图像后面的多帧输入图像进行检测。
在一个例子中,根据目标对象的跟踪框在一帧输入图像后面的多帧输入图像中对目标对象进行跟踪,可以包括但不限于:在通过第一线程得到包括目标对象的跟踪框后,可以启动第二线程;在第二线程启动后,通过第二线程根据目标对象的跟踪框在一帧输入图像后面的多帧输入图像中对目标对象进行跟踪,即在所述多帧输入图像中的每帧输入图像中对目标对象进行跟踪。
在一个例子中,在检测算法开启、且当前输入图像是输入图像序列中的 第一帧输入图像时,则可以通过第一线程将第一状态机设置为启动状态;在第一状态机为启动状态时,则可以通过第一线程对输入图像进行检测。
此外,在检测算法开启、且当前输入图像不是输入图像序列中的第一帧输入图像时,则可以通过第一线程将第一状态机设置为空闲状态;在第一状态机为空闲状态时,则可以停止通过第一线程对输入图像进行检测。
此外,在检测算法关闭时,则通过第一线程将第一状态机设置为关闭状态;在第一状态机为关闭状态时,则停止通过第一线程对输入图像进行检测。
此外,在跟踪算法开启时,则通过第二线程将第二状态机设置为启动状态;在第二状态机为启动状态时,则通过第二线程对输入图像进行跟踪。
此外,在跟踪算法关闭时,则通过第二线程将第二状态机设置为关闭状态;在第二状态机为关闭状态时,则停止通过第二线程对输入图像进行跟踪。
在一个例子中,基于跟踪算法,根据目标对象的跟踪框在一帧输入图像后面的多帧输入图像中对目标对象进行跟踪之后,还可以利用第一输入图像中的第一跟踪框和第二输入图像中的第二跟踪框确定目标对象的目标跟踪框;其中,第一跟踪框是基于检测算法在第一输入图像中得到的包括目标对象的跟踪框;第二跟踪框是基于跟踪算法在第二输入图像中对目标对象进行跟踪时得到的跟踪框;基于跟踪算法,根据目标跟踪框对目标对象进行跟踪。
其中,利用第一输入图像中的第一跟踪框和第二输入图像中的第二跟踪框确定目标对象的目标跟踪框,可以包括但不限于:计算该第一跟踪框与该第二跟踪框的重合程度,并根据该重合程度确定目标对象的目标跟踪框。
其中,根据该重合程度确定目标对象的目标跟踪框,可以包括但不限于:若该重合程度大于或者等于预设阈值(可以根据经验进行配置),则可以将该第二跟踪框确定为该目标对象的目标跟踪框;或者,若该重合程度小于所述预设阈值,则可以将该第一跟踪框确定为该目标对象的目标跟踪框。
其中,重合程度包括:第一跟踪框与第二跟踪框之间的IoU(交集并集比)。
基于上述技术方案,本发明实施例中,可以提高人脸检测的准确性和可靠性,降低网络复杂度和计算量,达到实时检测效果,实现多人脸检测,可 以减少读写开销和CPU开,不需要频繁调用检测算法,从而降低网络调用频次,解决了功耗过高的问题,避免完全依赖检测算法时的实时性低问题。
实施例2:
针对步骤102,可以基于检测算法,对输入图像序列中的一帧输入图像进行检测,得到包括目标对象的跟踪框,在实际应用中,输入图像可以包括多个目标对象,即步骤102可以得到多个跟踪框。例如,步骤101中,先采集输入图像序列1,输入图像序列1包括输入图像1-输入图像10,然后采集输入图像序列2,输入图像序列2包括输入图像11-输入图像20,以此类推,每个输入图像序列均包括10帧输入图像,且每帧输入图像均可以包括目标对象。
在得到输入图像序列1后,可以基于检测算法,对输入图像序列1中的第一帧输入图像(如输入图像1)进行检测,得到包括目标对象的跟踪框。然后,不再对输入图像序列1中的输入图像2-输入图像10进行检测。
进一步的,在得到输入图像序列2后,可以基于检测算法,对输入图像序列2中的输入图像11进行检测,得到包括目标对象的跟踪框。然后,不再对输入图像序列2中的输入图像12-输入图像20进行检测,以此类推。
为实现输入图像(后续以输入图像1为例)的检测,本实施例中,可以采用MTCNN检测算法对输入图像1进行检测,得到包括目标对象的跟踪框。
其中,MTCNN可以采用级联网络进行人脸检测,传统的MTCNN包括3个复杂度依次提高的网络:pnet、rnet和onet,其实现流程可以包括:输入图像经过预处理后,将预处理后的输入图像输出给pnet,在pnet中对输入图像进行处理,得到多个候选框,将这些候选框称为第一类候选框。通过本地NMS(Non-Maximum Suppression,非极大值抑制)方法对第一类候选框进行处理,得到第二类候选框,该第二类候选框包括第一类候选框中的部分候选框。
然后,将第二类候选框输出给rnet,在rnet中对第二类候选框进行处理后,得到第三类候选框,然后,通过本地NMS方法对第三类候选框进行处理,得到第四类候选框,该第四类候选框包括第三类候选框中的部分候选框。
然后,将第四类候选框输出给onet,在onet中对第四类候选框进行处理 后,得到第五类候选框,然后,通过本地NMS方法对第五类候选框进行处理,得到第六类候选框,该第六类候选框包括第五类候选框中的部分候选框。
进一步的,所述第六类候选框就可以是最终得到的各个人脸的跟踪框。
在上述MTCNN中,onet是复杂度最高的网络,其运算速度慢,读写开销大,CPU开销大,从而导致MTCNN无法直接在嵌入式设备运行。
针对上述发现,本发明实施例中提出一种新的MTCNN,该MTCNN是包括pnet和rnet,但不包括onet的MTCNN,且该MTCNN可以包括一个或者多个弱分类器,每个弱分类器可以部署在MTCNN的任意一级网络。
例如,在去除onet后,MTCNN可以依次包括pnet、本地NMS(后续称为第一本地NMS)、rnet、本地NMS(后续称为第二本地NMS),这样,可以在pnet之前部署弱分类器,即MTCNN可以依次包括弱分类器、pnet、第一本地NMS、rnet、第二本地NMS。或者,也可以在pnet与第一本地NMS之间部署弱分类器,即MTCNN可以依次包括pnet、弱分类器、第一本地NMS、rnet、第二本地NMS。或者,还可以在rnet与第二本地NMS之间部署弱分类器,即MTCNN可以依次包括pnet、第一本地NMS、rnet、弱分类器、第二本地NMS,对此不做限制,弱分类器可以部署在MTCNN的任意一级网络。
当然,上述是以一个弱分类器为例,当存在多个弱分类器时,可以将多个弱分类器部署在MTCNN的任意一级网络,例如,弱分类器1部署在pnet之前,弱分类器2部署在rnet与第二本地NMS之间,则MTCNN可以依次包括弱分类器1、pnet、第一本地NMS、rnet、弱分类器2、第二本地NMS,对此不做限制,每个弱分类器均可以部署在MTCNN的任意一级网络。
在一个例子中,弱分类器用于根据过滤策略对跟踪框(也就是上述候选框,后续称为候选框)进行过滤,且不同的弱分类器可以具有相同或者不同的过滤策略。所述过滤策略可以包括但不限于:形态学过滤策略,和/或,肤色过滤策略,也就是说,弱分类器可以根据形态学过滤策略对输入的候选框进行过滤处理,或者,采用肤色过滤策略对输入的候选框进行过滤处理。
综上所述,在对输入图像序列1中的输入图像1进行检测,得到包括目 标对象的跟踪框时,若MTCNN依次包括pnet、第一本地NMS、rnet、弱分类器、第二本地NMS,则实现流程可以包括:输入图像1经过预处理后,将预处理后的输入图像1输出给pnet,在pnet中对输入图像进行处理,得到第一类候选框。通过第一本地NMS方法对第一类候选框进行处理,得到第二类候选框,该第二类候选框包括第一类候选框中的部分候选框。将第二类候选框输出给rnet,在rnet中对第二类候选框进行处理后,得到第三类候选框。
然后,将第三类候选框输出给弱分类器,针对第三类候选框中的每个候选框,弱分类器可以检测该候选框是否符合过滤策略;如果不符合过滤策略,则可以将该候选框作为第四类候选框,如果符合过滤策略,则可以过滤该候选框。这样,可以将所有不符合过滤策略的候选框作为第四类候选框,然后将第四类候选框输出给MTCNN的下一级网络,即输出给第二本地NMS。
然后,通过第二本地NMS方法对第四类候选框进行处理,得到第五类候选框,该第五类候选框包括第四类候选框中的部分候选框。该第五类候选框不再输出给onet,第五类候选框中的每个候选框就是包括目标对象的跟踪框。
在上述过程中,在将输入图像1输出给pnet之前,还可以对输入图像1进行预处理,得到预处理后的输入图像1,并将预处理后的输入图像1输出给pnet。其中,预处理可以包括但不限于压缩感知处理和/或肤色检测处理。而且,通过对输入图像1进行预处理,可以从输入图像1中筛选出可能存在人脸的区域,并将可能存在人脸的区域作为预处理后的输入图像1输出给pnet。
在上述过程中,是将3个级联网络(如pnet、rnet和onet)简化为2个级联网络(如pnet和rnet),从而简化MTCNN的复杂度,然后,通过弱分类器保证精简后的MTCNN仍然保持较好的检出率和准确率,即通过弱分类器对候选框进行形态学滤波和/或肤色滤波,以剔除明显不是人脸的候选框。
参见图2所示,为精简后MTCNN的示意图,针对rnet的输出结果,在弱分类器中进行了形态学滤波和肤色滤波(如基于哈尔特征的肤色滤波)。
在上述过程中,在采用MTCNN对输入图像序列1中的输入图像1进行检测,得到包括目标对象的跟踪框时,可以利用时域信息预测目标对象的参 考区域(即预测下一次检测时的人脸可能区域),对此预测方式不做限制。然后,在采用MTCNN对输入图像序列2中的输入图像11进行检测,得到包括目标对象的跟踪框时,可以对输入图像11中的参考区域进行检测,得到包括目标对象的跟踪框。也就是说,输入给MTCNN的是输入图像11的参考区域,而不是输入图像11,从而减少输入给MTCNN的图像内容,提高处理速度。
在一个例子中,在采用MTCNN对输入图像序列1中的输入图像1进行检测,得到包括目标对象的跟踪框时,还可以对所有数据进行定点化的处理,即将输入图像和网络参数(即MTCNN中的参数)转换为定点数据,如通过浮点转定点操作(对此操作方式不做限制),将输入图像和网络参数转换为定点数据。或者,重新训练一个定点MTCNN网络,而MTCNN网络中的输入图像和网络参数均为定点数据。这样,就可以通过MTCNN对定点数据进行处理;在这种方式中,所有数据均为定点数据,不需要进行定点数据的转换。
实施例3:
针对步骤103,可以基于跟踪算法,根据目标对象的跟踪框在一帧输入图像后面的多帧输入图像中对目标对象进行跟踪。例如,步骤101中,先采集输入图像序列1,输入图像序列1包括输入图像1-输入图像10,然后采集输入图像序列2,输入图像序列2包括输入图像11-输入图像20,以此类推,每个输入图像序列均包括10帧输入图像,且每帧输入图像均可以包括目标对象。
在步骤102中,若得到输入图像1中的目标对象的跟踪框,则步骤103中,可以基于跟踪算法,根据输入图像1中的该目标对象的跟踪框,在输入图像序列1中的输入图像2-输入图像10中,对目标对象进行跟踪。
在步骤102中,若得到输入图像11中的目标对象的跟踪框,则步骤103中,可以基于跟踪算法,根据输入图像11中的该目标对象的跟踪框,在输入图像序列2中的输入图像12-输入图像20中对目标对象进行跟踪,以此类推。
为了实现目标对象的跟踪,本实施例中,可以采用STC(时空上下文视觉跟踪)跟踪算法,对目标对象进行跟踪。具体的,可以获取基于上一帧输入图像得到的目标对象的跟踪框(即步骤102得到的跟踪框)与该目标对象 的空间上下文模型,该空间上下文模型用于指示在所述上一帧输入图像中目标对象与周围图像区域的空间相关性;然后,可以基于该空间上下文模型,在当前帧输入图像中对应跟踪框的位置处以及周围区域中确定该目标对象。
其中,STC跟踪算法是一个基于时空上下文的目标跟踪算法,可以通过贝叶斯框架对需要跟踪的目标和该目标的局部上下文区域的时空关系进行建模,得到目标和其周围区域特征的统计相关性。然后,综合这一时空关系和生物视觉系统上的关注焦点特性,来评估新的一帧图像中目标出现位置的置信图,置信最大的位置就是新的一帧图像中的目标位置。基于此STC跟踪算法,就可以对目标对象进行跟踪,对此目标对象的跟踪方式不做限制。
针对传统的STC跟踪算法,本实施例中,可以对STC跟踪算法的scale(缩放)变换进行精简,以减少STC跟踪算法的复杂度,对此过程不做限制。
在一个例子中,在使用STC跟踪算法对目标对象进行跟踪时,上述空间上下文模型的特征可以包括但不限于以下之一或者任意组合:灰度特征、hog特征、矩特征、sift特征,对此空间上下文模型的特征的类型不做限制。
在一个例子中,在使用STC跟踪算法对目标对象进行跟踪时,还可以通过卡尔曼滤波预测目标对象的参考区域(即预测下一次跟踪时的人脸可能区域),对此预测方式不做限制。然后,在使用STC跟踪算法对下一帧输入图像中的目标对象进行跟踪时,可以对该下一帧输入图像中的参考区域进行跟踪,即使用STC跟踪算法对参考区域的目标对象进行跟踪,不再对输入图像的所有区域进行跟踪,从而辅助STC跟踪算法更新目标对象位置,提高处理速度。
实施例4:
在步骤101中,先采集输入图像序列1,输入图像序列1包括输入图像1-输入图像10,然后采集输入图像序列2,输入图像序列2包括输入图像11-输入图像20,以此类推,每个输入图像序列均可以包括10帧输入图像。
在步骤102中,在得到输入图像序列1后,可以基于检测算法,对输入图像序列1中的输入图像1进行检测,得到包括目标对象的跟踪框A,但不对输入图像2-输入图像10进行检测。然后,在得到输入图像序列2后,可以 基于检测算法,对输入图像序列2中的输入图像11进行检测,得到包括目标对象的跟踪框B,但不对输入图像12-输入图像20进行检测,以此类推。
在步骤103的一个实现方式中,可以基于跟踪算法,根据跟踪框A在输入图像2-输入图像10中对目标对象进行跟踪。然后,可以基于跟踪算法,根据跟踪框B在输入图像12-输入图像20中对目标对象进行跟踪,以此类推。
在这个实现方式中,为了对目标对象进行跟踪,是直接使用检测算法的检测结果(即跟踪框B),而不再考虑之前的跟踪结果,即在输入图像12-输入图像20中对目标对象进行跟踪时,不会考虑输入图像2-输入图像10的跟踪结果,而是直接根据跟踪框B在输入图像12-输入图像20中对目标对象进行跟踪,即目标对象的跟踪过程与输入图像2-输入图像10的跟踪结果无关。
在步骤103的另一个实现方式中,可以基于跟踪算法,根据跟踪框A在输入图像2-输入图像10中对目标对象进行跟踪。然后,可以继续对目标对象进行跟踪,而不会停止跟踪过程,也就是说,可以对每帧输入图像均进行跟踪,如继续在输入图像11-输入图像20中对目标对象进行跟踪,以此类推。
在对输入图像11进行检测,得到跟踪框B后,假设当前正在输入图像12中对目标对象进行跟踪,并得到跟踪框C,则将跟踪框B与跟踪框C进行融合,得到一个准确的跟踪框X(该跟踪框X可以为跟踪框B或者跟踪框C),然后,可以基于跟踪算法,根据跟踪框X在输入图像13-输入图像20中对目标对象进行跟踪,以此类推,在每次基于检测算法得到跟踪框后,就可以将检测算法得到的跟踪框与跟踪算法得到的跟踪框进行融合,得到准确的跟踪框,并基于跟踪算法,根据该跟踪框在输入图像中对目标对象进行跟踪。
在这个实现方式中,为了对目标对象进行跟踪,可以考虑检测算法的检测结果(如跟踪框B)和跟踪算法的跟踪结果(如跟踪框C),即在输入图像12-输入图像20中对目标对象进行跟踪时,可以考虑输入图像的跟踪结果,即将跟踪框B和跟踪框C进行融合,并根据融合结果在输入图像中对目标对象进行跟踪,也就是说,目标对象的跟踪过程与输入图像的跟踪结果有关。
以下结合具体实施例,对上述第二种实现方式进行说明。具体的,本实 施例中,可以利用第一输入图像中的第一跟踪框(基于检测算法在第一输入图像中得到的跟踪框,如上述跟踪框B)和第二输入图像中的第二跟踪框(基于跟踪算法在第二输入图像中得到的跟踪框,如上述跟踪框C)确定目标对象的目标跟踪框;然后,可以基于跟踪算法,根据该目标跟踪框对目标对象进行跟踪,也就是说,基于该目标跟踪框执行步骤103,对此不再赘述。
在一个例子中,利用第一输入图像中的第一跟踪框和第二输入图像中的第二跟踪框确定目标对象的目标跟踪框,可以包括但不限于:计算该第一跟踪框与该第二跟踪框的重合程度(即交集并集比IoU,如第一跟踪框与第二跟踪框的交集,除以第一跟踪框与第二跟踪框的并集);若该重合程度大于或者等于预设阈值,则可以将该第二跟踪框确定为该目标跟踪框;或者,若该重合程度小于所述预设阈值,则可以将该第一跟踪框确定为该目标跟踪框。
其中,当第一跟踪框与第二跟踪框的重合程度大于或者等于预设阈值时,则说明跟踪算法的跟踪结果没有偏移,维持当前的跟踪目标不变,即将第二跟踪框确定为目标跟踪框,继续根据第二跟踪框进行跟踪;当第一跟踪框与第二跟踪框的重合程度小于预设阈值时,则说明跟踪算法的跟踪结果发生偏移,或者新增人脸,因此消除当前的跟踪目标,或者更新跟踪目标为新增人脸,即将第一跟踪框确定为目标跟踪框,重新根据第一跟踪框进行跟踪。
实施例5:
在上述实施例中,检测算法和跟踪算法还可以通过不同的线程实现,例如,可以通过第一线程实现检测算法,并通过第二线程实现跟踪算法。
例如,在得到输入图像序列1后,第一线程可以基于检测算法,对输入图像序列1中的输入图像1进行检测,得到包括目标对象的跟踪框A,且第一线程停止对输入图像序列1中的输入图像2-输入图像10进行检测。
在得到输入图像序列2后,第一线程可以基于检测算法,对输入图像序列2中的输入图像11进行检测,得到包括目标对象的跟踪框B,且第一线程停止对输入图像序列2中的输入图像12-输入图像20进行检测,以此类推。
进一步的,第一线程对输入图像序列1中的输入图像1进行检测,得到 包括目标对象的跟踪框A后,第一线程可以向第二线程输出目标对象的跟踪框A,使得第二线程根据目标对象的跟踪框A在输入图像中对目标对象进行跟踪。第一线程对输入图像序列2中的输入图像12进行检测,得到包括目标对象的跟踪框B后,第一线程可以向第二线程输出目标对象的跟踪框B,使得第二线程根据目标对象的跟踪框B在输入图像中对目标对象进行跟踪。
第一线程在得到包括目标对象的跟踪框A后,还可以触发启动第二线程;在第二线程启动后,第二线程可以根据目标对象的跟踪框A在输入图像2-输入图像10中对目标对象进行跟踪。然后,第二线程可以根据目标对象的跟踪框B在输入图像12-输入图像20中对目标对象进行跟踪,以此类推。
参见图3A和图3B所示,为检测算法和跟踪算法的状态机及同步机制示意图,检测算法和跟踪算法置于不同线程中,维护自身的状态机以实现状态切换,并通过共享内存实现状态同步。检测算法用于从当前输入图像中定位人脸的方框坐标(即上述跟踪框),跟踪算法负责跟踪已检测出来的人脸方框。
其中,在检测算法开启、且当前输入图像是输入图像序列中的第一帧输入图像时,则可以通过第一线程将第一状态机(即检测算法的状态机)设置为启动状态;在第一状态机为启动状态时,则可以通过第一线程对输入图像进行检测。此外,在检测算法开启、且当前输入图像不是输入图像序列中的第一帧输入图像时,则可以通过第一线程将第一状态机设置为空闲状态;在第一状态机为空闲状态时,则可以停止通过第一线程对输入图像进行检测。此外,在检测算法关闭时,则通过第一线程将第一状态机设置为关闭状态;在第一状态机为关闭状态时,则可以停止通过第一线程对输入图像进行检测。
进一步的,在跟踪算法开启时,则可以通过第二线程将第二状态机(即跟踪算法的状态机)设置为启动状态;在所述第二状态机为启动状态时,则可以通过第二线程对输入图像进行跟踪。此外,在跟踪算法关闭时,则可以通过第二线程将所述第二状态机设置为关闭状态;在所述第二状态机为关闭状态时,则可以停止通过第二线程对输入图像进行跟踪。
基于上述各实施例,本发明实施例中,可以提高人脸检测的准确性和可 靠性,降低网络复杂度和计算量,达到实时检测效果,实现多人脸检测,可以减少读写开销和CPU开,不需要频繁调用检测算法,从而降低网络调用频次,解决了功耗过高的问题,避免完全依赖检测算法时的实时性低问题。
上述方式是检测算法和跟踪算法融合的多人脸快速检测方式,可以达到实时的多人脸检测效果,能够快速进行人脸检测,达到数百帧/秒的检测速度。
在上述方式中,采用MTCNN检测算法对人脸进行检测,可以提高人脸检测的准确性和鲁棒性,并降低网络复杂度和计算量,减少读写开销和CPU开销,降低网络调用频次,降低功耗。而且,可以对网络参数和运算过程进行定点化的转换,并保证定点化网络精度。通过对MTCNN检测算法进行精简,定点化处理和优化,使得网络复杂度降低,计算量减小,网络运算全部转换为定点运算,并保留较好的精度,从而可以在嵌入式设备上运行。
在上述方式中,引入了内存和CPU开销小的STC跟踪算法,并与检测算法相融合,使STC跟踪算法执行大部分人脸检测,从而解决完全依赖检测算法导致的实时性低问题,由于不需要频繁调用检测算法,从而解决了功耗过高的问题。由于加入STC跟踪算法,检测算法只是起到修正作用,不用频繁调用,所以嵌入式设备上的功耗得到控制。由于融合了STC跟踪算法的跟踪结果和检测算法的检测结果,因此,STC跟踪算法的漂移问题得到了控制。
实施例6:
基于与上述方法同样的构思,参见图4所示,本发明实施例中还提供一种跟踪控制设备40,包括存储器41和处理器42(如一个或多个处理器)。
在一个例子中,所述存储器,用于存储程序代码;所述处理器,用于调用所述程序代码,当程序代码被执行时,用于执行以下操作:获取输入图像序列;基于检测算法,对所述输入图像序列中的一帧输入图像进行检测,得到包括目标对象的跟踪框;基于跟踪算法,根据所述目标对象的跟踪框在所述一帧输入图像后面的多帧输入图像中对所述目标对象进行跟踪。
所述处理器通过第一线程实现所述检测算法;
所述处理器通过第二线程实现所述跟踪算法。
所述处理器根据所述目标对象的跟踪框在所述一帧输入图像后面的多帧输入图像中对所述目标对象进行跟踪时具体用于:通过所述第一线程向所述第二线程输出所述目标对象的跟踪框;通过所述第二线程根据所述目标对象的跟踪框在所述一帧输入图像后面的多帧输入图像中对目标对象进行跟踪。
在一个例子中,所述处理器对所述输入图像序列中的一帧输入图像进行检测,得到包括目标对象的跟踪框之后还用于:通过第一线程停止对所述一帧输入图像后面的多帧输入图像进行检测。
在一个例子中,所述处理器根据所述目标对象的跟踪框在所述一帧输入图像后面的多帧输入图像中对所述目标对象进行跟踪时具体用于:
在通过所述第一线程得到包括所述目标对象的跟踪框后,启动所述第二线程;在所述第二线程启动后,通过所述第二线程根据所述目标对象的跟踪框在所述一帧输入图像后面的多帧输入图像中对所述目标对象进行跟踪。
所述处理器还用于:在检测算法开启、且当前输入图像是所述输入图像序列中的第一帧输入图像时,通过第一线程将所述第一状态机设置为启动状态;在第一状态机为启动状态时,通过所述第一线程对输入图像进行检测;
在检测算法开启、且当前输入图像不是所述输入图像序列中的第一帧输入图像时,通过第一线程将所述第一状态机设置为空闲状态;在第一状态机为空闲状态时,停止通过所述第一线程对输入图像进行检测;
在检测算法关闭时,通过第一线程将第一状态机设置为关闭状态;在第一状态机为关闭状态时,停止通过所述第一线程对输入图像进行检测。
所述处理器还用于:在跟踪算法开启时,通过第二线程将第二状态机设置为启动状态;在第二状态机为启动状态时,通过所述第二线程对输入图像进行跟踪;在跟踪算法关闭时,通过第二线程将第二状态机设置为关闭状态;在第二状态机为关闭状态时,停止通过所述第二线程对输入图像进行跟踪。
所述处理器基于检测算法,对所述输入图像序列中的一帧输入图像进行检测,得到包括目标对象的跟踪框时具体用于:通过特定CNN检测算法对所述输入图像序列中的一帧输入图像进行检测,得到包括目标对象的跟踪框;
其中,所述特定CNN检测算法包括弱分类器。
所述处理器通过特定CNN检测算法对所述输入图像序列中的一帧输入图像进行检测,得到包括目标对象的跟踪框时具体用于:针对输入到所述特定CNN检测算法的弱分类器的跟踪框,则通过所述弱分类器检测所述跟踪框是否符合过滤策略;如果否,则将所述跟踪框输出给所述特定CNN检测算法的下一级网络。所述处理器通过所述弱分类器检测所述跟踪框是否符合过滤策略之后还用于:如果符合过滤策略,则过滤所述跟踪框。
所述处理器通过特定CNN检测算法对所述输入图像序列中的一帧输入图像进行检测,得到包括目标对象的跟踪框时具体用于:
将输入图像和网络参数转换为定点数据,通过所述特定CNN检测算法利用转换后的定点数据进行处理,得到包括所述目标对象的跟踪框。
在一个例子中,所述特定CNN检测算法通过定点网络实现,所述定点网络中的输入图像和网络参数均为定点数据;
所述处理器通过特定CNN检测算法对所述输入图像序列中的一帧输入图像进行检测,得到包括目标对象的跟踪框时具体用于:通过所述特定CNN检测算法利用定点数据进行处理,得到包括所述目标对象的跟踪框。
所述处理器通过特定CNN检测算法对所述输入图像序列中的一帧输入图像进行检测,得到包括目标对象的跟踪框之前还用于:对所述输入图像序列中的一帧输入图像进行预处理,得到预处理后的输入图像;通过特定CNN检测算法对预处理后的输入图像进行处理,得到包括所述目标对象的跟踪框。
所述处理器通过特定CNN检测算法对所述输入图像序列中的一帧输入图像进行检测,得到包括目标对象的跟踪框时具体用于:利用时域信息预测目标对象的参考区域;通过特定CNN检测算法对所述输入图像序列中的一帧输入图像中的参考区域进行检测,得到包括目标对象的跟踪框。
所述处理器基于跟踪算法,根据所述目标对象的跟踪框在所述一帧输入图像后面的多帧输入图像中对所述目标对象进行跟踪时具体用于:获取基于上一帧输入图像得到的所述目标对象的跟踪框以及所述目标对象的空间上下 文模型,所述空间上下文模型用于指示在所述上一帧输入图像中所述目标对象与周围图像区域的空间相关性;基于所述空间上下文模型,在当前帧输入图像中对应所述跟踪框的位置处以及周围区域中确定所述目标对象。
所述处理器基于跟踪算法,根据所述目标对象的跟踪框在所述一帧输入图像后面的多帧输入图像中对所述目标对象进行跟踪时具体用于:
通过卡尔曼滤波预测目标对象的参考区域;基于跟踪算法,根据所述目标对象的跟踪框在所述一帧输入图像后面的多帧输入图像中的所述参考区域,对所述目标对象进行跟踪。
所述处理器基于跟踪算法,根据所述目标对象的跟踪框在所述一帧输入图像后面的多帧输入图像中对所述目标对象进行跟踪之后还用于:
利用第一输入图像中的第一跟踪框和第二输入图像中的第二跟踪框确定所述目标对象的目标跟踪框;其中,所述第一跟踪框是基于检测算法在所述第一输入图像中得到的包括目标对象的跟踪框;所述第二跟踪框是基于跟踪算法在所述第二输入图像中对所述目标对象进行跟踪时得到的跟踪框;
基于跟踪算法,根据所述目标跟踪框对所述目标对象进行跟踪。
所述处理器利用第一输入图像中的第一跟踪框和第二输入图像中的第二跟踪框确定所述目标对象的目标跟踪框时具体用于:计算所述第一跟踪框与所述第二跟踪框的重合程度;根据所述重合程度确定所述目标对象的目标跟踪框。所述处理器根据所述重合程度确定所述目标对象的目标跟踪框时具体用于:若所述重合程度大于或者等于预设阈值,则将所述第二跟踪框确定为所述目标对象的目标跟踪框;或者,若所述重合程度小于所述预设阈值,则将所述第一跟踪框确定为所述目标对象的目标跟踪框。
实施例7:基于与上述方法同样的发明构思,本发明实施例还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机指令,所述计算机指令被执行时,执行上述的跟踪控制方法,参见上述各实施例所示。
上述实施例阐明的系统、装置、模块或单元,可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机, 计算机的具体形式可以是个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件收发设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任意几种设备的组合。
为了描述的方便,描述以上装置时以功能分为各种单元分别描述。当然,在实施本发明时可以把各单元的功能在同一个或多个软件和/或硬件中实现。
本领域内的技术人员应明白,本发明实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可以由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其它可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其它可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
而且,这些计算机程序指令也可以存储在能引导计算机或其它可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或者多个流程和/或方框图一个方框或者多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其它可编程数据处理设备,使得在计算机或者其它可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其它可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
以上所述仅为本发明实施例而已,并不用于限制本发明。对于本领域技 术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原理之内所作的任何修改、等同替换、改进,均应包含在本发明的权利要求范围之内。
Claims (45)
- 一种跟踪控制方法,其特征在于,所述方法包括:获取输入图像序列;基于检测算法,对所述输入图像序列中的一帧输入图像进行检测,得到包括目标对象的跟踪框;基于跟踪算法,根据所述目标对象的跟踪框在所述一帧输入图像后面的多帧输入图像中对所述目标对象进行跟踪。
- 根据权利要求1所述的方法,其特征在于,通过第一线程实现所述检测算法;通过第二线程实现所述跟踪算法。
- 根据权利要求2所述的方法,其特征在于,所述根据所述目标对象的跟踪框在所述一帧输入图像后面的多帧输入图像中对所述目标对象进行跟踪,包括:通过所述第一线程向所述第二线程输出所述目标对象的跟踪框;通过所述第二线程根据所述目标对象的跟踪框在所述一帧输入图像后面的多帧输入图像中对所述目标对象进行跟踪。
- 根据权利要求2所述的方法,其特征在于,对所述输入图像序列中的一帧输入图像进行检测,得到包括目标对象的跟踪框之后,所述方法还包括:通过第一线程停止对所述一帧输入图像后面的多帧输入图像进行检测。
- 根据权利要求2所述的方法,其特征在于,所述根据所述目标对象的跟踪框在所述一帧输入图像后面的多帧输入图像中对所述目标对象进行跟踪,包括:在通过所述第一线程得到包括所述目标对象的跟踪框后,启动所述第二线程;在所述第二线程启动后,通过所述第二线程根据所述目标对象的跟踪框在所述一帧输入图像后面的多帧输入图像中对所述目标对象进行跟踪。
- 根据权利要求2所述的方法,其特征在于,在检测算法开启、且当前输入图像是所述输入图像序列中的第一帧输入图像时,通过第一线程将所述第一状态机设置为启动状态;在第一状态机为 启动状态时,通过所述第一线程对输入图像进行检测;在检测算法开启、且当前输入图像不是所述输入图像序列中的第一帧输入图像时,通过第一线程将所述第一状态机设置为空闲状态;在第一状态机为空闲状态时,停止通过所述第一线程对输入图像进行检测;在检测算法关闭时,通过第一线程将第一状态机设置为关闭状态;在第一状态机为关闭状态时,停止通过所述第一线程对输入图像进行检测。
- 根据权利要求2所述的方法,其特征在于,在跟踪算法开启时,通过第二线程将第二状态机设置为启动状态;在第二状态机为启动状态时,通过所述第二线程对输入图像进行跟踪;在跟踪算法关闭时,通过第二线程将第二状态机设置为关闭状态;在第二状态机为关闭状态时,停止通过所述第二线程对输入图像进行跟踪。
- 根据权利要求1所述的方法,其特征在于,基于检测算法,对所述输入图像序列中的一帧输入图像进行检测,得到包括目标对象的跟踪框,包括:通过特定CNN检测算法对所述输入图像序列中的一帧输入图像进行检测,得到包括目标对象的跟踪框;其中,所述特定CNN检测算法包括弱分类器。
- 根据权利要求8所述的方法,其特征在于,所述特定CNN检测算法是包括pnet和rnet,但不包括onet的MTCNN检测算法。
- 根据权利要求8所述的方法,其特征在于,所述通过特定CNN检测算法对所述输入图像序列中的一帧输入图像进行检测,得到包括目标对象的跟踪框,包括:针对输入到所述特定CNN检测算法的弱分类器的跟踪框,则通过所述弱分类器检测所述跟踪框是否符合过滤策略;如果否,则将所述跟踪框输出给所述特定CNN检测算法的下一级网络。
- 根据权利要求8所述的方法,其特征在于,所述通过所述弱分类器检测所述跟踪框是否符合过滤策略之后,所述方法还包括:如果符合过滤策略,则过滤所述跟踪框。
- 根据权利要求8-11任一项所述的方法,其特征在于,所述特定CNN检测算法包括至少一个弱分类器,不同弱分类器具有相同或不同的过滤策略;其中,所述弱分类器部署在所述特定CNN检测算法的任意一级网络;所述过滤策略具体包括:形态学过滤策略,和/或,肤色过滤策略。
- 根据权利要求8所述的方法,其特征在于,所述通过特定CNN检测算法对所述输入图像序列中的一帧输入图像进行检测,得到包括目标对象的跟踪框,包括:将输入图像和网络参数转换为定点数据,通过所述特定CNN检测算法利用转换后的定点数据进行处理,得到包括所述目标对象的跟踪框。
- 根据权利要求8所述的方法,其特征在于,所述特定CNN检测算法通过定点网络实现,所述定点网络中的输入图像和网络参数均为定点数据;所述通过特定CNN检测算法对所述输入图像序列中的一帧输入图像进行检测,得到包括目标对象的跟踪框,包括:通过所述特定CNN检测算法利用定点数据进行处理,得到包括所述目标对象的跟踪框。
- 根据权利要求8所述的方法,其特征在于,所述通过特定CNN检测算法对所述输入图像序列中的一帧输入图像进行检测,得到包括目标对象的跟踪框之前,所述方法还包括:对所述输入图像序列中的一帧输入图像进行预处理,得到预处理后的输入图像;通过特定CNN检测算法对预处理后的输入图像进行处理,得到包括所述目标对象的跟踪框。
- 根据权利要求15所述的方法,其特征在于,所述预处理包括:压缩感知处理;和/或,肤色检测处理。
- 根据权利要求8所述的方法,其特征在于,所述通过特定CNN检测算法对所述输入图像序列中的一帧输入图像进行检测,得到包括目标对象的跟踪框,包括:利用时域信息预测目标对象的参考区域;通过特定CNN检测算法对所述输入图像序列中的一帧输入图像中的参考 区域进行检测,得到包括目标对象的跟踪框。
- 根据权利要求1所述的方法,其特征在于,所述基于跟踪算法,根据所述目标对象的跟踪框在所述一帧输入图像后面的多帧输入图像中对所述目标对象进行跟踪,包括:获取基于上一帧输入图像得到的所述目标对象的跟踪框以及所述目标对象的空间上下文模型,所述空间上下文模型用于指示在所述上一帧输入图像中所述目标对象与周围图像区域的空间相关性;基于所述空间上下文模型,在当前帧输入图像中对应所述跟踪框的位置处以及周围区域中确定所述目标对象。
- 根据权利要求18所述的方法,其特征在于,所述空间上下文模型包括以下之一或者任意组合:灰度特征、hog特征、矩特征、sift特征。
- 根据权利要求1所述的方法,其特征在于,所述基于跟踪算法,根据所述目标对象的跟踪框在所述一帧输入图像后面的多帧输入图像中对所述目标对象进行跟踪,包括:通过卡尔曼滤波预测目标对象的参考区域;基于跟踪算法,根据所述目标对象的跟踪框在所述一帧输入图像后面的多帧输入图像中的所述参考区域,对所述目标对象进行跟踪。
- 根据权利要求1所述的方法,其特征在于,所述基于跟踪算法,根据所述目标对象的跟踪框在所述一帧输入图像后面的多帧输入图像中对所述目标对象进行跟踪之后,所述方法还包括:利用第一输入图像中的第一跟踪框和第二输入图像中的第二跟踪框确定所述目标对象的目标跟踪框;其中,所述第一跟踪框是基于检测算法在所述第一输入图像中得到的包括目标对象的跟踪框;所述第二跟踪框是基于跟踪算法在所述第二输入图像中对所述目标对象进行跟踪时得到的跟踪框;基于跟踪算法,根据所述目标跟踪框对所述目标对象进行跟踪。
- 根据权利要求21所述的方法,其特征在于,所述利用第一输入图像中的第一跟踪框和第二输入图像中的第二跟踪框 确定所述目标对象的目标跟踪框,包括:计算所述第一跟踪框与所述第二跟踪框的重合程度;根据所述重合程度确定所述目标对象的目标跟踪框。
- 根据权利要求22所述的方法,其特征在于,所述根据所述重合程度确定所述目标对象的目标跟踪框,包括:若所述重合程度大于或者等于预设阈值,则将所述第二跟踪框确定为所述目标对象的目标跟踪框;或者,若所述重合程度小于所述预设阈值,则将所述第一跟踪框确定为所述目标对象的目标跟踪框。
- 根据权利要求22或23所述的方法,其特征在于,所述重合程度包括:所述第一跟踪框与所述第二跟踪框之间的交集并集比IoU。
- 根据权利要求1所述的方法,其特征在于,所述输入图像包括的目标对象为至少一个,所述目标对象包括人脸。
- 一种跟踪控制设备,其特征在于,包括:存储器和处理器;所述存储器,用于存储程序代码;所述处理器,用于调用所述程序代码,当程序代码被执行时,用于执行以下操作:获取输入图像序列;基于检测算法,对所述输入图像序列中的一帧输入图像进行检测,得到包括目标对象的跟踪框;基于跟踪算法,根据所述目标对象的跟踪框在所述一帧输入图像后面的多帧输入图像中对所述目标对象进行跟踪。
- 根据权利要求26所述的设备,其特征在于,所述处理器通过第一线程实现所述检测算法;所述处理器通过第二线程实现所述跟踪算法。
- 根据权利要求27所述的设备,其特征在于,所述处理器根据所述目标对象的跟踪框在所述一帧输入图像后面的多帧输入图像中对所述目标对象进行跟踪时具体用于:通过所述第一线程向所述第二线程输出所述目标对象的跟踪框;通过所述第二线程根据所述目标对象的跟踪框在所述一帧输入图 像后面的多帧输入图像中对所述目标对象进行跟踪。
- 根据权利要求27所述的设备,其特征在于,所述处理器对所述输入图像序列中的一帧输入图像进行检测,得到包括目标对象的跟踪框之后还用于:通过第一线程停止对所述一帧输入图像后面的多帧输入图像进行检测。
- 根据权利要求27所述的设备,其特征在于,所述处理器根据所述目标对象的跟踪框在所述一帧输入图像后面的多帧输入图像中对所述目标对象进行跟踪时具体用于:在通过所述第一线程得到包括所述目标对象的跟踪框后,启动所述第二线程;在所述第二线程启动后,通过所述第二线程根据所述目标对象的跟踪框在所述一帧输入图像后面的多帧输入图像中对所述目标对象进行跟踪。
- 根据权利要求27所述的设备,其特征在于,所述处理器还用于:在检测算法开启、且当前输入图像是所述输入图像序列中的第一帧输入图像时,通过第一线程将所述第一状态机设置为启动状态;在第一状态机为启动状态时,通过所述第一线程对输入图像进行检测;在检测算法开启、且当前输入图像不是所述输入图像序列中的第一帧输入图像时,通过第一线程将所述第一状态机设置为空闲状态;在第一状态机为空闲状态时,停止通过所述第一线程对输入图像进行检测;在检测算法关闭时,通过第一线程将第一状态机设置为关闭状态;在第一状态机为关闭状态时,停止通过所述第一线程对输入图像进行检测。
- 根据权利要求27所述的设备,其特征在于,所述处理器还用于:在跟踪算法开启时,通过第二线程将第二状态机设置为启动状态;在第二状态机为启动状态时,通过所述第二线程对输入图像进行跟踪;在跟踪算法关闭时,通过第二线程将第二状态机设置为关闭状态;在第二状态机为关闭状态时,停止通过所述第二线程对输入图像进行跟踪。
- 根据权利要求26所述的设备,其特征在于,所述处理器基于检测算法,对所述输入图像序列中的一帧输入图像进行检测,得到包括目标对象的跟踪框时具体用于:通过特定CNN检测算法对所 述输入图像序列中的一帧输入图像进行检测,得到包括目标对象的跟踪框;其中,所述特定CNN检测算法包括弱分类器。
- 根据权利要求33所述的设备,其特征在于,所述处理器通过特定CNN检测算法对所述输入图像序列中的一帧输入图像进行检测,得到包括目标对象的跟踪框时具体用于:针对输入到所述特定CNN检测算法的弱分类器的跟踪框,则通过所述弱分类器检测所述跟踪框是否符合过滤策略;如果否,则将所述跟踪框输出给所述特定CNN检测算法的下一级网络。
- 根据权利要求33所述的设备,其特征在于,所述处理器通过所述弱分类器检测所述跟踪框是否符合过滤策略之后还用于:如果符合过滤策略,则过滤所述跟踪框。
- 根据权利要求33所述的设备,其特征在于,所述处理器通过特定CNN检测算法对所述输入图像序列中的一帧输入图像进行检测,得到包括目标对象的跟踪框时具体用于:将输入图像和网络参数转换为定点数据,通过所述特定CNN检测算法利用转换后的定点数据进行处理,得到包括所述目标对象的跟踪框。
- 根据权利要求33所述的设备,其特征在于,所述特定CNN检测算法通过定点网络实现,所述定点网络中的输入图像和网络参数均为定点数据;所述处理器通过特定CNN检测算法对所述输入图像序列中的一帧输入图像进行检测,得到包括目标对象的跟踪框时具体用于:通过所述特定CNN检测算法利用定点数据进行处理,得到包括所述目标对象的跟踪框。
- 根据权利要求33所述的设备,其特征在于,所述处理器通过特定CNN检测算法对所述输入图像序列中的一帧输入图像进行检测,得到包括目标对象的跟踪框之前还用于:对所述输入图像序列中的一帧输入图像进行预处理,得到预处理后的输入图像;通过特定CNN检测算法对预处理后的输入图像进行处理,得到包括所述目标对象的跟踪框。
- 根据权利要求33所述的设备,其特征在于,所述处理器通过特定CNN检测算法对所述输入图像序列中的一帧输入图 像进行检测,得到包括目标对象的跟踪框时具体用于:利用时域信息预测目标对象的参考区域;通过特定CNN检测算法对所述输入图像序列中的一帧输入图像中的参考区域进行检测,得到包括目标对象的跟踪框。
- 根据权利要求26所述的设备,其特征在于,所述处理器基于跟踪算法,根据所述目标对象的跟踪框在所述一帧输入图像后面的多帧输入图像中对所述目标对象进行跟踪时具体用于:获取基于上一帧输入图像得到的所述目标对象的跟踪框以及所述目标对象的空间上下文模型,所述空间上下文模型用于指示在所述上一帧输入图像中所述目标对象与周围图像区域的空间相关性;基于所述空间上下文模型,在当前帧输入图像中对应所述跟踪框的位置处以及周围区域中确定所述目标对象。
- 根据权利要求26所述的设备,其特征在于,所述处理器基于跟踪算法,根据所述目标对象的跟踪框在所述一帧输入图像后面的多帧输入图像中对所述目标对象进行跟踪时具体用于:通过卡尔曼滤波预测目标对象的参考区域;基于跟踪算法,根据所述目标对象的跟踪框在所述一帧输入图像后面的多帧输入图像中的所述参考区域,对所述目标对象进行跟踪。
- 根据权利要求26所述的设备,其特征在于,所述处理器基于跟踪算法,根据所述目标对象的跟踪框在所述一帧输入图像后面的多帧输入图像中对所述目标对象进行跟踪之后还用于:利用第一输入图像中的第一跟踪框和第二输入图像中的第二跟踪框确定所述目标对象的目标跟踪框;其中,所述第一跟踪框是基于检测算法在所述第一输入图像中得到的包括目标对象的跟踪框;所述第二跟踪框是基于跟踪算法在所述第二输入图像中对所述目标对象进行跟踪时得到的跟踪框;基于跟踪算法,根据所述目标跟踪框对所述目标对象进行跟踪。
- 根据权利要求42所述的设备,其特征在于,所述处理器利用第一输入图像中的第一跟踪框和第二输入图像中的第二跟踪框确定所述目标对象的目标跟踪框时具体用于:计算所述第一跟踪框与所述第二跟踪框的重合程度;根据所述重合程度确定所述目标对象的目标跟踪框。
- 根据权利要求42所述的设备,其特征在于,所述处理器根据所述重合程度确定所述目标对象的目标跟踪框时具体用于:若所述重合程度大于或者等于预设阈值,则将所述第二跟踪框确定为所述目标对象的目标跟踪框;或者,若所述重合程度小于所述预设阈值,则将所述第一跟踪框确定为所述目标对象的目标跟踪框。
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机指令,所述计算机指令被执行时,实现权利要求1-25任一项所述的跟踪控制方法。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201880039294.5A CN110799984A (zh) | 2018-07-27 | 2018-07-27 | 跟踪控制方法、设备、计算机可读存储介质 |
PCT/CN2018/097667 WO2020019353A1 (zh) | 2018-07-27 | 2018-07-27 | 跟踪控制方法、设备、计算机可读存储介质 |
US17/158,713 US20210150254A1 (en) | 2018-07-27 | 2021-01-26 | Tracking control method, apparatus, and computer-readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2018/097667 WO2020019353A1 (zh) | 2018-07-27 | 2018-07-27 | 跟踪控制方法、设备、计算机可读存储介质 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/158,713 Continuation US20210150254A1 (en) | 2018-07-27 | 2021-01-26 | Tracking control method, apparatus, and computer-readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020019353A1 true WO2020019353A1 (zh) | 2020-01-30 |
Family
ID=69181220
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/097667 WO2020019353A1 (zh) | 2018-07-27 | 2018-07-27 | 跟踪控制方法、设备、计算机可读存储介质 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20210150254A1 (zh) |
CN (1) | CN110799984A (zh) |
WO (1) | WO2020019353A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111311641A (zh) * | 2020-02-25 | 2020-06-19 | 重庆邮电大学 | 一种无人机目标跟踪控制方法 |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111866573B (zh) * | 2020-07-29 | 2022-03-11 | 腾讯科技(深圳)有限公司 | 视频的播放方法、装置、电子设备及存储介质 |
CN113706576A (zh) * | 2021-03-17 | 2021-11-26 | 腾讯科技(深圳)有限公司 | 检测跟踪方法、装置、设备及介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101986348A (zh) * | 2010-11-09 | 2011-03-16 | 上海电机学院 | 一种视觉目标识别与跟踪方法 |
CN104866805A (zh) * | 2014-02-20 | 2015-08-26 | 腾讯科技(深圳)有限公司 | 人脸实时跟踪的方法和装置 |
CN105701840A (zh) * | 2015-12-31 | 2016-06-22 | 上海极链网络科技有限公司 | 视频中多物体实时追踪系统及其实现方法 |
CN107688785A (zh) * | 2017-08-28 | 2018-02-13 | 西安电子科技大学 | 基于arm平台的双线程并行实时人脸检测的开发方法 |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9349066B2 (en) * | 2012-01-06 | 2016-05-24 | Qualcomm Incorporated | Object tracking and processing |
CN106803263A (zh) * | 2016-11-29 | 2017-06-06 | 深圳云天励飞技术有限公司 | 一种目标跟踪方法及装置 |
-
2018
- 2018-07-27 WO PCT/CN2018/097667 patent/WO2020019353A1/zh active Application Filing
- 2018-07-27 CN CN201880039294.5A patent/CN110799984A/zh active Pending
-
2021
- 2021-01-26 US US17/158,713 patent/US20210150254A1/en not_active Abandoned
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101986348A (zh) * | 2010-11-09 | 2011-03-16 | 上海电机学院 | 一种视觉目标识别与跟踪方法 |
CN104866805A (zh) * | 2014-02-20 | 2015-08-26 | 腾讯科技(深圳)有限公司 | 人脸实时跟踪的方法和装置 |
CN105701840A (zh) * | 2015-12-31 | 2016-06-22 | 上海极链网络科技有限公司 | 视频中多物体实时追踪系统及其实现方法 |
CN107688785A (zh) * | 2017-08-28 | 2018-02-13 | 西安电子科技大学 | 基于arm平台的双线程并行实时人脸检测的开发方法 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111311641A (zh) * | 2020-02-25 | 2020-06-19 | 重庆邮电大学 | 一种无人机目标跟踪控制方法 |
CN111311641B (zh) * | 2020-02-25 | 2023-06-09 | 重庆邮电大学 | 一种无人机目标跟踪控制方法 |
Also Published As
Publication number | Publication date |
---|---|
CN110799984A (zh) | 2020-02-14 |
US20210150254A1 (en) | 2021-05-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20220417590A1 (en) | Electronic device, contents searching system and searching method thereof | |
US10510157B2 (en) | Method and apparatus for real-time face-tracking and face-pose-selection on embedded vision systems | |
CN113302620B (zh) | 使用机器学习模型确定对象与人之间的关联 | |
AU2016352215B2 (en) | Method and device for tracking location of human face, and electronic equipment | |
US11836931B2 (en) | Target detection method, apparatus and device for continuous images, and storage medium | |
CN111311634B (zh) | 一种人脸图像检测方法、装置及设备 | |
KR20170014491A (ko) | 움직임 인식 방법 및 움직임 인식 장치 | |
US20210150254A1 (en) | Tracking control method, apparatus, and computer-readable storage medium | |
US12002221B2 (en) | Control method and device for mobile platform, and computer readable storage medium | |
US20200082544A1 (en) | Computer vision processing | |
CN110533694A (zh) | 图像处理方法、装置、终端及存储介质 | |
US11394870B2 (en) | Main subject determining apparatus, image capturing apparatus, main subject determining method, and storage medium | |
US11381743B1 (en) | Region of interest capture for electronic devices | |
CN109685797A (zh) | 骨骼点检测方法、装置、处理设备及存储介质 | |
WO2020062546A1 (zh) | 目标跟踪处理方法、电子设备 | |
US12100163B2 (en) | Object tracking method and object tracking apparatus | |
Le et al. | Human detection and tracking for autonomous human-following quadcopter | |
US10074187B2 (en) | Image recognition system and semiconductor integrated circuit | |
US20230115371A1 (en) | Efficient vision perception | |
CN112541418B (zh) | 用于图像处理的方法、装置、设备、介质和程序产品 | |
Agrawal et al. | Single shot multitask pedestrian detection and behavior prediction | |
Mohamed et al. | Real-time moving objects tracking for mobile-robots using motion information | |
Yan | Using the Improved SSD Algorithm to Motion Target Detection and Tracking | |
JP2012242969A (ja) | データ処理装置、データ処理装置の制御方法、およびプログラム | |
WO2023106103A1 (ja) | 画像処理装置およびその制御方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18927303 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18927303 Country of ref document: EP Kind code of ref document: A1 |