WO2023221770A1 - 一种动态目标分析方法、装置、设备及存储介质 - Google Patents

一种动态目标分析方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2023221770A1
WO2023221770A1 PCT/CN2023/091884 CN2023091884W WO2023221770A1 WO 2023221770 A1 WO2023221770 A1 WO 2023221770A1 CN 2023091884 W CN2023091884 W CN 2023091884W WO 2023221770 A1 WO2023221770 A1 WO 2023221770A1
Authority
WO
WIPO (PCT)
Prior art keywords
detection
target
detection frame
preset
category
Prior art date
Application number
PCT/CN2023/091884
Other languages
English (en)
French (fr)
Inventor
祖春山
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Publication of WO2023221770A1 publication Critical patent/WO2023221770A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Definitions

  • the present invention relates to the technical field of target detection, and in particular to a dynamic target analysis method, device, equipment and storage medium.
  • Target detection is an important image analysis method that can analyze the location and other related information of the target from the image. For example, the target detection frame that represents the target location, the classification of the target, etc.
  • the present invention provides a dynamic target analysis method, device, equipment and storage medium to solve the deficiencies in related technologies.
  • a dynamic target analysis method which includes: performing target detection on video data to be detected, and obtaining a target detection frame for each frame of image; based on the obtained detection frame, using preset target tracking Algorithm to determine one or more detection frame sets; each detection frame set contains detection frames that belong to the same target between different images, and different detection frame sets correspond to different targets; for each detection frame set, the detection frame Contains images to determine the image quality; the image quality is positively related to the classification accuracy of the target corresponding to the detection frame set; filter out the detection frame set whose image quality is greater than the preset quality; based on the detection frame in each filtered detection frame set The included images determine the category of the target corresponding to the detection frame set.
  • determining the category of the target corresponding to the detection frame set based on the images contained in the detection frames in each filtered detection frame set includes: based on the images contained in the detection frames in each filtered detection frame set. image, extract the characteristics of the target, compare the characteristics of the target with the characteristics in the preset category feature set, and determine the category corresponding to the feature whose similarity is greater than the preset similarity threshold to the category of the target corresponding to the detection frame set.
  • the target detection for the video data to be detected includes: detecting the first category of the target in the video data to be detected; the preset classification feature set includes: features of the second category; the second category The category is a subdivision category of the first category; the method also includes: pre-setting a feature extraction model corresponding to each first category; extracting based on the images contained in the detection frames in each selected detection frame set, The characteristics of the target include: for each selected detection frame set, determining the first category of the target corresponding to the detection frame set; using the feature extraction model corresponding to the determined first category, for each selected detection frame set Extract the features of the target from the image contained in the detection frame.
  • the target detection for the video data to be detected includes: inputting the video data to be detected into a preset target detection model, and the preset target detection model is used to determine the third category of the target; the preset target The detection model includes a detection frame prediction branch corresponding to each third category; when the third category of any target is determined, the detection frame output by the detection frame prediction branch corresponding to the third category is determined as the target. Detection box.
  • performing target detection on the video data to be detected includes: inputting the video data to be detected into a preset target detection model, and the preset target detection model is used to determine the detection frame position, size and rotation angle of the target;
  • the preset target detection model includes a regression prediction branch and a classification prediction branch of the detection frame rotation angle; for any target, the outputs of the regression prediction branch and the classification prediction branch in the preset target detection model are combined to determine the rotation Angle synthesis results.
  • using a preset target tracking algorithm to determine one or more detection frame sets based on the acquired detection frame includes: determining the next frame image of the first frame image in the video data to be detected as the current frame image. image, perform the following steps in a loop: Based on the first real detection frame obtained through target detection in the previous frame image, use preset parameters to predict the detection frame in the current image to obtain the predicted detection frame; different predicted detection frames correspond to different The first real detection frame; for each second real detection frame obtained through target detection in the current image, pair it with the obtained predicted detection frame, and determine that the second real detection frame is successfully paired with a predicted detection frame.
  • the second real detection frame and the first real detection frame corresponding to the predicted detection frame belong to the same target, and are added to the set of detection frames corresponding to the target; based on the pairing result, the preset parameters are updated; when the current image does not If the next frame image exists, the loop ends; if the current image contains the next frame image, the next frame image is determined as the current image.
  • the preset parameters include parameters of Kalman filtering; and using the preset parameters to predict the detection frame in the current image includes: using Kalman filtering to predict the detection frame in the current image; For each second real detection frame obtained through target detection in the current image, pairing it with the obtained predicted detection frame includes: for the image contained in each second real detection frame obtained through target detection in the current image, And the image contained in the obtained predicted detection frame, calculate the preset detection frame image features; the computational complexity of the preset detection frame image features is less than the preset complexity threshold; use the Hungarian algorithm, according to the preset detection frame image features Similarity, as well as detection frame coincidence, pair the second real detection frame with the predicted detection frame.
  • determining the image quality for the images included in the detection frames in each detection frame set includes: inputting the images included in the detection frames in each detection frame set into a preset quality analysis model; the preset The quality analysis model is used to determine image quality; the preset quality analysis model includes a regression prediction branch and a classification prediction branch of image quality; for any detection frame set, the regression prediction branch and classification prediction in the preset quality analysis model are combined The output of the branch determines the image quality.
  • the second neural network model for determining image quality, and the third neural network model for determining the category of the target corresponding to the detection frame set is obtained through the following quantification method: for the initial neural network model trained with the first parameter accuracy, quantified into the second parameter accuracy, and an intermediate neural network model is obtained; the second parameter accuracy is lower than the first parameter accuracy; Perform the following steps cyclically until the accuracy and calculation speed of the current intermediate neural network model meet the preset quantitative requirements: determine the accuracy and calculation speed of the current intermediate neural network model; determine the difference between the initial neural network model and the current intermediate neural network model. The output error of each layer among them is selected; the layer with the largest output error is selected to improve the parameter accuracy, and a new intermediate neural network model is obtained as the current intermediate neural network model.
  • a dynamic target analysis device including: a target detection unit for performing target detection on video data to be detected and obtaining a target detection frame for each frame of image; a target tracking unit for Based on the obtained detection frame, use the preset target tracking algorithm to determine one or more detection frame sets; wherein each detection frame set contains detection frames that belong to the same target between different images, and different detection frame sets correspond to different Target; a screening unit, used to determine the image quality for the images contained in the detection frames in each detection frame set; the image quality is positively related to the classification accuracy of the target corresponding to the detection frame set; filter out the image quality greater than the preset quality A collection of detection frames; a classification unit, used to determine the category of the target corresponding to the collection of detection frames based on the images contained in the detection frames in each filtered collection of detection frames.
  • the classification unit is configured to: extract the characteristics of the target based on the images contained in the detection frames in each selected detection frame set, and compare the characteristics of the target with the characteristics in the preset category feature set. , the category corresponding to the feature whose similarity is greater than the preset similarity threshold is determined as the category of the target corresponding to the detection frame set.
  • the target detection unit is configured to: detect a first category of targets in the video data to be detected; the preset classification feature set includes: features of a second category; the second category is the The subdivision categories of the first category; the device is preset with feature extraction models corresponding to each first category; the classification unit is used to: for each selected detection frame set, determine the target corresponding to the detection frame set Category 1; utilizing identified The feature extraction model corresponding to the first category extracts the characteristics of the target for the images contained in the detection frames in each selected detection frame set.
  • the target detection unit is configured to: input the video data to be detected into a preset target detection model, and the preset target detection model is used to determine the third category of the target; the preset target detection model includes the corresponding A detection frame prediction branch for each third category; when the third category of any target is determined, the detection frame output by the detection frame prediction branch corresponding to the third category is determined as the detection frame of the target.
  • the target detection unit is used to: input the video data to be detected into a preset target detection model, and the preset target detection model is used to determine the detection frame position, size and rotation angle of the target; the preset The target detection model includes a regression prediction branch and a classification prediction branch of the detection frame rotation angle; for any target, the outputs of the regression prediction branch and the classification prediction branch in the preset target detection model are combined to determine the comprehensive result of the rotation angle.
  • the target tracking unit is configured to determine the next frame of the first frame of the image in the video data to be detected as the current image, and perform the following steps in a loop: based on the target detected in the previous frame of the image.
  • the first real detection frame uses preset parameters to predict the detection frame in the current image to obtain the predicted detection frame; different predicted detection frames correspond to different first real detection frames; for each first real detection frame in the current image obtained through target detection Two real detection frames are paired with the obtained predicted detection frame. When it is determined that the second real detection frame is successfully paired with a predicted detection frame, the first real detection frame corresponding to the second real detection frame and the predicted detection frame is determined.
  • the real detection frame belongs to the same target and is added to the set of detection frames corresponding to the target; based on the pairing result, the preset parameters are updated; when the next frame image does not exist in the current image, the loop ends; when the next frame image exists in the current image In the case of a frame image, the next frame image is determined as the current image.
  • the preset parameters include parameters of Kalman filtering;
  • the target tracking unit is used to: use Kalman filtering to predict the detection frame in the current image; for each detection frame obtained through target detection in the current image.
  • the images contained in the second real detection frame, and the obtained images contained in the predicted detection frame calculate the image features of the preset detection frame; the computational complexity of the image features of the preset detection frame is less than the preset complexity threshold; using Hungary
  • the algorithm pairs the second real detection frame with the predicted detection frame based on the similarity of the image features of the preset detection frame and the coincidence degree of the detection frame.
  • the screening unit is used to: input the images contained in the detection frames in each detection frame set into a preset quality analysis model; the preset quality analysis model is used to determine image quality; the preset quality
  • the analysis model includes a regression prediction branch and a classification prediction branch of image quality; for any detection frame set, the outputs of the regression prediction branch and the classification prediction branch in the preset quality analysis model are combined to determine the image quality.
  • the second neural network model for determining image quality, and the third neural network model for determining the category of the target corresponding to the detection frame set is obtained through the following quantification method: for the initial neural network model trained with the first parameter accuracy, quantified into the second parameter accuracy, and an intermediate neural network model is obtained; the second parameter accuracy is lower than the first parameter accuracy; Perform the following steps cyclically until the accuracy and calculation speed of the current intermediate neural network model meet the preset quantitative requirements: determine the accuracy and calculation speed of the current intermediate neural network model; determine the difference between the initial neural network model and the current intermediate neural network model. The output error of each layer among them is selected; the layer with the largest output error is selected to improve the parameter accuracy, and a new intermediate neural network model is obtained as the current intermediate neural network model.
  • an electronic device including: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores information that can be processed by the one processor.
  • the instructions are executed by the processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the above dynamic target analysis method.
  • a computer-readable storage medium storing a computer program, wherein the computer program implements the above dynamic target analysis method when executed by a processor.
  • Figure 1 is a schematic flow chart of a dynamic target analysis method according to an embodiment of the present invention
  • Figure 2 is a schematic flow chart of a target tracking method according to an embodiment of the present invention.
  • Figure 3 is a schematic flow chart of a feature extraction method according to an embodiment of the present invention.
  • Figure 4 is a schematic flow chart of a model quantification method according to an embodiment of the present invention.
  • Figure 5 is a schematic structural diagram of a dynamic target analysis device according to an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of the hardware structure of a computer device configured to configure a method according to an embodiment of the present invention.
  • Target detection is an important image analysis method that can analyze the location and other related information of the target from the image. For example, the target detection frame that represents the target location, the classification of the target, etc.
  • the embodiment of the present invention discloses a dynamic target analysis method.
  • target detection can be performed separately for each frame of image in the video data, and then a target tracking algorithm can be used to determine the detection frame belonging to the same target in the video data. Since the detection frame can represent the position information of the target, , multiple detection frames belonging to the same target can be used to determine the movement of the target.
  • the category of the target can be determined through the images contained in the determined multiple detection frames belonging to the same target, which can improve the accuracy of target classification.
  • the calculation amount required to specifically determine the target category is usually large. Therefore, in this method, some targets can be screened out without subsequent steps of determining the target category, thereby saving calculation amount and improving detection efficiency.
  • the target is determined based on the images contained in the detection frame of the same target.
  • Target category accuracy is lower.
  • the image quality can be determined for the images included in the detection frame belonging to the same target, and then the targets with lower image quality can be filtered out.
  • the subsequent steps of determining the target category will not be performed, but the targets with higher image quality can be retained for subsequent steps.
  • image quality can be related to the target The target classification accuracy is positively correlated. Screening out targets with low classification accuracy will have almost no impact on the overall target classification accuracy.
  • this method can determine the target category by screening out targets with higher image quality, but does not perform the step of determining the target category for targets with lower image quality, so that it can hardly affect the accuracy of the overall target classification. In this case, the calculation amount is saved and the detection efficiency is improved.
  • Figure 1 is a schematic flow chart of a dynamic target analysis method according to an embodiment of the present invention.
  • S101 Perform target detection on the video data to be detected, and obtain the target detection frame of each frame of image.
  • S102 Based on the obtained detection frame, use a preset target tracking algorithm to determine one or more detection frame sets.
  • each detection frame set contains detection frames belonging to the same target between different images, and different detection frame sets correspond to different targets.
  • S103 Determine the image quality for the images contained in the detection frames in each detection frame set; filter out the detection frame set whose image quality is greater than the preset quality.
  • image quality is positively related to the classification accuracy of the target corresponding to the detection frame set.
  • S104 Based on the images contained in the detection frames in each filtered detection frame set, determine the category of the target corresponding to the detection frame set.
  • the process of this method can determine the image quality of the images contained in the detection frames in each detection frame set, screen out targets with higher image quality, and determine the category of the target. However, for targets with lower image quality, the step of determining the target category is not performed. , which can save the amount of calculation and improve detection efficiency.
  • This method process does not limit the running subject and can be applied to servers or terminals.
  • dynamic target analysis can be implemented through edge computing. Specifically, it can be applied to edge terminals. However, edge terminals usually have less computing power resources. Therefore, this method embodiment can be used to save the amount of calculation. Improve the efficiency of target detection.
  • the above method and process can be applied to smart refrigerators.
  • the detected targets can specifically be products in the smart refrigerator, and the video data to be detected can be surveillance videos shot inside the smart refrigerator.
  • the smart refrigerator can detect multiple detection frames belonging to the same moved product and the category of the moved product through the above method and process.
  • the video data to be detected can be the monitoring captured by the smart refrigerator at the refrigerator outlet. video. Since the goods need to be picked up and moved through the refrigerator outlet, target detection can be carried out more efficiently based on this surveillance video.
  • multiple detection frames belonging to the same product can be used to determine the movement trajectory of the product, and then determine whether the product has been taken by the customer.
  • the popularity of the products in each category can be analyzed.
  • it can be combined with static target detection. Specifically, it can be to periodically take images of the products inside the smart refrigerator to determine the changes in the quantity of the products inside the smart refrigerator, so as to more accurately analyze the popularity of the products.
  • the product types and information recommended for customers can be displayed on nearby display screens. Discount information, discount information, etc.
  • S101 Perform target detection on the video data to be detected, and obtain the target detection frame of each frame of image.
  • the process of this method does not limit the source of the video data to be detected.
  • it can be the surveillance video of the outlet of the smart refrigerator.
  • the method flow does not limit the specific method of target detection.
  • a preset target detection model may be used for target detection.
  • the process of this method does not limit the specific structure of the target detection model.
  • the target detection model can be a model with a smaller amount of calculation, thereby saving calculation amount.
  • the computational complexity of the target detection model can be less than the preset threshold.
  • the YOLOv5 target detection model can be used for target detection. Since the YOLOv5 target detection model has a small amount of calculation and occupies less storage resources, it is more suitable for edge computing, can also save calculation amount and improve the efficiency of target detection.
  • target detection models such as other target detection models of the YOLO series, can also be used for target detection in this method process.
  • the category of the target can usually be determined during target detection.
  • the output object detection box corresponds to the category of the object.
  • the general category of the target may be determined first, and then the subdivided category may be further determined in S104.
  • the detected target is specifically a commodity
  • the commodity can be roughly classified into boxed, bagged and bottled according to the packaging method.
  • subdivision categories may include: boxed drinks, snacks and toys; bagged drinks, snacks and toys; and bottled drinks and snacks.
  • the video data to be detected can be input into a preset target detection model, and the preset target detection model can be used to determine the third category of the target.
  • This embodiment is not limited to the third category, and may specifically be a large category including subdivided categories. For example, boxes, bags and bottles.
  • the corresponding detection frame prediction methods may be different due to different third category targets.
  • the prediction of detection frames The approach may need to be differentiated.
  • a detection frame prediction branch corresponding to each third category can be constructed separately, so that detection frames of different third categories can be distinguished.
  • the prediction method greatly improves the accuracy of the target detection frames of each third category.
  • image samples with detection box labels and third category labels can be used for training.
  • performing target detection on the video data to be detected may include: inputting the video data to be detected into a preset target detection model, and the preset target detection model may be used to determine the third category of the target; the preset target detection model may include A detection box prediction branch corresponding to each third category.
  • the detection frame output by the detection frame prediction branch corresponding to the third category may be determined as the detection frame of the target.
  • This embodiment can improve the prediction accuracy of the detection frame by setting independent detection frame prediction branches of different third categories.
  • a rotation detection frame may be used.
  • the product may rotate as the customer takes it, showing different sides in the video data. Therefore, a rotation detection frame can be used to improve the accuracy of the detection frame.
  • the preset target detection model can be used to determine the detection frame position, size and rotation angle of the target. Based on the position, size and rotation angle of the detection frame, the rotation detection frame can be determined.
  • the rotation angle of the detection frame may specifically be the rotation angle of the long side of the detection frame or the rotation angle of the vertical side, which is not limited in this embodiment.
  • regression prediction can be performed for the rotation angle, or classification prediction can be performed for the rotation angle.
  • the rotation angle can be divided into 180 categories from 0 to 179 degrees.
  • regression prediction has higher accuracy, while classification prediction has better stability and smaller deviation.
  • regression prediction and classification prediction for the rotation angle can be combined to obtain the predicted rotation angle, thereby improving the precision, accuracy and stability of the rotation angle.
  • performing target detection on the video data to be detected may include: inputting the video data to be detected into a preset target detection model, and the preset target detection model may be used to determine the detection frame position, size and rotation angle of the target; preset The target detection model can include the regression prediction branch and the classification prediction branch of the detection frame rotation angle; for any target, the outputs of the regression prediction branch and the classification prediction branch in the preset target detection model can be combined to determine the comprehensive result of the rotation angle.
  • This embodiment can improve the precision, accuracy, and stability of the rotation angle by comprehensive regression prediction and classification prediction.
  • the comprehensive result of the rotation angle can be determined as the rotation angle of the final output detection frame.
  • the detection frame can be output based on the comprehensive result of the detection frame position, size and rotation angle output by the preset target detection model.
  • This method process does not limit the training method of the preset target detection model.
  • image samples labeled with detection frame labels can be used for training
  • image samples labeled with rotation detection frame labels can be used for training
  • image samples labeled with detection frame labels and target category labels can be used for training.
  • S102 Based on the obtained detection frame, use a preset target tracking algorithm to determine one or more detection frame sets.
  • Each detection frame set may contain detection frames belonging to the same target in different images, and different detection frame sets may correspond to different targets.
  • each acquired detection frame can be collected into a detection frame set. Since a single detection frame usually only corresponds to one target, there is usually no situation where the same detection frame is included in different detection frame sets.
  • each detection frame set can include detection frames belonging to the same target between different images.
  • the set of detection boxes can correspond to different targets.
  • the detection frames of two adjacent frames of images can be compared frame by frame, thereby determining that matching detection frames in different images belong to the same target.
  • detection frames belonging to the same target can be different detection frames whose image content represents the same target.
  • the detection frame can be regarded as a detection frame belonging to a target.
  • an object detection frame flashes for only one frame.
  • the detection frame may also not be added to any detection frame set. Due to the high frequency of video data collection, the movement of the target usually requires at least several consecutive frames to capture the target. If the detection frame only flashes for one frame, it can usually be considered an incorrect result.
  • determining one or more detection frame sets may include: determining the first frame image in the video data to be detected as the current image, and executing the following steps in a loop.
  • the preset parameters are used to predict the detection frame in the current image to obtain the predicted detection frame; different The predicted detection frames correspond to different first true detection frames.
  • each second real detection frame obtained through S101 target detection in the current image it is paired with the obtained predicted detection frame.
  • it is determined that the second real detection frame is successfully paired with a predicted detection frame it is determined that the second real detection frame is successfully paired with a predicted detection frame.
  • the second real detection frame and the first real detection frame corresponding to the predicted detection frame belong to the same target and are added to the set of detection frames corresponding to the target.
  • the above preset parameters are updated. If there is no next frame image in the current image, the loop ends; if there is a next frame image in the current image, the next frame image is determined as the current image.
  • the next frame image of the current image can be directly determined as the current image.
  • the preset parameters may include parameters of Kalman filtering.
  • using preset parameters to predict the detection frame in the current image may include: using Kalman filtering to predict the detection frame in the current image.
  • product tracking can use a fast tracking algorithm based on Kalman filtering.
  • Kalman filtering predictions can be made about product location and moving speed.
  • the specific update of the preset parameters that is, the update of the Kalman filter parameters, can include updating information such as the movement speed used in the next prediction based on the movement between detection frames of the same target in the real video data.
  • this method flow does not limit the specific pairing method.
  • pairing can be performed based on the degree of coincidence between the second real detection frame and the predicted detection frame. Specifically, for any second real detection frame, the predicted detection frame with the highest degree of coincidence can be selected for pairing.
  • the Hungarian algorithm can be used for pairing.
  • detection frame image features can be further added for pairing.
  • pairing may be based on the degree of overlap between detection frames and the similarity of image features of the detection frames.
  • the coincidence degree between the detection frames and the similarity of the image features of the detection frames can be integrated, and for any second real detection frame, the predicted detection frame with the highest comprehensive result is selected for pairing.
  • the detection frame image features include color features, edge features, shape features, etc.
  • any second real detection frame if there are two predicted detection frames with a high degree of overlap between the detection frames, if the image features in the second real detection frame are the same as the image features of one of the predicted detection frames , are all red features, they have a high probability of belonging to the same target, and the predicted detection frame can be selected for pairing.
  • Hungarian algorithm can be used for matching.
  • pairing each second real detection frame obtained through target detection in the current image with the obtained predicted detection frame may include: for each second real detection frame obtained through target detection in the current image Contained images, and images contained in the obtained predicted detection frame, calculate the image features of the preset detection frame; use the Hungarian algorithm, based on the similarity of the image features of the preset detection frame and the coincidence degree of the detection frame, the second real detection frame Pair with predicted detection boxes.
  • the similarity of specific preset detection frame image features may be determined based on the cosine distance of the preset detection frame image feature vector.
  • color features can be quickly determined directly based on the distribution of pixel values in the image contained in the detection frame, without the need for complex calculations by neural networks or other models.
  • the computational complexity of the preset detection frame image features may be less than the preset complexity threshold.
  • the preset detection frame image features may include at least one of the following: color features, edge features, shape features, texture features, directional gradient histogram features, etc.
  • the predicted detection frame may be predicted for each first real detection frame in the previous frame image.
  • the second real detection frame and the first real detection frame corresponding to the predicted detection frame belong to the same target, and can be added to the set of detection frames corresponding to the target. If the detection frame set corresponding to the target does not exist, it can be directly created and added.
  • the second real detection frame and the first real detection frame corresponding to the predicted detection frame may be added to the set of detection frames corresponding to the target.
  • consecutive multiple frames of images can complete the pairing between detection frames, so that the detection frames belonging to the same target can be determined and added to the set of detection frames corresponding to the target.
  • the next frame image of the first frame image in the video data to be detected can be determined as the current image, and then executed The above cycle steps.
  • a preset target tracking algorithm to determine one or more detection frame sets, which may include: determining the next frame image of the first frame image in the video data to be detected as the current image, and looping Perform the following steps: Based on the first real detection frame obtained through target detection in the previous frame image, use preset parameters to predict the detection frame in the current image to obtain the predicted detection frame; different predicted detection frames correspond to different first real detection frames Detection frame; For each second real detection frame obtained through target detection in the current image, pair it with the obtained predicted detection frame.
  • the second real detection frame and the first real detection frame corresponding to the predicted detection frame belong to the same target and are added to the set of detection frames corresponding to the target; based on the pairing result, the above preset parameters are updated; when there is no next frame in the current image In the case of an image, the loop ends; in the case where the next frame image exists in the current image, the next frame image is determined as the current image.
  • an embodiment of the present invention also provides a schematic flow chart of a target tracking method, as shown in Figure 2.
  • Figure 2 is a schematic flow chart of a target tracking method according to an embodiment of the present invention.
  • S201 Determine the next frame of the first frame of the video data to be detected as the current image.
  • S202 Based on the first real detection frame obtained through target detection in the previous frame image, use preset parameters to predict the detection frame in the current image to obtain the predicted detection frame.
  • different predicted detection frames correspond to different first real detection frames.
  • S203 For each second real detection frame obtained through target detection in the current image, pair it with the obtained predicted detection frame. When it is determined that the second real detection frame is successfully paired with a predicted detection frame, determine that the second real detection frame is successfully paired with a predicted detection frame.
  • the second real detection frame and the first real detection frame corresponding to the predicted detection frame belong to the same target and are added to the set of detection frames corresponding to the target.
  • S205 Determine whether there is a next frame image in the current image. If there is no next frame image for the current image, this process ends. If there is a next frame image in the current image, S206 is executed.
  • S206 Determine the next frame image as the current image, and execute S202.
  • the embodiment of the present invention also provides a specific example.
  • the detected target can be a product
  • the product tracking can use a fast tracking algorithm based on Kalman filtering, and a specially designed mechanism that can efficiently extract image features to improve tracking stability (tracking stability refers to the ability to continuously and stably Track the same item and try to avoid losing it).
  • the Hungarian algorithm is used to match the predicted tracks with the product detection result detections in the current frame video image (image feature matching and regional IOU matching) to obtain (track, detection) paired data.
  • Kalman filter parameters are updated using the product detection results matching the tracks.
  • Frame 0 The detector has detected 3 detections and currently does not have any tracks. Initialize these 3 detections to tracks.
  • Frame 1 The detector detected 3 more detections. For the tracks in Frame 0, first predict to get new tracks, then use the Hungarian algorithm to match the new tracks with detections, and get the (track, detection) matching pair. Finally Update the corresponding track with the detection in each pair.
  • efficient image features such as color features, HOG features
  • feature engineering methods e.g., feature engineering methods.
  • the criterion for efficient feature selection is that it can effectively distinguish different products, and at the same time, the computational complexity is low and can be quickly extracted and processed in embedded systems.
  • the track and detection are matched based on the image features and regional IOU features, and the cosine distance of the feature vector is used to match the image features.
  • S103 Determine the image quality for the images contained in the detection frames in each detection frame set; filter out the detection frame set whose image quality is greater than the preset quality.
  • image quality is positively related to the classification accuracy of the target corresponding to the detection frame set.
  • image quality analysis can mainly analyze image conditions that reduce the accuracy of target classification, such as target occlusion and target motion blur.
  • the accuracy of subsequent target classification will usually be low.
  • the method flow does not limit the form of image quality.
  • the image quality may be determined in a hierarchical manner. The higher the level, the higher the image quality.
  • the preset quality may specifically be a preset quality level.
  • the process of this method does not limit the method of determining image quality, as long as the image quality is positively correlated with the classification accuracy of the target corresponding to the detection frame set.
  • image quality can be determined through a neural network model.
  • classification prediction may be used to determine the image quality
  • regression prediction may be used to determine the image quality
  • Image quality can also be determined by combining classification prediction and regression prediction.
  • the classification of image quality can be determined by grading.
  • determining the image quality for the images included in the detection frames in each detection frame set may include: inputting the images included in the detection frames in each detection frame set into a preset quality analysis model; the preset quality analysis model Can be used to determine image quality; preset quality analysis models can include regression prediction branches and classification of image quality Prediction branch; for any detection frame set, combine the outputs of the regression prediction branch and the classification prediction branch in the preset quality analysis model to determine the image quality.
  • This embodiment can improve the precision, accuracy and stability of image quality by comprehensively integrating regression prediction and classification prediction.
  • the preset quality analysis model may be a model with a small amount of calculation, and the calculation complexity may be less than a preset threshold.
  • it can be a lightweight network based on mobilenetv2.
  • S104 Based on the images contained in the detection frames in each filtered detection frame set, determine the category of the target corresponding to the detection frame set.
  • This methodological process does not limit the method for specifically determining the categories of targets.
  • a target classification model can be used for determination, or analysis can be performed directly based on image features.
  • the preset category feature set can be updated, the categories that can be recognized and the specific characteristics of the category can be adjusted more flexibly.
  • the corresponding category features can be directly added to the preset category feature set, and targets belonging to that category can be determined.
  • targets belonging to that category can be determined.
  • categories that need updated features for example, if the outer packaging of bottled milk has changed, you can directly replace the features of the bottled milk category in the preset category feature combination.
  • This embodiment can improve the flexibility of target classification by comparing the characteristics of the target with a preset category feature set.
  • determining the category of the target corresponding to the detection frame set may include: based on the images contained in the detection frames in each filtered detection frame set image, extract the features of the target, compare the features of the target with the features in the preset category feature set, and determine the category corresponding to the feature whose similarity is greater than the preset similarity threshold to the category of the target corresponding to the detection frame set.
  • the preset category feature set may include correspondences between several different categories and different features.
  • feature similarity can be calculated by the cosine distance of the feature vectors.
  • This embodiment does not limit the feature extraction method of a specific target.
  • extraction can be performed through a neural network model.
  • the hidden layer of the neural network model that has been trained can be used to determine the output of the hidden layer as the feature of the target.
  • the target may be classified and the first category may be determined. Specifically, a rough category may be determined, and in S104, a subdivided category of the target, that is, a second category, may be further determined.
  • the first category of the target can be determined in S101, for a target of a certain first category, the possibility of other first categories can be excluded, and the target can be determined directly from the subdivision categories of the first category.
  • Category II since the first category of the target can be determined in S101, for a target of a certain first category, the possibility of other first categories can be excluded, and the target can be determined directly from the subdivision categories of the first category. Category II.
  • the range of possible second categories can be narrowed for the targets of the determined first category, thereby saving the amount of calculation and improving the efficiency of target detection.
  • performing target detection on the video data to be detected may include: detecting the first category of targets in the video data to be detected.
  • the preset classification feature set may include: features of the second category; the second category may be a subdivision category of the first category.
  • feature extraction models corresponding to each first category may be preset.
  • extracting the characteristics of the target may include: for each selected detection frame set, determining the first category of the target corresponding to the detection frame set; using the determined The feature extraction model corresponding to the first category extracts the features of the target for the images contained in the detection frames in each selected detection frame set.
  • the feature extraction model corresponding to each first category can be specially used to extract features for the target of the corresponding first category.
  • the feature extraction model can be a model with a smaller amount of calculation, and the calculation complexity can be less than a preset threshold. Specifically, it can be a lightweight network based on mobilenetv2.
  • This embodiment does not specifically limit the training method of the feature extraction model corresponding to each first category.
  • multiple image samples under a single first category can be trained using the subdivision categories of the first category as labels.
  • a neural network model, and the hidden layer output of the trained neural network model is used as the extracted target feature.
  • the images contained in the detection frames in each selected detection frame set can be input into the trained neural network model, and the output of the hidden layer is determined as the feature of the target.
  • the first category to which the target belongs has been determined, when specifically comparing the characteristics of the target with the features in the preset category feature set, only the first category to which the target belongs in the preset category feature set can be compared. Includes subdivided category features, which can improve comparison efficiency by narrowing the scope.
  • the probability that the target belongs to each first category is usually output, and the first category to which the target belongs is determined through the probability.
  • the feature extraction model to be used can be determined based on the probability that the target belongs to each first category. Specifically, it can be to use the feature extraction model corresponding to the first category with the highest output probability, or to use the feature extraction model corresponding to the first category with the output probability greater than the threshold. feature extraction model.
  • the first category to which the target belongs may not be determined based on the probability, or the first category whose probability is not greater than the threshold may be output.
  • a preset feature extraction model can be used to extract features.
  • the preset feature extraction model may be used to extract features for targets of each first category or each second category.
  • the feature extraction model can be a model with a small amount of calculation, and the calculation complexity can be less than the preset threshold. Specifically, it can be a lightweight network based on mobilenetv2, or a model based on resnet18. This embodiment can reduce the amount of calculation and improve the efficiency of feature extraction.
  • a neural network model can be trained based on an image sample set including all second category labels, and the hidden layer of the trained neural network model can be Output,as features of the extracted target.
  • this embodiment of the present invention also provides a schematic flow chart of a feature extraction method.
  • Figure 3 is a schematic flow chart of a feature extraction method according to an embodiment of the present invention.
  • S301 Determine whether the first category of the target is determined. If the first category of the target has been determined, execute S302; if the first category of the target has not been determined, execute S304.
  • S302 Use the feature extraction model corresponding to the determined first category to target the detection frame set corresponding to the target. Combine the images contained in the detection frame to extract the characteristics of the target.
  • S303 Compare the extracted features with the subdivision category features included in the determined first category, and determine the subdivision category corresponding to the feature whose similarity is greater than the preset similarity threshold as the target category.
  • S304 Use a preset feature extraction model to extract features of the target from the images included in the detection frames in the detection frame set corresponding to the target.
  • S305 Compare the extracted features with all subdivided category features, and determine the subdivided category corresponding to the feature whose similarity is greater than the preset similarity threshold as the target category.
  • the embodiment of the present invention also provides a specific example.
  • the category confidence of the product detection frame can be determined, that is, the confidence of the bottle category, box category and bag category.
  • the feature extraction model for bottled category products is used to extract product features, and compared with the subdivided category features of bottled category products to obtain a comparison result. Specifically, the subdivision category corresponding to the feature whose similarity is greater than the preset similarity threshold is determined as the subdivision category of the bottled category product.
  • the confidence level of the bagged category is greater than the preset threshold, such as 0.6
  • the subdivision category corresponding to the feature whose similarity is greater than the preset similarity threshold is determined as the subdivision category of the bagged category commodity.
  • a preset threshold such as 0.6
  • the subdivision category corresponding to the feature whose similarity is greater than the preset similarity threshold is determined as the subdivision category of the boxed category product.
  • the product is not identified as belonging to the bottle, box, or bag category.
  • the feature extraction model for all product subdivision categories can be used to extract product features, and compared with the full amount of subdivision category features to obtain the comparison results. Specifically, the subdivision category corresponding to the feature whose similarity is greater than a preset similarity threshold is determined as the subdivision category of the product.
  • the speed of feature comparison can be increased without reducing the accuracy of feature comparison by reducing the accuracy.
  • the features in the preset category feature set can be set to the preset accuracy, and for the extracted target features, the accuracy can be reduced to the preset accuracy, so that comparison can be made directly. Reduce the amount of calculation, increase the comparison speed, and improve the efficiency of target detection.
  • the accuracy of the conventional output feature vector of product feature extraction is FP32, and feature comparison is also performed based on the feature vector of FP32.
  • neural network models can be used, such as target detection models, image quality analysis models and target classification models.
  • the neural network model itself can be optimized.
  • model quantization or model pruning can be performed on any neural network model. This method process does not limit the specific model quantification method or model pruning method.
  • any neural network model can be any neural network model used in the process of this method.
  • the second neural network model for determining image quality the second neural network model for determining image quality
  • the third neural network model for determining the category of the target corresponding to the detection frame set is obtained through the following quantification method.
  • the initial neural network model trained with the accuracy of the first parameter is quantified into the accuracy of the second parameter to obtain an intermediate neural network model; the accuracy of the second parameter is lower than the accuracy of the first parameter.
  • the preset quantification requirements may specifically be requirements for comprehensive results of accuracy and calculation speed.
  • the calculation speed can be greater than the preset speed, thereby improving the efficiency of target detection.
  • the current intermediate neural network model obtained after the loop is completed can be used in this method process.
  • this embodiment of the present invention also provides a schematic flow chart of a model quantification method.
  • Figure 4 is a schematic flow chart of a model quantification method according to an embodiment of the present invention.
  • the method may include the following steps.
  • the first parameter precision may be FP32
  • the second parameter precision may be INT8.
  • S402 Determine the accuracy and calculation speed of the current intermediate neural network model.
  • S403 Determine whether the accuracy and calculation speed of the current intermediate neural network model meet the preset quantification requirements. When the preset quantification requirements are met, this process ends. If the preset quantization requirements are not met, S404 is executed.
  • S404 Determine the output error of each layer between the initial neural network model and the current intermediate neural network model.
  • S405 Select the layer with the largest output error to improve parameter accuracy, obtain a new intermediate neural network model as the current intermediate neural network model, and execute S402.
  • the embodiment of the present invention also provides an apparatus embodiment.
  • Figure 5 is a schematic structural diagram of a dynamic target analysis device according to an embodiment of the present invention.
  • the device may include the following units.
  • the target detection unit 501 is used to perform target detection on the video data to be detected and obtain the target detection frame of each frame of image.
  • the target tracking unit 502 is used to determine one or more detection frame sets based on the acquired detection frames and using a preset target tracking algorithm; wherein each detection frame set includes detections belonging to the same target between different images. Frames, different detection frame sets correspond to different targets.
  • the screening unit 503 is used to determine the image quality of the images contained in the detection frames in each detection frame set; the image quality is positively correlated with the classification accuracy of the corresponding target in the detection frame set; and filter out the detection frames whose image quality is greater than the preset quality. gather.
  • the classification unit 504 is configured to determine the category of the target corresponding to the detection frame set based on the images contained in the detection frames in each filtered detection frame set.
  • the classification unit 504 is configured to: extract the characteristics of the target based on the images contained in the detection frames in each selected detection frame set, compare the characteristics of the target with the characteristics in the preset category feature set, and classify similar The category corresponding to the feature whose degree is greater than the preset similarity threshold is determined to be the category of the target corresponding to the detection frame set.
  • the target detection unit 501 is configured to: detect the first category of targets in the video data to be detected; the preset classification feature set includes: features of the second category; the second category is a subdivision category of the first category ; The device is preset with feature extraction models corresponding to each first category; the classification unit 504 is used to: for each selected detection frame set, determine the first category of the target corresponding to the detection frame set; use the determined first The feature extraction model corresponding to the category extracts the characteristics of the target for the images contained in the detection frames in each selected detection frame set.
  • the target detection unit 501 is used to: input the video data to be detected into a preset target detection model, and the preset target detection model is used to determine the third category of the target; the preset target detection model includes a target corresponding to each third category.
  • the target detection unit 501 is used to: input the video data to be detected into a preset target detection model, and the preset target detection model is used to determine the detection frame position, size and rotation angle of the target; the preset target detection model includes, The regression prediction branch and the classification prediction branch of the detection frame rotation angle; for any target, the outputs of the regression prediction branch and the classification prediction branch in the preset target detection model are integrated to determine the comprehensive result of the rotation angle.
  • the target tracking unit 502 is configured to: determine the next frame of the first frame of the image in the video data to be detected as the current image, and perform the following steps in a loop: based on the first real image obtained through target detection in the previous frame of the image. Detection frame, use preset parameters to predict the detection frame in the current image to obtain the predicted detection frame; different predicted detection frames correspond to different first real detection frames; for each second real detection obtained through target detection in the current image The frame is paired with the obtained predicted detection frame. When it is determined that the second real detection frame is successfully paired with a predicted detection frame, the first real detection frame corresponding to the second real detection frame and the predicted detection frame is determined.
  • Belonging to the same target add it to the detection frame set corresponding to the target; based on the pairing result, update the preset parameters; if the current image does not have the next frame image, end the loop; if the current image has the next frame image , determine the next frame image as the current image.
  • the preset parameters include the parameters of Kalman filtering; the target tracking unit 502 is used to: use Kalman filtering to predict the detection frame in the current image; for each second frame obtained through target detection in the current image The images contained in the real detection frame and the images contained in the obtained predicted detection frame are used to calculate the image features of the preset detection frame; the computational complexity of the image features of the preset detection frame is less than the preset complexity threshold; the Hungarian algorithm is used to calculate the image features of the preset detection frame according to the preset The similarity of the detection frame image features and the coincidence degree of the detection frames are used to pair the second real detection frame with the predicted detection frame.
  • the screening unit 503 is used to: input the images contained in the detection frames in each detection frame set into a preset quality analysis model; the preset quality analysis model is used to determine the image quality; the preset quality analysis model includes image quality The regression prediction branch and the classification prediction branch; for any detection frame set, the output of the regression prediction branch and the classification prediction branch in the preset quality analysis model is combined to determine the image quality.
  • the second neural network model for determining image quality, and the third neural network model for determining the category of the target corresponding to the detection frame set at least one neural network model is obtained through the following quantification method: after training with the first parameter accuracy
  • the initial neural network model formed is quantified as the accuracy of the second parameter to obtain the intermediate neural network model; the accuracy of the second parameter is lower than the accuracy of the first parameter; the following steps are executed in a loop until the accuracy and calculation speed of the current intermediate neural network model meet the predetermined Set quantitative requirements: determine the accuracy and calculation speed of the current intermediate neural network model; determine the output error of each layer between the initial neural network model and the current intermediate neural network model; select the layer with the largest output error to improve parameter accuracy and obtain a new
  • the intermediate neural network model serves as the current intermediate neural network model.
  • Embodiments of the present invention also provide a computer device, which at least includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the program, any one of the above method embodiments is implemented. .
  • An embodiment of the present invention also provides an electronic device, including: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions that can be executed by the one processor, The instructions are executed by the at least one processor, so that the at least one processor can execute any of the above method embodiments.
  • Figure 6 is a schematic hardware structure diagram of a computer device configured to configure a method according to an embodiment of the present invention.
  • the device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040 and a bus 1050.
  • the processor 1010, the memory 1020, the input/output interface 1030 and the communication interface 1040 implement communication connections between each other within the device through the bus 1050.
  • the processor 1010 can be implemented using a general-purpose CPU (Central Processing Unit, central processing unit), a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and is used to execute related program to implement the technical solutions provided by the embodiments of the present invention.
  • a general-purpose CPU Central Processing Unit, central processing unit
  • a microprocessor an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits
  • ASIC Application Specific Integrated Circuit
  • the memory 1020 can be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory), static storage device, dynamic storage device, etc.
  • the memory 1020 can store operating systems and other application programs. When the technical solution provided by the embodiment of the present invention is implemented through software or firmware, the relevant program code is stored in the memory 1020 and called and executed by the processor 1010.
  • the input/output interface 1030 is used to connect the input/output module to realize information input and output.
  • the input/output/module can be configured in the device as a component (not shown in the figure), or can be externally connected to the device to provide corresponding functions.
  • Input devices can include keyboards, mice, touch screens, microphones, various sensors, etc., and output devices can include monitors, speakers, vibrators, indicator lights, etc.
  • the communication interface 1040 is used to connect a communication module (not shown in the figure) to realize communication interaction between this device and other devices.
  • the communication module can realize communication through wired means (such as USB, network cable, etc.) or wireless means (such as mobile network, WIFI, Bluetooth, etc.).
  • Bus 1050 includes a path that carries information between various components of the device (eg, processor 1010, memory 1020, input/output interface 1030, and communication interface 1040).
  • the above device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, during specific implementation, the device may also include necessary components for normal operation. Other components.
  • the above-mentioned device may also include only the components necessary to implement the embodiments of the present invention, and does not necessarily include all the components shown in the figures.
  • Embodiments of the present invention also provide a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, any one of the above method embodiments can be implemented.
  • Embodiments of the present invention also provide a computer-readable storage medium storing a computer program, which implements any of the above method embodiments when executed by a processor.
  • Computer-readable media includes both persistent and non-volatile, removable and non-removable media that can be implemented by any method or technology for storage of information.
  • Information may be computer-readable instructions, data structures, modules of a program or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), and read-only memory.
  • PRAM phase change memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • RAM random access memory
  • read-only memory read-only memory
  • ROM read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • flash memory or other memory technology
  • compact disc read-only memory CD-ROM
  • DVD digital versatile disc
  • Magnetic tape cassettes disk storage or other magnetic storage devices, or any other non-transmission medium, can be used to store information that can be accessed by a computing device.
  • computer-readable media does not include transitory media, such as modulated data signals and carrier waves.
  • the embodiments of the present invention can be implemented by means of software plus a necessary general hardware platform. Based on this understanding, the essence or the contribution part of the technical solutions of the embodiments of the present invention can be embodied in the form of software products.
  • the computer software products can be stored in storage media, such as ROM/RAM, magnetic disks, and optical disks. etc., including a number of instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in various embodiments or certain parts of the embodiments of the present invention.
  • a typical implementation device is a computer, which may be in the form of a personal computer, a laptop, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email transceiver, or a game controller. desktop, tablet, wearable device, or a combination of any of these devices.
  • each embodiment in this specification is described in a progressive manner.
  • the same and similar parts between the various embodiments can be referred to each other.
  • Each embodiment focuses on its differences from other embodiments.
  • the description is relatively simple.
  • the device embodiments described above are only illustrative.
  • the modules described as separate components may or may not be physically separated.
  • the functions of each module may be integrated into the same device. or implemented in multiple software and/or hardware. Some or all of the modules can also be selected according to actual needs to achieve the purpose of the solution of this embodiment. Persons of ordinary skill in the art can understand and implement the method without any creative effort.
  • first and second are used for descriptive purposes only and cannot be understood as indicating or implying relative importance.
  • plurality refers to two or more than two, unless expressly limited otherwise.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种动态目标分析方法、装置、设备及存储介质。所述方法包括:针对待检测视频数据进行目标检测,获取每帧图像的目标检测框;基于所获取的检测框,利用预设目标跟踪算法,确定一个或多个检测框集合;其中,每个检测框集合中包含不同图像之间属于同一目标的检测框,不同检测框集合对应于不同目标;针对每个检测框集合中检测框所包含的图像,确定图像质量;所述图像质量与检测框集合对应目标的分类准确率正相关;筛选出图像质量大于预设质量的检测框集合;基于筛选出的每个检测框集合中检测框所包含的图像,确定该检测框集合对应目标的类别。

Description

一种动态目标分析方法、装置、设备及存储介质 技术领域
本发明涉及目标检测技术领域,尤其涉及一种动态目标分析方法、装置、设备及存储介质。
背景技术
目标检测是一种重要的图像分析方法,可以从图像中分析出目标的位置等相关信息。例如,表征目标位置的目标检测框,目标的分类等。
在某些业务需求中,可能需要针对目标的动态情况进行分析。例如,在针对商品进行目标检测时,可能需要针对商品的拿取进行检测,从而方便确定商品的受欢迎程度,具体可以是检测商品的移动情况和商品种类。
但是具体分析目标的动态情况,往往需要针对视频数据进行目标检测等处理,计算量较大,检测效率较低。
发明内容
本发明提供一种动态目标分析方法、装置、设备及存储介质,以解决相关技术中的不足。
根据本发明实施例的第一方面,提供一种动态目标分析方法,包括:针对待检测视频数据进行目标检测,获取每帧图像的目标检测框;基于所获取的检测框,利用预设目标跟踪算法,确定一个或多个检测框集合;其中,每个检测框集合中包含不同图像之间属于同一目标的检测框,不同检测框集合对应于不同目标;针对每个检测框集合中检测框所包含的图像,确定图像质量;所述图像质量与检测框集合对应目标的分类准确率正相关;筛选出图像质量大于预设质量的检测框集合;基于筛选出的每个检测框集合中检测框所包含的图像,确定该检测框集合对应目标的类别。
可选地,所述基于筛选出的每个检测框集合中检测框所包含的图像,确定该检测框集合对应目标的类别,包括:基于筛选出的每个检测框集合中检测框所包含的图像,提取目标的特征,比对所述目标的特征与预设类别特征集合中的特征,将相似度大于预设相似度阈值的特征对应的类别,确定为该检测框集合对应目标的类别。
可选地,所述针对待检测视频数据进行目标检测,包括:针对待检测视频数据中目标的第一类别进行检测;所述预设分类特征集合包括:第二类别的特征;所述第二类别是所述第一类别的细分类别;所述方法还包括:预先设置各个第一类别对应的特征提取模型;所述基于筛选出的每个检测框集合中检测框所包含的图像,提取目标的特征,包括:针对筛选出的每个检测框集合,确定该检测框集合对应目标的第一类别;利用所确定的第一类别对应的特征提取模型,针对筛选出的每个检测框集合中检测框所包含的图像,提取目标的特征。
可选地,所述针对待检测视频数据进行目标检测,包括:将待检测视频数据输入预设目标检测模型,所述预设目标检测模型用于确定目标的第三类别;所述预设目标检测模型包含对应于每个第三类别的检测框预测分支;在确定任一目标的第三类别的情况下,将该第三类别对应的检测框预测分支输出的检测框,确定为该目标的检测框。
可选地,所述针对待检测视频数据进行目标检测,包括:将待检测视频数据输入预设目标检测模型,所述预设目标检测模型用于确定目标的检测框位置、尺寸和旋转角度;所述预设目标检测模型包含,检测框旋转角度的回归预测分支和分类预测分支;针对任一目标,综合所述预设目标检测模型中回归预测分支和分类预测分支的输出,确定旋转 角度综合结果。
可选地,所述基于所获取的检测框,利用预设目标跟踪算法,确定一个或多个检测框集合,包括:将所述待检测视频数据中首帧图像的下一帧图像确定为当前图像,循环执行以下步骤:基于上一帧图像中通过目标检测得到的第一真实检测框,利用预设参数对当前图像中的检测框进行预测,得到预测检测框;不同预测检测框对应于不同第一真实检测框;针对当前图像中通过目标检测得到的每个第二真实检测框,与所得到的预测检测框进行配对,在确定该第二真实检测框与一个预测检测框配对成功的情况下,确定该第二真实检测框与该预测检测框对应的第一真实检测框属于同一目标,添加到该目标对应的检测框集合;基于配对结果,更新所述预设参数;在当前图像不存在下一帧图像的情况下,结束循环;在当前图像存在下一帧图像的情况下,将下一帧图像确定为当前图像。
可选地,所述预设参数包括,卡尔曼滤波的参数;所述利用预设参数对当前图像中的检测框进行预测,包括:利用卡尔曼滤波对当前图像中的检测框进行预测;所述针对当前图像中通过目标检测得到的每个第二真实检测框,与所得到的预测检测框进行配对,包括:针对当前图像中通过目标检测得到的每个第二真实检测框包含的图像,以及所得到的预测检测框包含的图像,计算预设检测框图像特征;所述预设检测框图像特征的计算复杂度小于预设复杂度阈值;利用匈牙利算法,根据预设检测框图像特征的相似度,以及检测框重合度,将第二真实检测框与预测检测框进行配对。
可选地,所述针对每个检测框集合中检测框所包含的图像,确定图像质量,包括:将每个检测框集合中检测框所包含的图像输入预设质量分析模型;所述预设质量分析模型用于确定图像质量;所述预设质量分析模型包含图像质量的回归预测分支和分类预测分支;针对任一检测框集合,综合所述预设质量分析模型中回归预测分支和分类预测分支的输出,确定图像质量。
可选地,用于针对待检测视频数据进行目标检测的第一神经网络模型、用于确定图像质量的第二神经网络模型和用于确定检测框集合对应目标的类别的第三神经网络模型中,至少一个神经网络模型通过以下量化方式得到:针对以第一参数精度训练完成的初始神经网络模型,量化为第二参数精度,得到中间神经网络模型;第二参数精度低于第一参数精度;循环执行以下步骤,直到当前中间神经网络模型的准确度和计算速度满足预设量化要求:确定当前中间神经网络模型的准确度和计算速度;确定所述初始神经网络模型与当前中间神经网络模型之间每层的输出误差;选择其中输出误差最大的层提高参数精度,得到新的中间神经网络模型作为当前中间神经网络模型。
根据本发明实施例的第二方面,提供一种动态目标分析装置,包括:目标检测单元,用于针对待检测视频数据进行目标检测,获取每帧图像的目标检测框;目标跟踪单元,用于基于所获取的检测框,利用预设目标跟踪算法,确定一个或多个检测框集合;其中,每个检测框集合中包含不同图像之间属于同一目标的检测框,不同检测框集合对应于不同目标;筛选单元,用于针对每个检测框集合中检测框所包含的图像,确定图像质量;所述图像质量与检测框集合对应目标的分类准确率正相关;筛选出图像质量大于预设质量的检测框集合;分类单元,用于基于筛选出的每个检测框集合中检测框所包含的图像,确定该检测框集合对应目标的类别。
可选地,所述分类单元,用于:基于筛选出的每个检测框集合中检测框所包含的图像,提取目标的特征,比对所述目标的特征与预设类别特征集合中的特征,将相似度大于预设相似度阈值的特征对应的类别,确定为该检测框集合对应目标的类别。
可选地,所述目标检测单元,用于:针对待检测视频数据中目标的第一类别进行检测;所述预设分类特征集合包括:第二类别的特征;所述第二类别是所述第一类别的细分类别;所述装置预先设置有各个第一类别对应的特征提取模型;所述分类单元,用于:针对筛选出的每个检测框集合,确定该检测框集合对应目标的第一类别;利用所确定的 第一类别对应的特征提取模型,针对筛选出的每个检测框集合中检测框所包含的图像,提取目标的特征。
可选地,所述目标检测单元,用于:将待检测视频数据输入预设目标检测模型,所述预设目标检测模型用于确定目标的第三类别;所述预设目标检测模型包含对应于每个第三类别的检测框预测分支;在确定任一目标的第三类别的情况下,将该第三类别对应的检测框预测分支输出的检测框,确定为该目标的检测框。
可选地,所述目标检测单元,用于:将待检测视频数据输入预设目标检测模型,所述预设目标检测模型用于确定目标的检测框位置、尺寸和旋转角度;所述预设目标检测模型包含,检测框旋转角度的回归预测分支和分类预测分支;针对任一目标,综合所述预设目标检测模型中回归预测分支和分类预测分支的输出,确定旋转角度综合结果。
可选地,所述目标跟踪单元,用于:将所述待检测视频数据中首帧图像的下一帧图像确定为当前图像,循环执行以下步骤:基于上一帧图像中通过目标检测得到的第一真实检测框,利用预设参数对当前图像中的检测框进行预测,得到预测检测框;不同预测检测框对应于不同第一真实检测框;针对当前图像中通过目标检测得到的每个第二真实检测框,与所得到的预测检测框进行配对,在确定该第二真实检测框与一个预测检测框配对成功的情况下,确定该第二真实检测框与该预测检测框对应的第一真实检测框属于同一目标,添加到该目标对应的检测框集合;基于配对结果,更新所述预设参数;在当前图像不存在下一帧图像的情况下,结束循环;在当前图像存在下一帧图像的情况下,将下一帧图像确定为当前图像。
可选地,所述预设参数包括,卡尔曼滤波的参数;所述目标跟踪单元,用于:利用卡尔曼滤波对当前图像中的检测框进行预测;针对当前图像中通过目标检测得到的每个第二真实检测框包含的图像,以及所得到的预测检测框包含的图像,计算预设检测框图像特征;所述预设检测框图像特征的计算复杂度小于预设复杂度阈值;利用匈牙利算法,根据预设检测框图像特征的相似度,以及检测框重合度,将第二真实检测框与预测检测框进行配对。
可选地,所述筛选单元,用于:将每个检测框集合中检测框所包含的图像输入预设质量分析模型;所述预设质量分析模型用于确定图像质量;所述预设质量分析模型包含图像质量的回归预测分支和分类预测分支;针对任一检测框集合,综合所述预设质量分析模型中回归预测分支和分类预测分支的输出,确定图像质量。
可选地,用于针对待检测视频数据进行目标检测的第一神经网络模型、用于确定图像质量的第二神经网络模型和用于确定检测框集合对应目标的类别的第三神经网络模型中,至少一个神经网络模型通过以下量化方式得到:针对以第一参数精度训练完成的初始神经网络模型,量化为第二参数精度,得到中间神经网络模型;第二参数精度低于第一参数精度;循环执行以下步骤,直到当前中间神经网络模型的准确度和计算速度满足预设量化要求:确定当前中间神经网络模型的准确度和计算速度;确定所述初始神经网络模型与当前中间神经网络模型之间每层的输出误差;选择其中输出误差最大的层提高参数精度,得到新的中间神经网络模型作为当前中间神经网络模型。
根据本发明实施例的第三方面,提供一种电子设备,包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行上述动态目标分析方法。
根据本发明实施例的第四方面,提供一种存储有计算机程序的计算机可读存储介质,其中,所述计算机程序在由处理器执行时实现上述动态目标分析方法。
根据上述实施例可知,通过确定各个检测框集合中检测框所包含图像的图像质量,筛选出图像质量较高的目标,确定目标的类别,而针对图像质量较低的目标,不进行确 定目标类别的步骤,从而可以节约计算量,提高检测效率。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本发明。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本发明的实施例,并与说明书一起用于解释本发明的原理。
图1是根据本发明实施例示出的一种动态目标分析方法的流程示意图;
图2是根据本发明实施例示出的一种目标跟踪方法的流程示意图;
图3是根据本发明实施例示出的一种特征提取方法的流程示意图;
图4是根据本发明实施例示出的一种模型量化方法的流程示意图;
图5是根据本发明实施例示出的一种动态目标分析装置的结构示意图;
图6是根据本发明实施例示出的一种配置本发明实施例方法的计算机设备硬件结构示意图。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本发明相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本发明的一些方面相一致的装置和方法的例子。
目标检测是一种重要的图像分析方法,可以从图像中分析出目标的位置等相关信息。例如,表征目标位置的目标检测框,目标的分类等。
在某些业务需求中,可能需要针对目标的动态情况进行分析。例如,在针对商品进行目标检测时,可能需要针对商品的拿取进行检测,从而方便确定商品的受欢迎程度,具体可以是检测商品的移动情况和商品种类。
但是具体分析目标的动态情况,往往需要针对视频数据进行目标检测等处理,计算量较大,检测效率较低。
本发明实施例公开了一种动态目标分析方法。在该方法中,可以针对视频数据中的每帧图像分别进行目标检测,进而可以采用目标跟踪算法,确定出视频数据中,属于同一目标的检测框,由于检测框可以表征目标的位置信息,因此,属于同一目标的多个检测框可以用于确定该目标的移动情况。
进一步地,可以通过所确定的属于同一目标的多个检测框包含的图像,确定出该目标的类别,可以提高目标分类的准确率。
其中,具体确定目标类别所耗费的计算量通常较大,因此,在本方法中,可以筛选掉部分目标,不进行后续的确定目标类别的步骤,节约计算量,提高检测效率。
具体地,由于在检测框包含的图像中,目标如果被遮挡,或者目标在移动过程中变模糊,或者目标仅对应于较少的检测框,则基于同一目标的检测框包含的图像所确定的目标类别,准确率较低。
因此,可以针对属于同一目标的检测框包含的图像,确定图像质量,进而筛选掉图像质量较低的目标,并不进行后续确定目标类别的步骤,而是保留图像质量较高的目标,进行后续确定目标类别的步骤,节约计算列,提高检测效率。其中,图像质量可以与目 标的分类准确率正相关,筛选掉分类准确率较低的目标,对于整体的目标分类准确率影响几乎没有影响。
因此,本方法可以通过筛选出图像质量较高的目标,确定目标的类别,而针对图像质量较低的目标,不进行确定目标类别的步骤,从而可以在几乎不影响整体目标分类的准确率的情况下,节约计算量,提高检测效率。
如图1所示,图1是根据本发明实施例示出的一种动态目标分析方法的流程示意图。
其中可以包括以下步骤。
S101:针对待检测视频数据进行目标检测,获取每帧图像的目标检测框。
S102:基于所获取的检测框,利用预设目标跟踪算法,确定一个或多个检测框集合。
其中,每个检测框集合中包含不同图像之间属于同一目标的检测框,不同检测框集合对应于不同目标。
S103:针对每个检测框集合中检测框所包含的图像,确定图像质量;筛选出图像质量大于预设质量的检测框集合。
其中,图像质量与检测框集合对应目标的分类准确率正相关。
S104:基于筛选出的每个检测框集合中检测框所包含的图像,确定该检测框集合对应目标的类别。
本方法流程可以通过确定各个检测框集合中检测框所包含图像的图像质量,筛选出图像质量较高的目标,确定目标的类别,而针对图像质量较低的目标,不进行确定目标类别的步骤,从而可以节约计算量,提高检测效率。
由于图像质量与目标的分类准确率正相关,对于筛选掉的图像质量较低的目标,存在较大可能分类错误,因此,不确定目标类别,也几乎不影响整体目标分类的准确率。
本方法流程并不限定运行主体,可以应用于服务器,也可以应用于终端。
在一种可选的实施例中,可以通过边缘计算实现动态目标分析,具体可以是应用于边缘终端,而边缘终端的算力资源通常较少,因此,可以采用本方法实施例节约计算量,提高目标检测的效率。
为了便于理解,下面给出一种具体的应用实施例。
上述方法流程可以应用于智能冰箱,所检测的目标具体可以是智能冰箱中的商品,待检测视频数据可以是智能冰箱针对内部拍摄的监控视频。智能冰箱可以通过上述方法流程,检测出属于移动的同一商品的多个检测框,以及发生移动的商品类别。
需要说明的是,由于智能冰箱内部通常静止摆放了较多的商品,针对这些静止商品进行动态目标分析较为耗费计算资源,因此,待检测视频数据可以是智能冰箱针对冰箱出货口拍摄的监控视频。由于商品的拿取移动都需要经过冰箱出货口,也就可以基于这一监控视频更高效地进行目标检测。
而基于检测结果,可以通过属于同一商品的多个检测框,确定该商品的移动轨迹,进而确定该商品是否被顾客拿取。
综合商品拿取情况,以及拿取的商品类别,可以分析各个类别的商品受欢迎程度。
进一步地,可以结合静态的目标检测,具体可以是周期性拍摄智能冰箱内部商品的图像,确定智能冰箱内部商品的数量变化情况,从而更准确地分析商品受欢迎程度。
也可以结合顾客特征,例如,分析拿取商品的顾客年龄和性别等特征,从而确定不同类型的顾客所感兴趣的商品类别,进而方便为顾客推荐商品。具体可以是针对不同类型的顾客,在附近的展示屏上显示为顾客推荐的商品种类及信息,例如,商品价格、折 扣信息、优惠信息等。
下面针对各个步骤进行详细的解释。
S101:针对待检测视频数据进行目标检测,获取每帧图像的目标检测框。
本方法流程并不限定待检测视频数据的来源,可选地,可以是智能冰箱的出货口的监控视频。
本方法流程并不限定目标检测的具体方式,可选地,可以是采用预设的目标检测模型进行目标检测。
本方法流程并不限定目标检测模型的具体结构,可选地,目标检测模型可以是计算量较小的模型,从而节约计算量。目标检测模型的计算复杂度可以小于预设阈值。具体可以采用YOLOv5目标检测模型进行目标检测。由于YOLOv5目标检测模型的计算量小,并且所占用的存储资源较少,因此,比较适合边缘计算,也能够节约计算量,提高目标检测的效率。
当然,其他目标检测模型,例如,YOLO系列的其他目标检测模型,也可以在本方法流程中用于目标检测。
在一种可选的实施例中,由于S101中可以进行目标检测,而目标检测中通常也可以确定目标的类别。例如,输出的目标检测框对应于目标的类别。
可选地,为了与后续步骤S104中确定目标的类别区分开,可以在S101目标检测时,先确定出目标的大致类别,之后在S104中可以进一步确定细分类别。
例如,在智能冰箱的场景中,所检测的目标具体为商品,而商品可以根据包装方式大致分类为盒装、袋装和瓶装。进一步地,细分类别可以包括:盒装的饮料、零食和玩具;袋装的饮料、零食和玩具;以及瓶装的饮料和零食。
因此,可选地,可以将待检测视频数据输入预设目标检测模型,预设目标检测模型可以用于确定目标的第三类别。本实施例并不限定第三类别,具体可以是包含细分类别的大类别。例如,盒装、袋装和瓶装。
而针对目标的第三类别,由于不同第三类别的目标,所对应的检测框预测方式可能存在差异,例如,针对形状规则的盒装商品和形状通常不规则的袋装商品,检测框的预测方式可能需要进行区分。
因此,可选地,为了提高检测框的准确率,可以针对目标的每个第三类别,分别构建对应于每个第三类别的检测框预测分支,从而可以区分出不同第三类别的检测框预测方式,大大提高了各个第三类别的目标检测框的准确率。
相对应地,可选地,具体训练预设检测模型时,可以采用具有检测框标签和第三类别标签的图像样本进行训练。
可选地,针对待检测视频数据进行目标检测,可以包括:将待检测视频数据输入预设目标检测模型,预设目标检测模型可以用于确定目标的第三类别;预设目标检测模型可以包含对应于每个第三类别的检测框预测分支。在确定任一目标的第三类别的情况下,可以将该第三类别对应的检测框预测分支输出的检测框,确定为该目标的检测框。
本实施例可以通过设置不同第三类别的独立检测框预测分支,提高检测框的预测准确率。
针对检测框,本方法流程并不限定具体的格式。
在一种可选的实施例中,可以采用旋转检测框。例如,在智能冰箱的场景中,商品可能随着顾客的拿取进行旋转,在视频数据中呈现不同的侧面,因此,可以采用旋转检测框,提高检测框的准确率。
因此,可选地,预设目标检测模型可以用于确定目标的检测框位置、尺寸和旋转角度。综合检测框的位置、尺寸和旋转角度,可以确定出旋转检测框。
针对检测框的旋转角度,具体可以是检测框的长边旋转角度,也可以是竖边旋转角度,本实施例并不限定。
进一步地,可选地,可以针对旋转角度进行回归预测,也可以针对旋转角度进行分类预测,具体可以是将旋转角度从0到179度划分为180个分类。
其中,回归预测的精度较高,而分类预测的稳定性较好,偏差较小。
因此,可选地,可以综合针对旋转角度的回归预测和分类预测,得出预测的旋转角度,提高旋转角度的精度、准确率和稳定性。
可选地,针对待检测视频数据进行目标检测,可以包括:将待检测视频数据输入预设目标检测模型,预设目标检测模型可以用于确定目标的检测框位置、尺寸和旋转角度;预设目标检测模型可以包含,检测框旋转角度的回归预测分支和分类预测分支;针对任一目标,可以综合预设目标检测模型中回归预测分支和分类预测分支的输出,确定旋转角度综合结果。
本实施例可以通过综合回归预测和分类预测的方式,提高旋转角度的精度、准确率和稳定性。
其中,旋转角度综合结果可以确定为最终输出的检测框的旋转角度。
因此,可选地,可以基于预设目标检测模型输出的检测框位置、尺寸和旋转角度综合结果,输出检测框。
本方法流程并不限定预设目标检测模型的训练方式。可选地,可以采用标注有检测框标签的图像样本进行训练,也可以采用标注有旋转检测框标签的图像样本进行训练,也可以采用标注有检测框标签和目标类别标签的图像样本进行训练。
S102:基于所获取的检测框,利用预设目标跟踪算法,确定一个或多个检测框集合。
其中,每个检测框集合中可以包含不同图像之间属于同一目标的检测框,不同检测框集合可以对应于不同目标。
可选地,所获取的每个检测框,可以被收集到一个检测框集合中。由于单个检测框通常只对应于一个目标,因此,通常不存在不同检测框集合中包含相同检测框的情况。
本方法流程并不限定具体的预设目标跟踪算法,只要所确定的检测框集合中符合上述要求即可,也就是每个检测框集合中可以包含不同图像之间属于同一目标的检测框,不同检测框集合可以对应于不同目标。
在一种可选的实施例中,可以针对待检测视频数据,逐帧对比前后相邻的两帧图像的检测框,从而确定出不同图像中相匹配的检测框属于同一目标。
具体地,属于同一目标的检测框,可以是所包含的图像内容表征相同目标的不同检测框。
当然,也可能存在前一帧和后一帧图像之间,检测框数量不相同的情况,从而导致存在没有匹配成功的检测框。针对连续图像中,没有匹配成功的单个检测框,可以将该检测框看作是属于一个目标的检测框。例如,只闪过一帧的目标检测框。也可以将该检测框不添加到任一检测框集合中,由于视频数据的采集频率较高,目标的移动通常会存在至少连续几帧捕捉到该目标。如果是只闪过一帧的检测框,通常可以认为是错误结果。
可选地,基于所获取的检测框,利用预设目标跟踪算法,确定一个或多个检测框集合,可以包括:将待检测视频数据中的首帧图像确定为当前图像,循环执行以下步骤。
在当前图像存在上一帧图像的情况下,基于上一帧图像中通过S101目标检测得到的第一真实检测框,利用预设参数对当前图像中的检测框进行预测,得到预测检测框;不同预测检测框对应于不同第一真实检测框。
针对当前图像中通过S101目标检测得到的每个第二真实检测框,与所得到的预测检测框进行配对,在确定该第二真实检测框与一个预测检测框配对成功的情况下,确定该第二真实检测框与该预测检测框对应的第一真实检测框属于同一目标,添加到该目标对应的检测框集合。
基于配对结果,更新上述预设参数。在当前图像不存在下一帧图像的情况下,结束循环;在当前图像存在下一帧图像的情况下,将下一帧图像确定为当前图像。
其中,可选地,在当前图像不存在上一帧图像的情况下,可以直接将当前图像的下一帧图像确定为当前图像。
可选地,预设参数可以包括,卡尔曼滤波的参数。相对应地,利用预设参数对当前图像中的检测框进行预测,可以包括:利用卡尔曼滤波对当前图像中的检测框进行预测。
其中,商品跟踪可以采用基于卡尔曼滤波的快速跟踪算法。通过卡尔曼滤波,可以针对商品位置和移动速度进行预测。而具体更新预设参数,也就是更新卡尔曼滤波参数,可以包括根据真实的视频数据中,同一目标的检测框之间的移动情况,更新下一次预测时采用的移动速度等信息。
关于配对,本方法流程并不限定具体的配对方法。
可选地,可以基于第二真实检测框与预测检测框之间的重合度进行配对,具体可以是针对任一第二真实检测框,选择重合度最高的预测检测框进行配对。
可选地,具体可以采用匈牙利算法进行配对。
为了提高配对的准确率,提高目标跟踪的准确率,可选地,可以进一步增加检测框图像特征用于配对。具体可以是基于检测框之间的重合度以及检测框图像特征的相似度进行配对。
可选地,可以是综合检测框之间的重合度以及检测框图像特征的相似度,针对任一第二真实检测框,选择综合结果最高的预测检测框进行配对。
其中,检测框图像特征例如,颜色特征、边缘特征、形状特征等。
例如,针对任一第二真实检测框,在存在两个预测检测框的检测框重合度都较高的情况下,如果该第二真实检测框中的图像特征与其中一个预测检测框图像特征相同,都是红色特征,则大概率属于相同目标,可以选择该预测检测框进行配对。
当然,具体可以采用匈牙利算法进行配对。
可选地,针对当前图像中通过目标检测得到的每个第二真实检测框,与所得到的预测检测框进行配对,可以包括:针对当前图像中通过目标检测得到的每个第二真实检测框包含的图像,以及所得到的预测检测框包含的图像,计算预设检测框图像特征;利用匈牙利算法,根据预设检测框图像特征的相似度,以及检测框重合度,将第二真实检测框与预测检测框进行配对。
可选地,具体预设检测框图像特征的相似度,可以是基于预设检测框图像特征向量的余弦距离确定的。
其中,为了减小计算量,提高检测框图像特征的提取效率,从而提高目标检测效率,可以选择高效的图像特征,既能够有效区分不同目标,同时计算复杂度较低可以在嵌入式系统中快速提取处理。
例如,可以直接根据检测框包含的图像中像素值的分布情况,快速确定颜色特征,无需神经网络或其他模型的复杂计算。
因此,可选地,预设检测框图像特征的计算复杂度可以小于预设复杂度阈值。可选地,预设检测框图像特征可以包括以下至少一项:颜色特征、边缘特征、形状特征、纹理特征、方向梯度直方图特征等。
可选地,预测检测框可以是针对上一帧图像中的每个第一真实检测框进行预测得到的。
可选地,确定该第二真实检测框与该预测检测框对应的第一真实检测框属于同一目标,可以添加到该目标对应的检测框集合。在不存在该目标对应的检测框集合,可以直接创建后添加。
具体可以是将该第二真实检测框和该预测检测框对应的第一真实检测框添加到该目标对应的检测框集合。
需要说明的是,随着循环的执行,连续多帧的图像可以完成检测框之间的配对,从而可以确定出属于同一目标的检测框,都添加到该目标对应的检测框集合中。
可选地,由于针对待检测视频数据,只有首帧图像不存在上一帧图像,因此,可选地,可以将待检测视频数据中首帧图像的下一帧图像确定为当前图像,进而执行上述循环步骤。
可选地,基于所获取的检测框,利用预设目标跟踪算法,确定一个或多个检测框集合,可以包括:将待检测视频数据中首帧图像的下一帧图像确定为当前图像,循环执行以下步骤:基于上一帧图像中通过目标检测得到的第一真实检测框,利用预设参数对当前图像中的检测框进行预测,得到预测检测框;不同预测检测框对应于不同第一真实检测框;针对当前图像中通过目标检测得到的每个第二真实检测框,与所得到的预测检测框进行配对,在确定该第二真实检测框与一个预测检测框配对成功的情况下,确定该第二真实检测框与该预测检测框对应的第一真实检测框属于同一目标,添加到该目标对应的检测框集合;基于配对结果,更新上述预设参数;在当前图像不存在下一帧图像的情况下,结束循环;在当前图像存在下一帧图像的情况下,将下一帧图像确定为当前图像。
为了便于理解,本发明实施例还提供了一种目标跟踪方法的流程示意图,如图2所示。图2是根据本发明实施例示出的一种目标跟踪方法的流程示意图。
其中包括以下步骤。
S201:将待检测视频数据中的首帧图像的下一帧图像确定为当前图像。
S202:基于上一帧图像中通过目标检测得到的第一真实检测框,利用预设参数对当前图像中的检测框进行预测,得到预测检测框。
其中,不同预测检测框对应于不同第一真实检测框。
S203:针对当前图像中通过目标检测得到的每个第二真实检测框,与所得到的预测检测框进行配对,在确定该第二真实检测框与一个预测检测框配对成功的情况下,确定该第二真实检测框与该预测检测框对应的第一真实检测框属于同一目标,添加到该目标对应的检测框集合。
S204:基于配对结果,更新上述预设参数。
S205:判断当前图像是否存在下一帧图像。在当前图像不存在下一帧图像的情况下,结束本流程。在当前图像存在下一帧图像的情况下,执行S206。
S206:将下一帧图像确定为当前图像,执行S202。
为了便于理解,本发明实施例还提供了一种具体的实施例。
其中,所检测的目标可以是商品,商品跟踪可以采用基于卡尔曼滤波的快速跟踪算法,并专门设计了可以高效提取图像特征用于提高跟踪稳定性的机制(跟踪稳定性是指能够连续稳定地跟踪同一商品,尽量避免丢失)。
跟踪是迭代进行,一帧视频图像处理的具体流程为:
首先,使用卡尔曼滤波对商品位置及运动速度进行预测得到tracks。
然后,使用匈牙利算法将预测后的tracks和当前帧视频图像中的商品检测结果detections进行匹配(图像特征匹配和区域IOU匹配)得到(track,detection)配对数据。
最后,使用与tracks匹配的商品检测结果detections更新卡尔曼滤波参数。
举例说明如下:
Frame 0:检测器检测到了3个detections,当前没有任何tracks,将这3个detections初始化为tracks。
Frame 1:检测器又检测到了3个detections,对于Frame 0中的tracks,先进行预测得到新的tracks,然后使用匈牙利算法将新的tracks与detections进行匹配,得到(track,detection)匹配对,最后用每对中的detection更新对应的track。
之后循环执行,直到视频数据的最后一帧。
而具体提取图像特征用于提高跟踪稳定性方法,如下所示。
首先,基于特征工程方法选取高效图像特征(如颜色特征、HOG特征),高效特征选取的标准是既能够有效区分不同商品,同时计算复杂度较低可以在嵌入式系统中快速提取处理。
然后,基于图像特征和区域IOU特征对track和detection进行匹配,匹配图像特征匹配采用特征向量的余弦距离。
S103:针对每个检测框集合中检测框所包含的图像,确定图像质量;筛选出图像质量大于预设质量的检测框集合。
其中,图像质量与检测框集合对应目标的分类准确率正相关。
可选地,图像质量分析可以主要分析目标被遮挡的情况及目标运动模糊等对目标分类存在降低准确性的图像情况。
例如,在目标对应的检测框集合中,检测框包含的图像全都是模糊不清楚的,并且被其他物体遮挡,那么后续进行目标分类的准确率通常也会较低。
本方法流程并不限定图像质量的形式,可选地,可以是采用分级的方式确定图像质量。等级越高图像质量越高。相对应地,预设质量具体可以是预设质量等级。
本方法流程并不限定确定图像质量的方式,只要图像质量与检测框集合对应目标的分类准确率正相关即可。
可选地,可以通过神经网络模型确定图像质量。
可选地,具体可以采用分类预测的方式,确定图像质量,也可以采用回归预测的方式,确定图像质量。也可以综合分类预测和回归预测的方式,确定图像质量。
其中,图像质量的分类,可以是通过分级的方式确定分类。
可选地,针对每个检测框集合中检测框所包含的图像,确定图像质量,可以包括:将每个检测框集合中检测框所包含的图像输入预设质量分析模型;预设质量分析模型可以用于确定图像质量;预设质量分析模型可以包含图像质量的回归预测分支和分类 预测分支;针对任一检测框集合,综合预设质量分析模型中回归预测分支和分类预测分支的输出,确定图像质量。
本实施例可以通过综合回归预测和分类预测,提高图像质量的精度、准确率和稳定性。
可选地,预设质量分析模型可以是计算量较小的模型,计算复杂度可以小于预设阈值。具体可以是基于mobilenetv2的轻量化网络。
S104:基于筛选出的每个检测框集合中检测框所包含的图像,确定该检测框集合对应目标的类别。
本方法流程并不限定具体确定目标的类别的方法。
可选地,可以采用目标分类模型进行确定,也可以直接根据图像特征进行分析。
在一种可选的实施例中,可以先针对检测框集合中检测框所包含的图像,提取出目标的特征,再从预设的类别特征集合中进行比对查找,查找符合需求的类别特征,进而确定目标的类别。
在本实施例中,由于预设的类别特征集合可以更新,也就可以更加灵活地调整所能识别的类别,以及类别的具体特征。
例如,针对新增加的类别,可以直接将相应的类别特征添加到预设类别特征集合中,就可以确定出属于该类别的目标。针对需要更新特征的类别,例如,瓶装牛奶的外包装发生了变化,可以直接替换预设类别特征结合中,瓶装牛奶类别的特征。
本实施例通过将目标的特征,与预设的类别特征集合进行比对,可以提高目标分类的灵活性。
因此,可选地,基于筛选出的每个检测框集合中检测框所包含的图像,确定该检测框集合对应目标的类别,可以包括:基于筛选出的每个检测框集合中检测框所包含的图像,提取目标的特征,比对目标的特征与预设类别特征集合中的特征,将相似度大于预设相似度阈值的特征对应的类别,确定为该检测框集合对应目标的类别。
可选地,预设类别特征集合中,可以包括若干不同类别与不同特征的对应关系。
可选地,特征相似度可以通过特征向量的余弦距离计算。
本实施例并不限定具体目标的特征提取方式,可选地,可以通过神经网络模型进行提取,具体可以是通过训练完成的神经网络模型隐藏层,将隐藏层输出确定为目标的特征。
在一种可选的实施例中,在S101的目标检测时,可以针对目标进行分类,确定第一类别。具体地,可以是确定大致的类别,在S104进一步确定目标的细分类别,也就是第二类别。
其中,由于S101中可以确定出目标的第一类别,因此,针对某个第一类别的目标,可以排除其他第一类别的可能性,直接从该第一类别的细分类别中,确定目标的第二类别。
通过这种方式,可以针对已确定第一类别的目标,缩小可能的第二类别范围,从而节约计算量,提高目标检测的效率。
因此,可选地,针对待检测视频数据进行目标检测,可以包括:针对待检测视频数据中目标的第一类别进行检测。
其中,预设分类特征集合可以包括:第二类别的特征;第二类别可以是第一类别的细分类别。
相对应地,可选地,可以预先设置各个第一类别对应的特征提取模型。
基于筛选出的每个检测框集合中检测框所包含的图像,提取目标的特征,可以包括:针对筛选出的每个检测框集合,确定该检测框集合对应目标的第一类别;利用所确定的第一类别对应的特征提取模型,针对筛选出的每个检测框集合中检测框所包含的图像,提取目标的特征。
其中,各个第一类别对应的特征提取模型,可以专用于针对对应的第一类别的目标提取特征。
本实施例并不限定特征提取模型的形式,可选地,特征提取模型可以是计算量较小的模型,计算复杂度可以小于预设阈值。具体可以是基于mobilenetv2的轻量化网络。
本实施例也不具体限定各个第一类别对应的特征提取模型的训练方法,可选地,可以针对单个第一类别下的多个图像样本,利用该第一类别的细分类别作为标签,训练一个神经网络模型,并将训练完成的神经网络模型的隐藏层输出,作为提取的目标的特征。
具体可以是将筛选出的每个检测框集合中检测框所包含的图像,输入到训练完成的神经网络模型,将其中隐藏层的输出确定为目标的特征。
可选地,由于已经确定目标所属的第一类别,因此,具体比对目标的特征与预设类别特征集合中的特征时,可以只比对预设类别特征集合中,目标所属的第一类别包括的细分类别特征,从而可以通过缩小范围,提高比对效率。
需要说明的是,在一种可选的实施例中,S101目标检测时,通常输出的是目标属于各个第一类别的概率,通过概率确定目标所属的第一类别。
因此,可以基于目标属于各个第一类别的概率,确定需要使用的特征提取模型,具体可以是使用输出概率最高的第一类别对应的特征提取模型,或者是使用输出概率大于阈值的第一类别对应的特征提取模型。
可选地,S101目标检测时,可能根据概率无法确定出目标所属的第一类别,也可能输出概率没有大于阈值的第一类别。
换言之,目标检测时,可能难以确定出目标的第一类别。
相对应地,可选地,可以使用预设特征提取模型提取特征。预设特征提取模型可以用于针对各个第一类别或者各个第二类别的目标提取特征。
本实施例并不限定预设特征提取模型的形式,可选地,特征提取模型可以是计算量较小的模型,计算复杂度可以小于预设阈值。具体可以是基于mobilenetv2的轻量化网络,也可以是基于resnet18的模型。本实施例可以减少计算量,提高特征提取的效率。
本实施例也不具体限定预设特征提取模型的训练方法,可选地,可以基于包括全部第二类别标签的图像样本集合,训练一个神经网络模型,并将训练完成的神经网络模型的隐藏层输出,作为提取的目标的特征。
为了便于理解,本发明实施例还提供了一种特征提取方法的流程示意图。如图3所示,图3是根据本发明实施例示出的一种特征提取方法的流程示意图。
具体可以包括以下步骤。
S301:判断是否确定出目标的第一类别。在已经确定出目标的第一类别的情况下,执行S302;在未确定出目标的第一类别的情况下,执行S304。
S302:使用所确定的第一类别对应的特征提取模型,针对目标对应的检测框集 合中检测框所包含的图像,提取目标的特征。
S303:比对所提取的特征,与所确定的第一类别包含的细分类别特征,将相似度大于预设相似度阈值的特征对应的细分类别,确定为目标的类别。
S304:使用预设特征提取模型,针对目标对应的检测框集合中检测框所包含的图像,提取目标的特征。
S305:比对所提取的特征,与全部细分类别特征,将相似度大于预设相似度阈值的特征对应的细分类别,确定为目标的类别。
为了便于理解,本发明实施例还提供了一种具体的实施例。
在智能冰箱的场景中,为提高商品特征提取的精度和速度,专门设计了4个商品特征提取模型。
其中包括3个分别针对瓶装、盒装以及袋装类别商品的,基于mobilenetv2的轻量化小模型,用于进一步确定商品的细分类别;以及1个针对全部商品细分类别的基于resnet18的模型。
通过目标检测可以确定商品检测框的类别置信度,也就是瓶装类别、盒装类别和袋装类别的置信度。
在瓶装类别的置信度大于预设阈值,例如0.6,的情况下,使用针对瓶装类别商品的特征提取模型提取商品特征,与瓶装类别商品的细分类别特征进行比对,得到比对结果。具体可以是将相似度大于预设相似度阈值的特征对应的细分类别,确定为瓶装类别商品的细分类别。
在袋装类别的置信度大于预设阈值,例如0.6,的情况下,使用针对袋装类别商品的特征提取模型提取商品特征,与袋装类别商品的细分类别特征进行比对,得到比对结果。具体可以是将相似度大于预设相似度阈值的特征对应的细分类别,确定为袋装类别商品的细分类别。
在盒装类别的置信度大于预设阈值,例如0.6,的情况下,使用针对盒装类别商品的特征提取模型提取商品特征,与盒装类别商品的细分类别特征进行比对,得到比对结果。具体可以是将相似度大于预设相似度阈值的特征对应的细分类别,确定为盒装类别商品的细分类别。
在其他情况下,可以认为商品没有确定出属于瓶装、盒装还是袋装类别。
因此,可以使用针对全部商品细分类别的特征提取模型提取商品特征,与全量的细分类别特征进行比对,得到比对结果。具体可以是将相似度大于预设相似度阈值的特征对应的细分类别,确定为商品的细分类别。
在一种可选的实施例中,针对特征比对,可以通过降低精度的方式,在不降低特征比对的准确率的情况下,提高特征比对的速度。
因此,可选地,可以针对预设类别特征集合中的特征,设置为预设精度的特征,而针对提取出的目标的特征,可以将精度降低为预设精度,从而可以直接进行比对,降低计算量,提高比对速度,提高目标检测的效率。
在一种具体的实施例中,商品特征提取的常规输出特征向量的精度为FP32,特征比对也是基于FP32的特征向量进行。
在本方法流程中,可以提取出精度为FP16、INT16、INT8的特征,进行比对,能够有效提升比对速度。
在本方法流程中,可以使用较多的神经网络模型,例如,目标检测模型、图像质量分析模型和目标分类模型。
为了进一步减少计算量,提高目标检测效率,可以针对神经网络模型本身进行优化。
可选地,可以针对任一神经网络模型进行模型量化,或者模型剪枝。本方法流程并不限定具体的模型量化方式或者模型剪枝方式。
可选地,可以针对任一神经网络模型进行自动混合精度量化。其中,任一神经网络模型可以是本方法流程中使用的任一神经网络模型。
可选地,用于针对待检测视频数据进行目标检测的第一神经网络模型、用于确定图像质量的第二神经网络模型和用于确定检测框集合对应目标的类别的第三神经网络模型中,至少一个神经网络模型通过以下量化方式得到。
针对以第一参数精度训练完成的初始神经网络模型,量化为第二参数精度,得到中间神经网络模型;第二参数精度低于第一参数精度。
循环执行以下步骤,直到当前中间神经网络模型的准确度和计算速度满足预设量化要求:确定当前中间神经网络模型的准确度和计算速度;确定初始神经网络模型与当前中间神经网络模型之间每层的输出误差;选择其中输出误差最大的层提高参数精度,得到新的中间神经网络模型作为当前中间神经网络模型。
其中,模型的参数精度越低,计算速度越快,相应地,准确率可能会下降。
而预设量化要求,具体可以是针对准确度和计算速度的综合结果的要求。可选地,可以是在准确度不小于预设准确度的情况下,计算速度大于预设速度,从而可以提高目标检测的效率。
可选地,循环结束后得到的当前中间神经网络模型,可以用于本方法流程。
为了便于理解,本发明实施例还提供了一种模型量化方法的流程示意图。如图4所示,图4是根据本发明实施例示出的一种模型量化方法的流程示意图。
该方法可以包括以下步骤。
S401:针对以第一参数精度训练完成的初始神经网络模型,量化为第二参数精度,得到中间神经网络模型。其中,第一参数精度可以是FP32,第二参数精度可以是INT8。
S402:确定当前中间神经网络模型的准确度和计算速度。
S403:判断当前中间神经网络模型的准确度和计算速度是否满足预设量化要求。在满足预设量化要求的情况下,结束本流程。在不满足预设量化要求的情况下,执行S404。
S404:确定初始神经网络模型与当前中间神经网络模型之间每层的输出误差。
S405:选择其中输出误差最大的层提高参数精度,得到新的中间神经网络模型作为当前中间神经网络模型,执行S402。
对应于上述方法实施例,本发明实施例还提供了一种装置实施例。
如图5所示,图5是根据本发明实施例示出的一种动态目标分析装置的结构示意图。
该装置可以包括以下单元。
目标检测单元501,用于针对待检测视频数据进行目标检测,获取每帧图像的目标检测框。
目标跟踪单元502,用于基于所获取的检测框,利用预设目标跟踪算法,确定一个或多个检测框集合;其中,每个检测框集合中包含不同图像之间属于同一目标的检测 框,不同检测框集合对应于不同目标。
筛选单元503,用于针对每个检测框集合中检测框所包含的图像,确定图像质量;图像质量与检测框集合对应目标的分类准确率正相关;筛选出图像质量大于预设质量的检测框集合。
分类单元504,用于基于筛选出的每个检测框集合中检测框所包含的图像,确定该检测框集合对应目标的类别。
可选地,分类单元504,用于:基于筛选出的每个检测框集合中检测框所包含的图像,提取目标的特征,比对目标的特征与预设类别特征集合中的特征,将相似度大于预设相似度阈值的特征对应的类别,确定为该检测框集合对应目标的类别。
可选地,目标检测单元501,用于:针对待检测视频数据中目标的第一类别进行检测;预设分类特征集合包括:第二类别的特征;第二类别是第一类别的细分类别;装置预先设置有各个第一类别对应的特征提取模型;分类单元504,用于:针对筛选出的每个检测框集合,确定该检测框集合对应目标的第一类别;利用所确定的第一类别对应的特征提取模型,针对筛选出的每个检测框集合中检测框所包含的图像,提取目标的特征。
可选地,目标检测单元501,用于:将待检测视频数据输入预设目标检测模型,预设目标检测模型用于确定目标的第三类别;预设目标检测模型包含对应于每个第三类别的检测框预测分支;在确定任一目标的第三类别的情况下,将该第三类别对应的检测框预测分支输出的检测框,确定为该目标的检测框。
可选地,目标检测单元501,用于:将待检测视频数据输入预设目标检测模型,预设目标检测模型用于确定目标的检测框位置、尺寸和旋转角度;预设目标检测模型包含,检测框旋转角度的回归预测分支和分类预测分支;针对任一目标,综合预设目标检测模型中回归预测分支和分类预测分支的输出,确定旋转角度综合结果。
可选地,目标跟踪单元502,用于:将待检测视频数据中首帧图像的下一帧图像确定为当前图像,循环执行以下步骤:基于上一帧图像中通过目标检测得到的第一真实检测框,利用预设参数对当前图像中的检测框进行预测,得到预测检测框;不同预测检测框对应于不同第一真实检测框;针对当前图像中通过目标检测得到的每个第二真实检测框,与所得到的预测检测框进行配对,在确定该第二真实检测框与一个预测检测框配对成功的情况下,确定该第二真实检测框与该预测检测框对应的第一真实检测框属于同一目标,添加到该目标对应的检测框集合;基于配对结果,更新预设参数;在当前图像不存在下一帧图像的情况下,结束循环;在当前图像存在下一帧图像的情况下,将下一帧图像确定为当前图像。
可选地,预设参数包括,卡尔曼滤波的参数;目标跟踪单元502,用于:利用卡尔曼滤波对当前图像中的检测框进行预测;针对当前图像中通过目标检测得到的每个第二真实检测框包含的图像,以及所得到的预测检测框包含的图像,计算预设检测框图像特征;预设检测框图像特征的计算复杂度小于预设复杂度阈值;利用匈牙利算法,根据预设检测框图像特征的相似度,以及检测框重合度,将第二真实检测框与预测检测框进行配对。
可选地,筛选单元503,用于:将每个检测框集合中检测框所包含的图像输入预设质量分析模型;预设质量分析模型用于确定图像质量;预设质量分析模型包含图像质量的回归预测分支和分类预测分支;针对任一检测框集合,综合预设质量分析模型中回归预测分支和分类预测分支的输出,确定图像质量。
可选地,用于针对待检测视频数据进行目标检测的第一神经网络模型、用于确定图像质量的第二神经网络模型和用于确定检测框集合对应目标的类别的第三神经网络模型中,至少一个神经网络模型通过以下量化方式得到:针对以第一参数精度训练完 成的初始神经网络模型,量化为第二参数精度,得到中间神经网络模型;第二参数精度低于第一参数精度;循环执行以下步骤,直到当前中间神经网络模型的准确度和计算速度满足预设量化要求:确定当前中间神经网络模型的准确度和计算速度;确定初始神经网络模型与当前中间神经网络模型之间每层的输出误差;选择其中输出误差最大的层提高参数精度,得到新的中间神经网络模型作为当前中间神经网络模型。
本发明实施例还提供一种计算机设备,其至少包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,处理器执行所述程序时实现上述任一方法实施例。
本发明实施例还提供一种电子设备,包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行上述任一方法实施例。
图6是根据本发明实施例示出的一种配置本发明实施例方法的计算机设备硬件结构示意图,该设备可以包括:处理器1010、存储器1020、输入/输出接口1030、通信接口1040和总线1050。其中处理器1010、存储器1020、输入/输出接口1030和通信接口1040通过总线1050实现彼此之间在设备内部的通信连接。
处理器1010可以采用通用的CPU(Central Processing Unit,中央处理器)、微处理器、应用专用集成电路(Application Specific Integrated Circuit,ASIC)、或者一个或多个集成电路等方式实现,用于执行相关程序,以实现本发明实施例所提供的技术方案。
存储器1020可以采用ROM(Read Only Memory,只读存储器)、RAM(Random Access Memory,随机存取存储器)、静态存储设备,动态存储设备等形式实现。存储器1020可以存储操作系统和其他应用程序,在通过软件或者固件来实现本发明实施例所提供的技术方案时,相关的程序代码保存在存储器1020中,并由处理器1010来调用执行。
输入/输出接口1030用于连接输入/输出模块,以实现信息输入及输出。输入输出/模块可以作为组件配置在设备中(图中未示出),也可以外接于设备以提供相应功能。其中输入设备可以包括键盘、鼠标、触摸屏、麦克风、各类传感器等,输出设备可以包括显示器、扬声器、振动器、指示灯等。
通信接口1040用于连接通信模块(图中未示出),以实现本设备与其他设备的通信交互。其中通信模块可以通过有线方式(例如USB、网线等)实现通信,也可以通过无线方式(例如移动网络、WIFI、蓝牙等)实现通信。
总线1050包括一通路,在设备的各个组件(例如处理器1010、存储器1020、输入/输出接口1030和通信接口1040)之间传输信息。
需要说明的是,尽管上述设备仅示出了处理器1010、存储器1020、输入/输出接口1030、通信接口1040以及总线1050,但是在具体实施过程中,该设备还可以包括实现正常运行所必需的其他组件。此外,本领域的技术人员可以理解的是,上述设备中也可以仅包含实现本发明实施例方案所必需的组件,而不必包含图中所示的全部组件。
本发明实施例还提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述任一方法实施例。
本发明实施例还提供一种存储有计算机程序的计算机可读存储介质,所述计算机程序在由处理器执行时实现上述任一方法实施例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他 数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
通过以上的实施方式的描述可知,本领域的技术人员可以清楚地了解到本发明实施例可借助软件加必需的通用硬件平台的方式来实现。基于这样的理解,本发明实施例的技术方案本质上或者说做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明实施例各个实施例或者实施例的某些部分所述的方法。
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机,计算机的具体形式可以是个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件收发设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任意几种设备的组合。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于装置实施例而言,由于其基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,在实施本发明实施例方案时可以把各模块的功能在同一个或多个软件和/或硬件中实现。也可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
以上所述仅是本发明实施例的具体实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明实施例原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本发明实施例的保护。
在本发明中,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性。术语“多个”指两个或两个以上,除非另有明确的限定。
本领域技术人员在考虑说明书及实践这里公开的公开后,将容易想到本发明的其它实施方案。本发明旨在涵盖本发明的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本发明的一般性原理并包括本发明未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本发明的真正范围和精神由下面的权利要求指出。
应当理解的是,本发明并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本发明的范围仅由所附的权利要求来限制。

Claims (12)

  1. 一种动态目标分析方法,其特征在于,包括:
    针对待检测视频数据进行目标检测,获取每帧图像的目标检测框;
    基于所获取的检测框,利用预设目标跟踪算法,确定一个或多个检测框集合;其中,每个检测框集合中包含不同图像之间属于同一目标的检测框,不同检测框集合对应于不同目标;
    针对每个检测框集合中检测框所包含的图像,确定图像质量;所述图像质量与检测框集合对应目标的分类准确率正相关;筛选出图像质量大于预设质量的检测框集合;
    基于筛选出的每个检测框集合中检测框所包含的图像,确定该检测框集合对应目标的类别。
  2. 根据权利要求1所述的方法,其特征在于,所述基于筛选出的每个检测框集合中检测框所包含的图像,确定该检测框集合对应目标的类别,包括:
    基于筛选出的每个检测框集合中检测框所包含的图像,提取目标的特征,比对所述目标的特征与预设类别特征集合中的特征,将相似度大于预设相似度阈值的特征对应的类别,确定为该检测框集合对应目标的类别。
  3. 根据权利要求2所述的方法,其特征在于,所述针对待检测视频数据进行目标检测,包括:
    针对待检测视频数据中目标的第一类别进行检测;
    所述预设分类特征集合包括:第二类别的特征;所述第二类别是所述第一类别的细分类别;
    所述方法还包括:预先设置各个第一类别对应的特征提取模型;
    所述基于筛选出的每个检测框集合中检测框所包含的图像,提取目标的特征,包括:
    针对筛选出的每个检测框集合,确定该检测框集合对应目标的第一类别;
    利用所确定的第一类别对应的特征提取模型,针对筛选出的每个检测框集合中检测框所包含的图像,提取目标的特征。
  4. 根据权利要求1所述的方法,其特征在于,所述针对待检测视频数据进行目标检测,包括:
    将待检测视频数据输入预设目标检测模型,所述预设目标检测模型用于确定目标的第三类别;所述预设目标检测模型包含对应于每个第三类别的检测框预测分支;
    在确定任一目标的第三类别的情况下,将该第三类别对应的检测框预测分支输出的检测框,确定为该目标的检测框。
  5. 根据权利要求1所述的方法,其特征在于,所述针对待检测视频数据进行目标检测,包括:
    将待检测视频数据输入预设目标检测模型,所述预设目标检测模型用于确定目标的检测框位置、尺寸和旋转角度;
    所述预设目标检测模型包含,检测框旋转角度的回归预测分支和分类预测分支;
    针对任一目标,综合所述预设目标检测模型中回归预测分支和分类预测分支的输出,确定旋转角度综合结果。
  6. 根据权利要求1所述的方法,其特征在于,所述基于所获取的检测框,利用预设目标跟踪算法,确定一个或多个检测框集合,包括:
    将所述待检测视频数据中首帧图像的下一帧图像确定为当前图像,循环执行以下步骤:
    基于上一帧图像中通过目标检测得到的第一真实检测框,利用预设参数对当前图像中的检测框进行预测,得到预测检测框;不同预测检测框对应于不同第一真实检测框;
    针对当前图像中通过目标检测得到的每个第二真实检测框,与所得到的预测检测框进行配对,在确定该第二真实检测框与一个预测检测框配对成功的情况下,确定该第二真实检测框与该预测检测框对应的第一真实检测框属于同一目标,添加到该目标对应的检测框集合;
    基于配对结果,更新所述预设参数;
    在当前图像不存在下一帧图像的情况下,结束循环;
    在当前图像存在下一帧图像的情况下,将下一帧图像确定为当前图像。
  7. 根据权利要求6所述的方法,其特征在于,所述预设参数包括,卡尔曼滤波的参数;
    所述利用预设参数对当前图像中的检测框进行预测,包括:利用卡尔曼滤波对当前图像中的检测框进行预测;
    所述针对当前图像中通过目标检测得到的每个第二真实检测框,与所得到的预测检测框进行配对,包括:
    针对当前图像中通过目标检测得到的每个第二真实检测框包含的图像,以及所得到的预测检测框包含的图像,计算预设检测框图像特征;所述预设检测框图像特征的计算复杂度小于预设复杂度阈值;
    利用匈牙利算法,根据预设检测框图像特征的相似度,以及检测框重合度,将第二真实检测框与预测检测框进行配对。
  8. 根据权利要求1所述的方法,其特征在于,所述针对每个检测框集合中检测框所包含的图像,确定图像质量,包括:
    将每个检测框集合中检测框所包含的图像输入预设质量分析模型;所述预设质量分析模型用于确定图像质量;所述预设质量分析模型包含图像质量的回归预测分支和分类预测分支;
    针对任一检测框集合,综合所述预设质量分析模型中回归预测分支和分类预测分支的输出,确定图像质量。
  9. 根据权利要求1所述的方法,其特征在于,用于针对待检测视频数据进行目标检测的第一神经网络模型、用于确定图像质量的第二神经网络模型和用于确定检测框集合对应目标的类别的第三神经网络模型中,至少一个神经网络模型通过以下量化方式得到:
    针对以第一参数精度训练完成的初始神经网络模型,量化为第二参数精度,得到中间神经网络模型;第二参数精度低于第一参数精度;
    循环执行以下步骤,直到当前中间神经网络模型的准确度和计算速度满足预设量化要求:
    确定当前中间神经网络模型的准确度和计算速度;
    确定所述初始神经网络模型与当前中间神经网络模型之间每层的输出误差;
    选择其中输出误差最大的层提高参数精度,得到新的中间神经网络模型作为当前中间神经网络模型。
  10. 一种动态目标分析装置,其特征在于,包括:
    目标检测单元,用于针对待检测视频数据进行目标检测,获取每帧图像的目标检测框;
    目标跟踪单元,用于基于所获取的检测框,利用预设目标跟踪算法,确定一个或多个检测框集合;其中,每个检测框集合中包含不同图像之间属于同一目标的检测框,不同检测框集合对应于不同目标;
    筛选单元,用于针对每个检测框集合中检测框所包含的图像,确定图像质量;所述图像质量与检测框集合对应目标的分类准确率正相关;筛选出图像质量大于预设质量的检测框集合;
    分类单元,用于基于筛选出的每个检测框集合中检测框所包含的图像,确定该检测框集合对应目标的类别。
  11. 一种电子设备,其特征在于,包括:
    至少一个处理器;以及,
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如权利要求1至9中任一项所述方法。
  12. 一种存储有计算机程序的计算机可读存储介质,其特征在于,所述计算机程序 在由处理器执行时实现权利要求1至9中任一项所述方法。
PCT/CN2023/091884 2022-05-16 2023-04-28 一种动态目标分析方法、装置、设备及存储介质 WO2023221770A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210531336.5 2022-05-16
CN202210531336.5A CN114782494A (zh) 2022-05-16 2022-05-16 一种动态目标分析方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2023221770A1 true WO2023221770A1 (zh) 2023-11-23

Family

ID=82436779

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/091884 WO2023221770A1 (zh) 2022-05-16 2023-04-28 一种动态目标分析方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN114782494A (zh)
WO (1) WO2023221770A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114782494A (zh) * 2022-05-16 2022-07-22 京东方科技集团股份有限公司 一种动态目标分析方法、装置、设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190180146A1 (en) * 2017-12-13 2019-06-13 Microsoft Technology Licensing, Llc Ensemble model for image recognition processing
CN110610510A (zh) * 2019-08-29 2019-12-24 Oppo广东移动通信有限公司 目标跟踪方法、装置、电子设备及存储介质
CN112417970A (zh) * 2020-10-22 2021-02-26 北京迈格威科技有限公司 目标对象识别方法、装置和电子系统
CN113158909A (zh) * 2021-04-25 2021-07-23 中国科学院自动化研究所 基于多目标跟踪的行为识别轻量化方法、系统、设备
CN114782494A (zh) * 2022-05-16 2022-07-22 京东方科技集团股份有限公司 一种动态目标分析方法、装置、设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190180146A1 (en) * 2017-12-13 2019-06-13 Microsoft Technology Licensing, Llc Ensemble model for image recognition processing
CN110610510A (zh) * 2019-08-29 2019-12-24 Oppo广东移动通信有限公司 目标跟踪方法、装置、电子设备及存储介质
CN112417970A (zh) * 2020-10-22 2021-02-26 北京迈格威科技有限公司 目标对象识别方法、装置和电子系统
CN113158909A (zh) * 2021-04-25 2021-07-23 中国科学院自动化研究所 基于多目标跟踪的行为识别轻量化方法、系统、设备
CN114782494A (zh) * 2022-05-16 2022-07-22 京东方科技集团股份有限公司 一种动态目标分析方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN114782494A (zh) 2022-07-22

Similar Documents

Publication Publication Date Title
US11335092B2 (en) Item identification method, system and electronic device
US20190130580A1 (en) Methods and systems for applying complex object detection in a video analytics system
Bertini et al. Multi-scale and real-time non-parametric approach for anomaly detection and localization
WO2019057168A1 (zh) 一种货物订单处理方法、装置、服务器、购物终端及系统
TWI578272B (zh) Shelf detection system and method
US20170255830A1 (en) Method, apparatus, and system for identifying objects in video images and displaying information of same
US20140169639A1 (en) Image Detection Method and Device
CN109858552B (zh) 一种用于细粒度分类的目标检测方法及设备
CN111263224B (zh) 视频处理方法、装置及电子设备
WO2023221770A1 (zh) 一种动态目标分析方法、装置、设备及存储介质
US20150154455A1 (en) Face recognition with parallel detection and tracking, and/or grouped feature motion shift tracking
CN113766330A (zh) 基于视频生成推荐信息的方法和装置
WO2018078408A1 (en) Reducing scale estimate errors in shelf images
CN110335313A (zh) 音频采集设备定位方法及装置、说话人识别方法及系统
CN110060278A (zh) 基于背景减法的运动目标的检测方法及装置
CN111310531B (zh) 图像分类方法、装置、计算机设备及存储介质
CN111260685B (zh) 视频处理方法、装置及电子设备
KR102427690B1 (ko) 딥러닝 기반 클래스 분류 장치 및 방법
US11853881B2 (en) Image analysis-based classification and visualization of events
CN114332602A (zh) 一种智能货柜的商品识别方法
CN107665495B (zh) 对象跟踪方法及对象跟踪装置
CN115601686B (zh) 物品交付确认的方法、装置和系统
CN113496513A (zh) 一种目标对象检测方法及装置
Achakir et al. An automated AI-based solution for out-of-stock detection in retail environments
CN111967403B (zh) 视频移动区域确定方法、装置及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23806722

Country of ref document: EP

Kind code of ref document: A1