CN116630947A - Foreign matter detection method and device, and non-transient computer readable storage medium - Google Patents

Foreign matter detection method and device, and non-transient computer readable storage medium Download PDF

Info

Publication number
CN116630947A
CN116630947A CN202310220476.5A CN202310220476A CN116630947A CN 116630947 A CN116630947 A CN 116630947A CN 202310220476 A CN202310220476 A CN 202310220476A CN 116630947 A CN116630947 A CN 116630947A
Authority
CN
China
Prior art keywords
target detection
model
detection frame
image
student
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310220476.5A
Other languages
Chinese (zh)
Inventor
王婷婷
黄光伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BOE Technology Group Co Ltd
Original Assignee
BOE Technology Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BOE Technology Group Co Ltd filed Critical BOE Technology Group Co Ltd
Priority to CN202310220476.5A priority Critical patent/CN116630947A/en
Publication of CN116630947A publication Critical patent/CN116630947A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/778Active pattern-learning, e.g. online learning of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/84Arrangements for image or video recognition or understanding using pattern recognition or machine learning using probabilistic graphical models from image or video features, e.g. Markov models or Bayesian networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

A foreign matter detection method and apparatus, a non-transitory computer readable storage medium, includes: inputting the first image into a target detection model to obtain one or more target detection frames; the target detection model is obtained through training by the following method: pre-training a teacher model by a self-supervision method, and fine-tuning the teacher model by a supervision method; carrying out knowledge distillation on the student model by adopting the teacher model to obtain the target detection model; extracting the characteristic vector of each target detection frame; and processing the extracted characteristic vectors to determine whether the object in each target detection frame is a foreign object.

Description

Foreign matter detection method and device, and non-transient computer readable storage medium
Technical Field
Embodiments of the present disclosure relate to the field of target detection technology, but are not limited to, and in particular, to a foreign object detection method and apparatus, and a computer readable storage medium.
Background
The convolutional neural network has strong learning ability and efficient characteristic expression ability, and plays a great role in computer vision tasks such as segmentation, detection, identification, tracking and the like. In the aspect of intelligent retail, by utilizing a computer vision technology and erecting cameras at the positions of a store refrigerator, an automatic vending cabinet and the like, the quantity and the types of commodities on the cabinet are automatically identified, so that the labor cost can be greatly saved, and the shopping experience of customers is improved.
However, in some cases, some consumers may put items on the container that are not part of the container, such as garbage or other commodity. This situation is currently difficult to supervise. Merchants can only judge whether foreign matters exist on the container through manual comparison. The method is complicated in detection and low in efficiency.
Disclosure of Invention
The following is a summary of the subject matter described in detail herein. This summary is not intended to limit the scope of the claims.
The embodiment of the disclosure provides a foreign matter detection method, which comprises the following steps:
inputting the first image into a target detection model to obtain one or more target detection frames; the target detection model is obtained through training by the following method: pre-training a teacher model by a self-supervision method, and fine-tuning the teacher model by a supervision method; carrying out knowledge distillation on the student model by adopting the teacher model to obtain the target detection model;
extracting the characteristic vector of each target detection frame;
and processing the extracted characteristic vectors to determine whether the object in each target detection frame is a foreign object.
The embodiment of the disclosure also provides a foreign matter detection device, which comprises a memory; and a processor connected to the memory, the memory for storing instructions, the processor configured to perform the steps of the foreign object detection method of any of the embodiments of the present disclosure based on the instructions stored in the memory.
The embodiments of the present disclosure also provide a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the foreign object detection method of any of the embodiments of the present disclosure.
Other aspects will become apparent upon reading and understanding the accompanying drawings and detailed description.
Drawings
The accompanying drawings are included to provide a further understanding of the disclosed embodiments and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain, without limitation, the embodiments of the disclosure. The shapes and sizes of various components in the drawings are not to scale true, and are intended to be illustrative of the present disclosure.
Fig. 1 is a flowchart illustrating a foreign object detection method according to an exemplary embodiment of the present disclosure;
FIG. 2 illustrates an exemplary application scenario diagram of an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a target detection model according to an exemplary embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a knowledge distillation process provided by an exemplary embodiment of the present disclosure;
FIG. 5 is a schematic illustration of a merchandise image provided by an exemplary embodiment of the present disclosure;
FIG. 6 is a schematic view of a partial template image of the merchandise image shown in FIG. 5;
FIG. 7 is a schematic diagram of a foreign object detection system according to an exemplary embodiment of the present disclosure;
fig. 8 is a schematic structural view of a foreign matter detection device according to an exemplary embodiment of the present disclosure.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the present disclosure more apparent, embodiments of the present disclosure will be described in detail hereinafter with reference to the accompanying drawings. It should be noted that, without conflict, the embodiments of the present disclosure and features of the embodiments may be arbitrarily combined with each other.
Unless otherwise defined, technical or scientific terms used in the disclosure of the embodiments of the present disclosure should be given the ordinary meaning as understood by one of ordinary skill in the art to which the present disclosure pertains. The terms "first," "second," and the like, as used in embodiments of the present disclosure, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The word "comprising" or "comprises", and the like, is intended to mean that elements or items preceding the word encompass the elements or items listed thereafter and equivalents thereof without precluding other elements or items.
As shown in fig. 1, an embodiment of the present disclosure provides a foreign matter detection method, including the steps of:
step 101, inputting a first image into a target detection model to obtain one or more target detection frames; the target detection model is obtained by training the following method: pre-training a teacher model by a self-supervision method, and fine-tuning the teacher model by a supervision method; carrying out knowledge distillation on the student model by adopting a teacher model to obtain a target detection model;
102, extracting a feature vector of each target detection frame;
step 103, processing the extracted feature vectors to determine whether the object in each target detection frame is a foreign object.
According to the foreign matter detection method provided by the embodiment of the disclosure, the teacher model is pre-trained by adopting a self-supervision method, and the teacher model is finely adjusted by adopting a supervision method; and the teacher model is adopted to carry out knowledge distillation on the student model to obtain a target detection model, and in the detection process, the feature vector of a target detection frame output by the target detection model is processed to obtain a foreign matter detection result, so that sufficient visual feature expression can be learned, the foreign matter can be detected on the premise of correctly identifying commodity classification, and the method can be applied to scenes with lower calculation power such as terminals.
The embodiments of the present disclosure have a wide variety of application scenarios, which may be, but are not limited to, store refrigerators, vending containers, and the like. For example, for the refrigerator application scenario shown in fig. 2, the foreign matter detection method of the embodiment of the present disclosure may detect a commodity that does not belong to the refrigerator (for example, assuming that the foreign matter in fig. 2 is a herbal tea drink, the foreign matter detection result may be framed by a rectangular dotted line frame, and other commodities may be framed by a rectangular solid line frame). The types of goods in the refrigerator are limited and the categories are known, but the types of foreign matters are very many and the categories are unknown, and the present disclosure performs foreign matter detection by using the idea of open set recognition.
In the embodiment of the present disclosure, other articles or commodities that are not foreign objects may be identified as articles or commodities in a unified manner, or may be identified as a plurality of different classifications, which is not limited in this disclosure.
In the embodiment of the disclosure, before the teacher model is adopted to distill knowledge of the student model, the student model can be pre-trained by using a self-supervision method, so that the capability of the student model is further improved.
In the application scenario shown in fig. 2, an angle sensor and an image acquisition device (for example, the image acquisition device may be a camera, and the image acquisition device is shot into the refrigerator) may be disposed on a refrigerator door, and a target detection model and a foreign object detection device are disposed on a terminal, where the foreign object detection device may acquire a rotation signal of the angle sensor; calculating a rotation angle according to a rotation signal of the angle sensor, when the rotation angle is within a preset threshold range (namely after a refrigerator door is opened to a certain angle), notifying an image acquisition device by a foreign matter detection device to acquire images, receiving the images acquired by the image acquisition device, selecting one or more images, inputting the selected images into a target detection model to obtain one or more target detection frames, and extracting feature vectors of each target detection frame; the extracted feature vectors are processed to determine whether the object within each target detection frame is a foreign object. In fig. 2, the solid line box indicates that the article is a commodity belonging to the inside of the refrigerator, the dotted line box indicates "other" and indicates a foreign matter, and the foreign matter detection method of the present disclosure can detect a foreign matter on the premise of ensuring that the classification of the commodity is correctly recognized.
In the embodiment of the disclosure, the target detection model is trained by the following method:
pre-training a teacher model by a self-supervision method;
finely adjusting the teacher model by a supervision method;
and carrying out knowledge distillation on the student model by adopting a teacher model to obtain a target detection model.
The embodiment of the disclosure uses a self-supervision learning method to pretrain a teacher model, can well express visual characteristics for downstream tasks under the condition of no labeling, can improve the capability of the pretraining model, uses a data set without labeling image data in the pretraining process, and can be any article, namely can not be a commodity in a final application scene. Still taking the application scenario of fig. 2 as an example, when the teacher model is pre-trained by the self-supervision method, the content of the unlabeled image data used may be any item, and not necessarily a commodity in the refrigerator.
In the commodity identification scene, as a plurality of similar commodities are arranged in the data set, the self-supervision method enables own data to be enhanced to serve as a positive sample, and the other image serves as a negative sample, so that intra-class differences are reduced, inter-class differences are increased, fine granularity characteristics of the commodities can be learned, and identification accuracy is improved.
In addition, a large number of commodity images are in the data set collected at present, the commodity images can be cut out through detection, but the commodity categories are marked by less manpower, the self-supervision learning pre-training teacher model is used, the marking data are used for fine adjustment, the number of pictures to be marked can be greatly reduced, and the accuracy of the prediction result is greatly improved.
In some exemplary embodiments, pre-training the teacher model by a self-supervision method includes:
carrying out multiple transformations on each original image in the non-marked image data to obtain a first preset number of images, wherein the first preset number is the transformation times;
respectively carrying out pre-training on the first preset number of images by using a teacher model before pre-training to respectively obtain first preset number of first predicted values corresponding to the teacher model;
and comparing and learning the first preset number of first predicted values, and then self-supervising and training the teacher model.
In embodiments of the present disclosure, a contrast learning approach is used to learn the characterization of vision unsupervised while the teacher model is pre-trained by a self-supervising approach. The purpose of contrast learning is to distinguish between homogeneous and heterogeneous samples, and the self-supervised contrast loss function uses an information noise contrast estimation (InfoNCE noise contrastive estimation, infoNCE) loss function, expressed as follows:
where q is the feature vector of the object being searched, k + Is a positive sample feature vector, k i Is a negative sample feature vector, where q and k + The same sample is generated through data enhancement, the two samples belong to the same class, and then the distance between q and the positive sample is shortened through comparison loss, and the distance between the negative samples is pushed away.
In some exemplary embodiments, as shown in fig. 3, the object detection model includes a Backbone network (Backbone) for extracting features of an input image, a full connection layer for predicting a position of an object and a category score condition, and an output layer for outputting a category label.
In some exemplary embodiments, the teacher model includes a first backbone network and the student model includes a second backbone network, the first backbone network may be a ReXNet network, the second backbone network may be a MobileNeXt network, and an hourglass-type bottleneck (SandGlass Bottleneck) structure specific to the MobileNeXt network may substantially reduce parameters of the student model. However, the embodiment of the present disclosure is not limited thereto, and the structures of the first and second backbone networks may be set as needed.
In the disclosed embodiments, the fully-connected layer may include one or more layers, and illustratively, the fully-connected layer may include two layers, however, the disclosed embodiments are not limited in this regard.
In some exemplary embodiments, the knowledge distillation is a feature distillation, and the total loss function of the student model is: loss=loss 1+lamda Loss2, wherein lamda is a preset weight coefficient, for the predicted value of the student model, Y is a true value, cosfaceoss represents a Cosface recognition Cosface loss function; loss2=mse (feature_student), which is a Feature vector of the student model, and MSE, which is a Feature vector of the teacher model, represents a mean square error Loss function.
In the embodiment of the disclosure, in order to improve the detection capability of the model, a large model (i.e., a teacher model) is pre-trained and fine-tuned, and then a small model (i.e., a student model) is distilled, and only features are distilled in the distillation process, so that the small model fully learns the capability of the large model.
As shown in fig. 4, in the present disclosure, feature distillation is used, on the one hand, a student model is allowed to fit soft tag information output by a teacher model, so that the student model can learn some potential semantic information, and experience of the teacher model is induced, when in distillation, a training sample X (real tag is Y) is input into the teacher model to obtain a Feature vector feature_test of the teacher model, a Feature vector feature_student of the student model is input into the student model, and a MSEloss is used to calculate loss between feature_test and feature_student; on the other hand, the prediction value of the student model and the real hard tag are subjected to cross entropy loss, the difference of the real data is known, the cosface loss function is used for training, classifying and identifying the loss, and the two losses are added through a weight lamda to form total loss.
Loss2=MSE(Feature_student,Feature_teacher)
Total training Loss is loss=loss1+lamda×los2.
In performing the feature distillation, the embodiments of the present disclosure only need to adjust one parameter: the weight lamda does not need to adjust the temperature parameter T. In the training process, adaptive sharpness perception minimization (Adaptive Sharpness Aware Minimization, ASAM) can be used for generalization, and although a general training method can converge to a good local optimal point, the vicinity of the local optimal point is often very unsmooth, that is, adding a small disturbance ωw+e to the weight ωw may cause a bad result, so that the embodiment of the disclosure adopts an ASAM adaptive adjustment weight disturbance range to obtain a model with the greatest loss, and the model is more robust.
L S For the loss on the training set S,the floating range representing the weight ω does not exceed ρ, e is a floating value.
In some exemplary embodiments, processing the extracted feature vectors to determine whether the object within each target detection frame is a foreign object includes:
extracting the feature vector of each known class of object according to the trained object detection model;
for each target detection box, the following operations are performed:
and calculating the similarity between the extracted feature vector of the target detection frame and the feature vector of each known class of object, and determining whether the object in the target detection frame is a foreign object according to the similarity result.
The embodiment of the disclosure provides a method for determining whether an object in each target detection frame is a foreign object through similarity calculation. In the embodiment of the disclosure, the feature similarity is used to determine which commodity category or foreign matter an object in the target detection frame belongs to, and template images of M classes of commodities need to be manufactured in advance when the object detection frame is used, wherein M is the number of commodity categories needing to be identified. The more the commodity types, the stronger the foreign matter detection capability. When the self-supervision learning method is adopted for pre-training, any article data set can be adopted for pre-training, and when the article data set is in fine tuning, the found article data set is used for fine tuning. In some exemplary embodiments, a class N commodity data set is used in the fine tuning, where N is much greater than M.
Meaning of template image: the M types of commodity images to be identified are known in advance, and one of them is taken as an example, and the commodity image is assumed to be a baccarat image as shown in fig. 5. In actual detection, for each target detection frame, we acquire the image of the target detection frame after cutting, extract the feature, obtain Fquery, compare Fquery with feature Fgamma of the BAIXIAN, if the similarity of two features is very high, it indicates that the object detected by this target detection frame is BAIXIAN, i.e. the BAIXIAN image shown in FIG. 5 is a template image of BAIXIAN. In practice, images of various angles of the velcro can be acquired as template images, as shown in fig. 6, i.e., the template images cover different angles as much as possible. The calculation formula for measuring feature similarity:
in some exemplary embodiments, determining whether the object within the target detection frame is a foreign object based on the similarity result includes:
determining a maximum value of the calculated similarities;
comparing the determined maximum value with a preset similarity threshold value;
when the determined maximum value is greater than or equal to a preset similarity threshold value, determining that the object in the target detection frame is a known class object corresponding to the determined maximum value;
and when the determined maximum value is smaller than a preset similarity threshold value, determining that the object in the target detection frame is a foreign object.
Suppose that the ith commodity has q in M classes of commodities i Taking the commodity image as a template image, and extracting in advance according to a trained target detection modelFeatures of each kind of commodity, in commonExtracting the feature Fquery of a target detection frame to be predicted when in use, and respectively calculating the feature Fquery and +.>And finding out the cosine distance of each feature to obtain a nearest cosine distance as a similarity calculation result, judging that the object in the target detection frame is the commodity category corresponding to the nearest feature when the similarity calculation result is larger than or equal to a preset similarity threshold value, and judging that the object in the target detection frame is foreign matter when the similarity calculation result is smaller than the preset similarity threshold value.
In other exemplary embodiments, processing the extracted feature vector (where the extracted feature vector is a softmax score for each known class) to determine if the object within each target detection box is a foreign object includes:
determining a Weber distribution model of each known class of articles according to the trained target detection model;
for each target detection box, the following operations are performed:
acquiring softmax scores of the target detection boxes belonging to each known category; calculating the distance from the feature vector of the target detection frame to the centroid of each known category, determining the probability that the target detection frame belongs to each known category according to the calculated distance, and correcting each softmax score by using the calculated probability; and calculating the probability of the object in the target detection frame being the foreign object according to the corrected score.
The embodiment of the disclosure also provides a method for determining whether the object in each target detection frame is a foreign object through softmax score correction. Through the self-supervision learning, the pre-training model which can well express visual characterization can be obtained by learning with other commodity data, and then the pre-trained teacher model is finely tuned by a supervision method, and the application scene shown in the figure 2 is taken as an example, wherein the application scene is that an image acquisition device is placed on a refrigerator door, and the image acquisition device acquires images inside the refrigerator so as to determine whether the object category which does not belong to the refrigerator appears. Assuming that the types of commodities to be stored in the refrigerator are M types, M types of sample data can be actually collected aiming at the types of commodities, and when M types of data are trained, a teacher model is obtained by fine tuning the model. The foreign matter detection method improves the capability of foreign matter identification under the condition that the accuracy rate of normal commodity identification is not affected.
When determining whether the object in each target detection frame is a foreign object by using a similarity calculation mode and determining whether the object in each target detection frame is a foreign object by using a softmax score correction mode, the feature vectors extracted in the two modes are different, wherein when determining whether the object in each target detection frame is a foreign object by using a similarity calculation mode, the extracted feature vector is the feature vector extracted in the previous layer of the softmax layer; in determining whether the object within each target detection frame is a foreign object using the softmax score correction approach, the extracted feature vector is a softmax score for each known class extracted by the softmax layer.
The usual target detection model directly uses the result after softmax as the final prediction, but the present disclosure needs to detect whether there are other commodities not belonging to the M classes, and the embodiments of the present disclosure correct the softmax score by the following method to obtain the probability that the commodity belongs to the foreign object.
Step 1, we have obtained the commodity classification model through the self-supervision learning and fine tuning learning process. And acquiring M-class commodity (P1, P2, pi, … PM) data to obtain a feature vector before softmax. Taking the ith commodity as an example for explanation, reserving the features F of the penultimate layer of the commodity aiming at predicting the correct commodity, and if s in total in the Pi, reserving the feature Fi set of the Pi = { F1, F2, … Fs }; calculating the mean value in Fi as a centroid, and recording mFi;
calculating the distance from each element in Fi to the centroid mF, marking as Di, and fitting the distribution of the extremum in Di by using a Weber distribution formula as follows:
where x is a random variable, λ is a scale parameter (scale parameter), k is a shape parameter (shape parameter), λ >0, k >0. And obtaining lambda and k through fitting calculation of each commodity category, and obtaining a Weibull distribution.
Step 2, for a sample x input during reasoning, fx is calculated, and after Fx passes through softmax, score= { score belonging to each category is calculated 1 ,score 2 ,…,score m -a }; calculating the distance { d1, d2, di … dm } of Fx to each class centroid mF, based on distance d i The maximum value can be found in the Weibull probability density function sent to the i-th class (- ≡d) i ) Probability p of (2) i This probability value p i Representing the probability that the predicted sample does not belong to class i, then 1-p i I.e. the probability that the sample belongs to class i. Thus, weight w i =1-p i As a class i correction weight.
Step 3, correcting the score, and marking the new score after each type of correction as new_score= { w 1 *score 1 ,…w i *score i ,w m *score m Probability of score belonging to foreign matter (foreign matter) class is
Comparison score foreignmatter And all values in the new score, and taking the category corresponding to the highest scoring probability as the category (foreign matter or commodity) corresponding to the article.
The above two ways of determining whether the object within each target detection frame is a foreign object (similarity calculation and softmax correction of score) are generally used separately, and in some exemplary embodiments, the above two ways of determining whether the object within each target detection frame is a foreign object may be used in combination. For example, when the first image is acquired, it is assumed that the object in the target detection frame is a foreign object by both of the above two methods, and it is assumed that the result of the determination is different by one method, that is, when the object is a foreign object by the other method, it is possible to acquire the image several times more and to perform the detection again, or it is preferable to determine whether the object in the target detection frame is a foreign object by one of the above methods, such as the similarity calculation method.
In some exemplary embodiments, the method is preceded by:
acquiring a rotation signal of an angle sensor;
calculating a rotation angle according to a rotation signal of the angle sensor;
when the calculated rotation angle is within a preset range, acquiring an image acquired by the image acquisition device;
one or more images among the images acquired by the image acquisition device are selected as the first image.
In some exemplary embodiments, the method is preceded by:
acquiring a rotation signal of an angle sensor;
calculating a rotation angle according to a rotation signal of the angle sensor;
when the calculated rotation angle is within a preset range, acquiring an image acquired by the image acquisition device;
and detecting the rotation target of the image acquired by the image acquisition device to obtain one or more rotation detection frames, and taking the rotation detection frames as a first image.
Taking the application scenario of fig. 2 as an example, the foreign matter detection device obtains the value of the angle sensor to determine the door opening angle of the refrigerator, and because the foreign matter detection needs to see the whole appearance inside the refrigerator, if the door opening angle is too small, the acquired image may be deformed; if the door opening angle is overlarge, the body part of a person possibly enters the refrigerator to shield the commodity, so that when the door opening angle of the refrigerator is required to be within a certain threshold range, the image of the refrigerator is acquired, the image acquired by the image acquisition device is subjected to rotary target detection to obtain one or more rotary detection frames, then the image of each rotary detection frame is subjected to target detection through a target detection model, then the target detection frames are subjected to foreign matter detection and identification, and the final identification result is sent to a Web end to be displayed or when the foreign matter is detected, sound-light-electric alarm is carried out.
In the embodiment of the disclosure, the commodity identification accuracy can be improved by adding some strategies, and for the same type of commodity, if the difference of the colors and the shapes is not large, people are difficult to distinguish, and only the tastes are inconsistent, the commodity identification accuracy can be identified as one type (correspondingly, when the template image is finely adjusted and manufactured, the data set and the template image can be classified as one type); for the same kind of commodity, if the difference of the appearance is not large, only the colors are different, then the commodity is identified as two types (correspondingly, when the template image is trimmed and manufactured, the data set and the template image can be classified into two types).
As shown in fig. 7, the embodiment of the present disclosure also provides a foreign matter detection system including a foreign matter detection device, an image acquisition apparatus, and an angle sensor, wherein,
the image acquisition device is used for acquiring images;
the angle sensor is used for detecting a rotation signal;
the foreign matter detection device is used for acquiring a rotation signal of the angle sensor, calculating a rotation angle according to the rotation signal of the angle sensor, acquiring an image acquired by the image acquisition device when the calculated rotation angle is within a preset range, and inputting the acquired image into the target detection model to obtain one or more target detection frames; extracting the characteristic vector of each target detection frame; and processing the extracted characteristic vectors to determine whether the object in each target detection frame is a foreign object.
In other exemplary embodiments, the foreign object detection device is configured to obtain a rotation signal of the angle sensor, calculate a rotation angle according to the rotation signal of the angle sensor, obtain an image acquired by the image acquisition device when the calculated rotation angle is within a preset range, perform rotation target detection on the image acquired by the image acquisition device to obtain one or more rotation detection frames, and input the rotation detection frames into the target detection model to obtain one or more target detection frames; extracting the characteristic vector of each target detection frame; and processing the extracted characteristic vectors to determine whether the object in each target detection frame is a foreign object.
The embodiment of the disclosure also provides a foreign matter detection device, which comprises a memory; and a processor connected to the memory, the memory for storing instructions, the processor configured to perform the steps of the foreign object detection method according to any of the embodiments of the present disclosure based on the instructions stored in the memory.
As shown in fig. 8, in one example, the foreign matter detection device may include: processor 810, memory 820, and bus system 830, wherein processor 810 and memory 820 are coupled via bus system 830, memory 820 is configured to store instructions, and processor 810 is configured to execute the instructions stored by memory 820 to input a first image into an object detection model to obtain one or more object detection boxes; the target detection model is obtained through training by the following method: pre-training a teacher model by a self-supervision method, and fine-tuning the teacher model by a supervision method; carrying out knowledge distillation on the student model by adopting the teacher model to obtain the target detection model; extracting the characteristic vector of each target detection frame; and processing the extracted characteristic vectors to determine whether the object in each target detection frame is a foreign object.
It should be appreciated that the processor 810 may be a central processing unit (Central Processing Unit, CPU), and the processor 810 may also be other general purpose processors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), off-the-shelf programmable gate arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Memory 820 may include read only memory and random access memory and provides instructions and data to processor 810. A portion of memory 820 may also include non-volatile random access memory. For example, memory 820 may also store information of device type.
The bus system 830 may include a power bus, a control bus, a status signal bus, and the like in addition to a data bus. But for clarity of illustration, the various buses are labeled in fig. 8 as bus system 830.
In implementation, the processing performed by the processing device may be performed by integrated logic circuits in hardware or by instructions in software in processor 810. That is, the method steps of the embodiments of the present disclosure may be embodied as hardware processor execution or as a combination of hardware and software modules in a processor. The software modules may be located in random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, and other storage media. Which is located in a memory 820, and a processor 810 reads information in the memory 820 and performs the steps of the above method in combination with its hardware. To avoid repetition, a detailed description is not provided herein.
The embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the foreign object detection method according to any of the embodiments of the present disclosure.
In some possible implementations, aspects of the foreign object detection method provided by the present disclosure may also be implemented in the form of a program product including program code for causing a computer device to perform the steps in the foreign object detection method according to the various exemplary embodiments of the present disclosure described above when the program product is run on the computer device, for example, the computer device may perform the foreign object detection method described in the examples of the present disclosure.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
Those of ordinary skill in the art will appreciate that all or some of the steps, systems, functional modules/units in the apparatus, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between the functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed cooperatively by several physical components. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.
While the embodiments disclosed in this disclosure are described above, the embodiments are only used for facilitating understanding of the disclosure, and are not intended to limit the present invention. Any person skilled in the art will recognize that any modifications and variations can be made in the form and detail of the present disclosure without departing from the spirit and scope of the disclosure, which is defined by the appended claims.

Claims (10)

1. A foreign matter detection method, characterized by comprising:
inputting the first image into a target detection model to obtain one or more target detection frames; the target detection model is obtained through training by the following method: pre-training a teacher model by a self-supervision method, and fine-tuning the teacher model by a supervision method; carrying out knowledge distillation on the student model by adopting the teacher model to obtain the target detection model;
extracting the characteristic vector of each target detection frame;
and processing the extracted characteristic vectors to determine whether the object in each target detection frame is a foreign object.
2. The method of claim 1, wherein processing the extracted feature vectors to determine whether the object within each of the target detection frames is a foreign object comprises:
extracting the feature vector of each known class of object according to the trained object detection model;
for each target detection frame, executing the following operations:
and calculating the similarity between the extracted feature vector of the target detection frame and the feature vector of each known class of object, and determining whether the object in the target detection frame is a foreign object according to the similarity result.
3. The method of claim 2, wherein determining whether the object within the target detection frame is a foreign object based on the similarity result comprises:
determining a maximum value of the calculated similarities;
comparing the determined maximum value with a preset similarity threshold value;
when the determined maximum value is greater than or equal to a preset similarity threshold value, determining that the object in the target detection frame is a known class object corresponding to the determined maximum value;
and when the determined maximum value is smaller than a preset similarity threshold value, determining that the object in the target detection frame is a foreign object.
4. The method of claim 1, wherein processing the extracted feature vectors to determine whether the object within each of the target detection frames is a foreign object comprises:
determining a Weber distribution model of each known class of articles according to the trained target detection model;
for each target detection frame, executing the following operations:
acquiring softmax scores of the target detection boxes belonging to each known category; calculating the distance from the feature vector of the target detection frame to the centroid of each known category, determining the probability that the target detection frame belongs to each known category according to the calculated distance, and correcting each softmax score by using the calculated probability; and calculating the probability of the object in the target detection frame being foreign matter according to the corrected score.
5. The method of claim 1, wherein the teacher model comprises a first backbone network and a first feature extraction layer, the student model comprises a second backbone network and a second feature extraction layer, the first backbone network is a ReXNet network, and the second backbone network is a MobileNeXt network.
6. The method according to claim 1, wherein the knowledge distillation is specificSymptom distillation, wherein the total loss function of the student model is as follows: loss=loss 1+lamda+loss 2, where lamda is a preset weight coefficient,cosfacellus represents cosine face recognition loss function, +.>Y is a true value for the predicted value of the student model; loss2=mse (feature_student), MSE represents a mean square error Loss function, feature_student is a Feature vector of the student model, and feature_student is a Feature vector of the teacher model.
7. The method according to claim 1, characterized in that the method is preceded by:
acquiring a rotation signal of an angle sensor;
calculating a rotation angle according to the rotation signal of the angle sensor;
when the calculated rotation angle is within a preset range, acquiring an image acquired by the image acquisition device;
one or more images among the images acquired by the image acquisition device are selected as the first image.
8. The method according to claim 1, characterized in that the method is preceded by:
acquiring a rotation signal of an angle sensor;
calculating a rotation angle according to the rotation signal of the angle sensor;
when the calculated rotation angle is within a preset range, acquiring an image acquired by the image acquisition device;
and detecting a rotation target of the image acquired by the image acquisition device to obtain one or more rotation detection frames, wherein the rotation detection frames are used as the first image.
9. A foreign matter detection device, characterized by comprising a memory; and a processor connected to the memory, the memory for storing instructions, the processor configured to perform the steps of the foreign object detection method according to any one of claims 1 to 8 based on the instructions stored in the memory.
10. A non-transitory computer-readable storage medium, characterized in that a computer program is stored thereon, which when executed by a processor, implements the foreign matter detection method according to any one of claims 1 to 8.
CN202310220476.5A 2023-03-08 2023-03-08 Foreign matter detection method and device, and non-transient computer readable storage medium Pending CN116630947A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310220476.5A CN116630947A (en) 2023-03-08 2023-03-08 Foreign matter detection method and device, and non-transient computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310220476.5A CN116630947A (en) 2023-03-08 2023-03-08 Foreign matter detection method and device, and non-transient computer readable storage medium

Publications (1)

Publication Number Publication Date
CN116630947A true CN116630947A (en) 2023-08-22

Family

ID=87590815

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310220476.5A Pending CN116630947A (en) 2023-03-08 2023-03-08 Foreign matter detection method and device, and non-transient computer readable storage medium

Country Status (1)

Country Link
CN (1) CN116630947A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117611592A (en) * 2024-01-24 2024-02-27 长沙隼眼软件科技有限公司 Foreign matter detection method, device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117611592A (en) * 2024-01-24 2024-02-27 长沙隼眼软件科技有限公司 Foreign matter detection method, device, electronic equipment and storage medium
CN117611592B (en) * 2024-01-24 2024-04-05 长沙隼眼软件科技有限公司 Foreign matter detection method, device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
AU2014240213B2 (en) System and Method for object re-identification
US9336433B1 (en) Video face recognition
CN112131978B (en) Video classification method and device, electronic equipment and storage medium
US10169683B2 (en) Method and device for classifying an object of an image and corresponding computer program product and computer-readable medium
US20180165552A1 (en) All-weather thermal-image pedestrian detection method
US20150110387A1 (en) Method for binary classification of a query image
CN104715023A (en) Commodity recommendation method and system based on video content
Lee et al. Place recognition using straight lines for vision-based SLAM
CN101142586A (en) Method of performing face recognition
CN110807434A (en) Pedestrian re-identification system and method based on combination of human body analysis and coarse and fine particle sizes
US20180286081A1 (en) Object re-identification with temporal context
CN116630947A (en) Foreign matter detection method and device, and non-transient computer readable storage medium
CN111241987B (en) Multi-target model visual tracking method based on cost-sensitive three-branch decision
Lee et al. Data labeling research for deep learning based fire detection system
Bardeh et al. New approach for human detection in images using histograms of oriented gradients
Li et al. Seatbelt detection based on cascade adaboost classifier
CN110472639B (en) Target extraction method based on significance prior information
CN109934147B (en) Target detection method, system and device based on deep neural network
US20220375202A1 (en) Hierarchical sampling for object identification
Sarkar et al. Universal skin detection without color information
Liu et al. Video retrieval based on object discovery
Dutra et al. Re-identifying people based on indexing structure and manifold appearance modeling
Merrad et al. A Real-time Mobile Notification System for Inventory Stock out Detection using SIFT and RANSAC.
CN113465251B (en) Intelligent refrigerator and food material identification method
Happold Structured forest edge detectors for improved eyelid and Iris segmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination