WO2022083157A1 - Target detection method and apparatus, and electronic device - Google Patents

Target detection method and apparatus, and electronic device Download PDF

Info

Publication number
WO2022083157A1
WO2022083157A1 PCT/CN2021/101773 CN2021101773W WO2022083157A1 WO 2022083157 A1 WO2022083157 A1 WO 2022083157A1 CN 2021101773 W CN2021101773 W CN 2021101773W WO 2022083157 A1 WO2022083157 A1 WO 2022083157A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
value
reference position
matrix
truth
Prior art date
Application number
PCT/CN2021/101773
Other languages
French (fr)
Chinese (zh)
Inventor
王剑锋
Original Assignee
北京迈格威科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京迈格威科技有限公司 filed Critical 北京迈格威科技有限公司
Publication of WO2022083157A1 publication Critical patent/WO2022083157A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • the present disclosure relates to the technical field of model training, and in particular, to a target detection method, device and electronic device.
  • Object detection is a basic task of computer vision. It finds objects of interest to users in a picture and outputs their categories and positions, which can be represented by bounding boxes.
  • the common target detection methods are all implemented based on neural networks. Each position on the feature map output by the neural network corresponds to an output result. Therefore, these methods include a process called label assignment in the training process. The process determines the learning target for each location on the neural network's feature map during training.
  • the label assignment process specifies whether each position on the feature map of the neural network is to learn a positive sample (foreground) or a negative sample ( background), if it is learning a positive sample, select 1 from the n targets as a positive sample for this position.
  • This label assignment process is usually based on manually designed rules. Since the manually designed rules are subject to a certain degree, the performance of the network model trained in this label assignment method is not good, which affects the reliability of target detection. .
  • the purpose of the present disclosure is to provide a target detection method, device and electronic device, which can improve at least one of the above problems.
  • An embodiment of the present disclosure provides a target detection method, which includes: acquiring an image to be detected; inputting the image to be detected into a target detection model to obtain a target detection result; the target detection result includes the position and score of a bounding box corresponding to the target; wherein , the target detection model is trained by the following methods: input the image samples in the image sample set into the student network model, and obtain the student model detection results corresponding to each pixel of the first feature map of the image samples; wherein, the image samples are marked with the target real value frame, the student model detection result includes the score of the first reference position corresponding to each pixel of the first feature map and the coordinate information corresponding to the first reference position; obtain the teacher model detection result of the image sample by the teacher network model; wherein , the teacher network model is a pre-trained model, and the detection result of the teacher model includes the score of the second reference position corresponding to each pixel of the second feature map of the image sample and the coordinate information corresponding to the second reference position;
  • the step of determining the label assignment information according to the detection result of the teacher model includes: for each second reference position, respectively calculating the overlap ratio of the second reference position and each target ground truth frame of the image sample to obtain a matrix.
  • i takes the value [1, N]
  • j takes the value [1, A]
  • N is the number of marked ground truth boxes
  • A is the number of second reference positions included in the second feature map; based on the second reference
  • the overlap ratio between the position and each target ground-truth frame and the score of the second reference position determine the prediction quality of the second reference position for each target ground-truth frame corresponding to the target; wherein, the prediction quality is used to characterize the second reference position What is detected is the probability of the target corresponding to the target ground truth frame; the label assignment information of each first reference position is determined based on the prediction quality of each second reference position for the target corresponding to each target ground truth frame.
  • q ij (s j ) 1- ⁇ *(IoU ij ) ⁇ to calculate the prediction quality of each second reference position for the target corresponding to each target ground-truth frame, and obtain the prediction quality matrix Q;
  • q ij takes the value is [0,1]
  • is a preset hyperparameter with a value in the [0,1] interval
  • s j is the score of the jth second reference position
  • IoU ij is the jth second reference position corresponding to the
  • the overlap ratio of the second prediction frame and the i-th target ground-truth frame is the element of the i-th row and the j-th column in the matrix IoU;
  • the above-mentioned image samples are also marked with the target type corresponding to each target ground truth frame; based on the overlap ratio of the second prediction frame corresponding to the second reference position and each target ground truth frame and the score of the second reference position,
  • the above-mentioned step of determining the label assignment information of each first reference position based on the prediction quality of the target corresponding to each target truth frame based on each second reference position includes: for each second reference position, The second reference position selects the maximum prediction quality from the prediction quality of the target corresponding to each target ground-truth frame; judges whether the maximum prediction quality is greater than or equal to the first preset quality value; if so, assigns the maximum prediction quality to the second reference position The positive label of the corresponding target.
  • the above-mentioned step of determining the label assignment information of the first reference position based on the prediction quality of the target corresponding to each target ground truth frame based on each second reference position includes: for the jth column in the prediction quality matrix, all from Select the element q mj with the largest value in the column elements;
  • t p >t n , t p and t n are preset thresholds respectively, the first value represents positive samples, the second value represents negative samples, and the third value represents ignore samples.
  • the above-mentioned step of determining the label assignment information of each first reference position based on the prediction quality of the corresponding target of each target ground truth frame based on each second reference position includes: for the i-th row in the prediction quality matrix, all of the steps are obtained from this row. Select the target element q im greater than t p from the elements, and set the element X im corresponding to the target element in the i-th row of the initial label assignment matrix as the first value; among them, q im is greater than the unselected element in the row other elements q iu ;
  • the above step of calculating the loss function value of the student network model according to the label assignment information and the student model detection result includes: performing the following steps for each first reference position corresponding to each pixel in the first feature map. : determine the second reference position corresponding to the first reference position; determine the target truth value frame of the first reference position based on the label assignment information of the first reference position corresponding to the first reference position; based on the target truth value frame of the first reference position Calculate the classification loss function value and the regression loss function value with the score of the first reference position; determine the loss function value of the student network model based on the classification loss function value and the regression loss function value of each first reference position.
  • the embodiment of the present disclosure also provides a target detection device, wherein the device includes: an image acquisition module, configured to acquire an image to be detected; a target detection module, configured to input the to-be-detected image into a target detection model to obtain a target detection result;
  • the detection result includes the position and score of the bounding box corresponding to the target;
  • the target detection model is trained in the following way: input the image samples in the image sample set into the student network model, and obtain each pixel corresponding to the first feature map of the image sample.
  • the student model detection results of The teacher model detection result of the sample wherein, the teacher network model is a pre-trained model, and the teacher model detection result includes the score and position coordinates of the second reference position corresponding to each pixel of the second feature map of the image sample; wherein , the reference positions of the first feature map and the second feature map are the same; the label assignment information of the image sample is determined according to the detection result of the teacher model; the loss function value of the student network model is calculated according to the label assignment information and the detection result of the student model; based on the loss function Adjust the parameters of the student network model and continue training until the trained student network model is obtained; the trained student network model is used as the target detection model.
  • the target detection module is further configured to: for each second reference position, calculate the intersection of the second prediction frame corresponding to the second reference position and each target ground truth frame of the image sample respectively. Overlap ratio, get matrix IoU:
  • i takes the value [1, N]
  • j takes the value [1, A]
  • N is the number of the labeled ground truth boxes
  • A is the number of the second reference positions included in the second feature map
  • q ij (s j ) 1- ⁇ *(IoU ij ) ⁇ to calculate the prediction quality of each of the second reference positions for the target corresponding to each of the target ground-truth boxes, to obtain a prediction quality matrix Q;
  • q ij is [0,1]
  • is a preset hyperparameter with a value in the [0,1] interval
  • s j is the score of the jth second reference position
  • IoU ij is the jth
  • the overlap ratio of the second prediction frame corresponding to the two reference positions and the i-th target truth frame is the element of the i-th row and the j-th column in the matrix IoU;
  • the prediction quality of the target is obtained, and the prediction quality matrix Q is obtained;
  • q ij is [0,1]
  • is a preset hyperparameter with a value in the [0,1] interval
  • s j is the jth
  • the score of the two reference positions, IoU ij is the overlap ratio of the second prediction frame corresponding to the jth second reference position and the i-th target true value frame, and is the element of the i-th row and the j-th column in the matrix IoU;
  • the image sample is also marked with the target type corresponding to each target ground-truth frame;
  • the preset hyperparameter of the interval s ij is the score corresponding to the current target type in the score of the jth second reference position, and the current target type refers to the target type corresponding to the ith target truth value frame, IoU ij is the overlap ratio of the second prediction frame corresponding to the j-th second reference position and the i-th target ground-truth frame, and is the element of the i-th row and the j-th column in the matrix IoU;
  • the target detection module is further configured to: for each of the second reference positions, select the maximum prediction quality from the prediction qualities of the target corresponding to each of the target ground truth frames at the second reference position; Determine whether the maximum prediction quality is greater than or equal to a first preset quality value; if so, assign a positive sample label of the target corresponding to the maximum prediction quality to the second reference position.
  • the target detection module is further configured to: for the jth column in the prediction quality matrix, select the element q mj with the largest value from the elements of the column; if q mj is greater than t p , set q mj The element X mj in the corresponding label assignment matrix is equal to the first value; for the elements q ij other than q mj in the jth column, if q ij is less than t n , set the element X ij in the label assignment matrix corresponding to q ij equal to The second value; if q ij is less than or equal to t p and greater than or equal to t n , set X ij corresponding to q ij equal to the third value; if q mj is less than t n , set X ij in the jth column of the label assignment matrix The element X ij is equal to the second value; if q mj is greater than
  • the target detection module is further configured to: for the ith row in the prediction quality matrix, select a target element q im greater than t p from the elements of the row, and set the ith row in the label assignment initial matrix.
  • the element X im corresponding to the target element in the row is the first value; wherein, q im is greater than other elements q iu that are not selected in the row element; for the element q iu other than q im in the i-th row , If the qi iu is less than or equal to t p and greater than or equal to t n , set the element X iu of the initial matrix corresponding to the label assignment of the qi iu equal to the third value; if the qi iu is less than t n , set the q iu iu corresponds to the element X iu of the initial label allocation matrix equal to the second value; check whether the elements in the jth column of the initial label allocation matrix have conflicting elements
  • the target detection module is further configured to: for each first reference position corresponding to each pixel in the first feature map, perform the following steps: assign information based on the label of the first reference position. , determine the target ground truth frame of the first reference position; calculate the classification loss function value and the regression loss function value based on the target ground value frame of the first reference position and the score of the first reference position; The classification loss function value and the regression loss function value of the first reference position determine the loss function value of the student network model.
  • Embodiments of the present disclosure also provide an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the above method when executing the computer program.
  • Embodiments of the present disclosure further provide a computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and the computer program executes the steps of the foregoing method when the computer program is run by a processor.
  • FIG. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure
  • FIG. 2 is a flowchart of a target detection method according to an embodiment of the present disclosure
  • FIG. 3 is a flowchart of a training method for a target detection model provided by an embodiment of the present disclosure
  • FIG. 4 is a schematic diagram of a reference position in an anchor frame-based technology provided by an embodiment of the present disclosure
  • FIG. 5 is a schematic diagram of a reference position based on a non-anchor frame technology provided by an embodiment of the present disclosure
  • FIG. 6 is a flowchart of another method for training a target detection model according to an embodiment of the present disclosure
  • FIG. 7 is a flowchart of another method for training a target detection model provided by an embodiment of the present disclosure.
  • FIG. 8 is a flowchart of another method for training a target detection model according to an embodiment of the present disclosure.
  • FIG. 9 is a flow chart of training a target detection model according to an embodiment of the present disclosure.
  • FIG. 10 is a schematic structural diagram of a target detection apparatus according to an embodiment of the present disclosure.
  • Knowledge distillation refers to the method of using one (possibly deeper or more complex) neural network to guide another (possibly shallower or simpler) neural network during training.
  • the former is called the teacher network model and the latter is called the student network model.
  • the inventor found through research that if a certain position on the feature map of the teacher network model has a better detection result for a certain target, then the corresponding position of the student network model has a better detection result for the target. The probability will be better, so it is more reasonable to assign the label of the target to this position and then train the student network model.
  • the student network model obtained by this training method has higher reliability for target detection.
  • the embodiments of the present disclosure provide a target detection method, device, and electronic device.
  • a trained teacher network model is introduced to predict the training sample of the student network model, and then the label assignment information of the sample is determined. , based on this information to complete the training of the student network model to improve the performance of the student network model, thereby improving the reliability of target detection using the student network model.
  • the following description will be given by way of examples.
  • Embodiments of the present disclosure first provide an exemplary illustration of an electronic device that can implement a target detection method and apparatus.
  • the electronic device 100 includes one or more processors 102, One or more memories 104, input devices 106, output devices 108, and one or more image capture devices 110 are interconnected by a bus system 112 and/or other form of connection mechanism (not shown).
  • a bus system 112 and/or other form of connection mechanism (not shown).
  • the components and structures of the electronic device 100 shown in FIG. 1 are only exemplary and not restrictive, and the electronic device may also have other components and structures as required.
  • the processor 102 can be a server, an intelligent terminal, or a device that includes a central processing unit (CPU) or other forms of processing units with data processing capabilities and/or instruction execution capabilities, and can process data from other components in the electronic device 100. Processing may also control other components in electronic device 100 to perform object detection functions.
  • CPU central processing unit
  • processing may also control other components in electronic device 100 to perform object detection functions.
  • Memory 104 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory.
  • Volatile memory may include, for example, random access memory (RAM) and/or cache memory, among others.
  • Non-volatile memory may include, for example, read only memory (ROM), hard disk, flash memory, and the like.
  • One or more computer program instructions may be stored on a computer-readable storage medium, and the processor 102 may execute the program instructions to implement the functions described below (implemented by the processing device) in the disclosed embodiments and/or other desired functions.
  • Various application programs and various data such as various data used and/or generated by the application program, etc., may also be stored in the computer-readable storage medium.
  • Input device 106 may be a device used by a user to input instructions, and may include one or more of a keyboard, mouse, microphone, touch screen, and the like.
  • the output device 108 may output various information (eg, images or sounds) to the outside (eg, a user), and may include one or more of a display, a speaker, and the like.
  • the image acquisition device 110 may acquire a training sample set and store the acquired training sample set in the memory 104 for use by other components.
  • each device in the electronic device configured to implement the target detection method and device according to the embodiments of the present disclosure may be integrated or distributed, such as the processor 102 , the memory 104 , the input device 106 and the output device 108 .
  • the image acquisition device 110 is set in a designated position where the sample can be acquired.
  • the electronic device can be implemented as a smart terminal such as a camera, a smart phone, a tablet computer, a computer, a vehicle-mounted terminal, and the like.
  • an electronic device configured to implement the target detection method and apparatus according to the embodiments of the present disclosure may include more or less components than the above-described exemplary electronic device, which is not limited herein.
  • This embodiment also provides a target detection method.
  • the method mainly includes the following steps S202 to S204:
  • Step S202 acquiring an image to be detected
  • the image to be detected may be an image acquired by an image acquisition device such as a camera or a camera, and the image acquisition device may be installed in the waiting hall of a passenger station (such as a subway or high-speed rail) according to the detection needs to perform face detection.
  • Image or human body image acquisition; image acquisition equipment can also be set at traffic intersections or on both sides of the road to collect vehicle images according to detection needs.
  • the above image to be detected can also be obtained from a third-party device (such as a cloud server, etc.).
  • the image to be detected may also be an image corresponding to other types of target objects such as animals, designated objects, etc., which is not limited in this embodiment of the present disclosure.
  • Step S204 Input the image to be detected into the target detection model to obtain a target detection result; the target detection result includes the position and score of the bounding box corresponding to the target.
  • the target detection model can be a detection model for a specific type of target, or a detection model for multiple different types of targets. After inputting the image to be detected into the target detection model, the target detection model performs target detection on the image to be detected.
  • the detection image contains the target of the type that the detection model can detect.
  • the bounding box and the corresponding score corresponding to each target belonging to the type can be obtained.
  • the score corresponding to the bounding box indicates that the target corresponding to the bounding box belongs to this type. Confidence of the target.
  • the above target detection model is mainly obtained by training the following steps S302 to S312:
  • Step S302 input the image samples in the image sample set into the student network model, and obtain the student model detection result corresponding to each pixel of the first feature map of the image sample;
  • the model detection result includes the score of the first reference position corresponding to each pixel of the first feature map and the coordinate information corresponding to the first reference position; wherein, the first reference position includes the first anchor frame or the first position point;
  • the reference position (including the above-mentioned first reference position and the second reference position mentioned later) mentioned in the embodiments of the present disclosure may be based on anchor-based technology (anchor-based), each anchor point (that is, the pixel on the feature map) point) corresponding anchor box.
  • each anchor point corresponds to one or more anchor boxes
  • each anchor box corresponds to a prediction box.
  • the coordinate information corresponding to the first reference position is the coordinate offset between the first anchor frame and the first prediction frame corresponding to the first anchor frame, that is, the position of the first prediction frame relative to the first anchor frame
  • the coordinates of the first prediction frame can be determined from the coordinates of the first anchor frame and the coordinate offset.
  • the reference position (including the above-mentioned first reference position and the second reference position mentioned later) mentioned in the embodiment of the present disclosure may be based on the anchor-free technology (anchor-free), and each pixel on the feature map corresponds to one or multiple locations.
  • each position point corresponds to a prediction frame, and the position point can be regarded as a reference position.
  • the coordinate information corresponding to the first reference position is the first prediction frame of the position point relative to the position point.
  • the coordinate offset of , the coordinates of the first prediction frame can be determined from the coordinates of the position point and the coordinate offset.
  • the small line box represents the anchor box
  • the small dot in the middle of the anchor box represents the anchor point
  • the dotted box represents the prediction box corresponding to the anchor box.
  • the coordinate information corresponding to the anchor frame is the relative position of the dotted frame and the anchor frame.
  • the arrow in FIG. 4 indicates the relative positional relationship between the anchor frame and the prediction frame.
  • the dot in the middle of the dashed box in Figure 5 represents a position point, which can be regarded as a reference position in Figure 4, and the dotted box represents the prediction frame corresponding to the position point, then the coordinate information corresponding to the first reference position is
  • the position of the dotted box in FIG. 5 is relative to the position point in FIG. 5
  • the arrow in FIG. 5 indicates the relative positional relationship between the position point and the prediction frame.
  • the teacher network model and the student network model in the embodiments of the present disclosure may both be based on the anchor frame technology, or one may be based on the anchor frame technology and the other may be based on the non-anchor frame technology, or both may be based on the non-anchor frame technology technology, as long as the number of reference positions (that is, the number of anchor boxes/the number of position points) is the same for both.
  • the coordinates of the anchor box, the prediction box and the bounding box can be represented by the coordinates of the upper left corner and the lower right corner of the box.
  • the first reference position is called the first anchor frame.
  • the first anchor frame is expressed as: [a1(x a1 ,y a1 ),b1(x b1 ,y b1 )], where a1 represents the coordinates of the upper left corner of the first anchor frame, b1 represents the coordinates of the lower right corner of the first anchor frame, and the first prediction frame corresponding to the first anchor frame uses A1 and B1 Coordinate representation of two points, suppose ⁇ x a1 represents the abscissa offset between the upper left corner position point a1 and A1, ⁇ y a1 represents the ordinate offset between the upper left corner position point a1 and A1, ⁇ x b1 represents the upper right corner position point The abscissa offset between b1 and B1, ⁇ y b1 represents
  • the coordinate transformation method based on the non-anchor frame technology shown in FIG. 5 is similar to the above-mentioned FIG. 4, except that the first reference position is a position point, and the first prediction frame determined based on the position point uses two upper left corner and lower right corner. If the coordinates of the point are represented, the coordinate information corresponding to the first reference position is the coordinate offset between the upper left corner of the first prediction frame and the position point, and the coordinate offset between the lower right corner of the first prediction frame and the coordinate point.
  • the image sample set can be an image set obtained in advance from the network or other storage devices, or can be a sample set formed by manually labeling images collected by a collection device of an electronic device.
  • the image sample set includes multiple images Samples, the specific number can be set according to demand.
  • the above-mentioned image sample set has already marked the ground-truth box of the target.
  • the purpose of marking the ground-truth box of the target is to frame the target contained in the image sample, such as an image
  • the samples include targets such as pedestrians, motor vehicles, non-motor vehicles, or faces.
  • faces, pedestrians, motor vehicles, and non-motor vehicles are marked with target ground-truth boxes in the form of bounding boxes.
  • target ground-truth boxes of different colors can be used to label different types of targets, or different category labels can be used to label them, such as 1 for the face frame and 3 for the face frame.
  • the above target detection model can also only detect targets of the same type.
  • the target detection model only detects one of the types of targets such as pedestrians, motor vehicles, non-motor vehicles or faces.
  • the target ground-truth box only frames the target of this type in the image sample.
  • target ground-truth box it can not only indicate which targets are included in the image sample and what type the target belongs to, but also obtain the position coordinates of the target in the image sample.
  • the first feature map is output.
  • the number of the first feature map is determined by the model design, and there can be multiple ones.
  • Each first feature map can include C*H*W pixels, Among them, C is the number of feature map channels, H is the length of the feature map, and W is the width of the feature map.
  • the reference position corresponding to each pixel in the first feature map (each pixel may correspond to one or more anchor boxes, or each pixel corresponds to one or more position points) is represented by the first reference position.
  • the student network model can obtain the score corresponding to the first reference position corresponding to each pixel point of the first feature map and the coordinate information corresponding to the first reference position by performing target detection on the image sample.
  • the score of a certain first reference position is multiple scores corresponding to multiple types, and the score corresponding to a certain type among the multiple scores represents the first benchmark
  • the location-detected target belongs to the classification probability value of this type.
  • the student network model is trained to detect 4 types of targets.
  • the student network model is a network model based on anchor box technology. See the first feature map corresponding to each pixel in the student detection results shown in Table 1.
  • Example of scores for an anchor box See the first feature map corresponding to each pixel in the student detection results shown in Table 1.
  • the first anchor frame 1 to the first anchor frame 4 are the anchor frames corresponding to X pixels (X is less than or equal to 4) in the first feature map respectively.
  • X is less than or equal to 4
  • the corresponding face, human body, motor vehicle and The score of the license plate (that is, the classification probability score, or simply the score or the probability score) is shown in Table 1.
  • Table 1 it can be seen that the target detected by the first anchor box 1 is more likely to belong to the human body type, while The target detected by the first anchor frame 2 is more likely to belong to the motor vehicle type, the target detected by the first anchor frame 3 is more likely to belong to the face type 1, and the target detected by the first anchor frame 4 belongs to the license plate. type is more likely.
  • Step S304 obtaining the teacher model detection result of the image sample by the teacher network model; wherein, the teacher network model is a pre-trained model, and the teacher model detection result includes the first point corresponding to each pixel of the second feature map of the above-mentioned image sample.
  • the above-mentioned second reference position includes a second anchor frame or a second position point.
  • the score of the above-mentioned second reference position is also the classification probability value that the target detected in the second reference position output by the teacher network model for prediction belongs to each type. The larger the score, the more likely the target detected in the second reference position is. belong to this type.
  • the coordinate information corresponding to the second reference position is similar to the coordinate information corresponding to the first reference position, and the coordinate information corresponding to the second reference position is the coordinate offset between the second reference position and the second prediction frame corresponding to the second reference position , the coordinates of the second prediction frame corresponding to the second reference position can be determined based on the coordinates of the second reference position and the coordinate offset.
  • the teacher network model is a neural network model pre-trained by using the above-mentioned image sample set or other training image sample sets, wherein the teacher network model is used to predict the above-mentioned image samples and output a second feature map, and each pixel in the second feature map is used.
  • the corresponding reference positions are all represented by second reference positions, and the second reference positions are in one-to-one correspondence with the first reference positions.
  • the teacher network model and the student network model may be network models for target detection based on anchor boxes. In this way, the number of anchor boxes of the first feature map obtained by the student network model and the second feature map obtained by the teacher network model are the same.
  • the teacher network model and the student network model can also be network models that do not perform target detection based on anchor frames. In this way, the first feature map obtained by the student network model and the second feature map obtained by the teacher network model have the same number of location points. .
  • one of the teacher network model and the student network model can be a network model for target detection based on anchor boxes, and the other is a network model for target detection not based on anchor boxes, then the number of anchor boxes of one and the position of the other Points are the same.
  • Step S306 Determine the label assignment information of the image sample according to the teacher model detection result.
  • the score of the second reference position can reflect the probability that the target contained in the second reference position belongs to each target type.
  • the image samples are marked with target ground-truth boxes.
  • the possibility of which target the second reference position is specifically can be determined, and based on this information, the probability of the target corresponding to the second reference position can be determined.
  • the target that is, the label corresponding to the second reference position.
  • the first reference position corresponds to the second reference position one-to-one, and the label corresponding to the second reference position is the label corresponding to the first reference position.
  • target ground truth box 1 corresponding to target 1
  • target ground truth box 2 corresponding to target 2.
  • target 1 and target 2 should be allocated Which of the 100 second anchor boxes, that is, which second anchor boxes have detected target 1 (positive samples of target 1) or target 2 (positive samples of target 2).
  • the label assignment information of the first reference position can be based on the detection of the target at the second reference position. The situation is confirmed. If a target is detected at the second reference position corresponding to the first reference position, the label of the first reference position relative to the target is a positive sample. If no target is detected at the second reference position corresponding to the first reference position , then the label of the first reference position relative to the target is a negative sample.
  • the above label assignment information is specifically used to represent the sample type of each target corresponding to the first reference position, and the sample type includes positive samples and negative samples.
  • sample types include positive samples, negative samples, and ignore samples.
  • the above positive samples can be represented by 1, negative samples can be represented by 0, and ignored samples can be represented by -1. Positive samples indicate that the first reference position should detect the target, negative samples indicate that the first reference position should not detect the target, and ignore samples indicate that they do not care or are uncertain whether the first reference position should detect the target.
  • the resulting gradient is not backpropagated.
  • the image sample is a matrix of (3, H, W).
  • M feature maps are generated, and each feature map can be represented as a matrix of (C, Hv, Wv), Among them, v is the feature map identifier, C is the number of channels corresponding to each feature map, when each position of the feature map corresponds to an anchor box, there are Hv ⁇ Wv anchor boxes on each feature map, A is all the second The total number of anchor boxes on the feature map, i.e.
  • the result of the label assignment information of the image sample determined based on the detection result of the teacher model can be expressed as an N ⁇ A matrix, and each column has at most one 1, that is, each second anchor frame is assigned to at most one target. , or no target is assigned (becoming a negative sample or ignoring sample), and each row can have any number of 1s, that is, a target can be assigned to one or more second anchor boxes, or not assigned to any second anchor box.
  • X ij takes the value of 0, 1 or -1, where 1 is the value corresponding to the positive sample label, 0 is the value corresponding to the negative sample label, -1 is the value corresponding to the ignored sample label; the value of i is 1 A positive integer between 1 and A, and the value of j is a positive integer between 1 and A.
  • the label assignment information corresponding to each position in the above matrix is obtained according to the detection result of the teacher network model.
  • Step S308 Calculate the loss function value of the student network model according to the label assignment information and the student model detection result.
  • the value of the loss function in the training process of the student network model in this embodiment not only depends on the detection result output by the student network model itself, but also on the label assignment information determined based on the detection result of the teacher network model.
  • the label assignment can be performed more accurately, which can effectively alleviate the influence of the subjectivity of manual design of label assignment rules on the training effect of the student network model, make the calculation of the loss function value more accurate, and provide reliable data for the parameter adjustment of the student network model. .
  • the loss function value of the student network model is a value calculated based on the loss function of the student network model, the score of the first reference position corresponding to each pixel in the first feature map, and the above label assignment information.
  • Step S310 adjust the parameters of the student network model based on the loss function value and continue training until the trained student network model is obtained;
  • Step S312 using the trained student network model as the target detection model.
  • the training process of the target detection model configured to detect images is as follows: input the image samples in the image sample set into the student network model, and obtain each of the first feature maps corresponding to the image samples.
  • the detection result of the student model corresponding to the pixel point; the detection result of the teacher model of the image sample by the teacher network model is obtained; wherein, the teacher network model is a pre-trained model, and the detection result of the teacher model includes the second feature corresponding to the image sample
  • the score of the second reference position corresponding to each pixel in the figure and the coordinate information corresponding to the second reference position determine the label assignment information of the image sample according to the teacher model detection result, and calculate the student network according to the label assignment information and the student model detection result
  • the loss function value of the model adjust the parameters of the student network model based on the loss function value and continue training until the trained student network model is obtained, and the trained student network model is used as the target detection model.
  • the label assignment method of this training process is more efficient.
  • the label allocation method in this embodiment is more efficient and effectively alleviates the The influence of the subjectivity of artificially designed label assignment rules on the training effect of the student network model.
  • This label assignment method can be adapted to both anchor box-based and non-anchor box-based networks, and is more universal than the label assignment method designed for a certain network. .
  • This embodiment also provides another method for training a target detection model, which is implemented on the basis of the above method, and focuses on the specific implementation of determining the label assignment information of the image samples.
  • the student network model and the teacher network model are both An example of a network model based on anchor frame technology, as shown in Figure 6 is a flowchart of another target detection model training method, which illustrates the implementation of training the student network model, wherein the trained student network model is For the target detection model, specifically, it can be implemented with reference to the following steps S602 to S616:
  • Step S602 input the image samples in the image sample set into the student network model, and obtain the student model detection result corresponding to each pixel of the first feature map of the image sample;
  • the model detection result includes the score of the first anchor frame corresponding to each pixel of the first feature map and the coordinate information corresponding to the first anchor frame;
  • Step S604 obtaining the teacher model detection result of the image sample by the teacher network model; wherein, the teacher network model is a pre-trained model, and the teacher model detection result includes the first pixel corresponding to each pixel point of the second feature map of the above-mentioned image sample.
  • the score corresponding to the second anchor frame can be obtained by using the teacher network model, wherein the score is the probability value of each second anchor frame corresponding to each target output by the teacher network model for target detection.
  • Step S606 for each second anchor frame, calculate the overlap ratio of the second prediction frame corresponding to the second anchor frame and each target truth frame of the image sample to obtain the matrix IoU;
  • i takes the value [1, N]
  • j takes the value [1, A]
  • N is the number of marked ground-truth boxes
  • A is the number of second anchor boxes included in the second feature map.
  • the above overlap ratio is IoU (Intersection over Union), which represents the degree of overlap between the two frame areas.
  • the value of the overlap ratio is [0, 1].
  • the second prediction frame and the manually marked target ground truth frame When there is no overlap at all, its overlap ratio is 0.
  • the second prediction frame completely overlaps with the manually labeled target ground-truth frame When the second prediction frame completely overlaps with the manually labeled target ground-truth frame, its overlap ratio is 1. In other cases, the overlap ratio is between 0 and 1. Arbitrary floating point number.
  • Step S608 based on the overlap ratio of the second prediction frame corresponding to the second anchor frame and each target ground-truth frame and the score of the second anchor frame, determine the prediction quality of the second anchor frame for the target corresponding to each target ground-truth frame ; wherein, the prediction quality is used to characterize the probability that the second anchor frame detects the target corresponding to the target ground truth frame.
  • IoU i is the overlap ratio between the second prediction frame corresponding to the second anchor frame and the i-th target ground-truth frame
  • i is a positive integer
  • N is the corresponding target ground-truth frame target number.
  • the pred box represents the second anchor box
  • the gt box i represents the ith target ground-truth box.
  • the image sample can contain one or more faces, and each face corresponds to a target ground-truth box.
  • the score s j of the second anchor box is a numerical value.
  • the image samples can contain any one of faces, human bodies, motor vehicles and non-motor vehicles, or One or more targets of multiple types, each target corresponds to a target ground-truth frame, in this scenario, the image sample is also marked with the target type corresponding to each target ground-truth frame, and the score s ij of the second anchor frame is corresponding to A number of values corresponding to the target types that the student network can detect.
  • the overlap ratio of (value [0,1]) is the element of the i-th row and the j-th column in the matrix IoU;
  • the above prediction quality considers both the overlap ratio and the score s, that is, the confidence score of the second anchor frame corresponding to the target, which is objective and reasonable and does not depend on the anchor frame, etc., has good versatility, and is conducive to the determination of label assignment information .
  • the above only provides two methods for calculating the prediction quality. In this embodiment, the method for calculating the prediction quality is not limited.
  • Step S610 Determine label assignment information of the second anchor frame based on the prediction quality of the second anchor frame for the target corresponding to each target ground-truth frame.
  • the above formula for calculating the prediction quality can also be used to obtain an N ⁇ A prediction quality matrix.
  • the specific implementation process of determining the label assignment information based on the predicted quality can be implemented by steps A1-A3:
  • Step A1 select the maximum prediction quality from the second anchor frame for the prediction quality of the target corresponding to each target truth frame;
  • Step A2 judging whether the maximum predicted quality is greater than the first preset quality value
  • Step A3 if yes, assign the positive sample label of the target corresponding to the maximum prediction quality to the second anchor frame, and for the labels of the second anchor frame corresponding to other targets, all may be negative labels, or some may be negative labels. for ignore tags.
  • the above-mentioned first preset quality value is set according to the actual situation, and is not limited here.
  • negative labels and ignore labels can be assigned based on the following ways.
  • the ignored sample labels and negative sample labels here are both for the target corresponding to the maximum prediction quality. If the target corresponding to the maximum prediction quality is a negative label, the second anchor frame is probably the background and other scene areas, and other targets are also is a negative label.
  • the second anchor frame may be a negative label or an ignore label corresponding to other targets.
  • the labels corresponding to other targets can also be based on the second anchor frame and other targets.
  • the calculated prediction quality, the above-mentioned first prediction quality value and the second prediction quality value are determined.
  • the assigned sample labels may be represented by numerical values, letters or characters, which are not limited here.
  • the second anchor box, the three second anchor boxes and the prediction quality calculated by each target ground-truth box are expressed in the form of the following matrix: Among them, the first row of the matrix represents the target 1, the target marked with the target ground-truth box, which corresponds to each second anchor box (for example: the second anchor box 1, the second anchor box 2, the second anchor box 3).
  • the second row of the matrix represents target 2
  • the target marked with the target ground-truth box respectively corresponds to the prediction quality of the three second anchor boxes
  • the third row of the matrix represents target 3, the marked target ground-truth frame.
  • the targets correspond to the prediction quality of the three second anchor boxes respectively
  • the fourth row of the matrix indicates that the target 4, the target marked with the target ground-truth box, corresponds to the prediction quality of the three second anchor boxes, respectively.
  • the first preset quality value is set to 0.7
  • the second preset quality value is 0.4. Since the maximum prediction quality of the first column of 0.8 is greater than the first preset quality value, it means that the second anchor frame 1 corresponds to The target of is target 1, and the positive sample label of target 1 and the negative sample label of the remaining targets can be assigned to the second anchor box 1; since the maximum prediction quality 0.3 of the second column is less than the second preset quality value, it can be used for the second anchor box. Box 2 assigns negative sample labels for all targets; since the maximum predicted quality 0.5 of the third column is between the first and second preset quality values, the second anchor box 3 can be assigned the ignore sample label for target 4 and negative sample labels for other targets.
  • Step S612 calculate the loss function value of the student network model according to the label assignment information and the student model detection result
  • Step S614 adjust the parameters of the student network model based on the loss function value and continue training until the trained student network model is obtained;
  • the training method for the above target detection model provided by the embodiment of the present disclosure can calculate the overlap ratio according to the second prediction frame corresponding to the second anchor frame and each target ground-truth frame, and according to the second prediction frame corresponding to the second anchor frame and the The overlap ratio of each target ground-truth frame and the score of the second anchor frame determine the prediction quality of the second anchor frame for the target corresponding to each target ground-truth frame; and accurately obtain the label assignment information of the second anchor frame according to the predicted quality , according to the label assignment information of the second anchor frame, the label assignment is performed on the first feature map, which makes the label assignment objective and rational, which can effectively alleviate the influence of the subjectivity of manually designing label assignment rules on the training effect of the student network model, thereby improving the Performance of Student Network Models.
  • This embodiment also provides another method for training a target detection model, which is implemented on the basis of the above method, and focuses on the specific implementation of determining the label assignment information of an image sample in multiple target type application scenarios, so as to
  • the student network model and the teacher network model are both network models based on the anchor frame technology.
  • the flowchart of another target detection model training method mainly includes the following steps S702 to S716:
  • Step S702 input the image samples in the image sample set into the student network model, and obtain the student model detection result corresponding to each pixel of the first feature map of the image sample;
  • the model detection result includes the score of the first anchor frame corresponding to each pixel of the first feature map and the coordinate information corresponding to the first anchor frame;
  • Step S704 obtaining the teacher model detection result of the image sample by the teacher network model; wherein, the teacher network model is a pre-trained model, and the teacher model detection result includes the first pixel corresponding to each pixel point of the second feature map of the above-mentioned image sample.
  • Step S706 calculating the overlap ratio between each target ground-truth frame of the image sample and the second prediction frame corresponding to each second anchor frame to obtain the matrix IoU:
  • i takes the value [1, N]
  • j takes the value [1, A]
  • N is the number of marked ground truth boxes
  • A is the number of second anchor boxes included in the second feature map; in this embodiment, , there are N labeled ground truth boxes and A second anchor boxes of the image sample.
  • Step S708 calculate the prediction quality of each second anchor frame for the target corresponding to each target truth frame, and obtain the prediction quality matrix Q;
  • q ij is [0, 1], which represents the prediction quality of the target corresponding to the j-th second anchor frame and the i-th target ground-truth frame;
  • is the preset value in the [0, 1] interval parameter,
  • s ij is the score corresponding to the current target type in the score of the j-th second anchor frame, the current target type refers to the target type corresponding to the i-th target ground-truth frame, and IoU ij is the j-th second anchor frame
  • the overlap ratio of the second prediction box corresponding to the anchor box and the i-th target ground-truth box is the element of the i-th row and the j-th column in the matrix IoU.
  • Step S710 convert the prediction quality matrix into the label assignment matrix X corresponding to the label assignment information:
  • X ij is 0, 1 or -1, where 1 is the value corresponding to the positive sample label, 0 is the value corresponding to the negative sample label, and -1 is the value corresponding to the ignored sample label.
  • the specific process of converting the predicted quality matrix into the label assignment matrix can be converted from two different angles: row and column.
  • the following is an example of converting the columns of the predicted quality matrix.
  • the specific process of the conversion can be implemented through steps B1 to B4. :
  • Step B1 for the jth column in the prediction quality matrix, select the element q mj with the largest value from the elements of this column;
  • the prediction quality matrix obtained above is The first preset quality value is 0.7 and the second preset quality value is 0.4 as an example for illustration, wherein the largest element in the first column is q 31 , the largest element in the second column is q 22 , and the The largest element is q 43 .
  • Step B2 if q mj is greater than t p , set the element X mj in the label assignment matrix corresponding to q mj equal to the first value; for the elements q ij in the jth column except q mj , if q ij is less than t n , set X ij corresponding to q ij is equal to the second value; if q ij is less than or equal to t p and greater than or equal to t n , set X ij corresponding to q ij equal to the third value;
  • the first value represents positive samples
  • the second value represents negative samples
  • the third value represents ignore samples.
  • the value 1 can be used as the label of positive samples
  • the value 0 can be used as the label of negative samples
  • the value -1 can be used as the label of negative samples.
  • the label of the sample is ignored, or other characters are used as the label of the above-mentioned sample, which is not limited here.
  • Step B3 if q mj is less than t n , set the element X ij in the jth column of the label assignment matrix equal to the second value;
  • the maximum prediction quality of the second column of the above prediction quality matrix is 0.3, which is smaller than the second preset quality value of 0.4, Then the second column of the label assignment matrix obtained after conversion is the second value.
  • Step B4 if q mj is less than or equal to t p and greater than or equal to t n , set X mj in the label assignment matrix corresponding to q mj to be equal to the third value; for the elements q ij in the jth column other than q mj , if If q ij is less than t n , set X ij in the label assignment matrix corresponding to q ij equal to the second value; if q ij is less than or equal to t p and greater than or equal to t n , set X ij in the label assignment matrix corresponding to q ij equal to The third value; wherein, t p >t n , and t p and t n are preset thresholds respectively.
  • the maximum value is 0.5, between 0.7 and 0.4, then the value corresponding to this position after conversion is -1, and the values of the remaining elements in the third column are all less than 0.4, so Both are 0.
  • each second anchor box is assigned to at most one target (becomes a positive sample), or no target is assigned (becomes a negative sample or ignores the sample) .
  • Step C1 for the i-th row in the prediction quality matrix, select the target element q im greater than t p from the elements in the row, and set the element X im corresponding to the target element in the i-th row of the initial label assignment matrix as the A value; wherein, q im is greater than the other elements q iu that are not selected in this row of elements;
  • Positive sample labels for the same target can be assigned to one or more second anchor boxes, or none of them.
  • Step C2 for elements q iu other than q im in the i-th row, if the q iu is less than or equal to t p and greater than or equal to t n , set the element X iu of the initial matrix corresponding to the label assignment of the q iu equal to The third value; if the qi iu is less than t n , set the element X iu of the initial matrix corresponding to the label assignment of the qi iu to be equal to the second value;
  • the above initial matrix of label assignment can be understood as an empty matrix, and each element X is assigned an assignment. After the above steps C1 and C2, element X will be assigned 0, 1 or -1.
  • Step C3 check whether the elements in the jth column of the initial label assignment matrix have conflicting elements; wherein, the conflicting elements are more than 2, and the elements are the first value; if there are conflicting elements, obtain the conflicting elements in the prediction quality matrix. For the corresponding prediction quality, keep the element with the highest prediction quality in the conflicting elements as the first value, and modify the remaining elements to be the third value to obtain a label assignment matrix;
  • each column of the final label assignment matrix has only one first value, that is, each position corresponds to only one positive label of the sample.
  • t p >t n and t p and t n are preset thresholds respectively.
  • t p and t n are respectively the first preset quality value and the second preset quality value.
  • X im corresponding to the selected target element q im in each row is set to be between the first values.
  • the element X iu corresponding to the element q iu of the quality value is set as the third value; the X iu corresponding to the q iu smaller than the second preset quality value is set as the second value.
  • Step S712 calculate the loss function value of the student network model according to the label assignment information and the student model detection result
  • Step S714 adjust the parameters of the student network model based on the loss function value and continue training until the trained student network model is obtained;
  • the training method for the above target detection model provided by the embodiment of the present disclosure can accurately obtain label assignment information by comparing the prediction quality with a preset threshold, and perform label assignment on the first anchor frame corresponding to the first feature map according to the label assignment information , which makes the label assignment objective and rational, which can effectively alleviate the influence of the subjectivity of manually designing label assignment rules on the training effect of the student network model, thereby improving the performance of the student network model.
  • This embodiment also provides another method for training a target detection model, which is implemented on the basis of the above method, and focuses on the specific implementation of calculating the loss function value of the student network model.
  • the student network model and the teacher network model are both Taking the network model based on the anchor frame technology as an example, the flowchart of another target detection model training method shown in FIG. 8 mainly includes the following steps S802 to S820:
  • Step S802 input the image samples in the image sample set into the student network model, and obtain the student model detection result corresponding to each pixel of the first feature map of the image sample;
  • the model detection result includes the score of the first anchor frame corresponding to each pixel of the first feature map and the coordinate information corresponding to the first anchor frame;
  • Step S804 obtaining the teacher model detection result of the image sample by the teacher network model; wherein, the teacher network model is a pre-trained model, and the teacher model detection result includes the first pixel corresponding to each pixel of the second feature map of the above-mentioned image sample.
  • Step S806 determining the label assignment information of the image sample according to the teacher model detection result
  • Step S808 for each first anchor frame corresponding to each pixel in the first feature map, perform the following operations in steps S812 to S818;
  • Step S812 based on the label assignment information, determine the target ground truth frame of the first anchor frame;
  • the target ground-truth box corresponding to the first anchor box can be determined based on the label assignment information. Taking the first anchor frame 1 as an example, if its label assignment information is (0, 0, 1, 0), the target ground-truth frame of the target corresponding to the positive sample label of the second anchor frame 1 can be used as the first anchor frame The target ground-truth box.
  • Step S814 calculating the classification loss function value and the regression loss function value based on the target ground truth frame of the first anchor frame and the score of the first anchor frame;
  • the above classification loss function value can be obtained through the classification loss function, and the classification loss function can be a cross entropy function.
  • the classification loss function can be a cross entropy function.
  • it can be a binary cross entropy function (Binary Cross Entropy).
  • the target category is multiple categories, the multi-category cross entropy function (softmax_cross_entropy) can be used.
  • the classification loss function and the regression loss function can be selected according to actual needs. Therefore, the corresponding classification loss function value and regression loss can be calculated according to the target ground truth frame of the first anchor frame and the score of the first anchor frame.
  • the function value is not limited or described here.
  • Step S816 determining the loss function value of the student network model based on the classification loss function value and the regression loss function value of each first anchor frame;
  • the loss function value of the student network model is obtained by adding the calculated classification loss function value and the regression loss function value.
  • Step S818, adjust the parameters of the student network model based on the loss function value and continue training until the trained student network model is obtained;
  • step S818 can be realized by step D1-step D2:
  • Step D1 adjust the parameters of the student network model based on the loss function value to continue training
  • Step D2 when the loss function value converges to a preset value or the number of training times reaches a preset number of times, the training is stopped, and a trained student network model is obtained.
  • the loss function value is greater than the preset value, it means that the currently trained student network model has not reached the preset convergence degree, and the process from the above steps S802 to S816 can be repeated until the obtained loss function value converges to the preset value. Stop the training of the student network model.
  • the training of the student network model is stopped when the number of times of repeating steps S802 to S816 reaches a preset number of times.
  • the preset value and the preset number of times can be set according to the actual situation, which is not limited here.
  • Step S820 using the trained student network model as the target detection model.
  • the training method for the above target detection model provided by the embodiment of the present disclosure can input the image samples in the image sample set into the student network model, obtain the first feature map corresponding to the image sample, and obtain the teacher model detection result of the sample by the teacher network model.
  • label assignment information can be determined based on the detection result of the teacher network
  • the target truth value frame corresponding to the first anchor frame can be determined based on the label assignment information, and can be determined according to the first anchor frame.
  • the target truth box of the anchor box and the score of the first anchor box determine the loss function value of the student network model; so that the loss function value generated by the student network model during the training process not only depends on the output of the student network model itself, but also based on Based on the detection result of the trained teacher network model, label assignment information can be determined based on the detection result, and then the loss function value of the student network model is calculated based on the label assignment information, so that the calculation of the loss function value is more accurate, which is beneficial for the student network.
  • the parameter tuning of the model provides reliable data.
  • Figure 9 shows a training flow chart of a target detection model.
  • the student network model and the teacher network model are based on anchor frame technology.
  • the leftmost picture 900 is an image sample marked with an artificial target ground-truth frame.
  • the score of the second anchor frame corresponding to each pixel in the second feature map 902 can be obtained scores2 and coordinate information to obtain the second prediction frame pred boxes2, based on the calculation of the second prediction frame pred boxes2 and the image sample target ground truth frame, the overlap ratio matrix IoU can be obtained, based on the overlap ratio matrix IoU and the second anchor box score scores2
  • the predicted quality matrix qualities of the second feature map 902 are obtained, and the label assignment information can be determined based on the predicted quality matrix qualities, and this process corresponds to the assignment in FIG.
  • the first feature map 904 can be obtained
  • the score scores1 of the first anchor frame corresponding to each pixel and the first prediction frame pred boxes1 corresponding to the first anchor frame are used to assign the target ground-truth box to the first feature map 904 by using the label assignment information.
  • the value box and the first prediction box pred boxes1 and the score scores1 calculate the classification loss function value (classification loss) and the regression loss function value (regresssion loss), and finally calculate the loss function of the student network model through the classification loss function and the regression loss function value.
  • the value (loss) based on which the loss function value (loss) is used to train the student network model.
  • the detection results obtained by the teacher network model are used to assign labels to the first feature map, so that the label assignment is objective and rational.
  • the first feature map trains the student network model, optimizes the training process of the student network model, and can effectively alleviate the influence of the subjectivity of manually designing label assignment rules on the training effect of the student network model, thereby improving the performance of the student network model.
  • FIG. 10 shows a schematic structural diagram of a target detection apparatus. As shown in FIG. 10 , the apparatus includes:
  • the target detection module 1004 is configured to input the image to be detected into the target detection model to obtain the target detection result; the target detection result includes the position and score of the bounding box corresponding to the target; wherein, the target detection model is trained by the following methods: The image sample is input to the student network model, and a student model detection result corresponding to each pixel of the first feature map of the image sample is obtained; wherein, the image sample is marked with a target ground-truth frame, and the student model detection result includes the first feature map.
  • the result includes the score of the second reference position corresponding to each pixel of the second feature map of the image sample and the coordinate information corresponding to the second reference position; wherein, the number of reference positions of the first feature map and the second feature map is the same;
  • the training process of the target detection model configured to detect images is as follows: input the image samples in the image sample set into the student network model, and obtain each of the first feature maps corresponding to the image samples.
  • the detection result of the student model corresponding to the pixel point; the detection result of the teacher model of the image sample by the teacher network model is obtained; wherein, the teacher network model is a pre-trained model, and the detection result of the teacher model includes the second feature corresponding to the image sample
  • the score of the second reference position corresponding to each pixel in the figure and the coordinate information corresponding to the second reference position; the above-mentioned teacher model detection result is used to determine the label assignment information of the image sample, and the student network model is calculated according to the label assignment information and the student model detection result
  • the loss function value adjust the parameters of the student network model based on the loss function value and continue training until the trained student network model is obtained, and the trained student network model is used as the target detection model.
  • the label assignment method of this training process is more objective. Compared with the manually designed label allocation method, the label allocation method in this embodiment is more efficient, and effectively alleviates the manual labor
  • the subjectivity of designing label assignment rules affects the training effect of the student network model.
  • This label assignment method can be adapted to anchor box-based and non-anchor box-based networks, and is more universal than the label assignment method designed for a certain network.
  • the above-mentioned target detection module 1004 is also configured to, for each second reference position, respectively calculate the overlap ratio of the second prediction frame corresponding to the second reference position and each target true value frame of the image sample to obtain the matrix IoU: Among them, i takes the value [1, N], j takes the value [1, A], N is the number of marked ground truth boxes, and A is the number of second reference positions included in the second feature map; based on the second reference
  • the overlap ratio of the second prediction frame corresponding to the position and each target ground-truth frame and the score of the second reference position determine the prediction quality of the second reference position for the target corresponding to each target ground-truth frame; wherein, the prediction quality is used for Characterizing that what the second reference position detects is the probability of the target corresponding to the target ground-truth frame; the label assignment of each first reference position is determined based on the prediction quality of each second reference position for the target corresponding to each target ground-truth frame information.
  • the overlap ratio of the second prediction frame corresponding to the j second reference positions and the i-th target ground-truth frame is the element of the i-th row and the j-th column in the matrix IoU;
  • q ij (s ij ) 1- ⁇ *(IoU ij ) ⁇ to calculate the prediction quality of each second reference position for the target corresponding to each target ground-truth frame, and obtain the prediction quality matrix Q;
  • q ij takes the value is [0,1]
  • is a preset hyperparameter with a value in the [0,1] interval
  • s ij is the score corresponding to the current target type in the score of the j-th second reference position
  • the current target type refers to The target type corresponding to the i-th target ground-truth frame
  • IoU ij is the overlap ratio between the second prediction frame corresponding to the j-th second reference position and the i-th target ground-truth frame, which is the i-th row j-th in the matrix IoU the elements of the column;
  • the above-mentioned target detection module 1004 is also configured to, for each second reference position, select the maximum prediction quality from the prediction quality of the target corresponding to each target truth frame of the second reference position; determine whether the maximum prediction quality is greater than or equal to The first preset quality value; if yes, assign the positive sample label of the target corresponding to the maximum predicted quality to the second reference position.
  • the above-mentioned target detection module 1004 is further configured to, for the jth column in the prediction quality matrix, select the element q mj with the largest value from the elements of the column;
  • t p >t n , t p and t n are preset thresholds respectively, the first value represents positive samples, the second value represents negative samples, and the third value represents ignore samples.
  • the above-mentioned target detection module 1004 is also configured to, for the i-th row in the prediction quality matrix, select a target element q im greater than t p from the elements of the row, and set the label assignment initial matrix in the i-th row corresponding to the target element.
  • the element X im of is the first value; wherein, q im is larger than other elements q iu that are not selected in this row of elements;
  • the above-mentioned target detection module 1004 is further configured to perform the following steps for each first reference position corresponding to each pixel in the first feature map: determine a second reference position corresponding to the first reference position; based on the first reference position The label assignment information of the first reference position is determined, and the target truth value frame of the first reference position is determined; the classification loss function value and the regression loss function value are calculated based on the target truth value frame of the first reference position and the score of the first reference position; based on each first reference position The classification loss function value and the regression loss function value determine the loss function value of the student network model.
  • the target detection device provided by the embodiment of the present disclosure has the same technical features as the above-mentioned target detection method, so it can also solve the same technical problem and achieve the same technical effect.
  • This embodiment further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program executes the steps of the above-mentioned target detection method when the computer program is run by the processing device.
  • the target detection method, apparatus, and computer program product of an electronic device provided by the embodiments of the present disclosure include a computer-readable storage medium storing program codes, and the instructions included in the program codes can be used to execute the methods described in the foregoing method embodiments.
  • program codes storing program codes
  • the instructions included in the program codes can be used to execute the methods described in the foregoing method embodiments.
  • the terms “installed”, “connected” and “connected” should be understood in a broad sense, for example, it may be a fixed connection or a detachable connection , or integrally connected; it can be a mechanical connection or an electrical connection; it can be a direct connection, or an indirect connection through an intermediate medium, or the internal communication between the two components.
  • installed should be understood in a broad sense, for example, it may be a fixed connection or a detachable connection , or integrally connected; it can be a mechanical connection or an electrical connection; it can be a direct connection, or an indirect connection through an intermediate medium, or the internal communication between the two components.
  • the functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium.
  • the computer software products are stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the embodiments of the present disclosure.
  • the aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .
  • the label assignment method is more efficient, and the influence of the subjectivity of manually designing label assignment rules on the training effect of the student network model is effectively alleviated.
  • the label assignment method can be adapted to anchor frame-based and non-anchor-based
  • the network of boxes is more general than the label assignment method designed for a certain network.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

A target detection method and apparatus, and an electronic device. The method comprises: acquiring an image to undergo detection (S202); and inputting the image into a target detection model, and obtaining a target detection result (S204), the target detection result comprising the position and a score of a bounding box corresponding to a target. A process of training the above target detection model comprises: inputting an image sample in an image sample set into a student network model, and obtaining a student model detection result corresponding to each pixel in a first feature map of the image sample; acquiring, by means of a teacher network model, a teacher model detection result corresponding to the image sample; determining tag allocation information of the image sample according to the teacher model detection result, and calculating a loss function value of the student network model according to the tag allocation information and the student model detection result; and adjusting parameters of the student network model on the basis of the loss function value, and continuing to perform training until a target detection model is obtained. The method improves performance and training efficiency of a target detection model.

Description

目标检测方法、装置及电子设备Target detection method, device and electronic device
相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS
本公开要求于2020年10月22日提交中国专利局的申请号为2020111434527、名称为“目标检测方法、装置及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。This disclosure claims priority to the Chinese Patent Application No. 2020111434527 and entitled "Object Detection Method, Apparatus and Electronic Equipment" filed with the Chinese Patent Office on October 22, 2020, the entire contents of which are incorporated by reference in this disclosure .
技术领域technical field
本公开涉及模型训练技术领域,尤其是涉及一种目标检测方法、装置及电子设备。The present disclosure relates to the technical field of model training, and in particular, to a target detection method, device and electronic device.
背景技术Background technique
目标检测是计算机视觉的一个基础任务,它是在一张图片中找到用户感兴趣的目标,并输出它们的类别和位置,位置可用包围框进行表示。目前常见的目标检测方法都基于神经网络实现的,神经网络输出的特征图上的每个位置都会对应一个输出结果,因而这些方法在训练过程中都包含了一个被称为标签分配的过程,该过程决定了在训练时,神经网络的特征图上的每个位置的学习目标。换言之,在训练过程中,一个训练样本(一张图片)上有n个目标(物体),标签分配过程指定了神经网络的特征图上的每个位置是学习正样本(前景)还是负样本(背景),如果是学习正样本,则从n个目标中选择1个作为该位置的正样本。这一标签分配的过程通常都是人工设计的规则,由于人工设计的规则带有一定的主观性,导致在这种标签分配方式下训练出的网络模型的性能不佳,影响目标检测的可靠性。Object detection is a basic task of computer vision. It finds objects of interest to users in a picture and outputs their categories and positions, which can be represented by bounding boxes. At present, the common target detection methods are all implemented based on neural networks. Each position on the feature map output by the neural network corresponds to an output result. Therefore, these methods include a process called label assignment in the training process. The process determines the learning target for each location on the neural network's feature map during training. In other words, during the training process, there are n targets (objects) on a training sample (a picture), and the label assignment process specifies whether each position on the feature map of the neural network is to learn a positive sample (foreground) or a negative sample ( background), if it is learning a positive sample, select 1 from the n targets as a positive sample for this position. This label assignment process is usually based on manually designed rules. Since the manually designed rules are subject to a certain degree, the performance of the network model trained in this label assignment method is not good, which affects the reliability of target detection. .
公开内容public content
有鉴于此,本公开的目的在于提供一种目标检测方法、装置及电子设备,可以改善以上问题至少之一。In view of this, the purpose of the present disclosure is to provide a target detection method, device and electronic device, which can improve at least one of the above problems.
本公开实施例提供了一种目标检测方法,该方法包括:获取待检测图像;将待检测图像输入目标检测模型,得到目标检测结果;目标检测结果包括目标对应的包围框的位置和分数;其中,目标检测模型通过如下方式训练:将图像样本集中的图像样本输入学生网络模型,得到与图像样本的第一特征图的每个像素点对应的学生模型检测结果;其中,图像样本标注有目标真值框,学生模型检测结果包括与第一特征图的每个像素点对应的第一基准位置的分数和第一基准位置对应的坐标信息;获取教师网络模型对图像样本的教师模型检测结果;其中,教师网络模型为预先训练好的模型,教师模型检测结果包括与图像样本的第二特征图的每个像素点对应的第二基准位置的分数和第二基准位置对应的坐标信息;其中,第一特征图和第二特征图的基准位置数和/或位置点均相同;根据教师模型检测结果确定图像样本的标签分配信息;根据标签分配信息和学生模型检测结果计算学生网络模型的损失函数值;基于损失函数值调整学生网络模型的参数继续训练,直至得到训练好的学生网络模型;将训练好的学生网络模型作为目标检测模型。An embodiment of the present disclosure provides a target detection method, which includes: acquiring an image to be detected; inputting the image to be detected into a target detection model to obtain a target detection result; the target detection result includes the position and score of a bounding box corresponding to the target; wherein , the target detection model is trained by the following methods: input the image samples in the image sample set into the student network model, and obtain the student model detection results corresponding to each pixel of the first feature map of the image samples; wherein, the image samples are marked with the target real value frame, the student model detection result includes the score of the first reference position corresponding to each pixel of the first feature map and the coordinate information corresponding to the first reference position; obtain the teacher model detection result of the image sample by the teacher network model; wherein , the teacher network model is a pre-trained model, and the detection result of the teacher model includes the score of the second reference position corresponding to each pixel of the second feature map of the image sample and the coordinate information corresponding to the second reference position; The reference position numbers and/or position points of the first feature map and the second feature map are the same; the label assignment information of the image sample is determined according to the detection result of the teacher model; the loss function value of the student network model is calculated according to the label assignment information and the detection result of the student model ; Adjust the parameters of the student network model based on the loss function value and continue training until the trained student network model is obtained; take the trained student network model as the target detection model.
可选的,根据教师模型检测结果确定标签分配信息的步骤,包括:对于每个第二基准位置,分别计算该第二基准位置与图像样本的每个目标真值框的交叠比,得到矩阵IoU:Optionally, the step of determining the label assignment information according to the detection result of the teacher model includes: for each second reference position, respectively calculating the overlap ratio of the second reference position and each target ground truth frame of the image sample to obtain a matrix. IoU:
Figure PCTCN2021101773-appb-000001
Figure PCTCN2021101773-appb-000001
其中,i取值[1,N],j取值[1,A],N为标注真值框的个数,A为第二特征图包括的第二基准位置的个数;基于第二基准位置与每个目标真值框的交叠比和第二基准位置的分数,确定第二基准位置对于每个目标真值框对应目标的预测质量;其中,预测质量用于表征该第二基准位置检测出的是该目标真值框对应目标的概率大小;基于每个第二基准位置对于每个目标真值框对应目标的预测质量确定每个第一基准位置的标签分配信息。Among them, i takes the value [1, N], j takes the value [1, A], N is the number of marked ground truth boxes, and A is the number of second reference positions included in the second feature map; based on the second reference The overlap ratio between the position and each target ground-truth frame and the score of the second reference position determine the prediction quality of the second reference position for each target ground-truth frame corresponding to the target; wherein, the prediction quality is used to characterize the second reference position What is detected is the probability of the target corresponding to the target ground truth frame; the label assignment information of each first reference position is determined based on the prediction quality of each second reference position for the target corresponding to each target ground truth frame.
上述基于第二基准位置对应的第二预测框与每个目标真值框的交叠比和第二基准位置的分数,确定每个第二基准位置对于每个目标真值框对应目标的预测质量的步骤,包括:The above-mentioned based on the overlap ratio of the second prediction frame corresponding to the second reference position and each target ground truth frame and the score of the second reference position, determine the prediction quality of each second reference position for the target corresponding to each target ground truth frame steps, including:
利用公式q ij=(s j) 1-α*(IoU ij) α计算每个第二基准位置对于每个目标真值框对应目标的预测质量,得到预测质量矩阵Q;其中,q ij取值为[0,1],α为取值在[0,1]区间的预设超参数,s j为第j个第二基准位置的分数,IoU ij为第j个第二基准位置对应的第二预测框与第i个目标真值框的交叠比,为矩阵IoU中第i行第j列的元素;
Figure PCTCN2021101773-appb-000002
Use the formula q ij =(s j ) 1-α *(IoU ij ) α to calculate the prediction quality of each second reference position for the target corresponding to each target ground-truth frame, and obtain the prediction quality matrix Q; where q ij takes the value is [0,1], α is a preset hyperparameter with a value in the [0,1] interval, s j is the score of the jth second reference position, and IoU ij is the jth second reference position corresponding to the The overlap ratio of the second prediction frame and the i-th target ground-truth frame is the element of the i-th row and the j-th column in the matrix IoU;
Figure PCTCN2021101773-appb-000002
可选的,上述图像样本还标注有各目标真值框对应的目标类型;基于第二基准位置对应的第二预测框与每个目标真值框的交叠比和第二基准位置的分数,确定每个第二基准位置对于每个目标真值框对应目标的预测质量的步骤,包括:利用公式q ij=(s ij) 1-α*(IoU ij) α计算每个第二基准位置对于每个目标真值框对应目标的预测质量,得到预测质量矩阵Q;其中,q ij取值为[0,1],α为取值在[0,1]区间的预设超参数,s ij为第j个第二基准位置的分数中对应于当前目标类型的分数,当前目标类型是指第i个目标真值框对应的目标类型,IoU ij为第j个第二基准位置对应的第二预测框与第i个目标真值框的交叠比,为矩阵IoU中第i行第j列的元素;
Figure PCTCN2021101773-appb-000003
Optionally, the above-mentioned image samples are also marked with the target type corresponding to each target ground truth frame; based on the overlap ratio of the second prediction frame corresponding to the second reference position and each target ground truth frame and the score of the second reference position, The step of determining the prediction quality of each second reference position for the target corresponding to each target ground-truth frame includes: using the formula q ij =(s ij ) 1-α *(IoU ij ) α to calculate the Each target ground-truth box corresponds to the prediction quality of the target, and the prediction quality matrix Q is obtained; among them, q ij is [0,1], α is a preset hyperparameter with a value in the [0,1] interval, s ij is the score corresponding to the current target type in the score of the jth second reference position, the current target type refers to the target type corresponding to the ith target ground truth box, and IoU ij is the second reference position corresponding to the jth second reference position. The overlap ratio between the prediction frame and the i-th target ground-truth frame is the element of the i-th row and the j-th column in the matrix IoU;
Figure PCTCN2021101773-appb-000003
可选的,上述基于每个第二基准位置对于每个目标真值框对应目标的预测质量确定每个第一基准位置的标签分配信息的步骤,包括:对于每个第二基准位置,均从该第二基准位置对于每个目标真值框对应目标的预测质量中选择最大预测质量;判断最大预测质量是否大于或等于第一预设质量值;如果是,为第二基准位置分配最大预测质量对应的目标的正样本标签。Optionally, the above-mentioned step of determining the label assignment information of each first reference position based on the prediction quality of the target corresponding to each target truth frame based on each second reference position includes: for each second reference position, The second reference position selects the maximum prediction quality from the prediction quality of the target corresponding to each target ground-truth frame; judges whether the maximum prediction quality is greater than or equal to the first preset quality value; if so, assigns the maximum prediction quality to the second reference position The positive label of the corresponding target.
可选的,上述基于每个第二基准位置对于每个目标真值框对应目标的预测质量确定第一基准位置的标签分配信息的步骤,包括:对于预测质量矩阵中的第j列,均从该列元素中选择出值最大的元素q mjOptionally, the above-mentioned step of determining the label assignment information of the first reference position based on the prediction quality of the target corresponding to each target ground truth frame based on each second reference position includes: for the jth column in the prediction quality matrix, all from Select the element q mj with the largest value in the column elements;
如果q mj大于t p,设置q mj对应的标签分配矩阵中的元素X mj等于第一值;对于第j列中除q mj以外的元素q ij,如果q ij小于t n,设置q ij对应的标签分配矩阵中的元素X ij等于第二值;如果q ij小于或等于t p,且大于或等于t n,设置q ij对应的X ij等于第三值; If q mj is greater than t p , set the element X mj in the label assignment matrix corresponding to q mj equal to the first value; for the elements q ij in the jth column except q mj , if q ij is less than t n , set q ij corresponding to The element X ij in the label assignment matrix of is equal to the second value; if q ij is less than or equal to t p and greater than or equal to t n , set the corresponding X ij of q ij equal to the third value;
如果q mj小于t n,设置标签分配矩阵中的第j列中的元素X ij等于第二值; If q mj is less than t n , set the element X ij in the j-th column of the label assignment matrix equal to the second value;
如果q mj小于或等于t p,且大于或等于t n,设置q mj对应标签分配矩阵中的元素X mj等于第三值;对于第j列中除q mj以外的元素q ij,如果q ij小于t n,设置q ij对应标签分配矩阵中的元素X ij等于第二值;如果q ij小于或等于t p,且大于或等于t n,设置q ij对应标签分配矩阵中的元素X ij等于第三值; If q mj is less than or equal to t p and greater than or equal to t n , set the element X mj in the label assignment matrix corresponding to q mj equal to the third value; for the elements q ij in the jth column other than q mj , if q ij less than t n , set the element X ij in the label assignment matrix corresponding to q ij equal to the second value; if q ij is less than or equal to t p and greater than or equal to t n , set the element X ij in the label assignment matrix corresponding to q ij equal to third value;
其中,t p>t n,t p和t n分别为预设阈值,第一值表示正样本,第二值表示负样本,第三值表示忽略样本。 Wherein, t p >t n , t p and t n are preset thresholds respectively, the first value represents positive samples, the second value represents negative samples, and the third value represents ignore samples.
上述基于每个第二基准位置对于每个目标真值框对应目标的预测质量确定每个第一基准位置的标签分配信息的步骤,包括:对于预测质量矩阵中的第i行,均从该行元素中选择出大于t p的目标元素q im,设置标签分配初始矩阵中第i行中与目标元素对应的元素X im为第一值;其中,q im均大于该行元素中未被选择的其它元素q iuThe above-mentioned step of determining the label assignment information of each first reference position based on the prediction quality of the corresponding target of each target ground truth frame based on each second reference position includes: for the i-th row in the prediction quality matrix, all of the steps are obtained from this row. Select the target element q im greater than t p from the elements, and set the element X im corresponding to the target element in the i-th row of the initial label assignment matrix as the first value; among them, q im is greater than the unselected element in the row other elements q iu ;
对于第i行中除q im以外的元素q iu,如果q iu小于或等于t p,且大于或等于t n,设置q iu对应标签分配初始矩阵的元素X iu等于第三值;如果q iu小于t n,设置q iu对应标签分配初始矩阵的元素X iu等于第二值; For elements q iu other than q im in the i-th row, if q iu is less than or equal to t p and greater than or equal to t n , set the element X iu of the initial matrix corresponding to the label assignment of q iu equal to the third value; if q iu is less than t n , set the element X iu of the initial matrix corresponding to the label assignment of qi iu equal to the second value;
检查标签分配初始矩阵中的第j列的元素是否存在冲突元素;其中,冲突元素为2个以上,且元素均为第一值;如果存在冲突元素,在预测质量矩阵中获取冲突元素对应的预测质量,保留冲突元素中预测质量最大元素为第一值,其余元素修改为第三值,得到标签分配矩阵;其中,t p>t n,t p和t n分别为预设阈值,第一值表示正样本,第二值表示负样本,第三值表示忽略样本。 Check whether the elements in the jth column of the initial label assignment matrix have conflicting elements; among them, there are more than 2 conflicting elements, and the elements are all the first value; if there are conflicting elements, obtain the prediction corresponding to the conflicting elements in the prediction quality matrix Quality, retain the element with the highest predicted quality in the conflicting elements as the first value, and modify the remaining elements to the third value to obtain a label assignment matrix; where t p >t n , t p and t n are preset thresholds, respectively, the first value represents positive samples, the second value represents negative samples, and the third value represents ignore samples.
可选的,上述根据标签分配信息和学生模型检测结果计算学生网络模型的损失函数值的步骤,包括:对于第一特征图中每个像素点对应的每个第一基准位置,均执行以下步骤:确定第一基准位置对应的第二基准位置;基于第一基准位置对应的第一基准位置的标签分配信息,确定第一基准位置的目标真值框;基于第一基准位置的目标真值框和第一基准位置的分数计算分类损失函数值和回归损失函数值;基于各个第一基准位置的分类损失函数值和回归损失函数值确定学生网络模型的损失函数值。Optionally, the above step of calculating the loss function value of the student network model according to the label assignment information and the student model detection result includes: performing the following steps for each first reference position corresponding to each pixel in the first feature map. : determine the second reference position corresponding to the first reference position; determine the target truth value frame of the first reference position based on the label assignment information of the first reference position corresponding to the first reference position; based on the target truth value frame of the first reference position Calculate the classification loss function value and the regression loss function value with the score of the first reference position; determine the loss function value of the student network model based on the classification loss function value and the regression loss function value of each first reference position.
本公开实施例还提供一种目标检测装置,其中,该装置包括:图像获取模块,配置成获取待检测图像;目标检测模块,配置成将待检测图像输入目标检测模型,得到目标检测结果;目标检测结果包括目标对应的包围框的位置和分数;其中,目标检测模型通过如下方式训练:将图像样本集中的图像样本输入学生网络模型,得到与图像样本的第一特征图的每个像素点对应的学生模型检测结果;其中,图像样本标注有目标真值框,学生模型检测结果包括与第一特征图的每个像素点对应的第一基准位置的分数和位置坐标;获取教师网络模型对图像样本的教师模型检测结果;其中,教师网络模型为预先训练好的模型,教师模型检测结果包括与图像样本的第二特征图的每个像素点对应的第二基准位置的分数和位置坐标;其中,第一特征图和第二特征图的基准位置数相同;根据教师模型检测结果确定图像样本的标签分配信息;根据标签分配信息和学生模型检测结果计算学生网络模型的损失函数值;基于损失函数值调整学生网络模型的参数继续训练,直至得到训练好的学生网络模型;将训练好的学生网络模型作为目标检测模型。The embodiment of the present disclosure also provides a target detection device, wherein the device includes: an image acquisition module, configured to acquire an image to be detected; a target detection module, configured to input the to-be-detected image into a target detection model to obtain a target detection result; The detection result includes the position and score of the bounding box corresponding to the target; wherein, the target detection model is trained in the following way: input the image samples in the image sample set into the student network model, and obtain each pixel corresponding to the first feature map of the image sample. The student model detection results of The teacher model detection result of the sample; wherein, the teacher network model is a pre-trained model, and the teacher model detection result includes the score and position coordinates of the second reference position corresponding to each pixel of the second feature map of the image sample; wherein , the reference positions of the first feature map and the second feature map are the same; the label assignment information of the image sample is determined according to the detection result of the teacher model; the loss function value of the student network model is calculated according to the label assignment information and the detection result of the student model; based on the loss function Adjust the parameters of the student network model and continue training until the trained student network model is obtained; the trained student network model is used as the target detection model.
可选的,所述目标检测模块还配置成:对于每个所述第二基准位置,分别计算该第二基准位置对应的第二预测框与所述图像样本的每个目标真值框的交叠比,得到矩阵IoU:Optionally, the target detection module is further configured to: for each second reference position, calculate the intersection of the second prediction frame corresponding to the second reference position and each target ground truth frame of the image sample respectively. Overlap ratio, get matrix IoU:
Figure PCTCN2021101773-appb-000004
Figure PCTCN2021101773-appb-000004
其中,i取值[1,N],j取值[1,A],N为所述标注真值框的个数,A为所述第二特征图包括的第二基准位置的个数;基于所述第二基准位置与每个所述目标真值框的所述交叠比和所述第二基准位置的分数,确定所述第二基准位置对于每个所述目标真值框对应目标的预测质量;其中,所述预测质量用于表征该第二基准位置检测出的是该目标真值框对应目标的概率大小;基于每个所述第二基准位置对于每个所述目标真值框对应目标的预测质量确定每个所述第一基准位置的标签分配信息。利用公式q ij=(s j) 1-α*(IoU ij) α计算每个所述第二基准位置对于每个所述目标真值框对应目标的预测质量,得到预测质量矩阵Q;其中,q ij取值为[0,1],α为取值在[0,1]区间的预设超参数,s j为第j个所述第二基准 位置的分数,IoU ij为第j个第二基准位置对应的第二预测框与第i个目标真值框的交叠比,为矩阵IoU中第i行第j列的元素;
Figure PCTCN2021101773-appb-000005
Wherein, i takes the value [1, N], j takes the value [1, A], N is the number of the labeled ground truth boxes, and A is the number of the second reference positions included in the second feature map; Based on the overlap ratio of the second reference position and each of the target ground-truth boxes and the score of the second reference position, determining that the second reference position corresponds to a target for each of the target ground-truth boxes The prediction quality of The prediction quality of the target corresponding to the frame determines the label assignment information for each of the first reference positions. Use the formula q ij =(s j ) 1-α *(IoU ij ) α to calculate the prediction quality of each of the second reference positions for the target corresponding to each of the target ground-truth boxes, to obtain a prediction quality matrix Q; wherein, q ij is [0,1], α is a preset hyperparameter with a value in the [0,1] interval, s j is the score of the jth second reference position, and IoU ij is the jth The overlap ratio of the second prediction frame corresponding to the two reference positions and the i-th target truth frame is the element of the i-th row and the j-th column in the matrix IoU;
Figure PCTCN2021101773-appb-000005
可选的,所述目标检测模块还配置成:利用公式q ij=(s j) 1-α*(IoU ij) α计算每个所述第二基准位置对于每个所述目标真值框对应目标的预测质量,得到预测质量矩阵Q;其中,q ij取值为[0,1],α为取值在[0,1]区间的预设超参数,s j为第j个所述第二基准位置的分数,IoU ij为第j个第二基准位置对应的第二预测框与第i个目标真值框的交叠比,为矩阵IoU中第i行第j列的元素;
Figure PCTCN2021101773-appb-000006
Optionally, the target detection module is further configured to: use the formula q ij =(s j ) 1-α *(IoU ij ) α to calculate that each of the second reference positions corresponds to each of the target ground-truth boxes The prediction quality of the target is obtained, and the prediction quality matrix Q is obtained; wherein, q ij is [0,1], α is a preset hyperparameter with a value in the [0,1] interval, and s j is the jth The score of the two reference positions, IoU ij is the overlap ratio of the second prediction frame corresponding to the jth second reference position and the i-th target true value frame, and is the element of the i-th row and the j-th column in the matrix IoU;
Figure PCTCN2021101773-appb-000006
可选的,所述图像样本还标注有各目标真值框对应的目标类型;所述目标检测模块还配置成:利用公式q ij=(s ij) 1-α*(IoU ij) α计算每个所述第二基准位置对于每个所述目标真值框对应目标的预测质量,得到预测质量矩阵Q;其中,q ij取值为[0,1],α为取值在[0,1]区间的预设超参数,s ij为第j个第二基准位置的分数中对应于当前目标类型的分数,所述当前目标类型是指第i个目标真值框对应的目标类型,IoU ij为第j个第二基准位置对应的第二预测框与第i个目标真值框的交叠比,为矩阵IoU中第i行第j列的元素;
Figure PCTCN2021101773-appb-000007
Optionally, the image sample is also marked with the target type corresponding to each target ground-truth frame; the target detection module is further configured to: use the formula q ij =(s ij ) 1-α *(IoU ij ) α to calculate For the prediction quality of the target corresponding to each of the second reference positions, the prediction quality matrix Q is obtained; wherein, q ij takes a value of [0, 1], and α is a value between [0, 1] ] The preset hyperparameter of the interval, s ij is the score corresponding to the current target type in the score of the jth second reference position, and the current target type refers to the target type corresponding to the ith target truth value frame, IoU ij is the overlap ratio of the second prediction frame corresponding to the j-th second reference position and the i-th target ground-truth frame, and is the element of the i-th row and the j-th column in the matrix IoU;
Figure PCTCN2021101773-appb-000007
可选的,所述目标检测模块还配置成:对于每个所述第二基准位置,均从该第二基准位置对于每个所述目标真值框对应目标的预测质量中选择最大预测质量;判断所述最大预测质量是否大于或等于第一预设质量值;如果是,为所述第二基准位置分配所述最大预测质量对应的目标的正样本标签。Optionally, the target detection module is further configured to: for each of the second reference positions, select the maximum prediction quality from the prediction qualities of the target corresponding to each of the target ground truth frames at the second reference position; Determine whether the maximum prediction quality is greater than or equal to a first preset quality value; if so, assign a positive sample label of the target corresponding to the maximum prediction quality to the second reference position.
可选的,所述目标检测模块还配置成:对于所述预测质量矩阵中的第j列,均从该列元素中选择出值最大的元素q mj;如果q mj大于t p,设置q mj对应的标签分配矩阵中的元素X mj等于第一值;对于第j列中除q mj以外的元素q ij,如果q ij小于t n,设置q ij对应的标签分配矩阵中的元素X ij等于第二值;如果q ij小于或等于t p,且大于或等于t n,设置q ij对应的X ij等于第三值;如果q mj小于t n,设置标签分配矩阵中的第j列中的元素X ij等于所述第二值;如果q mj小于或等于t p,且大于或等于t n,设置q mj对应标签分配矩阵中的元素X mj等于所述第三值;对于第j列中除q mj以外的元素q ij,如果q ij小于t n,设置q ij对应标签分配矩阵中的元素X ij等于所述第二值;如果q ij小于或等于t p,且大于或等于t n,设置q ij对应标签分配矩阵中的元素X ij等于所述第三值;其中,t p>t n,t p和t n分别为预设阈值,所述第一值表示正样本,所述第二值表示负样本,所述第三值表示忽略样本。 Optionally, the target detection module is further configured to: for the jth column in the prediction quality matrix, select the element q mj with the largest value from the elements of the column; if q mj is greater than t p , set q mj The element X mj in the corresponding label assignment matrix is equal to the first value; for the elements q ij other than q mj in the jth column, if q ij is less than t n , set the element X ij in the label assignment matrix corresponding to q ij equal to The second value; if q ij is less than or equal to t p and greater than or equal to t n , set X ij corresponding to q ij equal to the third value; if q mj is less than t n , set X ij in the jth column of the label assignment matrix The element X ij is equal to the second value; if q mj is less than or equal to t p and greater than or equal to t n , set the element X mj in the label assignment matrix corresponding to q mj to be equal to the third value; for the jth column in For elements q ij other than q mj , if q ij is less than t n , set the element X ij in the label assignment matrix corresponding to q ij equal to the second value; if q ij is less than or equal to t p and greater than or equal to t n , set the element X ij in the label assignment matrix corresponding to q ij equal to the third value; wherein, t p >t n , t p and t n are preset thresholds respectively, the first value represents a positive sample, and the The second value represents negative samples and the third value represents ignore samples.
可选的,所述目标检测模块还配置成:对于所述预测质量矩阵中的第i行,均从该行元素中选择出大于t p的目标元素q im,设置标签分配初始矩阵中第i行中与所述目标元素对应的元素X im为第一值;其中,q im均大于该行元素中未被选择的其它元素q iu;对于第i行中除q im以外的元素q iu,如果所述q iu小于或等于t p,且大于或等于t n,设置所述q iu对应标签分配初始矩阵的元素X iu等于第三值;如果所述q iu小于t n,设置所述q iu对应标签分配初始矩阵的元素X iu等于第二值;检查所述标签分配初始矩阵中的第j列的元素是否存在冲突元素;其中,所述冲突元素为2个以上,且元素均为所述第一值;如果存在冲突元素,在所述预测质量矩阵中获取所述冲突元素对应的预测质量,保留所述冲突元素中预测质量最大 元素为所述第一值,其余元素修改为所述第三值,得到标签分配矩阵;其中,t p>t n,t p和t n分别为预设阈值,所述第一值表示正样本,所述第二值表示负样本,所述第三值表示忽略样本。 Optionally, the target detection module is further configured to: for the ith row in the prediction quality matrix, select a target element q im greater than t p from the elements of the row, and set the ith row in the label assignment initial matrix. The element X im corresponding to the target element in the row is the first value; wherein, q im is greater than other elements q iu that are not selected in the row element; for the element q iu other than q im in the i-th row , If the qi iu is less than or equal to t p and greater than or equal to t n , set the element X iu of the initial matrix corresponding to the label assignment of the qi iu equal to the third value; if the qi iu is less than t n , set the q iu iu corresponds to the element X iu of the initial label allocation matrix equal to the second value; check whether the elements in the jth column of the initial label allocation matrix have conflicting elements; wherein, the conflicting elements are more than 2, and the elements are all the first value; if there is a conflicting element, obtain the prediction quality corresponding to the conflicting element in the prediction quality matrix, keep the element with the highest prediction quality in the conflicting element as the first value, and modify the remaining elements to be the The third value is to obtain a label assignment matrix; wherein, t p >t n , t p and t n are preset thresholds respectively, the first value represents positive samples, the second value represents negative samples, and the third value represents negative samples. A value to ignore samples.
可选的,所述目标检测模块还配置成:对于所述第一特征图中每个像素点对应的每个第一基准位置,均执行以下步骤:基于所述第一基准位置的标签分配信息,确定所述第一基准位置的目标真值框;基于所述第一基准位置的目标真值框和所述第一基准位置的分数计算分类损失函数值和回归损失函数值;基于各个所述第一基准位置的分类损失函数值和回归损失函数值确定所述学生网络模型的损失函数值。Optionally, the target detection module is further configured to: for each first reference position corresponding to each pixel in the first feature map, perform the following steps: assign information based on the label of the first reference position. , determine the target ground truth frame of the first reference position; calculate the classification loss function value and the regression loss function value based on the target ground value frame of the first reference position and the score of the first reference position; The classification loss function value and the regression loss function value of the first reference position determine the loss function value of the student network model.
本公开实施例还提供一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,处理器执行计算机程序时实现上述的方法的步骤。Embodiments of the present disclosure also provide an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the above method when executing the computer program.
本公开实施例还提供一种计算机可读存储介质,其中,计算机可读存储介质上存储有计算机程序,计算机程序被处理器运行时执行上述的方法的步骤。Embodiments of the present disclosure further provide a computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and the computer program executes the steps of the foregoing method when the computer program is run by a processor.
本公开的其他特征和优点将在随后的说明书中阐述,并且,部分地从说明书中变得显而易见,或者通过实施本公开而了解。本公开的目的和其他优点在说明书以及附图中所特别指出的结构来实现和获得。Other features and advantages of the present disclosure will be set forth in the description that follows, and, in part, will be apparent from the description, or will be learned by practice of the present disclosure. The objectives and other advantages of the disclosure will be realized and attained by the structure particularly pointed out in the description and drawings.
为使本公开的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。In order to make the above-mentioned objects, features and advantages of the present disclosure more obvious and easy to understand, the preferred embodiments are exemplified below, and are described in detail as follows in conjunction with the accompanying drawings.
附图说明Description of drawings
为了更清楚地说明本公开具体实施方式或现有技术中的技术方案,下面将对具体实施方式或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本公开的一些实施方式,对于本领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the specific embodiments of the present disclosure or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the specific embodiments or the prior art. Obviously, the accompanying drawings in the following description The drawings are some embodiments of the present disclosure. For those skilled in the art, other drawings can also be obtained from these drawings without creative efforts.
图1为本公开实施例提供的一种电子设备的结构示意图;FIG. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure;
图2为本公开实施例提供的一种目标检测方法的流程图;2 is a flowchart of a target detection method according to an embodiment of the present disclosure;
图3为本公开实施例提供的一种目标检测模型的训练方法的流程图;3 is a flowchart of a training method for a target detection model provided by an embodiment of the present disclosure;
图4为本公开实施例提供的基于锚框技术中的基准位置示意图;4 is a schematic diagram of a reference position in an anchor frame-based technology provided by an embodiment of the present disclosure;
图5为本公开实施例提供的基于非锚框技术的基准位置示意图;5 is a schematic diagram of a reference position based on a non-anchor frame technology provided by an embodiment of the present disclosure;
图6为本公开实施例提供的另一种目标检测模型的训练方法的流程图;6 is a flowchart of another method for training a target detection model according to an embodiment of the present disclosure;
图7为本公开实施例提供的另一种目标检测模型的训练方法的流程图;7 is a flowchart of another method for training a target detection model provided by an embodiment of the present disclosure;
图8为本公开实施例提供的另一种目标检测模型的训练方法的流程图;8 is a flowchart of another method for training a target detection model according to an embodiment of the present disclosure;
图9为本公开实施例提供的一种目标检测模型训练流程图;FIG. 9 is a flow chart of training a target detection model according to an embodiment of the present disclosure;
图10为本公开实施例提供的一种目标检测装置的结构示意图。FIG. 10 is a schematic structural diagram of a target detection apparatus according to an embodiment of the present disclosure.
具体实施方式Detailed ways
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合附图对本公开的技术方案进行清楚、完整地描述,显然,所描述的实施例是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the present disclosure will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are part of the embodiments of the present disclosure, but not all of them. example. Based on the embodiments in the present disclosure, all other embodiments obtained by those skilled in the art without creative efforts shall fall within the protection scope of the present disclosure.
知识蒸馏是指在训练过程中,用一个(可能更深或更复杂)的神经网络指导另一个(可能较浅或更简单的)神经网络的方法。其中前者被成为教师网络模型,后者被称为学生网络模型。Knowledge distillation refers to the method of using one (possibly deeper or more complex) neural network to guide another (possibly shallower or simpler) neural network during training. The former is called the teacher network model and the latter is called the student network model.
在实现本公开的过程中,发明人通过研究发现,如果教师网络模型的特征图上的某个位置对于某个目标的检测结果比较好,则学生网络模型的对应位置对于该目标的检测结果大概率会比较好,因而将该目标的标签分配给这个位置,再对学生网络模型进行训练会更合理,使用这种训练方式得到的学生网络 模型进行目标检测的可靠性更高。基于此,本公开实施例提供了一种目标检测方法、装置及电子设备,该技术中,引入已经训练好的教师网络模型对学生网络模型的训练样本进行预测,进而确定该样本的标签分配信息,基于这些信息完成学生网络模型的训练,以提升学生网络模型的性能,进而提高应用该学生网络模型进行目标检测的可靠性。下面通过实施例进行描述。In the process of realizing the present disclosure, the inventor found through research that if a certain position on the feature map of the teacher network model has a better detection result for a certain target, then the corresponding position of the student network model has a better detection result for the target. The probability will be better, so it is more reasonable to assign the label of the target to this position and then train the student network model. The student network model obtained by this training method has higher reliability for target detection. Based on this, the embodiments of the present disclosure provide a target detection method, device, and electronic device. In this technology, a trained teacher network model is introduced to predict the training sample of the student network model, and then the label assignment information of the sample is determined. , based on this information to complete the training of the student network model to improve the performance of the student network model, thereby improving the reliability of target detection using the student network model. The following description will be given by way of examples.
本公开实施例首先提供了一种可实现目标检测方法和装置的电子设备的示例性说明,如图1所示的一种电子设备的结构示意图,电子设备100包括一个或多个处理器102、一个或多个存储器104、输入装置106、输出装置108以及一个或多个图像采集装置110,这些组件通过总线系统112和/或其它形式的连接机构(未示出)互连。应当注意,图1所示的电子设备100的组件和结构只是示例性的,而非限制性的,根据需要,电子设备也可以具有其他组件和结构。Embodiments of the present disclosure first provide an exemplary illustration of an electronic device that can implement a target detection method and apparatus. As shown in FIG. 1 , a schematic structural diagram of an electronic device, the electronic device 100 includes one or more processors 102, One or more memories 104, input devices 106, output devices 108, and one or more image capture devices 110 are interconnected by a bus system 112 and/or other form of connection mechanism (not shown). It should be noted that the components and structures of the electronic device 100 shown in FIG. 1 are only exemplary and not restrictive, and the electronic device may also have other components and structures as required.
处理器102可以为服务器、智能终端,或者是包含中央处理单元(CPU)或者具有数据处理能力和/或指令执行能力的其它形式的处理单元的设备,可以对电子设备100中的其它组件的数据进行处理,还可以控制电子设备100中的其它组件以执行目标检测功能。The processor 102 can be a server, an intelligent terminal, or a device that includes a central processing unit (CPU) or other forms of processing units with data processing capabilities and/or instruction execution capabilities, and can process data from other components in the electronic device 100. Processing may also control other components in electronic device 100 to perform object detection functions.
存储器104可以包括一个或多个计算机程序产品,计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。非易失性存储器例如可以包括只读存储器(ROM)、硬盘、闪存等。在计算机可读存储介质上可以存储一个或多个计算机程序指令,处理器102可以运行程序指令,以实现下文的本公开实施例中(由处理设备实现)的功能以及/或者其它期望的功能。在计算机可读存储介质中还可以存储各种应用程序和各种数据,例如应用程序使用和/或产生的各种数据等。Memory 104 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory may include, for example, random access memory (RAM) and/or cache memory, among others. Non-volatile memory may include, for example, read only memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on a computer-readable storage medium, and the processor 102 may execute the program instructions to implement the functions described below (implemented by the processing device) in the disclosed embodiments and/or other desired functions. Various application programs and various data, such as various data used and/or generated by the application program, etc., may also be stored in the computer-readable storage medium.
输入装置106可以是用户用来输入指令的装置,并且可以包括键盘、鼠标、麦克风和触摸屏等中的一个或多个。 Input device 106 may be a device used by a user to input instructions, and may include one or more of a keyboard, mouse, microphone, touch screen, and the like.
输出装置108可以向外部(例如,用户)输出各种信息(例如,图像或声音),并且可以包括显示器、扬声器等中的一个或多个。The output device 108 may output various information (eg, images or sounds) to the outside (eg, a user), and may include one or more of a display, a speaker, and the like.
图像采集装置110可以获取训练样本集,并且将采集到的训练样本集存储在存储器104中以供其它组件使用。The image acquisition device 110 may acquire a training sample set and store the acquired training sample set in the memory 104 for use by other components.
示例性地,配置成实现根据本公开实施例的目标检测方法和装置的电子设备中的各器件可以集成设置,也可以分散设置,诸如将处理器102、存储器104、输入装置106和输出装置108集成设置于一体,而将图像采集装置110设置于可以采集到样本的指定位置。当上述电子设备中的各器件集成设置时,该电子设备可以被实现为诸如相机、智能手机、平板电脑、计算机、车载终端等智能终端。Exemplarily, each device in the electronic device configured to implement the target detection method and device according to the embodiments of the present disclosure may be integrated or distributed, such as the processor 102 , the memory 104 , the input device 106 and the output device 108 . The image acquisition device 110 is set in a designated position where the sample can be acquired. When the various devices in the above electronic device are integrated and set, the electronic device can be implemented as a smart terminal such as a camera, a smart phone, a tablet computer, a computer, a vehicle-mounted terminal, and the like.
在实际应用中,配置成实现根据本公开实施例的目标检测方法和装置的电子设备可以包含比上述示例性电子设备更多或者更少的组件,在此不进行限制。In practical applications, an electronic device configured to implement the target detection method and apparatus according to the embodiments of the present disclosure may include more or less components than the above-described exemplary electronic device, which is not limited herein.
本实施例还提供了一种目标检测方法,参见图2所示的一种目标检测方法的流程图,该方法主要包括如下步骤S202~步骤S204:This embodiment also provides a target detection method. Referring to the flowchart of a target detection method shown in FIG. 2 , the method mainly includes the following steps S202 to S204:
步骤S202,获取待检测图像;Step S202, acquiring an image to be detected;
本实施例中,待检测图像可以是通过诸如相机、摄像装置等图像采集设备采集得到的图像,图像采集设备可以根据检测需要,设置在客运站(如地铁或高铁)的候车大厅中进行人脸图像或人体图像采集;图像采集设备也可以根据检测需要,设置在交通路口或者道路两侧进行车辆图像的采集。上述待检测图像还可以从第三方设备(如云服务器等)获取。In this embodiment, the image to be detected may be an image acquired by an image acquisition device such as a camera or a camera, and the image acquisition device may be installed in the waiting hall of a passenger station (such as a subway or high-speed rail) according to the detection needs to perform face detection. Image or human body image acquisition; image acquisition equipment can also be set at traffic intersections or on both sides of the road to collect vehicle images according to detection needs. The above image to be detected can also be obtained from a third-party device (such as a cloud server, etc.).
上述待检测图像除了是针对人体、人脸或车辆等目标对象的图像之外,也可以为针对诸如动物、指 定物体等其他类型的目标对象对应的图像,本公开实施例对此不进行限定。In addition to the image to be detected, the image to be detected may also be an image corresponding to other types of target objects such as animals, designated objects, etc., which is not limited in this embodiment of the present disclosure.
步骤S204,将待检测图像输入目标检测模型,得到目标检测结果;该目标检测结果包括目标对应的包围框的位置和分数。Step S204: Input the image to be detected into the target detection model to obtain a target detection result; the target detection result includes the position and score of the bounding box corresponding to the target.
目标检测模型可以是针对某一特定类型目标的检测模型,也可以是针对多个不同类型目标的检测模型,将待检测图像输入目标检测模型后,目标检测模型对待检测图像进行目标检测,如果带检测图像包含有检测模型能够检测类型的目标,通过目标检测模型将可以得到与属于该类型的各目标对应的包围框和对应的分数,其中包围框对应的分数表示该包围框对应目标属于该类型目标的置信度。The target detection model can be a detection model for a specific type of target, or a detection model for multiple different types of targets. After inputting the image to be detected into the target detection model, the target detection model performs target detection on the image to be detected. The detection image contains the target of the type that the detection model can detect. Through the target detection model, the bounding box and the corresponding score corresponding to each target belonging to the type can be obtained. The score corresponding to the bounding box indicates that the target corresponding to the bounding box belongs to this type. Confidence of the target.
参见图3所示的目标检测模型的训练方法的流程图,上述目标检测模型主要通过如下步骤S302~步骤S312训练得到:Referring to the flowchart of the training method of the target detection model shown in FIG. 3 , the above target detection model is mainly obtained by training the following steps S302 to S312:
步骤S302,将图像样本集中的图像样本输入学生网络模型,得到与该图像样本的第一特征图的每个像素点对应的学生模型检测结果;其中,图像样本标注有目标真值框,该学生模型检测结果包括与第一特征图的每个像素点对应的第一基准位置的分数和第一基准位置对应的坐标信息;其中,该第一基准位置包括第一锚框或第一位置点;Step S302, input the image samples in the image sample set into the student network model, and obtain the student model detection result corresponding to each pixel of the first feature map of the image sample; The model detection result includes the score of the first reference position corresponding to each pixel of the first feature map and the coordinate information corresponding to the first reference position; wherein, the first reference position includes the first anchor frame or the first position point;
本公开实施例中提到的基准位置(包括上述第一基准位置和后面提到的第二基准位置)可以是基于锚框技术(anchor-based)中,特征图上各个锚点(也即像素点)对应的锚框。其中,每个锚点对应一个或多个锚框,每个锚框对应一个预测框。该方式下,上述第一基准位置对应的坐标信息为第一锚框与第一锚框对应的第一预测框间的坐标偏移量,即第一预测框相对于第一锚框的位置,由第一锚框的坐标和该坐标偏移量可以确定出第一预测框的坐标。The reference position (including the above-mentioned first reference position and the second reference position mentioned later) mentioned in the embodiments of the present disclosure may be based on anchor-based technology (anchor-based), each anchor point (that is, the pixel on the feature map) point) corresponding anchor box. Among them, each anchor point corresponds to one or more anchor boxes, and each anchor box corresponds to a prediction box. In this manner, the coordinate information corresponding to the first reference position is the coordinate offset between the first anchor frame and the first prediction frame corresponding to the first anchor frame, that is, the position of the first prediction frame relative to the first anchor frame, The coordinates of the first prediction frame can be determined from the coordinates of the first anchor frame and the coordinate offset.
本公开实施例中提到的基准位置(包括上述第一基准位置和后面提到的第二基准位置)可以是基于非锚框技术(anchor-free)中,特征图上每个像素点对应一个或多个位置点。其中,每个位置点对应有一个预测框,该位置点即可以视为基准位置,这种方式下,上述第一基准位置对应的坐标信息为该位置点的第一预测框相对于该位置点的坐标偏移量,由位置点的坐标和该坐标偏移量可以确定出第一预测框的坐标。The reference position (including the above-mentioned first reference position and the second reference position mentioned later) mentioned in the embodiment of the present disclosure may be based on the anchor-free technology (anchor-free), and each pixel on the feature map corresponds to one or multiple locations. Among them, each position point corresponds to a prediction frame, and the position point can be regarded as a reference position. In this way, the coordinate information corresponding to the first reference position is the first prediction frame of the position point relative to the position point. The coordinate offset of , the coordinates of the first prediction frame can be determined from the coordinates of the position point and the coordinate offset.
为了便于进一步理解上述基准位置,参见图4所示的基于锚框技术中的基准位置示意图和图5所示的基于非锚框技术的基准位置示意图,其中,图4中的虚线框中间的实线小框代表锚框,锚框中间的小点代表锚点,虚线框代表该锚框对应的预测框。锚框对应的坐标信息是虚线框和锚框的相对位置,如图4中箭头示意出锚框与预测框间的相对位置关系。图5中的虚线框中间的小点代表位置点,该位置点可以视为图4中的一个基准位置,虚线框代表该位置点对应的预测框,则上述第一基准位置对应的坐标信息是图5中虚线框相对于图5中的位置点的位置,如图5中的箭头示意出位置点与预测框间的相对位置关系。In order to further understand the above reference position, refer to the schematic diagram of the reference position based on the anchor frame technology shown in FIG. 4 and the schematic diagram of the reference position based on the non-anchor frame technology shown in FIG. The small line box represents the anchor box, the small dot in the middle of the anchor box represents the anchor point, and the dotted box represents the prediction box corresponding to the anchor box. The coordinate information corresponding to the anchor frame is the relative position of the dotted frame and the anchor frame. The arrow in FIG. 4 indicates the relative positional relationship between the anchor frame and the prediction frame. The dot in the middle of the dashed box in Figure 5 represents a position point, which can be regarded as a reference position in Figure 4, and the dotted box represents the prediction frame corresponding to the position point, then the coordinate information corresponding to the first reference position is The position of the dotted box in FIG. 5 is relative to the position point in FIG. 5 , and the arrow in FIG. 5 indicates the relative positional relationship between the position point and the prediction frame.
本公开实施例中的教师网络模型和学生网络模型,可以两个都基于锚框技术,或者,可以一个基于锚框技术,而另一个基于非锚框技术,也可以两个都基于非锚框技术,只要二者的基准位置数(即锚框数/位置点数)相同即可。The teacher network model and the student network model in the embodiments of the present disclosure may both be based on the anchor frame technology, or one may be based on the anchor frame technology and the other may be based on the non-anchor frame technology, or both may be based on the non-anchor frame technology technology, as long as the number of reference positions (that is, the number of anchor boxes/the number of position points) is the same for both.
锚框、预测框和包围框的坐标可以采用框的左上角和右下角两点的坐标表示。比如,以图4所示的锚框作为第一基准位置为例,为了描述方便,将第一基准位置称为第一锚框,如果以特征图左上角的点作为坐标原点,水平向右方向为横坐标轴(X轴)的正向,竖直向下的方向为纵坐标轴(Y轴)正向;第一锚框表示为:[a1(x a1,y a1),b1(x b1,y b1)],其中,a1代表第一锚框的左上角位置点的坐标,b1代表第一锚框的右下角位置点的坐标,第一锚框对应的第一预测框用A1和B1两点的坐标表示,假设Δx a1表示左上角位置点a1和A1间的横坐标偏移量,Δy a1表示左上角位置点a1和A1间的纵坐标偏移量,Δx b1表示右上角位置点b1和B1间的横坐标偏移量,Δy b1表示右上角位置点b1和B1间的纵坐标偏移量, 则第一锚框对应的坐标信息表示为[a1(Δx a1,Δy a1),b1(Δx b1,Δy b1)],由第一锚框自身的坐标和对应锚框对应的坐标信息即可确定出第一锚框对应的第一预测框的坐标为:[A1(x a1-Δx a1,y a1-Δy a1),B1(x b1+Δx b1,y b1+Δy b1)]。 The coordinates of the anchor box, the prediction box and the bounding box can be represented by the coordinates of the upper left corner and the lower right corner of the box. For example, taking the anchor frame shown in Figure 4 as the first reference position as an example, for the convenience of description, the first reference position is called the first anchor frame. If the point in the upper left corner of the feature map is used as the coordinate origin, the horizontal direction is the positive direction of the abscissa axis (X axis), and the vertical downward direction is the positive direction of the ordinate axis (Y axis); the first anchor frame is expressed as: [a1(x a1 ,y a1 ),b1(x b1 ,y b1 )], where a1 represents the coordinates of the upper left corner of the first anchor frame, b1 represents the coordinates of the lower right corner of the first anchor frame, and the first prediction frame corresponding to the first anchor frame uses A1 and B1 Coordinate representation of two points, suppose Δx a1 represents the abscissa offset between the upper left corner position point a1 and A1, Δy a1 represents the ordinate offset between the upper left corner position point a1 and A1, Δx b1 represents the upper right corner position point The abscissa offset between b1 and B1, Δy b1 represents the ordinate offset between the upper right corner position point b1 and B1, then the coordinate information corresponding to the first anchor frame is expressed as [a1(Δx a1 ,Δy a1 ), b1(Δx b1 ,Δy b1 )], the coordinates of the first prediction frame corresponding to the first anchor frame can be determined from the coordinates of the first anchor frame itself and the coordinate information corresponding to the corresponding anchor frame: [A1(x a1 - Δx a1 , y a1 -Δy a1 ), B1(x b1 +Δx b1 ,y b1 +Δy b1 )].
对于图5所示的基于非锚框技术下的坐标变换方式与上述图4类似,只是第一基准位置为一个位置点,基于该位置点确定出的第一预测框用左上角和右下角两点的坐标表示,则第一基准位置对应的坐标信息为第一预测框左上角与该位置点的坐标偏移量,以及第一预测框右下角与该坐标点的坐标偏移量。The coordinate transformation method based on the non-anchor frame technology shown in FIG. 5 is similar to the above-mentioned FIG. 4, except that the first reference position is a position point, and the first prediction frame determined based on the position point uses two upper left corner and lower right corner. If the coordinates of the point are represented, the coordinate information corresponding to the first reference position is the coordinate offset between the upper left corner of the first prediction frame and the position point, and the coordinate offset between the lower right corner of the first prediction frame and the coordinate point.
该图像样本集可以为预先从网络或其它存储设备上获取的图像集合,也可以为通过电子设备的采集装置采集到的图像并经人工标注后构成的样本集,图像样本集中包含有多个图像样本,具体数量可以根据需求而设定。The image sample set can be an image set obtained in advance from the network or other storage devices, or can be a sample set formed by manually labeling images collected by a collection device of an electronic device. The image sample set includes multiple images Samples, the specific number can be set according to demand.
在将图像样本集中的图像样本输入学生网络模型中之前,上述图像样本集已经标注了目标的真值框,标注目标真值框的目的是为了框出图像样本中所包含的目标,比如,图像样本中包括行人、机动车、非机动车或人脸等目标,在本实施例中,以包围框的形式一一将人脸、行人、机动车和非机动车进行目标真值框的标注。在实际使用时,为了区分图像样本中不同类型的目标,可利用不同颜色的目标真值框对不同类型的目标进行标注,或者用不同的类别标签标注,如用1表示人脸框,3表示机动车框,5表示非机动车框等,在此不进行限定。Before the image samples in the image sample set are input into the student network model, the above-mentioned image sample set has already marked the ground-truth box of the target. The purpose of marking the ground-truth box of the target is to frame the target contained in the image sample, such as an image The samples include targets such as pedestrians, motor vehicles, non-motor vehicles, or faces. In this embodiment, faces, pedestrians, motor vehicles, and non-motor vehicles are marked with target ground-truth boxes in the form of bounding boxes. In actual use, in order to distinguish different types of targets in an image sample, target ground-truth boxes of different colors can be used to label different types of targets, or different category labels can be used to label them, such as 1 for the face frame and 3 for the face frame. For the motor vehicle frame, 5 represents a non-motor vehicle frame and the like, which are not limited here.
上述目标检测模型除了针对不同类型目标的检测之外,还可以仅针对同一类型的目标进行检测,例如目标检测模型仅检测行人、机动车、非机动车或人脸等类型的目标中的一种,则目标真值框仅将图像样本中该类型的目标框出来。In addition to the detection of different types of targets, the above target detection model can also only detect targets of the same type. For example, the target detection model only detects one of the types of targets such as pedestrians, motor vehicles, non-motor vehicles or faces. , the target ground-truth box only frames the target of this type in the image sample.
通过上述目标真值框,不仅可以表明图像样本中包含了哪些目标、目标属于何种类型,而且还能够获得该目标在图像样本中的位置坐标。Through the above-mentioned target ground-truth box, it can not only indicate which targets are included in the image sample and what type the target belongs to, but also obtain the position coordinates of the target in the image sample.
图像样本经过学生网络模型的目标检测处理后输出上述第一特征图,第一特征图的数量由模型设计决定,可以有多个,每个第一特征图可以包括C*H*W个像素,其中,C为特征图通道数,H为特征图的长度,W为特征图的宽。该第一特征图中每个像素点对应的基准位置(每个像素点可以对应一个或多个锚框,或者每个像素点对应一个或多个位置点)均以第一基准位置表示。学生网络模型通过对图像样本进行目标检测,能够获取第一特征图的每个像素点对应的第一基准位置对应的分数和第一基准位置对应的坐标信息。After the image sample is processed by the target detection of the student network model, the above-mentioned first feature map is output. The number of the first feature map is determined by the model design, and there can be multiple ones. Each first feature map can include C*H*W pixels, Among them, C is the number of feature map channels, H is the length of the feature map, and W is the width of the feature map. The reference position corresponding to each pixel in the first feature map (each pixel may correspond to one or more anchor boxes, or each pixel corresponds to one or more position points) is represented by the first reference position. The student network model can obtain the score corresponding to the first reference position corresponding to each pixel point of the first feature map and the coordinate information corresponding to the first reference position by performing target detection on the image sample.
如果学生网络模型被训练为检测多种类型的目标,则某个第一基准位置的分数为分别对应于多种类型的多个分数,多个分数中对应某个类型的分数表示该第一基准位置检出的目标属于该类型的分类概率值。If the student network model is trained to detect multiple types of targets, the score of a certain first reference position is multiple scores corresponding to multiple types, and the score corresponding to a certain type among the multiple scores represents the first benchmark The location-detected target belongs to the classification probability value of this type.
例如:学生网络模型被训练用于检测4种类型的目标,学生网络模型为基于锚框技术的网络模型,参见表1所示的学生检测结果中第一特征图的每个像素点对应的第一锚框的分数的示例:For example, the student network model is trained to detect 4 types of targets. The student network model is a network model based on anchor box technology. See the first feature map corresponding to each pixel in the student detection results shown in Table 1. Example of scores for an anchor box:
表1Table 1
Figure PCTCN2021101773-appb-000008
Figure PCTCN2021101773-appb-000008
其中,第一锚框1至第一锚框4为第一特征图中X个像素点(X小于等于4)分别对应的锚框,对 于每一个锚框,对应人脸、人体、机动车辆和车牌的分数(即分类概率分数,或者简称为分数或概率分数)如表1所示,根据表1所示分数可知,第一锚框1检出的目标属于人体类型的可能性比较大,而第一锚框2检出的目标属于机动车辆类型的可能性比较大,第一锚框3检出的目标属于人脸类型1的可能性比较大,第一锚框4检出的目标属于车牌类型的可能性较大。Among them, the first anchor frame 1 to the first anchor frame 4 are the anchor frames corresponding to X pixels (X is less than or equal to 4) in the first feature map respectively. For each anchor frame, the corresponding face, human body, motor vehicle and The score of the license plate (that is, the classification probability score, or simply the score or the probability score) is shown in Table 1. According to the scores shown in Table 1, it can be seen that the target detected by the first anchor box 1 is more likely to belong to the human body type, while The target detected by the first anchor frame 2 is more likely to belong to the motor vehicle type, the target detected by the first anchor frame 3 is more likely to belong to the face type 1, and the target detected by the first anchor frame 4 belongs to the license plate. type is more likely.
步骤S304,获取教师网络模型对图像样本的教师模型检测结果;其中,教师网络模型为预先训练好的模型,教师模型检测结果包括与上述图像样本的第二特征图的每个像素点对应的第二基准位置的分数和第二基准位置对应的坐标信息;其中,第一特征图和第二特征图的基准位置数相同,例如均为C*H*W,其中,每个特征图上有H*W个位置,第二特征图的通道数为C。Step S304, obtaining the teacher model detection result of the image sample by the teacher network model; wherein, the teacher network model is a pre-trained model, and the teacher model detection result includes the first point corresponding to each pixel of the second feature map of the above-mentioned image sample. The score of the second reference position and the coordinate information corresponding to the second reference position; wherein, the number of reference positions of the first feature map and the second feature map is the same, for example, both are C*H*W, wherein each feature map has H *W positions, the number of channels of the second feature map is C.
上述第二基准位置包括第二锚框或第二位置点。上述第二基准位置的分数同样为教师网络模型进行预测时输出的第二基准位置检出的目标属于各类型的分类概率值,该分数越大表明该第二基准位置检出的目标更有可能属于该类型。The above-mentioned second reference position includes a second anchor frame or a second position point. The score of the above-mentioned second reference position is also the classification probability value that the target detected in the second reference position output by the teacher network model for prediction belongs to each type. The larger the score, the more likely the target detected in the second reference position is. belong to this type.
第二基准位置对应的坐标信息同第一基准位置对应的坐标信息类似,第二基准位置对应的坐标信息为第二基准位置与该第二基准位置对应的第二预测框间的坐标偏移量,基于第二基准位置的坐标和该坐标偏移量可以确定出第二基准位置对应的第二预测框的坐标。The coordinate information corresponding to the second reference position is similar to the coordinate information corresponding to the first reference position, and the coordinate information corresponding to the second reference position is the coordinate offset between the second reference position and the second prediction frame corresponding to the second reference position , the coordinates of the second prediction frame corresponding to the second reference position can be determined based on the coordinates of the second reference position and the coordinate offset.
教师网络模型为利用上述图像样本集或其它训练图像样本集预先训练好的神经网络模型,其中,利用教师网络模型对上述图像样本进行预测输出第二特征图,第二特征图中每个像素点对应的基准位置均以第二基准位置表示,第二基准位置与第一基准位置一一对应。The teacher network model is a neural network model pre-trained by using the above-mentioned image sample set or other training image sample sets, wherein the teacher network model is used to predict the above-mentioned image samples and output a second feature map, and each pixel in the second feature map is used. The corresponding reference positions are all represented by second reference positions, and the second reference positions are in one-to-one correspondence with the first reference positions.
教师网络模型和学生网络模型可以是基于锚框进行目标检测的网络模型,该方式下,学生网络模型得到的第一特征图和通过教师网络模型得到的第二特征图的锚框数均相同。教师网络模型和学生网络模型也可以是不基于锚框进行目标检测的网络模型,该方式下,学生网络模型得到的第一特征图和通过教师网络模型得到的第二特征图的位置点数均相同。或者,教师网络模型和学生网络模型可以一个是基于锚框进行目标检测的网络模型,另一个为不基于锚框进行目标检测的网络模型,则其中一者的锚框数和另一者的位置点数相同。The teacher network model and the student network model may be network models for target detection based on anchor boxes. In this way, the number of anchor boxes of the first feature map obtained by the student network model and the second feature map obtained by the teacher network model are the same. The teacher network model and the student network model can also be network models that do not perform target detection based on anchor frames. In this way, the first feature map obtained by the student network model and the second feature map obtained by the teacher network model have the same number of location points. . Alternatively, one of the teacher network model and the student network model can be a network model for target detection based on anchor boxes, and the other is a network model for target detection not based on anchor boxes, then the number of anchor boxes of one and the position of the other Points are the same.
步骤S306,根据教师模型检测结果确定图像样本的标签分配信息。Step S306: Determine the label assignment information of the image sample according to the teacher model detection result.
因为教师网络模型为预先训练好的模型,第二基准位置的分数可以反映出该第二基准位置包含的目标属于每个目标类型的概率大小,同时,因为图像样本中标注有目标真值框,而基于目标真值框与第二基准位置对应的预测框的交叠比可确定出该第二基准位置具体为哪个目标的可能性大小,进而基于这些信息可以确定出第二基准位置所对应的目标,即第二基准位置对应的标签。而第一基准位置与第二基准位置一一对应,第二基准位置对应的标签即为第一基准位置对应的标签。Because the teacher network model is a pre-trained model, the score of the second reference position can reflect the probability that the target contained in the second reference position belongs to each target type. At the same time, because the image samples are marked with target ground-truth boxes, Based on the overlap ratio of the target ground-truth frame and the prediction frame corresponding to the second reference position, the possibility of which target the second reference position is specifically can be determined, and based on this information, the probability of the target corresponding to the second reference position can be determined. The target, that is, the label corresponding to the second reference position. The first reference position corresponds to the second reference position one-to-one, and the label corresponding to the second reference position is the label corresponding to the first reference position.
比如,以教师网络模型为基于锚框的神经网络模型为例,图像样本上有2个目标真值框,分别为目标1对应的目标真值框1和目标2对应的目标真值框2。第二特征图有100个第二锚框,则根据第二锚框对应每类目标的分数和坐标信息,以及目标真值框1和目标真值框2可以确定出目标1和目标2应分配给100个第二锚框中的哪几个,即哪几个第二锚框检出了目标1(为目标1的正样本)或目标2(为目标2的正样本)。For example, taking the teacher network model as an anchor box-based neural network model as an example, there are two target ground truth boxes on the image sample, namely target ground truth box 1 corresponding to target 1 and target ground truth box 2 corresponding to target 2. There are 100 second anchor boxes in the second feature map, then according to the score and coordinate information of the second anchor box corresponding to each type of target, as well as the target truth box 1 and target truth box 2, it can be determined that target 1 and target 2 should be allocated Which of the 100 second anchor boxes, that is, which second anchor boxes have detected target 1 (positive samples of target 1) or target 2 (positive samples of target 2).
因为第一特征图和第二特征图对应的基准位置数相同,第二基准位置与第一基准位置位置一一对应,因此第一基准位置的标签分配信息可以基于第二基准位置对目标的检出情况确定。若第一基准位置对应的第二基准位置检出了某个目标,则第一基准位置相对于该目标的标签为正样本,若第一基准位置对应的第二基准位置未检出某个目标,则第一基准位置相对于该目标的标签为负样本,若不关心(因不想用这个样本带来的损失进行梯度反传)或不确定第一基准位置对应的第二基准位置是否检出某个目标,则第一基准位置相对于该目标的标签为忽略样本。Because the number of reference positions corresponding to the first feature map and the second feature map is the same, and the second reference position corresponds to the first reference position one-to-one, the label assignment information of the first reference position can be based on the detection of the target at the second reference position. The situation is confirmed. If a target is detected at the second reference position corresponding to the first reference position, the label of the first reference position relative to the target is a positive sample. If no target is detected at the second reference position corresponding to the first reference position , then the label of the first reference position relative to the target is a negative sample. If you don’t care (because you don’t want to use the loss brought by this sample to carry out gradient backpropagation) or you are not sure whether the second reference position corresponding to the first reference position is detected A certain target, the label of the first reference position relative to the target is an ignored sample.
上述标签分配信息具体用于表征第一基准位置对应每个目标的样本类型,样本类型包括正样本和负样本。或者,样本类型包括正样本、负样本和忽略样本。上述正样本可以用1表示,负样本可以用0表示,忽略样本可以用-1表示。正样本表示该第一基准位置应该检出该目标,负样本表示该第一基准位置不该检出该目标,忽略样本表示不关心或不确定该第一基准位置是否应该检出该目标,它带来的梯度不反传。The above label assignment information is specifically used to represent the sample type of each target corresponding to the first reference position, and the sample type includes positive samples and negative samples. Alternatively, sample types include positive samples, negative samples, and ignore samples. The above positive samples can be represented by 1, negative samples can be represented by 0, and ignored samples can be represented by -1. Positive samples indicate that the first reference position should detect the target, negative samples indicate that the first reference position should not detect the target, and ignore samples indicate that they do not care or are uncertain whether the first reference position should detect the target. The resulting gradient is not backpropagated.
假设图像样本上标注真值框的数量为N个(即有N个目标),第二特征图上的第二基准位置的总数为A个(即有A个锚框)。比如,图像样本是(3,H,W)的矩阵,经过教师网络模型的多次卷积后,生成了M个特征图,每个特征图可以表示为(C,Hv,Wv)的矩阵,其中,v为特征图标识,C为每个特征图对应的通道数,当特征图的每个位置对应一个锚框时,每个特征图上有Hv×Wv个锚框,A是所有第二特征图上的锚框总数,即
Figure PCTCN2021101773-appb-000009
Assuming that the number of ground truth boxes on the image sample is N (that is, there are N targets), the total number of second reference positions on the second feature map is A (that is, there are A anchor boxes). For example, the image sample is a matrix of (3, H, W). After multiple convolutions of the teacher network model, M feature maps are generated, and each feature map can be represented as a matrix of (C, Hv, Wv), Among them, v is the feature map identifier, C is the number of channels corresponding to each feature map, when each position of the feature map corresponds to an anchor box, there are Hv×Wv anchor boxes on each feature map, A is all the second The total number of anchor boxes on the feature map, i.e.
Figure PCTCN2021101773-appb-000009
经上述分析,基于教师模型检测结果确定出的图像样本的标签分配信息的结果可以表示为一个N×A的矩阵,且每一列只有至多一个1,即每个第二锚框至多分配到一个目标,也可以不分配目标(成为负样本或忽略样本),每一行可以有任意个1,即一个目标可以被分配给一个或多个第二锚框,也可以不分配给任何第二锚框。After the above analysis, the result of the label assignment information of the image sample determined based on the detection result of the teacher model can be expressed as an N×A matrix, and each column has at most one 1, that is, each second anchor frame is assigned to at most one target. , or no target is assigned (becoming a negative sample or ignoring sample), and each row can have any number of 1s, that is, a target can be assigned to one or more second anchor boxes, or not assigned to any second anchor box.
下面可用标签分配信息矩阵X进行表示:The following can be represented by the label assignment information matrix X:
Figure PCTCN2021101773-appb-000010
Figure PCTCN2021101773-appb-000010
其中,X ij取值为0,1或-1,其中,1为正样本标签对应的值,0为负样本标签对应的值,-1为忽略样本标签对应的值;i的取值为1到N之间的正整数,j的取值为1到A之间的正整数。 Among them, X ij takes the value of 0, 1 or -1, where 1 is the value corresponding to the positive sample label, 0 is the value corresponding to the negative sample label, -1 is the value corresponding to the ignored sample label; the value of i is 1 A positive integer between 1 and A, and the value of j is a positive integer between 1 and A.
具体实现时,根据教师网络模型的检测结果得出在上述矩阵中每个位置对应的标签分配信息。During specific implementation, the label assignment information corresponding to each position in the above matrix is obtained according to the detection result of the teacher network model.
步骤S308,根据标签分配信息和学生模型检测结果计算学生网络模型的损失函数值。Step S308: Calculate the loss function value of the student network model according to the label assignment information and the student model detection result.
本实施例中的学生网络模型训练过程中的损失函数值不单单依赖于学生网络模型自身输出的检测结果,还基于教师网络模型检测结果确定出的标签分配信息,通过基于教师网络模型的检测结果能够较准确的进行标签分配,可有效缓解人工设计标签分配规则的主观性对学生网络模型训练效果的影响,使得该损失函数值的计算更准确,为学生网络模型的参数调整提供了可靠的数据。The value of the loss function in the training process of the student network model in this embodiment not only depends on the detection result output by the student network model itself, but also on the label assignment information determined based on the detection result of the teacher network model. The label assignment can be performed more accurately, which can effectively alleviate the influence of the subjectivity of manual design of label assignment rules on the training effect of the student network model, make the calculation of the loss function value more accurate, and provide reliable data for the parameter adjustment of the student network model. .
具体地,学生网络模型的损失函数值是基于学生网络模型的损失函数以及第一特征图中每个像素对应的第一基准位置的分数和上述标签分配信息计算得到的值。Specifically, the loss function value of the student network model is a value calculated based on the loss function of the student network model, the score of the first reference position corresponding to each pixel in the first feature map, and the above label assignment information.
步骤S310,基于损失函数值调整学生网络模型的参数继续训练,直至得到训练好的学生网络模型;Step S310, adjust the parameters of the student network model based on the loss function value and continue training until the trained student network model is obtained;
步骤S312,将训练好的学生网络模型作为目标检测模型。Step S312, using the trained student network model as the target detection model.
本申请实施例提供的上述目标检测方法,其中,配置成检测图像的目标检测模型的训练过程为:将图像样本集中的图像样本输入学生网络模型,得到与图像样本的第一特征图的每个像素点对应的学生模型检测结果;获取教师网络模型对该图像样本的教师模型检测结果;其中,该教师网络模型为预先训练好的模型,上述教师模型检测结果包括该图像样本对应的第二特征图中每个像素对应的第二基准位置的分数和第二基准位置对应的坐标信息;根据教师模型检测结果确定图像样本的标签分配信息,以及根据该标签分配信息和学生模型检测结果计算学生网络模型的损失函数值,基于损失函数值调整学生网络模型的参数继续训练,直至得到训练好的学生网络模型,将该训练好了学生网络模型作为目标检测模型, 这种训练过程的标签分配方式更客观化和合理化,使得训练得到的目标检测模型的可靠性更高,进而提升了目标检测的准确率,相比于人工设计的标签分配方法,本实施例标签分配方式更为高效,且有效缓解人工设计标签分配规则的主观性对学生网络模型训练效果的影响,该标签分配方式可适配基于锚框和非基于锚框的网络,比针对某种网络设计的标签分配方法更有普适性。In the above-mentioned target detection method provided by the embodiment of the present application, the training process of the target detection model configured to detect images is as follows: input the image samples in the image sample set into the student network model, and obtain each of the first feature maps corresponding to the image samples. The detection result of the student model corresponding to the pixel point; the detection result of the teacher model of the image sample by the teacher network model is obtained; wherein, the teacher network model is a pre-trained model, and the detection result of the teacher model includes the second feature corresponding to the image sample The score of the second reference position corresponding to each pixel in the figure and the coordinate information corresponding to the second reference position; determine the label assignment information of the image sample according to the teacher model detection result, and calculate the student network according to the label assignment information and the student model detection result The loss function value of the model, adjust the parameters of the student network model based on the loss function value and continue training until the trained student network model is obtained, and the trained student network model is used as the target detection model. The label assignment method of this training process is more efficient. Objectification and rationalization make the target detection model obtained by training more reliable, thereby improving the accuracy of target detection. Compared with the manually designed label allocation method, the label allocation method in this embodiment is more efficient and effectively alleviates the The influence of the subjectivity of artificially designed label assignment rules on the training effect of the student network model. This label assignment method can be adapted to both anchor box-based and non-anchor box-based networks, and is more universal than the label assignment method designed for a certain network. .
本实施例还提供了另一种目标检测模型的训练方法,该方法在上述方法的基础上实现,重点描述确定图像样本的标签分配信息的具体实施方式,以学生网络模型和教师网络模型均为基于锚框技术的网络模型为例,如图6所示的另一种目标检测模型的训练方法的流程图,示意出对学生网络模型进行训练的实现方式,其中,训练好的学生网络模型即为目标检测模型,具体而言,可以参照如下步骤S602~步骤S616实现:This embodiment also provides another method for training a target detection model, which is implemented on the basis of the above method, and focuses on the specific implementation of determining the label assignment information of the image samples. The student network model and the teacher network model are both An example of a network model based on anchor frame technology, as shown in Figure 6 is a flowchart of another target detection model training method, which illustrates the implementation of training the student network model, wherein the trained student network model is For the target detection model, specifically, it can be implemented with reference to the following steps S602 to S616:
步骤S602,将图像样本集中的图像样本输入学生网络模型,得到与该图像样本的第一特征图的每个像素点对应的学生模型检测结果;其中,图像样本标注有目标真值框,该学生模型检测结果包括与第一特征图的每个像素点对应的第一锚框的分数和该第一锚框对应的坐标信息;Step S602, input the image samples in the image sample set into the student network model, and obtain the student model detection result corresponding to each pixel of the first feature map of the image sample; The model detection result includes the score of the first anchor frame corresponding to each pixel of the first feature map and the coordinate information corresponding to the first anchor frame;
步骤S604,获取教师网络模型对图像样本的教师模型检测结果;其中,教师网络模型为预先训练好的模型,教师模型检测结果包括与上述图像样本的第二特征图的每个像素点对应的第二锚框的分数和该第二锚框对应的坐标信息;Step S604, obtaining the teacher model detection result of the image sample by the teacher network model; wherein, the teacher network model is a pre-trained model, and the teacher model detection result includes the first pixel corresponding to each pixel point of the second feature map of the above-mentioned image sample. The score of the second anchor frame and the coordinate information corresponding to the second anchor frame;
利用教师网络模型能够得出第二锚框对应的分数,其中,该分数为教师网络模型进行目标检测时输出的每个第二锚框对应每个目标的概率值。The score corresponding to the second anchor frame can be obtained by using the teacher network model, wherein the score is the probability value of each second anchor frame corresponding to each target output by the teacher network model for target detection.
步骤S606,对于每个第二锚框,分别计算该第二锚框对应的第二预测框与图像样本的每个目标真值框的交叠比,得到矩阵IoU;Step S606, for each second anchor frame, calculate the overlap ratio of the second prediction frame corresponding to the second anchor frame and each target truth frame of the image sample to obtain the matrix IoU;
Figure PCTCN2021101773-appb-000011
Figure PCTCN2021101773-appb-000011
其中,i取值[1,N],j取值[1,A],N为标注真值框的个数,A为第二特征图包括的第二锚框的个数。Among them, i takes the value [1, N], j takes the value [1, A], N is the number of marked ground-truth boxes, and A is the number of second anchor boxes included in the second feature map.
上述交叠比即为IoU(Intersection over Union),表示两个框面积的重叠度,通常,交叠比的取值为[0,1],当第二预测框与人工标注的目标真值框完全没有重合时,其交叠比为0,当第二预测框与人工标注的目标真值框完全重合时,其交叠比为1,其余情况交叠比是介于0和1之间的任意浮点数。The above overlap ratio is IoU (Intersection over Union), which represents the degree of overlap between the two frame areas. Usually, the value of the overlap ratio is [0, 1]. When the second prediction frame and the manually marked target ground truth frame When there is no overlap at all, its overlap ratio is 0. When the second prediction frame completely overlaps with the manually labeled target ground-truth frame, its overlap ratio is 1. In other cases, the overlap ratio is between 0 and 1. Arbitrary floating point number.
步骤S608,基于第二锚框对应的第二预测框与每个目标真值框的交叠比和第二锚框的分数,确定第二锚框对于每个目标真值框对应目标的预测质量;其中,预测质量用于表征第二锚框检测出的是该目标真值框对应目标的概率大小。Step S608, based on the overlap ratio of the second prediction frame corresponding to the second anchor frame and each target ground-truth frame and the score of the second anchor frame, determine the prediction quality of the second anchor frame for the target corresponding to each target ground-truth frame ; wherein, the prediction quality is used to characterize the probability that the second anchor frame detects the target corresponding to the target ground truth frame.
对于教师网络模型得到的第二锚框中的任意一个第二锚框,计算该第二锚框对应的第二预测框与图像样本的每个目标真值框计算交叠比,可得到N×1的交叠比矩阵,其中,交叠比的计算公式为:For any second anchor frame in the second anchor frame obtained by the teacher network model, calculating the overlap ratio between the second prediction frame corresponding to the second anchor frame and each target ground-truth frame of the image sample, N× The overlap ratio matrix of 1, where the formula for calculating the overlap ratio is:
Figure PCTCN2021101773-appb-000012
Figure PCTCN2021101773-appb-000012
其中,IoU i为第二锚框对应的第二预测框与第i个目标真值框的交叠比,i为正整数,其取值为[1,N],N为目标真值框对应的目标数。pred boxe表示第二锚框,gt boxe i表示第i个目标真值框。 Among them, IoU i is the overlap ratio between the second prediction frame corresponding to the second anchor frame and the i-th target ground-truth frame, i is a positive integer, and its value is [1, N], and N is the corresponding target ground-truth frame target number. The pred box represents the second anchor box, and the gt box i represents the ith target ground-truth box.
对于学生网络被训练为只检测一种类型的目标如仅检测人脸这一类型的目标的场景,图像样本中可以包含一个或多个人脸,每个人脸对应一个目标真值框,这种场景下,第二锚框的分数s j为一个数值。 可以利用公式q ij=(s j) 1-α*(IoU ij) α计算每个第二锚框对于每个目标真值框对应目标的预测质量,得到预测质量矩阵Q;其中,q ij取值为[0,1],α为取值在[0,1]区间的预设超参数,s j为第j个第二锚框的分数(取值为[0,1]),IoU ij为第j个第二锚框对应的第二预测框与第i个目标真值框的交叠比(取值为[0,1]),为矩阵IoU中第i行第j列的元素; For scenarios where the student network is trained to detect only one type of target, such as only faces, the image sample can contain one or more faces, and each face corresponds to a target ground-truth box. This scenario Next, the score s j of the second anchor box is a numerical value. The prediction quality of each second anchor frame for the target corresponding to each target ground-truth frame can be calculated by using the formula q ij =(s j ) 1-α *(IoU ij ) α , and the prediction quality matrix Q is obtained; wherein, q ij takes The value is [0,1], α is a preset hyperparameter with a value in the [0,1] interval, s j is the score of the jth second anchor box (valued at [0,1]), IoU ij is the overlap ratio of the second prediction frame corresponding to the j-th second anchor frame and the i-th target ground-truth frame (the value is [0,1]), which is the element of the i-th row and the j-th column in the matrix IoU;
Figure PCTCN2021101773-appb-000013
Figure PCTCN2021101773-appb-000013
对于学生网络被训练为检测多种类型的目标如检测人脸、人体、机动车辆和非机动车辆的场景,图像样本中可以包含人脸、人体、机动车辆和非机动车辆中的任意一种或多种类型的一个或多个目标,每个目标对应一个目标真值框,该场景下,图像样本还标注有各目标真值框对应的目标类型,第二锚框的分数s ij为对应于学生网络可检测的目标类型一一对应的多个数值。可以利用公式q ij=(s ij) 1-α*(IoU ij) α计算每个第二锚框对于每个所述目标真值框对应目标的预测质量,得到预测质量矩阵Q;其中,q ij取值为[0,1],α为取值在[0,1]区间的预设超参数,s ij为第j个第二锚框的分数中对应于当前目标类型的分数(取值为[0,1]),该当前目标类型是指第i个目标真值框对应的目标类型,IoU ij为第j个第二锚框对应的第二预测框与第i个目标真值框的交叠比(取值为[0,1]),为矩阵IoU中第i行第j列的元素; For scenes where the student network is trained to detect multiple types of targets, such as detecting faces, human bodies, motor vehicles and non-motor vehicles, the image samples can contain any one of faces, human bodies, motor vehicles and non-motor vehicles, or One or more targets of multiple types, each target corresponds to a target ground-truth frame, in this scenario, the image sample is also marked with the target type corresponding to each target ground-truth frame, and the score s ij of the second anchor frame is corresponding to A number of values corresponding to the target types that the student network can detect. The prediction quality of each second anchor frame for each target corresponding to the target ground-truth frame can be calculated by using the formula q ij =(s ij ) 1-α *(IoU ij ) α , and a prediction quality matrix Q is obtained; wherein, q ij is [0,1], α is a preset hyperparameter with a value in the [0,1] interval, s ij is the score corresponding to the current target type in the score of the jth second anchor box (value is [0,1]), the current target type refers to the target type corresponding to the i-th target ground-truth frame, and IoU ij is the second prediction frame corresponding to the j-th second anchor frame and the i-th target ground-truth frame The overlap ratio of (value [0,1]) is the element of the i-th row and the j-th column in the matrix IoU;
Figure PCTCN2021101773-appb-000014
Figure PCTCN2021101773-appb-000014
由于上述预测质量既考虑了交叠比又考虑了分数s,即第二锚框对应于目标的置信度分数,客观合理且不依赖于锚框等,通用性好,有利于标签分配信息的确定。Since the above prediction quality considers both the overlap ratio and the score s, that is, the confidence score of the second anchor frame corresponding to the target, which is objective and reasonable and does not depend on the anchor frame, etc., has good versatility, and is conducive to the determination of label assignment information .
除了可利用交叠比的方法得到预测质量之外,还可以用教师网络模型的第二锚框分数和坐标信息,与每个目标真值框计算损失函数(与学生网络模型计算损失函数的方法一致),得到第二特征图上每个像素点对应的第二锚框的预测质量q=e^(-loss m),loss m表示第二特征图上第m个位置的损失函数值。上述仅是给出了两种计算预测质量的方法,在本实施例中,不对计算预测质量的方法进行限定。 In addition to using the overlap ratio method to obtain the prediction quality, the second anchor box score and coordinate information of the teacher network model can also be used to calculate the loss function with each target ground-truth box (and the method of calculating the loss function of the student network model). Consistent), obtain the prediction quality q=e^(-loss m ) of the second anchor box corresponding to each pixel on the second feature map, where loss m represents the loss function value of the mth position on the second feature map. The above only provides two methods for calculating the prediction quality. In this embodiment, the method for calculating the prediction quality is not limited.
步骤S610,基于第二锚框对于每个目标真值框对应目标的预测质量确定第二锚框的标签分配信息。Step S610: Determine label assignment information of the second anchor frame based on the prediction quality of the second anchor frame for the target corresponding to each target ground-truth frame.
由于第二锚框与目标真值框的计算可得N×A的交叠比矩阵,则利用上述计算预测质量的公式,同样得到N×A的预测质量矩阵,对于每一个第二锚框,基于该预测质量确定标签分配信息的具体实现过程可由步骤A1-步骤A3实现:Since the calculation of the second anchor frame and the target ground-truth frame can obtain an N×A overlap ratio matrix, the above formula for calculating the prediction quality can also be used to obtain an N×A prediction quality matrix. For each second anchor frame, The specific implementation process of determining the label assignment information based on the predicted quality can be implemented by steps A1-A3:
步骤A1,从第二锚框对于每个目标真值框对应目标的预测质量选择最大预测质量;Step A1, select the maximum prediction quality from the second anchor frame for the prediction quality of the target corresponding to each target truth frame;
步骤A2,判断最大预测质量是否大于第一预设质量值;Step A2, judging whether the maximum predicted quality is greater than the first preset quality value;
步骤A3,如果是,为第二锚框分配最大预测质量对应目标的正样本标签,而对于该第二锚框对应其它目标的标签,则可以都为负标签,也可以有的为负标签有的为忽略标签。Step A3, if yes, assign the positive sample label of the target corresponding to the maximum prediction quality to the second anchor frame, and for the labels of the second anchor frame corresponding to other targets, all may be negative labels, or some may be negative labels. for ignore tags.
上述第一预设质量值为根据实际情况进行设置的,在此不进行限定。The above-mentioned first preset quality value is set according to the actual situation, and is not limited here.
在实际使用时,可以基于以下方式分配负标签和忽略标签。当最大预测质量小于第一预设质量值时,判断最大预测质量是否大于第二预设质量值;其中,第一预测质量值大于第二预测质量值;如果是,为第二锚框分配忽略样本标签;如果否,为第二锚框分配负样本标签。这里的忽略样本标签和负样本标签 都是针对最大预测质量对应的目标而言的,如果最大预测质量对应的目标为负标签,则第二锚框大概率为背景等场景区域,对应其它目标也为负标签,如果最大预测质量对应的目标为忽略标签,则第二锚框对应其它目标可能为负标签,也可能为忽略标签,对应其它目标的标签还可以基于该第二锚框与其它目标计算出的预测质量、上述第一预测质量值和第二预测质量值确定。In actual use, negative labels and ignore labels can be assigned based on the following ways. When the maximum prediction quality is less than the first preset quality value, determine whether the maximum prediction quality is greater than the second preset quality value; wherein, the first prediction quality value is greater than the second prediction quality value; if so, assign ignore for the second anchor frame sample label; if no, assign a negative sample label to the second anchor box. The ignored sample labels and negative sample labels here are both for the target corresponding to the maximum prediction quality. If the target corresponding to the maximum prediction quality is a negative label, the second anchor frame is probably the background and other scene areas, and other targets are also is a negative label. If the target corresponding to the maximum prediction quality is an ignore label, the second anchor frame may be a negative label or an ignore label corresponding to other targets. The labels corresponding to other targets can also be based on the second anchor frame and other targets. The calculated prediction quality, the above-mentioned first prediction quality value and the second prediction quality value are determined.
其中,分配的样本标签可以用数值表示,也可以用字母或文字表示,在此不进行限定。The assigned sample labels may be represented by numerical values, letters or characters, which are not limited here.
以样本图像包含4个目标,其中,4个目标中的目标1-目标4分别为人脸、机动车、行人和非机动车类型为例进行说明,假设利用教师网络模型预测出图像样本的三个第二锚框,三个第二锚框与每个目标真值框计算出的预测质量用下述矩阵的形式进行表示:
Figure PCTCN2021101773-appb-000015
其中,矩阵的第一行表示目标1这一被标注目标真值框的目标分别对应每个第二锚框(例如:第二锚框1、第二锚框2、第二锚框3)的预测质量,矩阵的第二行表示目标2这一被标注目标真值框的目标分别对应三个第二锚框的预测质量,矩阵的第三行表示目标3这一被标注目标真值框的目标分别对应三个第二锚框的预测质量,矩阵的第四行表示目标4这一被标注目标真值框的目标分别对应三个第二锚框的预测质量。
Take the sample image containing 4 targets, among which target 1-target 4 in the 4 targets are face, motor vehicle, pedestrian and non-motor vehicle type as an example, it is assumed that the teacher network model is used to predict three image samples. The second anchor box, the three second anchor boxes and the prediction quality calculated by each target ground-truth box are expressed in the form of the following matrix:
Figure PCTCN2021101773-appb-000015
Among them, the first row of the matrix represents the target 1, the target marked with the target ground-truth box, which corresponds to each second anchor box (for example: the second anchor box 1, the second anchor box 2, the second anchor box 3). Prediction quality, the second row of the matrix represents target 2, the target marked with the target ground-truth box, respectively corresponds to the prediction quality of the three second anchor boxes, and the third row of the matrix represents target 3, the marked target ground-truth frame. The targets correspond to the prediction quality of the three second anchor boxes respectively, and the fourth row of the matrix indicates that the target 4, the target marked with the target ground-truth box, corresponds to the prediction quality of the three second anchor boxes, respectively.
在本实施例中设定第一预设质量值为0.7,第二预设质量值为0.4,由于第一列的最大预测质量0.8大于第一预设质量值,则说明第二锚框1对应的目标是目标1,可为第二锚框1分配目标1的正样本标签和其余目标的负样本标签;由于第二列的最大预测质量0.3小于第二预设质量值,可为第二锚框2分配所有目标的负样本标签;由于第三列的最大预测质量0.5在第一预设质量值和第二预设质量值之间,可为第二锚框3分配目标4的忽略样本标签和其它目标的负样本标签。In this embodiment, the first preset quality value is set to 0.7, and the second preset quality value is 0.4. Since the maximum prediction quality of the first column of 0.8 is greater than the first preset quality value, it means that the second anchor frame 1 corresponds to The target of is target 1, and the positive sample label of target 1 and the negative sample label of the remaining targets can be assigned to the second anchor box 1; since the maximum prediction quality 0.3 of the second column is less than the second preset quality value, it can be used for the second anchor box. Box 2 assigns negative sample labels for all targets; since the maximum predicted quality 0.5 of the third column is between the first and second preset quality values, the second anchor box 3 can be assigned the ignore sample label for target 4 and negative sample labels for other targets.
步骤S612,根据标签分配信息和学生模型检测结果计算学生网络模型的损失函数值;Step S612, calculate the loss function value of the student network model according to the label assignment information and the student model detection result;
步骤S614,基于损失函数值调整学生网络模型的参数继续训练,直至得到训练好的学生网络模型;Step S614, adjust the parameters of the student network model based on the loss function value and continue training until the trained student network model is obtained;
步骤S616,将训练好的学生网络模型作为目标检测模型。Step S616, using the trained student network model as the target detection model.
本公开实施例提供的上述目标检测模型的训练方法,能够根据第二锚框对应的第二预测框与每个目标真值框计算交叠比,根据第二锚框对应的第二预测框与每个目标真值框的交叠比和第二锚框的分数,确定第二锚框对于每个目标真值框对应目标的预测质量;并根据预测质量准确得到第二锚框的标签分配信息,根据第二锚框的标签分配信息对第一特征图进行标签分配,使得标签分配客观化和合理化,可有效缓解人工设计标签分配规则的主观性对学生网络模型训练效果的影响,进而提升了学生网络模型的性能。The training method for the above target detection model provided by the embodiment of the present disclosure can calculate the overlap ratio according to the second prediction frame corresponding to the second anchor frame and each target ground-truth frame, and according to the second prediction frame corresponding to the second anchor frame and the The overlap ratio of each target ground-truth frame and the score of the second anchor frame determine the prediction quality of the second anchor frame for the target corresponding to each target ground-truth frame; and accurately obtain the label assignment information of the second anchor frame according to the predicted quality , according to the label assignment information of the second anchor frame, the label assignment is performed on the first feature map, which makes the label assignment objective and rational, which can effectively alleviate the influence of the subjectivity of manually designing label assignment rules on the training effect of the student network model, thereby improving the Performance of Student Network Models.
本实施例还提供了另一种目标检测模型的训练方法,该方法在上述方法的基础上实现,重点描述在多个目标类型应用场景下,确定图像样本的标签分配信息的具体实施方式,以学生网络模型和教师网络模型均为基于锚框技术的网络模型为例,如图7所示的另一种目标检测模型的训练方法的流程图,主要包括如下步骤S702~步骤S716:This embodiment also provides another method for training a target detection model, which is implemented on the basis of the above method, and focuses on the specific implementation of determining the label assignment information of an image sample in multiple target type application scenarios, so as to The student network model and the teacher network model are both network models based on the anchor frame technology. For example, as shown in FIG. 7, the flowchart of another target detection model training method mainly includes the following steps S702 to S716:
步骤S702,将图像样本集中的图像样本输入学生网络模型,得到与该图像样本的第一特征图的每个像素点对应的学生模型检测结果;其中,图像样本标注有目标真值框,该学生模型检测结果包括与第一特征图的每个像素点对应的第一锚框的分数和第一锚框对应的坐标信息;Step S702, input the image samples in the image sample set into the student network model, and obtain the student model detection result corresponding to each pixel of the first feature map of the image sample; The model detection result includes the score of the first anchor frame corresponding to each pixel of the first feature map and the coordinate information corresponding to the first anchor frame;
步骤S704,获取教师网络模型对图像样本的教师模型检测结果;其中,教师网络模型为预先训练好的模型,教师模型检测结果包括与上述图像样本的第二特征图的每个像素点对应的第二锚框的分数和第 二锚框对应的坐标信息;Step S704, obtaining the teacher model detection result of the image sample by the teacher network model; wherein, the teacher network model is a pre-trained model, and the teacher model detection result includes the first pixel corresponding to each pixel point of the second feature map of the above-mentioned image sample. The score of the second anchor frame and the coordinate information corresponding to the second anchor frame;
步骤S706,将图像样本的每个目标真值框与每个第二锚框对应的第二预测框计算交叠比,得到矩阵IoU:Step S706, calculating the overlap ratio between each target ground-truth frame of the image sample and the second prediction frame corresponding to each second anchor frame to obtain the matrix IoU:
Figure PCTCN2021101773-appb-000016
Figure PCTCN2021101773-appb-000016
其中,i取值[1,N],j取值[1,A],N为标注真值框的个数,A为第二特征图包括的第二锚框的个数;本实施例中,图像样本的标注真值框有N个,第二锚框有A个,通过计算每个标注真值框与每个第二锚框对应的第二预测框的交叠比,可得到交叠比矩阵IoU。Among them, i takes the value [1, N], j takes the value [1, A], N is the number of marked ground truth boxes, and A is the number of second anchor boxes included in the second feature map; in this embodiment, , there are N labeled ground truth boxes and A second anchor boxes of the image sample. By calculating the overlap ratio of each labeled ground truth box and the second prediction box corresponding to each second anchor box, the overlap can be obtained. than the matrix IoU.
步骤S708,计算每个第二锚框对于每个目标真值框对应目标的预测质量,得到预测质量矩阵Q;Step S708, calculate the prediction quality of each second anchor frame for the target corresponding to each target truth frame, and obtain the prediction quality matrix Q;
Figure PCTCN2021101773-appb-000017
Figure PCTCN2021101773-appb-000017
具体地,可利用公式q ij=(s ij) 1-α*(IoU ij) α计算每个第二锚框对于每个目标真值框对应目标的预测质量,得到预测质量矩阵Q,Q同样为一N×A的矩阵。其中,q ij取值为[0,1],表示第j个第二锚框与第i个目标真值框对应目标的预测质量;α为取值在[0,1]区间的预设超参数,s ij为第j个第二锚框的分数中对应于当前目标类型的分数,所述当前目标类型是指第i个目标真值框对应的目标类型,IoU ij为第j个第二锚框对应的第二预测框与第i个目标真值框的交叠比,为矩阵IoU中第i行第j列的元素。 Specifically, the formula q ij =(s ij ) 1-α *(IoU ij ) α can be used to calculate the prediction quality of each second anchor frame for the target corresponding to each target ground-truth frame to obtain the prediction quality matrix Q, and Q is the same is an N×A matrix. Among them, q ij is [0, 1], which represents the prediction quality of the target corresponding to the j-th second anchor frame and the i-th target ground-truth frame; α is the preset value in the [0, 1] interval parameter, s ij is the score corresponding to the current target type in the score of the j-th second anchor frame, the current target type refers to the target type corresponding to the i-th target ground-truth frame, and IoU ij is the j-th second anchor frame The overlap ratio of the second prediction box corresponding to the anchor box and the i-th target ground-truth box is the element of the i-th row and the j-th column in the matrix IoU.
步骤S710,将预测质量矩阵转换为标签分配信息对应的标签分配矩阵X:Step S710, convert the prediction quality matrix into the label assignment matrix X corresponding to the label assignment information:
Figure PCTCN2021101773-appb-000018
Figure PCTCN2021101773-appb-000018
其中,X ij取值为0,1或-1,其中,1为正样本标签对应的值,0为负样本标签对应的值,-1为忽略样本标签对应的值。 Among them, X ij is 0, 1 or -1, where 1 is the value corresponding to the positive sample label, 0 is the value corresponding to the negative sample label, and -1 is the value corresponding to the ignored sample label.
预测质量矩阵转换为标签分配矩阵的具体过程可从行和列两个不同的角度进行转换,下面以对预测质量矩阵的列进行转换为例说明,转换的具体过程可通过步骤B1-步骤B4实现:The specific process of converting the predicted quality matrix into the label assignment matrix can be converted from two different angles: row and column. The following is an example of converting the columns of the predicted quality matrix. The specific process of the conversion can be implemented through steps B1 to B4. :
步骤B1,对于预测质量矩阵中的第j列,均从该列元素中选择出值最大的元素q mjStep B1, for the jth column in the prediction quality matrix, select the element q mj with the largest value from the elements of this column;
同样以上述得到的预测质量矩阵为
Figure PCTCN2021101773-appb-000019
第一预设质量值为0.7,第二预设质量值为0.4为例进行说明,其中,第一列的最大的元素为q 31,第二列的最大的元素为q 22,第三列的最大的元素为q 43
Similarly, the prediction quality matrix obtained above is
Figure PCTCN2021101773-appb-000019
The first preset quality value is 0.7 and the second preset quality value is 0.4 as an example for illustration, wherein the largest element in the first column is q 31 , the largest element in the second column is q 22 , and the The largest element is q 43 .
步骤B2,如果q mj大于t p,设置q mj对应的标签分配矩阵中的元素X mj等于第一值;对于第j列中除q mj以外的元素q ij,如果q ij小于t n,设置q ij对应的X ij等于第二值;如果q ij小于或等于t p,且大于或等于t n,设置q ij对应的X ij等于第三值; Step B2, if q mj is greater than t p , set the element X mj in the label assignment matrix corresponding to q mj equal to the first value; for the elements q ij in the jth column except q mj , if q ij is less than t n , set X ij corresponding to q ij is equal to the second value; if q ij is less than or equal to t p and greater than or equal to t n , set X ij corresponding to q ij equal to the third value;
其中,第一值表示正样本,第二值表示负样本,第三值表示忽略样本,在实际使用时,可以用数值1作为正样本的标签,数值0作为负样本的标签,数值-1作为忽略样本的标签,或者,使用其他字符来作为上述样本的标签,在此不进行限定。Among them, the first value represents positive samples, the second value represents negative samples, and the third value represents ignore samples. In actual use, the value 1 can be used as the label of positive samples, the value 0 can be used as the label of negative samples, and the value -1 can be used as the label of negative samples. The label of the sample is ignored, or other characters are used as the label of the above-mentioned sample, which is not limited here.
接续前例,以t p为第一预设质量值0.7,t n为第二预设质量值0.4为例,其中,上述预测质量矩阵中的第一列预测质量中最大元素为q 31=0.8大于第一预设质量值0.7,则将标签分配矩阵与q mj的对应位置处的X mj设置为第一值,而对于第一列除q 31之外的其它元素q 11、q 21和q 41的预设质量均小于第二预设质量值0.4,则可将标签分配矩阵对应q 11、q 21和q 41位置处的X 11、X 21和X 41设置为第二值;而如果q ij小于或等于第一预设质量值0.7,且大于或等于第二预设质量值0.4,设置q ij对应的X ij等于第三值。 Continuing from the previous example, take t p as the first preset quality value of 0.7 and t n as the second preset quality value of 0.4 as an example, wherein the largest element in the prediction quality of the first column in the above prediction quality matrix is that q 31 =0.8 is greater than The first preset quality value is 0.7, then X mj at the corresponding position of the label assignment matrix and q mj is set to the first value, and for the other elements q 11 , q 21 and q 41 in the first column except q 31 The preset quality of all is less than the second preset quality value of 0.4, then X 11 , X 21 and X 41 at the positions corresponding to q 11 , q 21 and q 41 of the label assignment matrix can be set to the second value; and if q ij is less than or equal to the first preset quality value of 0.7 and greater than or equal to the second preset quality value of 0.4, set X ij corresponding to q ij equal to the third value.
对于第二列和第三列中通过预测质量与预设质量值的比较确定对应位置上标签分配信息的过程同上,在此不进行一一赘述。The process of determining the label assignment information at the corresponding position by comparing the predicted quality with the preset quality value in the second column and the third column is the same as the above, and will not be repeated here.
步骤B3,如果q mj小于t n,设置标签分配矩阵中的第j列中的元素X ij等于第二值; Step B3, if q mj is less than t n , set the element X ij in the jth column of the label assignment matrix equal to the second value;
如果最大的q mj小于t n,则说明该列中其它预测质量均小于t n,比如,上述预测质量矩阵的第二列的预测质量最大值为0.3,均小于第二预设质量值0.4,则转换后得到的标签分配矩阵第二列均为第二值。 If the largest q mj is smaller than t n , it means that other prediction qualities in this column are smaller than t n , for example, the maximum prediction quality of the second column of the above prediction quality matrix is 0.3, which is smaller than the second preset quality value of 0.4, Then the second column of the label assignment matrix obtained after conversion is the second value.
步骤B4,如果q mj小于或等于t p,且大于或等于t n,设置q mj对应标签分配矩阵中的X mj等于第三值;对于第j列中除q mj以外的元素q ij,如果q ij小于t n,设置q ij对应标签分配矩阵中的X ij等于第二值;如果q ij小于或等于t p,且大于或等于t n,设置q ij对应标签分配矩阵中的X ij等于第三值;其中,t p>t n,t p和t n分别为预设阈值。 Step B4, if q mj is less than or equal to t p and greater than or equal to t n , set X mj in the label assignment matrix corresponding to q mj to be equal to the third value; for the elements q ij in the jth column other than q mj , if If q ij is less than t n , set X ij in the label assignment matrix corresponding to q ij equal to the second value; if q ij is less than or equal to t p and greater than or equal to t n , set X ij in the label assignment matrix corresponding to q ij equal to The third value; wherein, t p >t n , and t p and t n are preset thresholds respectively.
接续前例,上述预测质量矩阵的第三列,最大值为0.5,介于0.7和0.4之间,则转换后该位置对应的值为-1,第三列其余元素的值因均小于0.4,所以均为0。Continuing from the previous example, in the third column of the above prediction quality matrix, the maximum value is 0.5, between 0.7 and 0.4, then the value corresponding to this position after conversion is -1, and the values of the remaining elements in the third column are all less than 0.4, so Both are 0.
因此,对于上述预测质量矩阵:
Figure PCTCN2021101773-appb-000020
经预测质量矩阵转换得到的标签分配矩阵为:
Figure PCTCN2021101773-appb-000021
So for the above prediction quality matrix:
Figure PCTCN2021101773-appb-000020
The label assignment matrix converted from the predicted quality matrix is:
Figure PCTCN2021101773-appb-000021
在上述分配好的标签分配矩阵中的每一列至多只有一个1,即每个第二锚框所在位置至多分配到一个目标(成为正样本),也可以不分配目标(成为负样本或忽略样本)。There is at most one 1 in each column of the above assigned label assignment matrix, that is, the location of each second anchor box is assigned to at most one target (becomes a positive sample), or no target is assigned (becomes a negative sample or ignores the sample) .
下面以对预测质量矩阵的行进行转换为例说明转换的具体过程可通过步骤C1-步骤C2实现:The following takes the conversion of the rows of the prediction quality matrix as an example to illustrate that the specific process of conversion can be implemented through steps C1-step C2:
步骤C1,对于预测质量矩阵中的第i行,均从该行元素中选择出大于t p的目标元素q im,设置标签分配初始矩阵中第i行中与目标元素对应的元素X im为第一值;其中,q im均大于该行元素中未被选择的其它元素q iuStep C1, for the i-th row in the prediction quality matrix, select the target element q im greater than t p from the elements in the row, and set the element X im corresponding to the target element in the i-th row of the initial label assignment matrix as the A value; wherein, q im is greater than the other elements q iu that are not selected in this row of elements;
对于同一个目标的正样本标签可以被分配给一个或多个第二锚框,也可以不分配给任何第二锚框。Positive sample labels for the same target can be assigned to one or more second anchor boxes, or none of them.
步骤C2,对于第i行中除q im以外的元素q iu,如果所述q iu小于或等于t p,且大于或等于t n,设置所述q iu对应标签分配初始矩阵的元素X iu等于第三值;如果所述q iu小于t n,设置所述q iu对应标签分配初始矩阵的元素X iu等于第二值; Step C2, for elements q iu other than q im in the i-th row, if the q iu is less than or equal to t p and greater than or equal to t n , set the element X iu of the initial matrix corresponding to the label assignment of the q iu equal to The third value; if the qi iu is less than t n , set the element X iu of the initial matrix corresponding to the label assignment of the qi iu to be equal to the second value;
上述标签分配初始矩阵可以理解为是一个空矩阵,每个元素X均为赋值,通过上述步骤C1和C2之后,元素X会被赋予0,1或-1。The above initial matrix of label assignment can be understood as an empty matrix, and each element X is assigned an assignment. After the above steps C1 and C2, element X will be assigned 0, 1 or -1.
步骤C3,检查标签分配初始矩阵中的第j列的元素是否存在冲突元素;其中,冲突元素为2个以上,且元素均为第一值;如果存在冲突元素,在预测质量矩阵中获取冲突元素对应的预测质量,保留所述冲突元素中预测质量最大元素为第一值,其余元素修改为第三值,得到标签分配矩阵;Step C3, check whether the elements in the jth column of the initial label assignment matrix have conflicting elements; wherein, the conflicting elements are more than 2, and the elements are the first value; if there are conflicting elements, obtain the conflicting elements in the prediction quality matrix. For the corresponding prediction quality, keep the element with the highest prediction quality in the conflicting elements as the first value, and modify the remaining elements to be the third value to obtain a label assignment matrix;
通过冲突元素的检查,可以保证最终的标签分配矩阵每列仅有一个第一值,即每个位置仅对应一个样本正标签。By checking the conflicting elements, it can be guaranteed that each column of the final label assignment matrix has only one first value, that is, each position corresponds to only one positive label of the sample.
其中,t p>t n,t p和t n分别为预设阈值。 Wherein, t p >t n , and t p and t n are preset thresholds respectively.
上述t p和t n分别为第一预设质量值和第二预设质量值,在本实施例中,将每行中除被选中的目标元素q im对应的X im设为第一值之外,还需要对其它预测质量与预设阈值进行比较得到对应的样本类型(正样本、负样本或忽略样本),比如,可将小于或等于第一预设质量,且大于或等于第二预设质量值的元素q iu对应的元素X iu设为第三值;将小于第二预设质量值的q iu对应的X iu设为第二值。 The above-mentioned t p and t n are respectively the first preset quality value and the second preset quality value. In this embodiment, X im corresponding to the selected target element q im in each row is set to be between the first values. In addition, it is also necessary to compare other prediction quality with the preset threshold to obtain the corresponding sample type (positive sample, negative sample or ignored sample). The element X iu corresponding to the element q iu of the quality value is set as the third value; the X iu corresponding to the q iu smaller than the second preset quality value is set as the second value.
步骤S712,根据标签分配信息和学生模型检测结果计算学生网络模型的损失函数值;Step S712, calculate the loss function value of the student network model according to the label assignment information and the student model detection result;
步骤S714,基于损失函数值调整学生网络模型的参数继续训练,直至得到训练好的学生网络模型;Step S714, adjust the parameters of the student network model based on the loss function value and continue training until the trained student network model is obtained;
步骤S716,将训练好的学生网络模型作为目标检测模型。Step S716, using the trained student network model as the target detection model.
本公开实施例提供的上述目标检测模型的训练方法,能够利用预测质量与预设阈值的比较准确获得到标签分配信息,根据该标签分配信息对第一特征图对应的第一锚框进行标签分配,使得标签分配客观化和合理化,可有效缓解人工设计标签分配规则的主观性对学生网络模型训练效果的影响,进而提升了学生网络模型的性能。The training method for the above target detection model provided by the embodiment of the present disclosure can accurately obtain label assignment information by comparing the prediction quality with a preset threshold, and perform label assignment on the first anchor frame corresponding to the first feature map according to the label assignment information , which makes the label assignment objective and rational, which can effectively alleviate the influence of the subjectivity of manually designing label assignment rules on the training effect of the student network model, thereby improving the performance of the student network model.
本实施例还提供了另一种目标检测模型的训练方法,该方法在上述方法的基础上实现,重点描述计算学生网络模型的损失函数值的具体实施方式,以学生网络模型和教师网络模型均为基于锚框技术的网络模型为例,如图8所示的另一种目标检测模型的训练方法的流程图,主要包括如下步骤S802~步骤S820:This embodiment also provides another method for training a target detection model, which is implemented on the basis of the above method, and focuses on the specific implementation of calculating the loss function value of the student network model. The student network model and the teacher network model are both Taking the network model based on the anchor frame technology as an example, the flowchart of another target detection model training method shown in FIG. 8 mainly includes the following steps S802 to S820:
步骤S802,将图像样本集中的图像样本输入学生网络模型,得到与该图像样本的第一特征图的每个像素点对应的学生模型检测结果;其中,图像样本标注有目标真值框,该学生模型检测结果包括与第一特征图的每个像素点对应的第一锚框的分数和第一锚框对应的坐标信息;Step S802, input the image samples in the image sample set into the student network model, and obtain the student model detection result corresponding to each pixel of the first feature map of the image sample; The model detection result includes the score of the first anchor frame corresponding to each pixel of the first feature map and the coordinate information corresponding to the first anchor frame;
步骤S804,获取教师网络模型对图像样本的教师模型检测结果;其中,教师网络模型为预先训练好的模型,教师模型检测结果包括与上述图像样本的第二特征图的每个像素点对应的第二锚框的分数和第二锚框对应的坐标信息;Step S804, obtaining the teacher model detection result of the image sample by the teacher network model; wherein, the teacher network model is a pre-trained model, and the teacher model detection result includes the first pixel corresponding to each pixel of the second feature map of the above-mentioned image sample. The score of the second anchor frame and the coordinate information corresponding to the second anchor frame;
步骤S806,根据教师模型检测结果确定图像样本的标签分配信息;Step S806, determining the label assignment information of the image sample according to the teacher model detection result;
步骤S808,对于第一特征图中每个像素点对应的每个第一锚框,均执行以下步骤S812至步骤S818中的操作;Step S808, for each first anchor frame corresponding to each pixel in the first feature map, perform the following operations in steps S812 to S818;
步骤S812,基于标签分配信息,确定第一锚框的目标真值框;Step S812, based on the label assignment information, determine the target ground truth frame of the first anchor frame;
基于标签分配信息即可确定第一锚框对应的目标真值框。以第一锚框1为例,如果其标签分配信息为(0,0,1,0),则可以将第二锚框1的正样本标签对应的目标的目标真值框作为第一锚框的目标真值框。The target ground-truth box corresponding to the first anchor box can be determined based on the label assignment information. Taking the first anchor frame 1 as an example, if its label assignment information is (0, 0, 1, 0), the target ground-truth frame of the target corresponding to the positive sample label of the second anchor frame 1 can be used as the first anchor frame The target ground-truth box.
步骤S814,基于第一锚框的目标真值框和第一锚框的分数计算分类损失函数值和回归损失函数值;Step S814, calculating the classification loss function value and the regression loss function value based on the target ground truth frame of the first anchor frame and the score of the first anchor frame;
上述分类损失函数值可以通过分类损失函数得到,分类损失函数可以为交叉熵函数,例如,如果目标类别仅有两类时,可以为二分类的交叉熵函数(Binary Cross Entropy)。如果目标类别为多个分类时,可以用多分类交叉熵函数(softmax_cross_entropy)。The above classification loss function value can be obtained through the classification loss function, and the classification loss function can be a cross entropy function. For example, if there are only two types of target categories, it can be a binary cross entropy function (Binary Cross Entropy). If the target category is multiple categories, the multi-category cross entropy function (softmax_cross_entropy) can be used.
上述回归损失函数值可采用交叠比损失函数(IoU Loss)得到,因此,回归损失函数值loss2=-log(IoU),IoU是第一锚框对应的第一预测框和目标真值框的交叠比。The above regression loss function value can be obtained by using the overlap ratio loss function (IoU Loss), therefore, the regression loss function value loss2=-log(IoU), IoU is the first prediction frame corresponding to the first anchor frame and the target truth value frame. Overlap ratio.
在实际使用时,分类损失函数和回归损失函数可以根据实际需要进行选取,因此,可根据第一锚框的目标真值框和第一锚框的分数计算出对应的分类损失函数值和回归损失函数值,在此不进行限定和赘述。In actual use, the classification loss function and the regression loss function can be selected according to actual needs. Therefore, the corresponding classification loss function value and regression loss can be calculated according to the target ground truth frame of the first anchor frame and the score of the first anchor frame. The function value is not limited or described here.
步骤S816,基于各个第一锚框的分类损失函数值和回归损失函数值确定学生网络模型的损失函数值;Step S816, determining the loss function value of the student network model based on the classification loss function value and the regression loss function value of each first anchor frame;
通常,将计算出的分类损失函数值和回归损失函数值进行相加计算便得到了学生网络模型的损失函数值。Usually, the loss function value of the student network model is obtained by adding the calculated classification loss function value and the regression loss function value.
步骤S818,基于损失函数值调整学生网络模型的参数继续训练,直至得到训练好的学生网络模型;Step S818, adjust the parameters of the student network model based on the loss function value and continue training until the trained student network model is obtained;
上述步骤S818可由步骤D1-步骤D2实现:The above step S818 can be realized by step D1-step D2:
步骤D1,基于损失函数值调整学生网络模型的参数继续训练;Step D1, adjust the parameters of the student network model based on the loss function value to continue training;
步骤D2,当损失函数值收敛至预设值或者训练次数达到预设次数,停止训练,得到训练好的学生网络模型。Step D2, when the loss function value converges to a preset value or the number of training times reaches a preset number of times, the training is stopped, and a trained student network model is obtained.
通常,当损失函数值大于预设值时说明当前训练好的学生网络模型没有达到预设的收敛程度,可重复执行上述步骤S802至步骤S816的过程,直至得到的损失函数值收敛至预设值停止学生网络模型的训练。Usually, when the loss function value is greater than the preset value, it means that the currently trained student network model has not reached the preset convergence degree, and the process from the above steps S802 to S816 can be repeated until the obtained loss function value converges to the preset value. Stop the training of the student network model.
或者,重复执行步骤S802至步骤S816的次数达到预设次数时停止学生网络模型的训练。在实际使用时,预设值和预设次数可根据实际情况进行设置,在此不进行限定。Alternatively, the training of the student network model is stopped when the number of times of repeating steps S802 to S816 reaches a preset number of times. In actual use, the preset value and the preset number of times can be set according to the actual situation, which is not limited here.
步骤S820,将训练好的学生网络模型作为目标检测模型。Step S820, using the trained student network model as the target detection model.
本公开实施例提供的上述目标检测模型的训练方法,能够将图像样本集中的图像样本输入学生网络模型,得到图像样本对应的第一特征图,获取教师网络模型对样本的教师模型检测结果,对于第一特征图中每个像素点对应的每个第一锚框,可基于教师网络检测结果确定标签分配信息,基于标签分配信息确定第一锚框对应的目标真值框,并可根据第一锚框的目标真值框和第一锚框的分数确定学生网络模型的损失函数值;使得学生网络模型在训练过程中产生的损失函数值不单单依赖于学生网络模型自身输出的结果,还基于已经训练好的教师网络模型的检测结果,基于该检测结果可以确定出标签分配信息,进而基于该标签分配信息计算学生网络模型的损失函数值,使得该损失函数值的计算更准确,为学生网络模型的参数调整提供了可靠的数据。The training method for the above target detection model provided by the embodiment of the present disclosure can input the image samples in the image sample set into the student network model, obtain the first feature map corresponding to the image sample, and obtain the teacher model detection result of the sample by the teacher network model. For each first anchor frame corresponding to each pixel in the first feature map, label assignment information can be determined based on the detection result of the teacher network, and the target truth value frame corresponding to the first anchor frame can be determined based on the label assignment information, and can be determined according to the first anchor frame. The target truth box of the anchor box and the score of the first anchor box determine the loss function value of the student network model; so that the loss function value generated by the student network model during the training process not only depends on the output of the student network model itself, but also based on Based on the detection result of the trained teacher network model, label assignment information can be determined based on the detection result, and then the loss function value of the student network model is calculated based on the label assignment information, so that the calculation of the loss function value is more accurate, which is beneficial for the student network. The parameter tuning of the model provides reliable data.
进一步,为了充分对上述目标检测模型的训练方法进行理解,图9示出了一种目标检测模型训练流程图,如图9所示,以学生网络模型和教师网络模型均为基于锚框技术的网络模型为例,最左边的图片900为进行人工目标真值框标注的图像样本,当图像样本输入教师网络模型901后可得到第二特征图902中每个像素对应的第二锚框的分数scores2和坐标信息得到第二预测框pred boxes2,基于第二预测框pred boxes2和图像样本目标真值框的计算可得到交叠比矩阵IoU,基于该叠比矩阵IoU和第二锚框的分数scores2得到第二特征图902的预测质量矩阵qualities,基于预测质量矩阵qualities可确定出标签分配信息,该过程对应图9中的assignment;当图像样本输入学生网络模型903后可得到第一特征图904中每个像素对应的第一锚框的分数scores1和第一锚框对应的第一预测框pred boxes1,利用标签分配信息对第一特征图904进行目标真值框的分配,基于分配好的目标真值框和第一预测框pred boxes1以及分数scores1计算出分类损失函数值(classification loss)和回归损失函数值(regresssion loss),最终通过分类损失函数和回归损失函数值计算出学生网络模型的损失函数值(loss),基于该损失函数值(loss)训练学生网络模型。Further, in order to fully understand the training method of the above target detection model, Figure 9 shows a training flow chart of a target detection model. As shown in Figure 9, the student network model and the teacher network model are based on anchor frame technology. Taking the network model as an example, the leftmost picture 900 is an image sample marked with an artificial target ground-truth frame. When the image sample is input into the teacher network model 901, the score of the second anchor frame corresponding to each pixel in the second feature map 902 can be obtained scores2 and coordinate information to obtain the second prediction frame pred boxes2, based on the calculation of the second prediction frame pred boxes2 and the image sample target ground truth frame, the overlap ratio matrix IoU can be obtained, based on the overlap ratio matrix IoU and the second anchor box score scores2 The predicted quality matrix qualities of the second feature map 902 are obtained, and the label assignment information can be determined based on the predicted quality matrix qualities, and this process corresponds to the assignment in FIG. 9 ; when the image sample is input into the student network model 903, the first feature map 904 can be obtained The score scores1 of the first anchor frame corresponding to each pixel and the first prediction frame pred boxes1 corresponding to the first anchor frame are used to assign the target ground-truth box to the first feature map 904 by using the label assignment information. The value box and the first prediction box pred boxes1 and the score scores1 calculate the classification loss function value (classification loss) and the regression loss function value (regresssion loss), and finally calculate the loss function of the student network model through the classification loss function and the regression loss function value. The value (loss), based on which the loss function value (loss) is used to train the student network model.
该目标检测模型的训练过程中无需人工对第一特征图进行标签标注,利用教师网络模型得到的检测结果对第一特征图进行标签分配,使得标签分配客观化和合理化,基于上述分配标签后的第一特征图训练学生网络模型,优化了学生网络模型的训练过程,可有效缓解人工设计标签分配规则的主观性对学生 网络模型训练效果的影响,进而提升了学生网络模型的性能,例如提升了学生网络模型对目标检测的准确率等。In the training process of the target detection model, there is no need to manually label the first feature map, and the detection results obtained by the teacher network model are used to assign labels to the first feature map, so that the label assignment is objective and rational. The first feature map trains the student network model, optimizes the training process of the student network model, and can effectively alleviate the influence of the subjectivity of manually designing label assignment rules on the training effect of the student network model, thereby improving the performance of the student network model. The accuracy of the student network model for target detection, etc.
对应于上述方法实施例,本公开实施例提供了一种目标检测装置,图10示出了一种目标检测装置的结构示意图,如图10所示,该装置包括:Corresponding to the foregoing method embodiments, an embodiment of the present disclosure provides a target detection apparatus. FIG. 10 shows a schematic structural diagram of a target detection apparatus. As shown in FIG. 10 , the apparatus includes:
图像获取模块1002,配置成获取待检测图像;an image acquisition module 1002, configured to acquire an image to be detected;
目标检测模块1004,配置成将待检测图像输入目标检测模型,得到目标检测结果;目标检测结果包括目标对应的包围框的位置和分数;其中,目标检测模型通过如下方式训练:将图像样本集中的图像样本输入学生网络模型,得到与图像样本的第一特征图的每个像素点对应的学生模型检测结果;其中,图像样本标注有目标真值框,学生模型检测结果包括与第一特征图的每个像素点对应的第一基准位置的分数和第一基准位置对应的坐标信息;获取教师网络模型对图像样本的教师模型检测结果;其中,教师网络模型为预先训练好的模型,教师模型检测结果包括与图像样本的第二特征图的每个像素点对应的第二基准位置的分数和第二基准位置对应的坐标信息;其中,第一特征图和第二特征图的基准位置数相同;根据教师模型检测结果确定图像样本的标签分配信息;根据标签分配信息和学生模型检测结果计算学生网络模型的损失函数值;基于损失函数值调整学生网络模型的参数继续训练,直至得到训练好的学生网络模型;将训练好的学生网络模型作为目标检测模型。The target detection module 1004 is configured to input the image to be detected into the target detection model to obtain the target detection result; the target detection result includes the position and score of the bounding box corresponding to the target; wherein, the target detection model is trained by the following methods: The image sample is input to the student network model, and a student model detection result corresponding to each pixel of the first feature map of the image sample is obtained; wherein, the image sample is marked with a target ground-truth frame, and the student model detection result includes the first feature map. The score of the first reference position corresponding to each pixel point and the coordinate information corresponding to the first reference position; obtain the teacher model detection result of the image sample by the teacher network model; wherein, the teacher network model is a pre-trained model, and the teacher model detects The result includes the score of the second reference position corresponding to each pixel of the second feature map of the image sample and the coordinate information corresponding to the second reference position; wherein, the number of reference positions of the first feature map and the second feature map is the same; Determine the label assignment information of the image sample according to the detection result of the teacher model; calculate the loss function value of the student network model according to the label assignment information and the detection result of the student model; adjust the parameters of the student network model based on the loss function value and continue training until the trained students are obtained Network model; use the trained student network model as the target detection model.
本申请实施例提供的上述目标检测装置,其中,配置成检测图像的目标检测模型的训练过程为:将图像样本集中的图像样本输入学生网络模型,得到与图像样本的第一特征图的每个像素点对应的学生模型检测结果;获取教师网络模型对该图像样本的教师模型检测结果;其中,该教师网络模型为预先训练好的模型,上述教师模型检测结果包括该图像样本对应的第二特征图中每个像素对应的第二基准位置的分数和第二基准位置对应的坐标信息;应用上述教师模型检测结果确定图像样本的标签分配信息,根据标签分配信息和学生模型检测结果计算学生网络模型的损失函数值,基于损失函数值调整学生网络模型的参数继续训练,直至得到训练好的学生网络模型,将该训练好了学生网络模型作为目标检测模型,这种训练过程的标签分配方式更客观化和合理化,使得训练得到的目标检测模型的可靠性更高,进而提升了目标检测的准确率,相比于人工设计的标签分配方法,本实施例标签分配方式更为高效,且有效缓解人工设计标签分配规则的主观性对学生网络模型训练效果的影响,该标签分配方式可适配基于锚框和非基于锚框的网络,比针对某种网络设计的标签分配方法更有普适性。In the above-mentioned target detection device provided by the embodiment of the present application, the training process of the target detection model configured to detect images is as follows: input the image samples in the image sample set into the student network model, and obtain each of the first feature maps corresponding to the image samples. The detection result of the student model corresponding to the pixel point; the detection result of the teacher model of the image sample by the teacher network model is obtained; wherein, the teacher network model is a pre-trained model, and the detection result of the teacher model includes the second feature corresponding to the image sample The score of the second reference position corresponding to each pixel in the figure and the coordinate information corresponding to the second reference position; the above-mentioned teacher model detection result is used to determine the label assignment information of the image sample, and the student network model is calculated according to the label assignment information and the student model detection result According to the loss function value, adjust the parameters of the student network model based on the loss function value and continue training until the trained student network model is obtained, and the trained student network model is used as the target detection model. The label assignment method of this training process is more objective. Compared with the manually designed label allocation method, the label allocation method in this embodiment is more efficient, and effectively alleviates the manual labor The subjectivity of designing label assignment rules affects the training effect of the student network model. This label assignment method can be adapted to anchor box-based and non-anchor box-based networks, and is more universal than the label assignment method designed for a certain network.
上述目标检测模块1004还配置成,对于每个第二基准位置,分别计算该第二基准位置对应的第二预测框与图像样本的每个目标真值框的交叠比,得到矩阵IoU:
Figure PCTCN2021101773-appb-000022
其中,i取值[1,N],j取值[1,A],N为标注真值框的个数,A为第二特征图包括的第二基准位置的个数;基于第二基准位置对应的第二预测框与每个目标真值框的交叠比和第二基准位置的分数,确定第二基准位置对于每个目标真值框对应目标的预测质量;其中,预测质量用于表征该第二基准位置检测出的是该目标真值框对应目标的概率大小;基于每个第二基准位置对于每个目标真值框对应目标的预测质量确定每个第一基准位置的标签分配信息。
The above-mentioned target detection module 1004 is also configured to, for each second reference position, respectively calculate the overlap ratio of the second prediction frame corresponding to the second reference position and each target true value frame of the image sample to obtain the matrix IoU:
Figure PCTCN2021101773-appb-000022
Among them, i takes the value [1, N], j takes the value [1, A], N is the number of marked ground truth boxes, and A is the number of second reference positions included in the second feature map; based on the second reference The overlap ratio of the second prediction frame corresponding to the position and each target ground-truth frame and the score of the second reference position determine the prediction quality of the second reference position for the target corresponding to each target ground-truth frame; wherein, the prediction quality is used for Characterizing that what the second reference position detects is the probability of the target corresponding to the target ground-truth frame; the label assignment of each first reference position is determined based on the prediction quality of each second reference position for the target corresponding to each target ground-truth frame information.
上述目标检测模块1004还配置成,利用公式q ij=(s j) 1-α*(IoU ij) α计算每个第二基准位置对于每个目标真值框对应目标的预测质量,得到预测质量矩阵Q;其中,q ij取值为[0,1],α为取值在[0,1]区间的预设超参数,s j为第j个第二基准位置的分数,IoU ij为第j个第二基准位置对应的第二预测框与第i 个目标真值框的交叠比,为矩阵IoU中第i行第j列的元素;
Figure PCTCN2021101773-appb-000023
The above-mentioned target detection module 1004 is further configured to calculate the prediction quality of each second reference position for the target corresponding to each target ground truth frame by using the formula q ij =(s j ) 1-α *(IoU ij ) α , to obtain the prediction quality Matrix Q; among them, q ij is [0,1], α is a preset hyperparameter with a value in the [0,1] interval, s j is the score of the jth second reference position, and IoU ij is the first The overlap ratio of the second prediction frame corresponding to the j second reference positions and the i-th target ground-truth frame is the element of the i-th row and the j-th column in the matrix IoU;
Figure PCTCN2021101773-appb-000023
上述图像样本还标注有各目标真值框对应的目标类型;The above image samples are also marked with the target type corresponding to each target ground-truth box;
利用公式q ij=(s ij) 1-α*(IoU ij) α计算每个第二基准位置对于每个目标真值框对应目标的预测质量,得到预测质量矩阵Q;其中,q ij取值为[0,1],α为取值在[0,1]区间的预设超参数,s ij为第j个第二基准位置的分数中对应于当前目标类型的分数,当前目标类型是指第i个目标真值框对应的目标类型,IoU ij为第j个第二基准位置对应的第二预测框与第i个目标真值框的交叠比,为矩阵IoU中第i行第j列的元素;
Figure PCTCN2021101773-appb-000024
Use the formula q ij =(s ij ) 1-α *(IoU ij ) α to calculate the prediction quality of each second reference position for the target corresponding to each target ground-truth frame, and obtain the prediction quality matrix Q; where q ij takes the value is [0,1], α is a preset hyperparameter with a value in the [0,1] interval, s ij is the score corresponding to the current target type in the score of the j-th second reference position, and the current target type refers to The target type corresponding to the i-th target ground-truth frame, IoU ij is the overlap ratio between the second prediction frame corresponding to the j-th second reference position and the i-th target ground-truth frame, which is the i-th row j-th in the matrix IoU the elements of the column;
Figure PCTCN2021101773-appb-000024
上述目标检测模块1004还配置成,对于每个第二基准位置,均从该第二基准位置对于每个目标真值框对应目标的预测质量中选择最大预测质量;判断最大预测质量是否大于或等于第一预设质量值;如果是,为第二基准位置分配最大预测质量对应的目标的正样本标签。The above-mentioned target detection module 1004 is also configured to, for each second reference position, select the maximum prediction quality from the prediction quality of the target corresponding to each target truth frame of the second reference position; determine whether the maximum prediction quality is greater than or equal to The first preset quality value; if yes, assign the positive sample label of the target corresponding to the maximum predicted quality to the second reference position.
上述目标检测模块1004还配置成,对于预测质量矩阵中的第j列,均从该列元素中选择出值最大的元素q mjThe above-mentioned target detection module 1004 is further configured to, for the jth column in the prediction quality matrix, select the element q mj with the largest value from the elements of the column;
如果q mj大于t p,设置q mj对应的标签分配矩阵中的元素X mj等于第一值;对于第j列中除q mj以外的元素q ij,如果q ij小于t n,设置q ij对应的标签分配矩阵中的元素X ij等于第二值;如果q ij小于或等于t p,且大于或等于t n,设置q ij对应的X ij等于第三值; If q mj is greater than t p , set the element X mj in the label assignment matrix corresponding to q mj equal to the first value; for the elements q ij in the jth column except q mj , if q ij is less than t n , set q ij corresponding to The element X ij in the label assignment matrix of is equal to the second value; if q ij is less than or equal to t p and greater than or equal to t n , set the corresponding X ij of q ij equal to the third value;
如果q mj小于t n,设置标签分配矩阵中的第j列中的元素X ij等于第二值; If q mj is less than t n , set the element X ij in the j-th column of the label assignment matrix equal to the second value;
如果q mj小于或等于t p,且大于或等于t n,设置q mj对应标签分配矩阵中的元素X mj等于第三值;对于第j列中除q mj以外的元素q ij,如果q ij小于t n,设置q ij对应标签分配矩阵中的元素X ij等于第二值;如果q ij小于或等于t p,且大于或等于t n,设置q ij对应标签分配矩阵中的元素X ij等于第三值; If q mj is less than or equal to t p and greater than or equal to t n , set the element X mj in the label assignment matrix corresponding to q mj equal to the third value; for the elements q ij in the jth column other than q mj , if q ij less than t n , set the element X ij in the label assignment matrix corresponding to q ij equal to the second value; if q ij is less than or equal to t p and greater than or equal to t n , set the element X ij in the label assignment matrix corresponding to q ij equal to third value;
其中,t p>t n,t p和t n分别为预设阈值,第一值表示正样本,第二值表示负样本,第三值表示忽略样本。 Wherein, t p >t n , t p and t n are preset thresholds respectively, the first value represents positive samples, the second value represents negative samples, and the third value represents ignore samples.
上述目标检测模块1004还配置成,对于预测质量矩阵中的第i行,均从该行元素中选择出大于t p的目标元素q im,设置标签分配初始矩阵中第i行中与目标元素对应的元素X im为第一值;其中,q im均大于该行元素中未被选择的其它元素q iuThe above-mentioned target detection module 1004 is also configured to, for the i-th row in the prediction quality matrix, select a target element q im greater than t p from the elements of the row, and set the label assignment initial matrix in the i-th row corresponding to the target element. The element X im of is the first value; wherein, q im is larger than other elements q iu that are not selected in this row of elements;
对于第i行中除q im以外的元素q iu,如果q iu小于或等于t p,且大于或等于t n,设置q iu对应标签分配初始矩阵的元素X iu等于第三值;如果q iu小于t n,设置q iu对应标签分配初始矩阵的元素X iu等于第二值; For elements q iu other than q im in the i-th row, if q iu is less than or equal to t p and greater than or equal to t n , set the element X iu of the initial matrix corresponding to the label assignment of q iu equal to the third value; if q iu is less than t n , set the element X iu of the initial matrix corresponding to the label assignment of qi iu equal to the second value;
检查标签分配初始矩阵中的第j列的元素是否存在冲突元素;其中,冲突元素为2个以上,且元素均为第一值;如果存在冲突元素,在预测质量矩阵中获取冲突元素对应的预测质量,保留冲突元素中预测质量最大元素为第一值,其余元素修改为第三值,得到标签分配矩阵;其中,t p>t n,t p和t n分别为预设阈值,第一值表示正样本,第二值表示负样本,第三值表示忽略样本。 Check whether the elements in the jth column of the initial label assignment matrix have conflicting elements; among them, there are more than 2 conflicting elements, and the elements are all the first value; if there are conflicting elements, obtain the prediction corresponding to the conflicting elements in the prediction quality matrix Quality, retain the element with the highest predicted quality in the conflicting elements as the first value, and modify the remaining elements to the third value to obtain a label assignment matrix; where t p >t n , t p and t n are preset thresholds, respectively, the first value represents positive samples, the second value represents negative samples, and the third value represents ignore samples.
上述目标检测模块1004还配置成,对于第一特征图中每个像素点对应的每个第一基准位置,均执行以下步骤:确定第一基准位置对应的第二基准位置;基于第一基准位置的标签分配信息,确定第一基准位置的目标真值框;基于第一基准位置的目标真值框和第一基准位置的分数计算分类损失函数值和回归损失函数值;基于各个第一基准位置的分类损失函数值和回归损失函数值确定学生网络模型的损失函 数值。The above-mentioned target detection module 1004 is further configured to perform the following steps for each first reference position corresponding to each pixel in the first feature map: determine a second reference position corresponding to the first reference position; based on the first reference position The label assignment information of the first reference position is determined, and the target truth value frame of the first reference position is determined; the classification loss function value and the regression loss function value are calculated based on the target truth value frame of the first reference position and the score of the first reference position; based on each first reference position The classification loss function value and the regression loss function value determine the loss function value of the student network model.
本公开实施例提供的目标检测装置,与上述目标检测方法具有相同的技术特征,所以也能解决相同的技术问题,达到相同的技术效果。The target detection device provided by the embodiment of the present disclosure has the same technical features as the above-mentioned target detection method, so it can also solve the same technical problem and achieve the same technical effect.
本实施例还提供了一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,计算机程序被处理设备运行时执行上述目标检测方法的步骤。This embodiment further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program executes the steps of the above-mentioned target detection method when the computer program is run by the processing device.
本公开实施例所提供的目标检测方法、装置及电子设备的计算机程序产品,包括存储了程序代码的计算机可读存储介质,所述程序代码包括的指令可用于执行前面方法实施例中所述的方法,具体实现可参见方法实施例,在此不再赘述。The target detection method, apparatus, and computer program product of an electronic device provided by the embodiments of the present disclosure include a computer-readable storage medium storing program codes, and the instructions included in the program codes can be used to execute the methods described in the foregoing method embodiments. For the specific implementation, reference may be made to the method embodiments, which will not be repeated here.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的电子设备和装置的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, for the specific working process of the electronic equipment and apparatus described above, reference may be made to the corresponding process in the foregoing method embodiments, which will not be repeated here.
另外,在本公开实施例的描述中,除非另有明确的规定和限定,术语“安装”、“相连”、“连接”应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或一体地连接;可以是机械连接,也可以是电连接;可以是直接相连,也可以通过中间媒介间接相连,可以是两个元件内部的连通。对于本领域技术人员而言,可以具体情况理解上述术语在本公开中的具体含义。In addition, in the description of the embodiments of the present disclosure, unless otherwise expressly specified and limited, the terms "installed", "connected" and "connected" should be understood in a broad sense, for example, it may be a fixed connection or a detachable connection , or integrally connected; it can be a mechanical connection or an electrical connection; it can be a direct connection, or an indirect connection through an intermediate medium, or the internal communication between the two components. For those skilled in the art, the specific meanings of the above terms in the present disclosure can be understood in specific situations.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。The functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of the present disclosure can be embodied in the form of software products in essence, or the parts that make contributions to the prior art or the parts of the technical solutions. The computer software products are stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the embodiments of the present disclosure. The aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .
在本公开的描述中,需要说明的是,术语“中心”、“上”、“下”、“左”、“右”、“竖直”、“水平”、“内”、“外”等指示的方位或位置关系为基于附图所示的方位或位置关系,仅是为了便于描述本公开和简化描述,而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作,因此不能理解为对本公开的限制。此外,术语“第一”、“第二”、“第三”仅用于描述目的,而不能理解为指示或暗示相对重要性。In the description of the present disclosure, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. The indicated orientation or positional relationship is based on the orientation or positional relationship shown in the drawings, and is only for the convenience of describing the present disclosure and simplifying the description, rather than indicating or implying that the indicated device or element must have a specific orientation or a specific orientation. construction and operation, and therefore should not be construed as limiting the present disclosure. Furthermore, the terms "first", "second", and "third" are used for descriptive purposes only and should not be construed to indicate or imply relative importance.
最后应说明的是:以上实施例,仅为本公开的具体实施方式,用以说明本公开的技术方案,而非对其限制,本公开的保护范围并不局限于此,尽管参照前述实施例对本公开进行了详细的说明,本领域技术人员应当理解:任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化,或者对其中部分技术特征进行等同替换;而这些修改、变化或者替换,并不使相应技术方案的本质脱离本公开实施例技术方案的精神和范围,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应以权利要求的保护范围为准。Finally, it should be noted that the above embodiments are only specific implementations of the present disclosure, and are used to illustrate the technical solutions of the present disclosure, rather than limit them. The protection scope of the present disclosure is not limited thereto, although referring to the foregoing embodiments The present disclosure has been described in detail, and those skilled in the art should understand that any person skilled in the art who is familiar with the technical field can still modify the technical solutions described in the foregoing embodiments or easily think of them within the technical scope disclosed in the present disclosure. change, or equivalently replace some of the technical features; and these modifications, changes or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present disclosure, and should be included within the protection scope of the present disclosure. Inside. Therefore, the protection scope of the present disclosure should be subject to the protection scope of the claims.
工业实用性Industrial Applicability
本公开提出的技术方案中,标签分配方式更为高效,且有效缓解了人工设计标签分配规则的主观性对学生网络模型训练效果的影响,该标签分配方式可适配基于锚框和非基于锚框的网络,比针对某种网络设计的标签分配方法更有普适性。In the technical solution proposed in the present disclosure, the label assignment method is more efficient, and the influence of the subjectivity of manually designing label assignment rules on the training effect of the student network model is effectively alleviated. The label assignment method can be adapted to anchor frame-based and non-anchor-based The network of boxes is more general than the label assignment method designed for a certain network.

Claims (18)

  1. 一种目标检测方法,其特征在于,所述方法包括:A target detection method, characterized in that the method comprises:
    获取待检测图像;Get the image to be detected;
    将所述待检测图像输入目标检测模型,得到目标检测结果;所述目标检测结果包括目标对应的包围框的位置和分数;其中,所述目标检测模型通过如下方式训练:Input the image to be detected into a target detection model to obtain a target detection result; the target detection result includes the position and score of the bounding box corresponding to the target; wherein, the target detection model is trained in the following manner:
    将图像样本集中的图像样本输入学生网络模型,得到与所述图像样本的第一特征图的每个像素点对应的学生模型检测结果;其中,所述图像样本标注有目标真值框,所述学生模型检测结果包括与所述第一特征图的每个像素点对应的第一基准位置的分数和所述第一基准位置对应的坐标信息;Input the image samples in the image sample set into the student network model, and obtain the student model detection result corresponding to each pixel of the first feature map of the image sample; wherein, the image sample is marked with a target ground-truth frame, and the The student model detection result includes the score of the first reference position corresponding to each pixel of the first feature map and the coordinate information corresponding to the first reference position;
    获取教师网络模型对所述图像样本的教师模型检测结果;其中,所述教师网络模型为预先训练好的模型,所述教师模型检测结果包括与所述图像样本的第二特征图的每个像素点对应的第二基准位置的分数和所述第二基准位置对应的坐标信息;其中,所述第一特征图和所述第二特征图的基准位置数相同;Obtain the teacher model detection result of the image sample by the teacher network model; wherein, the teacher network model is a pre-trained model, and the teacher model detection result includes each pixel corresponding to the second feature map of the image sample. The score of the second reference position corresponding to the point and the coordinate information corresponding to the second reference position; wherein, the number of reference positions of the first feature map and the second feature map is the same;
    根据所述教师模型检测结果确定所述图像样本的标签分配信息;Determine the label assignment information of the image sample according to the detection result of the teacher model;
    根据所述标签分配信息和所述学生模型检测结果计算所述学生网络模型的损失函数值;Calculate the loss function value of the student network model according to the label assignment information and the student model detection result;
    基于所述损失函数值调整所述学生网络模型的参数继续训练,直至得到训练好的学生网络模型;Adjust the parameters of the student network model based on the loss function value and continue training until the trained student network model is obtained;
    将训练好的学生网络模型作为所述目标检测模型。The trained student network model is used as the target detection model.
  2. 根据权利要求1所述的方法,其特征在于,根据所述教师模型检测结果确定标签分配信息的步骤,包括:The method according to claim 1, wherein the step of determining label assignment information according to the detection result of the teacher model comprises:
    对于每个所述第二基准位置,分别计算该第二基准位置对应的第二预测框与所述图像样本的每个目标真值框的交叠比,得到矩阵IoU:For each of the second reference positions, calculate the overlap ratio of the second prediction frame corresponding to the second reference position and each target ground-truth frame of the image sample to obtain the matrix IoU:
    Figure PCTCN2021101773-appb-100001
    Figure PCTCN2021101773-appb-100001
    其中,i取值[1,N],j取值[1,A],N为所述标注真值框的个数,A为所述第二特征图包括的第二基准位置的个数;Wherein, i takes the value [1, N], j takes the value [1, A], N is the number of the labeled ground truth boxes, and A is the number of the second reference positions included in the second feature map;
    基于所述第二基准位置与每个所述目标真值框的所述交叠比和所述第二基准位置的分数,确定所述第二基准位置对于每个所述目标真值框对应目标的预测质量;其中,所述预测质量用于表征该第二基准位置检测出的是该目标真值框对应目标的概率大小;Based on the overlap ratio of the second reference position and each of the target ground-truth boxes and the score of the second reference position, determining that the second reference position corresponds to a target for each of the target ground-truth boxes The prediction quality; wherein, the prediction quality is used to characterize the probability that what the second reference position detects is the target corresponding to the target ground truth frame;
    基于每个所述第二基准位置对于每个所述目标真值框对应目标的预测质量确定每个所述第一基准位置的标签分配信息。Label assignment information for each of the first reference positions is determined based on the prediction quality of each of the second reference positions for the target corresponding to each of the target ground truth boxes.
  3. 根据权利要求2所述的方法,其特征在于,基于所述第二基准位置对应的第二预测框与每个所述目标真值框的所述交叠比和所述第二基准位置的分数,确定每个所述第二基准位置对于每个所述目标真值框对应目标的预测质量的步骤,包括:The method according to claim 2, wherein, based on the overlap ratio of the second prediction frame corresponding to the second reference position and each of the target ground-truth boxes and the score of the second reference position , the step of determining the prediction quality of each of the second reference positions for the target corresponding to each of the target ground-truth boxes, including:
    利用公式q ij=(s j) 1-α*(IoU ij) α计算每个所述第二基准位置对于每个所述目标真值框对应目标的预测质量,得到预测质量矩阵Q;其中,q ij取值为[0,1],α为取值在[0,1]区间的预设超参数,s j为第j个所述第二基准位置的分数,IoU ij为第j个第二基准位置对应的第二预测框与第i个目标真值框的交叠比,为矩阵IoU中第i行第j列的元素; Use the formula q ij =(s j ) 1-α *(IoU ij ) α to calculate the prediction quality of each of the second reference positions for the target corresponding to each of the target ground-truth boxes, to obtain a prediction quality matrix Q; wherein, q ij is [0,1], α is a preset hyperparameter with a value in the [0,1] interval, s j is the score of the jth second reference position, and IoU ij is the jth The overlap ratio of the second prediction frame corresponding to the two reference positions and the i-th target truth frame is the element of the i-th row and the j-th column in the matrix IoU;
    Figure PCTCN2021101773-appb-100002
    Figure PCTCN2021101773-appb-100002
  4. 根据权利要求2所述的方法,其特征在于,所述图像样本还标注有各目标真值框对应的目标类型;The method according to claim 2, wherein the image sample is further marked with the target type corresponding to each target ground-truth frame;
    基于所述第二基准位置对应的第二预测框与每个所述目标真值框的所述交叠比和所述第二基准位置的分数,确定每个所述第二基准位置对于每个所述目标真值框对应目标的预测质量的步骤,包括:Based on the overlap ratio of the second prediction frame corresponding to the second reference position and each of the target ground-truth boxes and the score of the second reference position, it is determined that each of the second reference positions is for each The step of predicting the quality of the target corresponding to the target ground-truth frame includes:
    利用公式q ij=(s ij) 1-α*(IoU ij) α计算每个所述第二基准位置对于每个所述目标真值框对应目标的预测质量,得到预测质量矩阵Q;其中,q ij取值为[0,1],α为取值在[0,1]区间的预设超参数,s ij为第j个第二基准位置的分数中对应于当前目标类型的分数,所述当前目标类型是指第i个目标真值框对应的目标类型,IoU ij为第j个第二基准位置对应的第二预测框与第i个目标真值框的交叠比,为矩阵IoU中第i行第j列的元素; Use the formula q ij =(s ij ) 1-α *(IoU ij ) α to calculate the prediction quality of each of the second reference positions for the target corresponding to each of the target ground-truth boxes, and obtain a prediction quality matrix Q; wherein, q ij is [0, 1], α is a preset hyperparameter with a value in the [0, 1] interval, s ij is the score corresponding to the current target type in the score of the j-th second reference position, so The current target type refers to the target type corresponding to the i-th target ground-truth frame, and IoU ij is the overlap ratio between the second prediction frame corresponding to the j-th second reference position and the i-th target ground-truth frame, and is the matrix IoU The element in the i-th row and the j-th column;
    Figure PCTCN2021101773-appb-100003
    Figure PCTCN2021101773-appb-100003
  5. 根据权利要求2-4任一项所述的方法,其特征在于,基于每个所述第二基准位置对于每个所述目标真值框对应目标的预测质量确定每个所述第一基准位置的标签分配信息的步骤,包括:The method according to any one of claims 2-4, wherein each of the first reference positions is determined based on the prediction quality of each of the second reference positions for the target corresponding to each of the target ground-truth boxes The steps for assigning information on labels include:
    对于每个所述第二基准位置,均从该第二基准位置对于每个所述目标真值框对应目标的预测质量中选择最大预测质量;For each of the second reference positions, the maximum prediction quality is selected from the prediction quality of the target corresponding to each of the target ground-truth boxes at the second reference position;
    判断所述最大预测质量是否大于或等于第一预设质量值;judging whether the maximum predicted quality is greater than or equal to a first preset quality value;
    如果是,为所述第二基准位置分配所述最大预测质量对应的目标的正样本标签。If yes, assign the positive sample label of the target corresponding to the maximum prediction quality to the second reference position.
  6. 根据权利要求3或4所述的方法,其特征在于,基于每个第二基准位置对于每个所述目标真值框对应目标的预测质量确定所述第一基准位置的标签分配信息的步骤,包括:The method according to claim 3 or 4, wherein the step of determining the label assignment information of the first reference position based on the prediction quality of each second reference position for each target corresponding to the target ground truth frame, include:
    对于所述预测质量矩阵中的第j列,均从该列元素中选择出值最大的元素q mjFor the jth column in the prediction quality matrix, select the element q mj with the largest value from the elements of the column;
    如果q mj大于t p,设置q mj对应的标签分配矩阵中的元素X mj等于第一值;对于第j列中除q mj以外的元素q ij,如果q ij小于t n,设置q ij对应的标签分配矩阵中的元素X ij等于第二值;如果q ij小于或等于t p,且大于或等于t n,设置q ij对应的X ij等于第三值; If q mj is greater than t p , set the element X mj in the label assignment matrix corresponding to q mj equal to the first value; for the elements q ij in the jth column except q mj , if q ij is less than t n , set q ij corresponding to The element X ij in the label assignment matrix of is equal to the second value; if q ij is less than or equal to t p and greater than or equal to t n , set the corresponding X ij of q ij equal to the third value;
    如果q mj小于t n,设置标签分配矩阵中的第j列中的元素X ij等于所述第二值; If q mj is less than t n , set the element X ij in the j-th column of the label assignment matrix equal to the second value;
    如果q mj小于或等于t p,且大于或等于t n,设置q mj对应标签分配矩阵中的元素X mj等于所述第三值;对于第j列中除q mj以外的元素q ij,如果q ij小于t n,设置q ij对应标签分配矩阵中的元素X ij等于所述第二值;如果q ij小于或等于t p,且大于或等于t n,设置q ij对应标签分配矩阵中的元素X ij等于所述第三值; If q mj is less than or equal to t p and greater than or equal to t n , set the element X mj in the label assignment matrix corresponding to q mj equal to the third value; for the elements q ij in the jth column other than q mj , if q ij is less than t n , set the element X ij in the label assignment matrix corresponding to q ij equal to the second value; if q ij is less than or equal to t p and greater than or equal to t n , set q ij corresponding to the element X ij in the label assignment matrix element X ij is equal to said third value;
    其中,t p>t n,t p和t n分别为预设阈值,所述第一值表示正样本,所述第二值表示负样本,所述第三值表示忽略样本。 t p >t n , t p and t n are preset thresholds respectively, the first value represents positive samples, the second value represents negative samples, and the third value represents ignore samples.
  7. 根据权利要求3或4所述的方法,其特征在于,基于每个所述第二基准位置对于每个所述目标真值框对应目标的预测质量确定每个所述第一基准位置的标签分配信息的步骤,包括:The method according to claim 3 or 4, wherein the label assignment of each of the first reference positions is determined based on the prediction quality of each of the second reference positions for the target corresponding to each of the target ground-truth boxes Information steps, including:
    对于所述预测质量矩阵中的第i行,均从该行元素中选择出大于t p的目标元素q im,设置标签分配初始矩阵中第i行中与所述目标元素对应的元素X im为第一值;其中,q im均大于该行元素中未被选择的其它元素q iuFor the i-th row in the prediction quality matrix, a target element q im greater than t p is selected from the elements in the row, and the element X im corresponding to the target element in the i-th row of the initial label assignment matrix is set as The first value; wherein, q im is greater than the other elements q iu that are not selected in this row of elements;
    对于第i行中除q im以外的元素q iu,如果所述q iu小于或等于t p,且大于或等于t n,设置所述q iu对应 标签分配初始矩阵的元素X iu等于第三值;如果所述q iu小于t n,设置所述q iu对应标签分配初始矩阵的元素X iu等于第二值; For the element q iu other than q im in the i-th row, if the q iu is less than or equal to t p and greater than or equal to t n , set the element X iu of the q iu corresponding to the label assignment initial matrix to be equal to the third value ; If the qi iu is less than t n , set the element X iu of the corresponding label assignment initial matrix of the qi iu to be equal to the second value;
    检查所述标签分配初始矩阵中的第j列的元素是否存在冲突元素;其中,所述冲突元素为2个以上,且元素均为所述第一值;Check whether the elements in the jth column of the initial label allocation matrix have conflicting elements; wherein, the conflicting elements are more than 2, and the elements are all the first value;
    如果存在冲突元素,在所述预测质量矩阵中获取所述冲突元素对应的预测质量,保留所述冲突元素中预测质量最大元素为所述第一值,其余元素修改为所述第三值,得到标签分配矩阵;If there is a conflicting element, obtain the prediction quality corresponding to the conflicting element in the prediction quality matrix, keep the element with the highest prediction quality in the conflicting element as the first value, and modify the rest of the elements to the third value to obtain label assignment matrix;
    其中,t p>t n,t p和t n分别为预设阈值,所述第一值表示正样本,所述第二值表示负样本,所述第三值表示忽略样本。 t p >t n , t p and t n are preset thresholds respectively, the first value represents positive samples, the second value represents negative samples, and the third value represents ignore samples.
  8. 根据权利要求1-7任一项所述的方法,其特征在于,根据所述标签分配信息和所述学生模型检测结果计算所述学生网络模型的损失函数值的步骤,包括:The method according to any one of claims 1-7, wherein the step of calculating the loss function value of the student network model according to the label assignment information and the student model detection result includes:
    对于所述第一特征图中每个像素点对应的每个第一基准位置,均执行以下步骤:For each first reference position corresponding to each pixel in the first feature map, the following steps are performed:
    基于所述第一基准位置的标签分配信息,确定所述第一基准位置的目标真值框;determining the target ground truth frame of the first reference position based on the label assignment information of the first reference position;
    基于所述第一基准位置的目标真值框和所述第一基准位置的分数计算分类损失函数值和回归损失函数值;Calculate the classification loss function value and the regression loss function value based on the target ground-truth box of the first reference position and the score of the first reference position;
    基于各个所述第一基准位置的分类损失函数值和回归损失函数值确定所述学生网络模型的损失函数值。The loss function value of the student network model is determined based on the classification loss function value and the regression loss function value of each of the first reference positions.
  9. 一种目标检测装置,其特征在于,所述装置包括:A target detection device, characterized in that the device comprises:
    图像获取模块,配置成获取待检测图像;an image acquisition module, configured to acquire an image to be detected;
    目标检测模块,配置成将所述待检测图像输入目标检测模型,得到目标检测结果;所述目标检测结果包括目标对应的包围框的位置和分数;其中,所述目标检测模型通过如下方式训练:A target detection module, configured to input the image to be detected into a target detection model to obtain a target detection result; the target detection result includes the position and score of the bounding box corresponding to the target; wherein, the target detection model is trained in the following manner:
    将图像样本集中的图像样本输入学生网络模型,得到与所述图像样本的第一特征图的每个像素点对应的学生模型检测结果;其中,所述图像样本标注有目标真值框,所述学生模型检测结果包括与所述第一特征图的每个像素点对应的第一基准位置的分数和所述第一基准位置对应的坐标信息;Input the image samples in the image sample set into the student network model, and obtain the student model detection result corresponding to each pixel of the first feature map of the image sample; wherein, the image sample is marked with a target ground-truth frame, and the The student model detection result includes the score of the first reference position corresponding to each pixel of the first feature map and the coordinate information corresponding to the first reference position;
    获取教师网络模型对所述图像样本的教师模型检测结果;其中,所述教师网络模型为预先训练好的模型,所述教师模型检测结果包括与所述图像样本的第二特征图的每个像素点对应的第二基准位置的分数和所述第二基准位置对应的坐标信息;其中,所述第一特征图和所述第二特征图的基准位置数和/或位置点均相同;Obtain the teacher model detection result of the image sample by the teacher network model; wherein, the teacher network model is a pre-trained model, and the teacher model detection result includes each pixel corresponding to the second feature map of the image sample. The score of the second reference position corresponding to the point and the coordinate information corresponding to the second reference position; wherein, the reference position numbers and/or position points of the first feature map and the second feature map are the same;
    根据所述教师模型检测结果确定所述图像样本的标签分配信息;Determine the label assignment information of the image sample according to the detection result of the teacher model;
    根据所述标签分配信息和所述学生模型检测结果计算所述学生网络模型的损失函数值;Calculate the loss function value of the student network model according to the label assignment information and the student model detection result;
    基于所述损失函数值调整所述学生网络模型的参数继续训练,直至得到训练好的学生网络模型;Adjust the parameters of the student network model based on the loss function value and continue training until the trained student network model is obtained;
    将训练好的学生网络模型作为所述目标检测模型。The trained student network model is used as the target detection model.
  10. 根据权利要求9所述的装置,其特征在于,所述目标检测模块还配置成:The device according to claim 9, wherein the target detection module is further configured to:
    对于每个所述第二基准位置,分别计算该第二基准位置对应的第二预测框与所述图像样本的每个目标真值框的交叠比,得到矩阵IoU:For each of the second reference positions, calculate the overlap ratio of the second prediction frame corresponding to the second reference position and each target ground-truth frame of the image sample to obtain the matrix IoU:
    Figure PCTCN2021101773-appb-100004
    Figure PCTCN2021101773-appb-100004
    其中,i取值[1,N],j取值[1,A],N为所述标注真值框的个数,A为所述第二特征图包括的第二基准位置的个数;Wherein, i takes the value [1, N], j takes the value [1, A], N is the number of the labeled ground truth boxes, and A is the number of the second reference positions included in the second feature map;
    基于所述第二基准位置与每个所述目标真值框的所述交叠比和所述第二基准位置的分数,确定所述第二基准位置对于每个所述目标真值框对应目标的预测质量;其中,所述预测质量用于表征该第二基准位置检测出的是该目标真值框对应目标的概率大小;Based on the overlap ratio of the second reference position and each of the target ground-truth boxes and the score of the second reference position, determining that the second reference position corresponds to a target for each of the target ground-truth boxes The prediction quality; wherein, the prediction quality is used to characterize the probability that what the second reference position detects is the target corresponding to the target ground truth frame;
    基于每个所述第二基准位置对于每个所述目标真值框对应目标的预测质量确定每个所述第一基准位置的标签分配信息,The label assignment information of each of the first reference positions is determined based on the prediction quality of each of the second reference positions for the target corresponding to each of the target ground-truth boxes,
    利用公式q ij=(s j) 1-α*(IoU ij) α计算每个所述第二基准位置对于每个所述目标真值框对应目标的预测质量,得到预测质量矩阵Q;其中,q ij取值为[0,1],α为取值在[0,1]区间的预设超参数,s j为第j个所述第二基准位置的分数,IoU ij为第j个第二基准位置对应的第二预测框与第i个目标真值框的交叠比,为矩阵IoU中第i行第j列的元素; Use the formula q ij =(s j ) 1-α *(IoU ij ) α to calculate the prediction quality of each of the second reference positions for the target corresponding to each of the target ground-truth boxes, to obtain a prediction quality matrix Q; wherein, q ij is [0,1], α is a preset hyperparameter with a value in the [0,1] interval, s j is the score of the jth second reference position, and IoU ij is the jth The overlap ratio of the second prediction frame corresponding to the two reference positions and the i-th target truth frame is the element of the i-th row and the j-th column in the matrix IoU;
    Figure PCTCN2021101773-appb-100005
    Figure PCTCN2021101773-appb-100005
  11. 根据权利要求10所述的装置,其特征在于,所述目标检测模块还配置成:The apparatus according to claim 10, wherein the target detection module is further configured to:
    利用公式q ij=(s j) 1-α*(IoU ij) α计算每个所述第二基准位置对于每个所述目标真值框对应目标的预测质量,得到预测质量矩阵Q;其中,q ij取值为[0,1],α为取值在[0,1]区间的预设超参数,s j为第j个所述第二基准位置的分数,IoU ij为第j个第二基准位置对应的第二预测框与第i个目标真值框的交叠比,为矩阵IoU中第i行第j列的元素; Use the formula q ij =(s j ) 1-α *(IoU ij ) α to calculate the prediction quality of each of the second reference positions for the target corresponding to each of the target ground-truth boxes, to obtain a prediction quality matrix Q; wherein, q ij is [0,1], α is a preset hyperparameter with a value in the [0,1] interval, s j is the score of the jth second reference position, and IoU ij is the jth The overlap ratio of the second prediction frame corresponding to the two reference positions and the i-th target truth frame is the element of the i-th row and the j-th column in the matrix IoU;
    Figure PCTCN2021101773-appb-100006
    Figure PCTCN2021101773-appb-100006
  12. 根据权利要求10所述的装置,其特征在于,所述图像样本还标注有各目标真值框对应的目标类型;所述目标检测模块还配置成:The device according to claim 10, wherein the image sample is further marked with the target type corresponding to each target ground-truth frame; the target detection module is further configured to:
    利用公式q ij=(s ij) 1-α*(IoU ij) α计算每个所述第二基准位置对于每个所述目标真值框对应目标的预测质量,得到预测质量矩阵Q;其中,q ij取值为[0,1],α为取值在[0,1]区间的预设超参数,s ij为第j个第二基准位置的分数中对应于当前目标类型的分数,所述当前目标类型是指第i个目标真值框对应的目标类型,IoU ij为第j个第二基准位置对应的第二预测框与第i个目标真值框的交叠比,为矩阵IoU中第i行第j列的元素; Use the formula q ij =(s ij ) 1-α *(IoU ij ) α to calculate the prediction quality of each of the second reference positions for the target corresponding to each of the target ground-truth boxes, and obtain a prediction quality matrix Q; wherein, q ij is [0, 1], α is a preset hyperparameter with a value in the [0, 1] interval, s ij is the score corresponding to the current target type in the score of the j-th second reference position, so The current target type refers to the target type corresponding to the i-th target ground-truth frame, and IoU ij is the overlap ratio between the second prediction frame corresponding to the j-th second reference position and the i-th target ground-truth frame, and is the matrix IoU The element in the i-th row and the j-th column;
    Figure PCTCN2021101773-appb-100007
    Figure PCTCN2021101773-appb-100007
  13. 根据权利要求10-12任一项所述的装置,其特征在于,所述目标检测模块还配置成:The device according to any one of claims 10-12, wherein the target detection module is further configured to:
    对于每个所述第二基准位置,均从该第二基准位置对于每个所述目标真值框对应目标的预测质量中选择最大预测质量;For each of the second reference positions, the maximum prediction quality is selected from the prediction quality of the target corresponding to each of the target ground-truth boxes at the second reference position;
    判断所述最大预测质量是否大于或等于第一预设质量值;judging whether the maximum predicted quality is greater than or equal to a first preset quality value;
    如果是,为所述第二基准位置分配所述最大预测质量对应的目标的正样本标签。If yes, assign the positive sample label of the target corresponding to the maximum prediction quality to the second reference position.
  14. 根据权利要求11或12所述的装置,其特征在于,所述目标检测模块还配置成:The device according to claim 11 or 12, wherein the target detection module is further configured to:
    对于所述预测质量矩阵中的第j列,均从该列元素中选择出值最大的元素q mjFor the jth column in the prediction quality matrix, select the element q mj with the largest value from the elements of the column;
    如果q mj大于t p,设置q mj对应的标签分配矩阵中的元素X mj等于第一值;对于第j列中除q mj以外的元素q ij,如果q ij小于t n,设置q ij对应的标签分配矩阵中的元素X ij等于第二值;如果q ij小于或等于t p,且大于或等于t n,设置q ij对应的X ij等于第三值; If q mj is greater than t p , set the element X mj in the label assignment matrix corresponding to q mj equal to the first value; for the elements q ij in the jth column except q mj , if q ij is less than t n , set q ij corresponding to The element X ij in the label assignment matrix of is equal to the second value; if q ij is less than or equal to t p and greater than or equal to t n , set the corresponding X ij of q ij equal to the third value;
    如果q mj小于t n,设置标签分配矩阵中的第j列中的元素X ij等于所述第二值; If q mj is less than t n , set the element X ij in the j-th column of the label assignment matrix equal to the second value;
    如果q mj小于或等于t p,且大于或等于t n,设置q mj对应标签分配矩阵中的元素X mj等于所述第三值;对于第j列中除q mj以外的元素q ij,如果q ij小于t n,设置q ij对应标签分配矩阵中的元素X ij等于所述第二值;如果q ij小于或等于t p,且大于或等于t n,设置q ij对应标签分配矩阵中的元素X ij等于所述第三值; If q mj is less than or equal to t p and greater than or equal to t n , set the element X mj in the label assignment matrix corresponding to q mj equal to the third value; for the elements q ij in the jth column other than q mj , if q ij is less than t n , set the element X ij in the label assignment matrix corresponding to q ij equal to the second value; if q ij is less than or equal to t p and greater than or equal to t n , set q ij corresponding to the element X ij in the label assignment matrix element X ij is equal to said third value;
    其中,t p>t n,t p和t n分别为预设阈值,所述第一值表示正样本,所述第二值表示负样本,所述第三值表示忽略样本。 t p >t n , t p and t n are preset thresholds respectively, the first value represents positive samples, the second value represents negative samples, and the third value represents ignore samples.
  15. 根据权利要求11或12所述的装置,其特征在于,所述目标检测模块还配置成:The device according to claim 11 or 12, wherein the target detection module is further configured to:
    对于所述预测质量矩阵中的第i行,均从该行元素中选择出大于t p的目标元素q im,设置标签分配初始矩阵中第i行中与所述目标元素对应的元素X im为第一值;其中,q im均大于该行元素中未被选择的其它元素q iuFor the i-th row in the prediction quality matrix, a target element q im greater than t p is selected from the elements in the row, and the element X im corresponding to the target element in the i-th row of the initial label assignment matrix is set as The first value; wherein, q im is greater than the other elements q iu that are not selected in this row of elements;
    对于第i行中除q im以外的元素q iu,如果所述q iu小于或等于t p,且大于或等于t n,设置所述q iu对应标签分配初始矩阵的元素X iu等于第三值;如果所述q iu小于t n,设置所述q iu对应标签分配初始矩阵的元素X iu等于第二值; For the element q iu other than q im in the i-th row, if the q iu is less than or equal to t p and greater than or equal to t n , set the element X iu of the q iu corresponding to the label assignment initial matrix to be equal to the third value ; If the qi iu is less than t n , set the element X iu of the corresponding label assignment initial matrix of the qi iu to be equal to the second value;
    检查所述标签分配初始矩阵中的第j列的元素是否存在冲突元素;其中,所述冲突元素为2个以上,且元素均为所述第一值;Check whether the elements in the jth column of the initial label allocation matrix have conflicting elements; wherein, the conflicting elements are more than 2, and the elements are all the first value;
    如果存在冲突元素,在所述预测质量矩阵中获取所述冲突元素对应的预测质量,保留所述冲突元素中预测质量最大元素为所述第一值,其余元素修改为所述第三值,得到标签分配矩阵;If there is a conflicting element, obtain the prediction quality corresponding to the conflicting element in the prediction quality matrix, keep the element with the highest prediction quality in the conflicting element as the first value, and modify the rest of the elements to the third value to obtain label assignment matrix;
    其中,t p>t n,t p和t n分别为预设阈值,所述第一值表示正样本,所述第二值表示负样本,所述第三值表示忽略样本。 t p >t n , t p and t n are preset thresholds respectively, the first value represents positive samples, the second value represents negative samples, and the third value represents ignore samples.
  16. 根据权利要求9-15任一项所述的装置,其特征在于,所述目标检测模块还配置成:The device according to any one of claims 9-15, wherein the target detection module is further configured to:
    对于所述第一特征图中每个像素点对应的每个第一基准位置,均执行以下步骤:For each first reference position corresponding to each pixel in the first feature map, the following steps are performed:
    基于所述第一基准位置的标签分配信息,确定所述第一基准位置的目标真值框;determining the target ground truth frame of the first reference position based on the label assignment information of the first reference position;
    基于所述第一基准位置的目标真值框和所述第一基准位置的分数计算分类损失函数值和回归损失函数值;Calculate the classification loss function value and the regression loss function value based on the target ground-truth box of the first reference position and the score of the first reference position;
    基于各个所述第一基准位置的分类损失函数值和回归损失函数值确定所述学生网络模型的损失函数值。The loss function value of the student network model is determined based on the classification loss function value and the regression loss function value of each of the first reference positions.
  17. 一种电子设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现上述权利要求1-8任一项所述的方法的步骤。An electronic device, comprising a memory, a processor, and a computer program stored on the memory and running on the processor, characterized in that, when the processor executes the computer program, the above claims 1- 8. The steps of any one of the methods.
  18. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器运行时执行上述权利要求1-8任一项所述的方法的步骤。A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processor, the steps of the method according to any one of the preceding claims 1-8 are executed. .
PCT/CN2021/101773 2020-10-22 2021-06-23 Target detection method and apparatus, and electronic device WO2022083157A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011143452.7A CN112418268A (en) 2020-10-22 2020-10-22 Target detection method and device and electronic equipment
CN202011143452.7 2020-10-22

Publications (1)

Publication Number Publication Date
WO2022083157A1 true WO2022083157A1 (en) 2022-04-28

Family

ID=74841060

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/101773 WO2022083157A1 (en) 2020-10-22 2021-06-23 Target detection method and apparatus, and electronic device

Country Status (2)

Country Link
CN (1) CN112418268A (en)
WO (1) WO2022083157A1 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114898086A (en) * 2022-07-13 2022-08-12 山东圣点世纪科技有限公司 Target key point detection method based on cascade temperature control distillation
CN115631344A (en) * 2022-10-06 2023-01-20 中国人民解放军国防科技大学 Target detection method based on feature adaptive aggregation
CN115953605A (en) * 2023-03-14 2023-04-11 深圳中集智能科技有限公司 Machine vision multi-target image coordinate matching method
CN116071608A (en) * 2023-03-16 2023-05-05 浙江啄云智能科技有限公司 Target detection method, device, equipment and storage medium
CN116452912A (en) * 2023-03-28 2023-07-18 浙江大学 Training method, target detection method, medium and electronic equipment
CN116563278A (en) * 2023-07-06 2023-08-08 宁德时代新能源科技股份有限公司 Detection result display method, device, computer equipment and storage medium
CN116883390A (en) * 2023-09-04 2023-10-13 合肥中科类脑智能技术有限公司 Fuzzy-resistant semi-supervised defect detection method, device and storage medium
CN117115107A (en) * 2023-08-24 2023-11-24 哪吒港航智慧科技(上海)有限公司 Training method and device for appearance defect detection model based on long tail distribution probability
CN117784162A (en) * 2024-02-26 2024-03-29 安徽蔚来智驾科技有限公司 Target annotation data acquisition method, target tracking method, intelligent device and medium
WO2024012607A3 (en) * 2022-07-14 2024-04-04 顺丰科技有限公司 Personnel detection method and apparatus, device, and storage medium
CN117115107B (en) * 2023-08-24 2024-06-07 哪吒港航智慧科技(上海)有限公司 Training method and device for appearance defect detection model based on long tail distribution probability

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112418268A (en) * 2020-10-22 2021-02-26 北京迈格威科技有限公司 Target detection method and device and electronic equipment
CN113139437B (en) * 2021-03-31 2022-09-20 成都飞机工业(集团)有限责任公司 Helmet wearing inspection method based on YOLOv3 algorithm
CN113239982A (en) * 2021-04-23 2021-08-10 北京旷视科技有限公司 Training method of detection model, target detection method, device and electronic system
CN113762051B (en) * 2021-05-13 2024-05-28 腾讯科技(深圳)有限公司 Model training method, image detection device, storage medium and equipment
CN113344213A (en) * 2021-05-25 2021-09-03 北京百度网讯科技有限公司 Knowledge distillation method, knowledge distillation device, electronic equipment and computer readable storage medium
CN113850012B (en) * 2021-06-11 2024-05-07 腾讯科技(深圳)有限公司 Data processing model generation method, device, medium and electronic equipment
CN113361710B (en) * 2021-06-29 2023-11-24 北京百度网讯科技有限公司 Student model training method, picture processing device and electronic equipment
CN113705362B (en) * 2021-08-03 2023-10-20 北京百度网讯科技有限公司 Training method and device of image detection model, electronic equipment and storage medium
CN113806387A (en) * 2021-09-17 2021-12-17 北京百度网讯科技有限公司 Model training method, high-precision map change detection method and device and electronic equipment
CN113610069B (en) * 2021-10-11 2022-02-08 北京文安智能技术股份有限公司 Knowledge distillation-based target detection model training method
CN115019060A (en) * 2022-07-12 2022-09-06 北京百度网讯科技有限公司 Target recognition method, and training method and device of target recognition model
CN115527083B (en) * 2022-09-27 2023-04-11 中电金信软件有限公司 Image annotation method and device and electronic equipment
CN117315237B (en) * 2023-11-23 2024-02-27 上海闪马智能科技有限公司 Method and device for determining target detection model and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180268292A1 (en) * 2017-03-17 2018-09-20 Nec Laboratories America, Inc. Learning efficient object detection models with knowledge distillation
CN109711544A (en) * 2018-12-04 2019-05-03 北京市商汤科技开发有限公司 Method, apparatus, electronic equipment and the computer storage medium of model compression
CN110852285A (en) * 2019-11-14 2020-02-28 腾讯科技(深圳)有限公司 Object detection method and device, computer equipment and storage medium
CN111507378A (en) * 2020-03-24 2020-08-07 华为技术有限公司 Method and apparatus for training image processing model
CN112418268A (en) * 2020-10-22 2021-02-26 北京迈格威科技有限公司 Target detection method and device and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180268292A1 (en) * 2017-03-17 2018-09-20 Nec Laboratories America, Inc. Learning efficient object detection models with knowledge distillation
CN109711544A (en) * 2018-12-04 2019-05-03 北京市商汤科技开发有限公司 Method, apparatus, electronic equipment and the computer storage medium of model compression
CN110852285A (en) * 2019-11-14 2020-02-28 腾讯科技(深圳)有限公司 Object detection method and device, computer equipment and storage medium
CN111507378A (en) * 2020-03-24 2020-08-07 华为技术有限公司 Method and apparatus for training image processing model
CN112418268A (en) * 2020-10-22 2021-02-26 北京迈格威科技有限公司 Target detection method and device and electronic equipment

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114898086A (en) * 2022-07-13 2022-08-12 山东圣点世纪科技有限公司 Target key point detection method based on cascade temperature control distillation
CN114898086B (en) * 2022-07-13 2022-09-20 山东圣点世纪科技有限公司 Target key point detection method based on cascade temperature control distillation
WO2024012607A3 (en) * 2022-07-14 2024-04-04 顺丰科技有限公司 Personnel detection method and apparatus, device, and storage medium
CN115631344A (en) * 2022-10-06 2023-01-20 中国人民解放军国防科技大学 Target detection method based on feature adaptive aggregation
CN115953605A (en) * 2023-03-14 2023-04-11 深圳中集智能科技有限公司 Machine vision multi-target image coordinate matching method
CN116071608A (en) * 2023-03-16 2023-05-05 浙江啄云智能科技有限公司 Target detection method, device, equipment and storage medium
CN116071608B (en) * 2023-03-16 2023-06-06 浙江啄云智能科技有限公司 Target detection method, device, equipment and storage medium
CN116452912A (en) * 2023-03-28 2023-07-18 浙江大学 Training method, target detection method, medium and electronic equipment
CN116452912B (en) * 2023-03-28 2024-04-05 浙江大学 Training method, target detection method, medium and electronic equipment
CN116563278A (en) * 2023-07-06 2023-08-08 宁德时代新能源科技股份有限公司 Detection result display method, device, computer equipment and storage medium
CN116563278B (en) * 2023-07-06 2023-11-17 宁德时代新能源科技股份有限公司 Detection result display method, device, computer equipment and storage medium
CN117115107B (en) * 2023-08-24 2024-06-07 哪吒港航智慧科技(上海)有限公司 Training method and device for appearance defect detection model based on long tail distribution probability
CN117115107A (en) * 2023-08-24 2023-11-24 哪吒港航智慧科技(上海)有限公司 Training method and device for appearance defect detection model based on long tail distribution probability
CN116883390A (en) * 2023-09-04 2023-10-13 合肥中科类脑智能技术有限公司 Fuzzy-resistant semi-supervised defect detection method, device and storage medium
CN116883390B (en) * 2023-09-04 2023-11-21 合肥中科类脑智能技术有限公司 Fuzzy-resistant semi-supervised defect detection method, device and storage medium
CN117784162A (en) * 2024-02-26 2024-03-29 安徽蔚来智驾科技有限公司 Target annotation data acquisition method, target tracking method, intelligent device and medium
CN117784162B (en) * 2024-02-26 2024-05-14 安徽蔚来智驾科技有限公司 Target annotation data acquisition method, target tracking method, intelligent device and medium

Also Published As

Publication number Publication date
CN112418268A (en) 2021-02-26

Similar Documents

Publication Publication Date Title
WO2022083157A1 (en) Target detection method and apparatus, and electronic device
WO2020192469A1 (en) Method and apparatus for training image semantic segmentation network, device, and storage medium
WO2021238281A1 (en) Neural network training method, image classification system, and related device
CN112150821B (en) Lightweight vehicle detection model construction method, system and device
US11734390B2 (en) Unsupervised domain adaptation method, device, system and storage medium of semantic segmentation based on uniform clustering
US10410354B1 (en) Method and apparatus for multi-model primitive fitting based on deep geometric boundary and instance aware segmentation
WO2019223586A1 (en) Method and apparatus for detecting parking space usage condition, electronic device, and storage medium
CN111639524B (en) Automatic driving image semantic segmentation optimization method
CN102609724B (en) Method for prompting ambient environment information by using two cameras
CN112215171B (en) Target detection method, device, equipment and computer readable storage medium
WO2021233041A1 (en) Data annotation method and device, and fine granularity identification method and device
WO2021208617A1 (en) Method and apparatus for recognizing station entering and exiting, terminal, and storage medium
CN112766218B (en) Cross-domain pedestrian re-recognition method and device based on asymmetric combined teaching network
CN111898735A (en) Distillation learning method, distillation learning device, computer equipment and storage medium
CN113807399A (en) Neural network training method, neural network detection method and neural network detection device
WO2021243947A1 (en) Object re-identification method and apparatus, and terminal and storage medium
CN110807409A (en) Crowd density detection model training method and crowd density detection method
CN103065163A (en) Rapid target detection and recognition system and method based on static picture
CN109684990B (en) Video-based phone call behavior detection method
CN105631404A (en) Method and device for clustering pictures
CN112750128B (en) Image semantic segmentation method, device, terminal and readable storage medium
WO2023179593A1 (en) Data processing method and device
KR102014288B1 (en) Development pressure prediction method based on artificial intelligence using drone
CN116665390A (en) Fire detection system based on edge calculation and optimized YOLOv5
CN116433909A (en) Similarity weighted multi-teacher network model-based semi-supervised image semantic segmentation method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21881577

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 16.08.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21881577

Country of ref document: EP

Kind code of ref document: A1