WO2022262757A1 - 模型训练方法、图像检测方法及检测装置 - Google Patents

模型训练方法、图像检测方法及检测装置 Download PDF

Info

Publication number
WO2022262757A1
WO2022262757A1 PCT/CN2022/098880 CN2022098880W WO2022262757A1 WO 2022262757 A1 WO2022262757 A1 WO 2022262757A1 CN 2022098880 W CN2022098880 W CN 2022098880W WO 2022262757 A1 WO2022262757 A1 WO 2022262757A1
Authority
WO
WIPO (PCT)
Prior art keywords
loss function
model
value
product type
frame
Prior art date
Application number
PCT/CN2022/098880
Other languages
English (en)
French (fr)
Inventor
张旦
Original Assignee
上海齐感电子信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海齐感电子信息科技有限公司 filed Critical 上海齐感电子信息科技有限公司
Publication of WO2022262757A1 publication Critical patent/WO2022262757A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Definitions

  • the invention relates to the technical field of image processing, in particular to a model training method, an image detection method and a detection device.
  • Human figure detection refers to detecting whether there is a human figure in the image, extracting features from the human figure image, and detecting the human figure through the extracted features.
  • Humanoid detection is an important research topic in computer vision, and it is widely used in intelligent video surveillance, vehicle assisted driving, intelligent transportation, intelligent robots and other fields.
  • the mainstream humanoid detection methods are divided into statistical learning methods based on artificial image features and deep learning methods based on artificial neural networks. Deep learning methods include loss functions, which, as a means of measuring the inconsistency between model predictions and real values, are crucial for automatic parameter adjustment during model training.
  • the amount of data is often relatively large and requires high computing power.
  • the existing loss functions often use the cross-entropy loss function and the Focal loss function, but they all contain log computing units, and the computational complexity is high. Slow down the model convergence speed.
  • the Chinese patent application with the publication number CN111860631A discloses a method of optimizing the loss function by means of wrong-cause strengthening, which adjusts the influence of correlation on the cross-entropy loss function by adding a penalty item on the basis of the original cross-entropy loss function.
  • Weakness which improves the accuracy of the model for object recognition, and can improve the recognition accuracy of the deep learning network model.
  • the optimized loss function still contains a log operation unit, which has high computational complexity and slow running speed.
  • the Chinese patent application with the publication number CN112419269A discloses a construction method and application of an improved Focal Loss function to improve the segmentation effect of road surface diseases, including: setting the weight w of the Focal Loss function; The value w is converted into a piecewise function w'; the Focal Loss function is optimized by using the piecewise function w' to obtain an improved Focal Loss function.
  • the invention has the advantages of accurate classification, suppression of interference caused by wrong labeling, etc., and has high practical value and promotion value in the field of image processing technology.
  • the improved Focal Loss function in this patent still contains a log operation unit, which has high computational complexity and slows down the convergence speed of the model.
  • the purpose of the present invention is to provide a model training method, an image detection method and a detection device to solve the problem that the existing loss function contains a log operation unit, the calculation complexity is high, and the convergence speed of the model is slowed down.
  • the model training method of the present invention constructs a product-type Focal loss function, uses the product-type Focal loss function to carry out model training on the neural network model and outputs a trained neural network model, so as to be applied to humanoid-based An image detection method performed on an image dataset;
  • the construction method of described product type Focal loss function comprises the following steps:
  • W is the weight value
  • m is the adjustment parameter
  • P i is the predicted probability value of the i-th pixel in the feature map output by the network model
  • is the sample loss adjustment factor
  • y i is the effective value of the real sample
  • the product type Focal loss function is constructed by W and ⁇ .
  • the beneficial effect of the model training method of the present invention is: by setting the weight value, the expression of the weight value is:
  • the product-type Focal loss function does not contain logarithms, and solves the problem that the existing loss function contains a log operation unit, which has high computational complexity and slows down the convergence speed of the model, and uses the sample loss adjustment factor ⁇ to balance Simple and difficult samples reduce the loss of easy-to-classify samples, making the product Focal loss function pay more attention to difficult and misclassified samples in the calculation; by setting the sample ratio balance factor ⁇ to balance the ratio of positive and negative samples Both solve the problem that in the ordinary cross-entropy loss function, the greater the output probability of positive samples, the smaller the loss, and the smaller the output probability of negative samples, the smaller the loss, which leads to the slow and possible iteration of the cross-entropy loss function in a large number of simple samples The problem that cannot be optimized to the optimum; constructing the product type Focal loss function through W and ⁇ not only reduces the complexity of calculation, improves the operation speed, but also solves the power level of the contribution of misclassified target individuals to the loss function While the number
  • the expression of the product type Focal loss function is:
  • L fl-new is the product type Focal loss function
  • the obtained product type Focal loss function is the product type Focal loss function of the positive sample
  • the obtained product type Focal loss The function is the product Focal loss function of negative samples. Its beneficial effect is that by removing the log operation unit and using the product type operation unit, the complexity of the algorithm is reduced, the operation speed is improved, and the problem of increasing the power series of the contribution of the misclassified target individual to the loss function is solved At the same time, it is also considered that the contribution of correctly classified target individuals to the loss function shows a power series reduction, so that the product type Focal loss function reflects the overall discrimination of the feature map.
  • the product type Focal loss function is constructed by W and ⁇
  • backpropagation calculation and weight coefficient adjustment are performed.
  • the beneficial effect is to improve the generalization ability of the model.
  • the value range of the adjustment parameter is 0.5-1.2. Its beneficial effect is that: if m is greater than 1.2, the value of m will be too large, and the calculation limit will be exceeded in the multiplication operation, which will increase the complexity of the algorithm. If m is less than 0.5, the value of m will be too small, resulting in no results. significance.
  • the adjustment parameter m takes a value of 1, and the expression of the product type Focal loss function is:
  • the adjustment parameter m takes a value of 1, which does not exceed the boundary and is not too small, so that the model is easy to train and easier to achieve the preset goal.
  • the value of ⁇ is greater than 0, and the value of ⁇ is 0.1-0.9. Its beneficial effect is that: ⁇ greater than 0 can effectively reduce the loss of easy-to-classify samples, so that the product Focal loss function pays more attention to difficult and misclassified samples in the calculation, and the value of ⁇ is 0.1-0.9, so that Balance the uneven proportion of positive and negative samples. A value of ⁇ greater than 0.9 will lead to an excessive proportion of positive samples, and a value of ⁇ less than 0.1 will result in an excessive proportion of negative samples.
  • the present invention also provides an image detection method, comprising the following steps:
  • S200 Perform data preprocessing on the training set, the verification set, and the test set;
  • S300 Use the model training method to perform model training and output the trained neural network model.
  • the beneficial effects of the image detection method of the present invention are: through step S100: mark the human figure image data set and divide it into a training set, a verification set and a test set; Perform data preprocessing on the test set to preprocess the humanoid image data set, through step S300: use the described model training method to perform model training and output the trained neural network model, so that the log operation unit can be removed, and at the same time use
  • the product-type computing unit not only reduces the computational complexity and improves the computing speed, but also solves the problem that the contribution of the misclassified target individual to the loss function increases in power series, and also takes into account the correct classification of the target individual.
  • the loss function contribution presents a power series reduction, so that the product type Focal loss function reflects the overall discrimination of the feature map.
  • the step S300 specifically includes the following steps: after performing several generations of model training on the neural network model on the training set using the product Focal loss function, inputting the verification set into the neural network
  • the model obtains the output result of the first model, and then uses the NMS strategy to optimize the output result of the first model, and then obtains a trained neural network model according to the output result of the first model after optimization.
  • step S300 is executed, the following steps are further included:
  • S500 Deploy the neural network model whose evaluation result reaches the expected effect on the chip, and output the effect.
  • the method for optimizing using the NMS strategy includes the following steps:
  • S410 Provide a set of candidate frames and a set of spare candidate frames
  • S420 Initialize the candidate frame set as an empty set, and initialize all candidate frames in the spare candidate frame set to obtain several frames to be processed;
  • S430 Sort the several frames to be processed according to the confidence, and select the frame to be processed with the highest confidence as the first frame to be processed;
  • S440 Calculate the degree of coincidence between the first frame to be processed and other frames to be processed except the first frame to be processed in the plurality of frames to be processed to obtain several coincidence degree values, and calculate the several coincidence degree values Comparing with the preset threshold value to obtain the processing frame to be deleted;
  • the step of comparing the several coincidence degree values with a preset threshold value to obtain a processing frame to be deleted includes: selecting a frame to be processed whose coincidence degree value is higher than the preset threshold value as the selected Describe the processing box to be deleted. Its beneficial effect is that: if the coincidence degree value of the frame to be processed is higher than the preset threshold value, it means that it has a relatively high coincidence degree with the first frame to be processed, so it needs to be deleted from the spare candidate frame set.
  • a frame to be processed is a frame to be processed with a larger coincidence value to solve the problem of one person with two frames.
  • step S431 is performed after step S430 is completed, and step S440 is performed after step S431 is performed; step S431 includes: obtaining the preset threshold according to the confidence of the first frame to be processed .
  • the beneficial effect is that the obtained preset threshold can form a more accurate basis for comparison, and thus the obtained processing frame to be deleted is more accurate and error-free.
  • the formula for selecting the preset threshold is:
  • S i is the preset threshold
  • S 0 is the preset initial value
  • conf is the confidence level
  • is the adjustment parameter.
  • the value range of the adjustment parameter is 0.5-0.75. Its beneficial effect is that: the value of the adjustment parameter is lower than 0.5, which will cause the confidence of the selected frame to be processed to be too low, and the value of the value of the adjustment parameter greater than 0.5 can suppress the selection of the frame to be processed with too low confidence, and the If the value of the adjustment parameter is higher than 0.8, the confidence of the selected frame to be processed will be too high, and there will be missed detection.
  • the value range of the preset initial value is 0.2-0.8. Its beneficial effect is that: the preset initial value lower than 0.2 will cause the confidence of the selected frame to be processed to be too low, and the value of the preset initial value greater than 0.2 can suppress the selection of the frame to be processed with too low confidence, If the preset initial value is higher than 0.8, the confidence level of the selected frame to be processed will be too high, which may result in missed detection.
  • step S432 is performed after step S430 is performed, and step S440 is performed after step S432 is performed; step S432 includes: moving the first frame to be processed from the set of spare candidate frames to In the set of candidate boxes.
  • step S451 is performed after step S450 is completed, and step S460 is performed after step S451 is performed; step S451 includes: moving the first frame to be processed from the set of candidate frames to In the set of candidate boxes.
  • the present invention also provides a detection device, comprising:
  • a processor adapted to load and execute instructions of a software program
  • a memory adapted to store a software program comprising instructions for performing the following steps:
  • Construct product type Focal loss function use described product type Focal loss function to carry out model training to neural network model and output trained neural network model, to be applied to the image detection method that carries out based on humanoid image data set;
  • the construction method of described product type Focal loss function comprises the following steps:
  • W is the weight value
  • m is the adjustment parameter
  • P i is the predicted probability value of the i-th pixel in the feature map output by the network model
  • is the sample loss adjustment factor
  • y i is the effective value of the real sample
  • the product type Focal loss function is constructed by W and ⁇ .
  • the beneficial effects of the detection device of the present invention are: the instructions of the software program are loaded and executed by the processor, and the software program is stored in the memory, so that the detection device can be equipped with a model training method, and the product Focal loss can be achieved by setting the weight value
  • the logarithm is not included in the function, which solves the problem that the existing loss function contains a log operation unit, which has high computational complexity and slows down the convergence speed of the model; and adopts the sample loss adjustment factor ⁇ to balance simple and difficult samples, so as to Reduce the loss of easy-to-classify samples, so that the product Focal loss function pays more attention to difficult and misclassified samples in the calculation.
  • the sample ratio balance factor ⁇ By setting the sample ratio balance factor ⁇ to balance the uneven ratio of positive and negative samples, it solves the problem of common In the cross-entropy loss function, the greater the output probability of positive samples, the smaller the loss, and the smaller the output probability of negative samples, the smaller the loss.
  • the cross-entropy loss function is relatively slow in the iterative process of a large number of simple samples and may not be optimized to the optimum.
  • the problem of constructing the product type Focal loss function by W and ⁇ not only reduces the complexity of calculation, but also improves the speed of operation, and solves the problem that the contribution of the misclassified target individual to the loss function increases while the power series increases. , it is also taken into account that the contribution of correctly classified target individuals to the loss function is reduced by a power series, so that the product Focal loss function reflects the overall discrimination of the feature map.
  • the software program also includes instructions for performing the following steps:
  • S200 Perform data preprocessing on the training set, the verification set, and the test set;
  • S300 Use the model training method to perform model training and output the trained neural network model. Its beneficial effect is: the log operation unit is removed, and the product type operation unit is used at the same time, which not only reduces the complexity of the calculation, improves the operation speed, but also solves the power series of the contribution of the misclassified target individual to the loss function While increasing, it is also taken into account that the contribution of correctly classified target individuals to the loss function decreases in a power series, so that the product Focal loss function reflects the overall discrimination of the feature map.
  • step S300 After the software program executes the instructions in step S300, it also includes instructions for performing the following steps:
  • S500 Deploy the neural network model whose evaluation result reaches the expected effect on the chip, and output the effect.
  • Fig. 1 is the flowchart of image detection method in some embodiments of the present invention.
  • Fig. 2 is the flowchart of the optimization method of NMS strategy in some embodiments of the present invention.
  • Fig. 3 is a structural block diagram of a detection device in some embodiments of the present invention.
  • FIG. 1 is a flowchart of an image detection method in some embodiments of the present invention. Referring to FIG. 1 , it includes the following steps:
  • S200 Perform data preprocessing on the training set, the verification set, and the test set;
  • S500 Deploy the neural network model whose evaluation result reaches the expected effect on the chip, and output the effect.
  • step S300 in the prior art generally uses a cross-entropy loss function and a focal loss function for model training and outputs training results.
  • the cross-entropy loss function taking the binary classification as an example, the original classification loss is the direct sum of the cross-entropy of each training sample, as shown in formula (1):
  • L ce is the cross-entropy loss function
  • P i is the predicted probability value of the i-th pixel in the feature map output by the network model
  • y i is the effective value of the real sample
  • the obtained cross-entropy loss The function is the cross-entropy loss function of positive samples.
  • the obtained cross-entropy loss function is the cross-entropy loss function of negative samples.
  • the cross-entropy loss function has a higher output probability and a smaller loss; for negative samples, the smaller the output probability, the smaller the loss, and the cross-entropy loss function is relatively slow in the iterative process of a large number of simple samples and may not be optimally optimized.
  • the Focal loss function is based on the cross-entropy loss function, adding a sample loss adjustment factor ⁇ that balances simple and difficult samples and a sample ratio balance factor ⁇ that balances positive and negative samples, as shown in formula (2):
  • L fl is the Focal loss function
  • P i is the predicted probability value of the i-th pixel in the feature map output by the network model
  • is the sample loss adjustment factor
  • is the sample proportion balance factor
  • y i is the effective value of the real sample value
  • both the cross-entropy loss function and the focal loss function contain a log operation unit, which has high computational complexity and slows down the model convergence speed.
  • the embodiment of the present invention provides a model training method
  • Construct product type Focal loss function use described product type Focal loss function to carry out model training to neural network model and output trained neural network model, to be applied to the image detection method that carries out based on humanoid image data set;
  • the construction method of described product type Focal loss function comprises the following steps:
  • W is the weight value
  • m is the adjustment parameter
  • P i is the predicted probability value of the i-th pixel in the feature map output by the network model
  • is the sample loss adjustment factor
  • y i is the effective value of the real sample
  • the product type Focal loss function is constructed by W and ⁇ .
  • the expression of described product type Focal loss function is as shown in formula (7):
  • L fl-new is the product type Focal loss function
  • the obtained product type Focal loss function is the product type Focal loss function of the positive sample
  • the obtained product type Focal loss The function is the product Focal loss function of negative samples.
  • the backpropagation algorithm is used for calculation.
  • the backpropagation algorithm (Backpropagation algorithm, referred to as BP algorithm) is a supervised learning algorithm, which is often used to train multi-layer perceptrons. It mainly consists of two links (incentive propagation, weight update) The loop is iterated repeatedly until the response of the network to the input reaches the predetermined target range.
  • the incentive propagation link in each iteration consists of two steps: the first stage, the forward propagation stage, sends the training input to the network to obtain the stimulus response; the second stage, the backpropagation stage, sends the stimulus response to the target corresponding to the training input
  • the output is subtracted to obtain the response error of the hidden layer and the output layer.
  • the weight update link mainly updates the weight on each synapse according to the following steps: Multiply the input stimulus and the response error to obtain the gradient of the weight; multiply this gradient by a ratio and invert it and add it to the weight. This ratio will affect the speed and effectiveness of the training process, hence the name "training factor".
  • the direction of the gradient indicates the direction of error expansion, so it needs to be reversed when updating the weight, so as to reduce the error caused by the weight.
  • the value range of the adjustment parameter is 0.5-1.2, and m is greater than 1.2, which will make the value of m too large, exceed the calculation limit when performing multiplication operations, and increase the complexity of the algorithm. If m is less than 0.5 will make the value of m too small, resulting in meaningless results.
  • the value of the adjustment parameter m is 1, and the expression of the product type Focal loss function is as shown in formula (9):
  • P i is the predicted probability value of the i-th pixel in the feature map output by the network model
  • is the sample loss adjustment factor
  • is the sample ratio balance factor
  • y i is the effective value of the real sample
  • the adjustment parameter m takes a value of 1, which will not exceed the limit or be too small, making it easier to train and easier to achieve the preset goal.
  • the value of ⁇ is greater than 0, and the value of ⁇ is 0.1-0.9.
  • ⁇ greater than 0 can effectively reduce the loss of easy-to-classify samples, making the product Focal loss function pay more attention to difficult and misclassified samples in the calculation, and the value of ⁇ is 0.1-0.9, which balances positive and negative samples The proportion of itself is uneven. A value of ⁇ greater than 0.9 will lead to an excessive proportion of positive samples, and a value of ⁇ less than 0.1 will result in an excessive proportion of negative samples.
  • labeling the humanoid image data set and dividing it into a training set, a verification set and a test set includes: collecting different environments, different backgrounds, and different postures captured by a camera in a real environment , Images in different positions form a humanoid image dataset, and use the labeling tool to generate the label frame position and label information of the corresponding target, the label information of the category is human is human; the labeled humanoid image dataset is divided into training set, verification set and Test set; Generate a list of training set images and shuffle the order; Cluster the target boxes corresponding to the labels in all images to generate 12 cluster points.
  • performing data preprocessing on the training set, the verification set, and the test set in the step S200 includes performing a normalization operation using image RGB channels, and performing the following operations on each channel: The operation shown in formula (11):
  • X i is R, G, B
  • R, G, and B represent red, green, and blue channels respectively.
  • the RGB color mode is a color standard in the industry. Changes and their mutual superposition to obtain a variety of colors, RGB is the color representing the three channels of red, green, and blue. This standard includes almost all the colors that human vision can perceive. It is currently used One of the widest range of color systems. It is a conventional technique in the art to perform a normalization operation by using the RGB channels of an image, and details will not be repeated here.
  • step S200 after performing the normalization operation in step S200, it also includes randomly flipping the image horizontally, cropping the image with the minimum image target area, and randomly setting the saturation multiple in [1/1.5, 1.5 ] adjustment, the exposure multiple is randomly adjusted at [1/1.5, 1.5], the tone multiple is randomly adjusted at [1/1.2, 1.2], the image is randomly rotated at the angle of [-30, 30] according to the center point; all of the above are random The probabilities are both 50%.
  • the step S300 specifically includes the following steps: after using the product focal loss function to perform several generations of model training on the neural network model on the training set, input the verification set Obtaining the output result of the first model from the neural network model, then optimizing the output result of the first model by using the NMS strategy, and obtaining a trained neural network model according to the output result of the optimized first model.
  • the NMS strategy is used to optimize the output result of the first model, so as to evaluate the performance of the neural network model, and perform model optimization when the performance of the neural network model is not good, Finally, a trained neural network model is obtained.
  • the model optimization includes adjusting the neural network structure for the humanoid network training model, the adjustment refers to expanding the number of convolution kernels according to a multiple of 1.25, and expanding the training set by adding image data of the scene, and then performing model training to achieve The purpose of optimizing the model is to finally obtain a trained neural network model.
  • the model optimization described above is common knowledge in the field, and will not be repeated here.
  • the standard evaluation test in step S400 is to use a camera to perform scene measurement on the test set used for training, and compare it with the selected product on the market. Referring to Figure 1, the effect is better than the selected one. If the average value of the effects of the products on the market is judged to be that the evaluation result has reached the expected effect, then the step S500 is performed to deploy the neural network model whose evaluation result has reached the expected effect on the chip to output the effect, otherwise the steps are repeated S100-S400 processing.
  • Fig. 2 is a flow chart of the optimization method of the NMS strategy in some embodiments of the present invention; in some embodiments of the present invention, referring to Fig. 2, in the step S300 and the step S400, the method of using the NMS strategy for optimization includes The following steps:
  • S410 Provide a set of candidate frames and a set of spare candidate frames
  • S420 Initialize the candidate frame set as an empty set, and initialize all candidate frames in the spare candidate frame set to obtain several frames to be processed;
  • S430 Sort the several frames to be processed according to the confidence, and select the frame to be processed with the highest confidence as the first frame to be processed;
  • S440 Calculate the degree of coincidence between the first frame to be processed and other frames to be processed except the first frame to be processed in the plurality of frames to be processed to obtain several coincidence degree values, and calculate the several coincidence degree values Comparing with the preset threshold value to obtain the processing frame to be deleted;
  • step S460 Repeat the processing from the step S430 to the step S450 until the set of spare candidate frames is an empty set.
  • the frame to be processed obtained in the set of candidate frames has no frame to be processed with the same degree of coincidence, thereby solving the problem that there are two frames for one person in the trained model.
  • step S432 is performed after step S430 is completed, and step S440 is performed after step S432 is completed; step S432 includes: transferring the first frame to be processed from the standby The candidate frame set is moved to the candidate frame set, that is, the first frame to be processed is first moved from the spare candidate frame set to the candidate frame set, and then subsequent coincidence degree calculation and comparison processing is performed.
  • step S451 is performed after step S450 is completed, and step S460 is performed after step S451 is performed; step S451 includes: moving the first frame to be processed from the The spare candidate frame set is moved to the candidate frame set, that is, the coincidence degree calculation and comparison are performed first, and then the first frame to be processed is moved from the spare candidate frame set to the candidate frame set.
  • the step of comparing the several coincidence degree values with the preset threshold value to obtain the processing frame to be deleted includes: The processing box is selected as the processing box to be deleted. If the coincidence degree value of the frame to be processed is higher than the preset threshold value, it means that it has a relatively high coincidence degree with the first frame to be processed, so it needs to be deleted from the set of candidate frames to be processed.
  • the frame to be processed with a larger specific coincidence value is used to solve the problem of one person with two frames.
  • step S431 is performed after step S430 is completed, and step S440 is performed after step S431 is performed; step S431 includes: obtaining according to the confidence of the first frame to be processed The preset threshold.
  • the obtained preset threshold can form a more accurate basis for comparison, and thus the obtained processing frame to be deleted is more accurate and error-free.
  • the selection formula of the preset threshold is shown in formula (10):
  • S i is the preset threshold
  • S 0 is the preset initial value
  • conf is the confidence level
  • is the adjustment parameter.
  • the value range of the adjustment parameter is 0.5-0.75. If the adjustment parameter value is lower than 0.5, the confidence of the selected frame to be processed will be too low. If the value of the adjustment parameter is greater than 0.5, it can suppress the selection of the frame to be processed with too low confidence. The value of the adjustment parameter is higher than 0.8 will cause the confidence of the selected frame to be processed to be too high, and there will be missed detection.
  • the value range of the preset initial value S is 0.2-0.8
  • the preset initial value is a hyperparameter set manually. If the preset initial value is lower than 0.2, the confidence of the selected frame to be processed will be too low, and if the value of the preset initial value is greater than 0.2, the selection of the frame to be processed with too low confidence can be suppressed. If the value is higher than 0.8, the confidence of the selected frame to be processed will be too high, and there will be missed detection.
  • the preset initial value is selected as 0.5.
  • Hyperparameters are parameters that set values before starting the learning process, rather than parameter data obtained through training. Usually, hyperparameters need to be optimized to select a set of optimal hyperparameters for the learning machine to improve learning performance and Effect, the selection and setting of hyperparameters is a conventional technique in the art, and will not be repeated here.
  • the neural network model is any one of YOLOv4, YOLOv3 and YOLOv5s.
  • the product Focal loss function shown in formula (9) is used to train its own data set in YOLOv5s, as part of the classification loss function in the training process, ⁇ is 0.25, ⁇ is 2, training 20 One epoch, 85.1% AP can be achieved on 1W test pictures, and the training time is shortened by 27% on the same platform.
  • Fig. 3 is a structural block diagram of a detection device in some embodiments of the present invention.
  • a detection device is also provided. Referring to FIG. 3 , it includes: a storage 1 and a processor 2, the processor 2 is suitable for loading and executing instructions of a software program, and the storage 1 is suitable for storing a software program comprising instructions for performing the following steps:
  • Construct product type Focal loss function use described product type Focal loss function to carry out model training to neural network model and output trained neural network model, to be applied to the image detection method that carries out based on humanoid image data set;
  • the construction method of described product type Focal loss function comprises the following steps:
  • W is the weight value
  • m is the adjustment parameter
  • P i is the predicted probability value of the i-th pixel in the feature map output by the network model
  • is the sample loss adjustment factor
  • y i is the effective value of the real sample
  • the product type Focal loss function is constructed by W and ⁇ .
  • the software program also includes instructions for performing the following steps: S100: mark the humanoid image data set and divide it into a training set, a verification set and a test set;
  • S200 Perform data preprocessing on the training set, the verification set, and the test set;
  • S300 Use the model training method to perform model training and output the trained neural network model.
  • step S300 after the software program executes the instructions of step S300, it further includes instructions for performing the following steps:
  • S500 Deploy the neural network model whose evaluation result reaches the expected effect on the chip, and output the effect.
  • the software program further includes all steps for performing the step S300 using the NMS strategy to optimize the model output result and the step S400 using the NMS strategy to optimize the model output result to obtain the final effect Instructions describing the optimization method of the NMS strategy:
  • S410 Provide a set of candidate frames and a set of spare candidate frames
  • S420 Initialize the candidate frame set as an empty set, and initialize all candidate frames in the spare candidate frame set to obtain several frames to be processed;
  • S430 Sort the several frames to be processed according to the confidence, and select the frame to be processed with the highest confidence as the first frame to be processed;
  • S440 Calculate the degree of coincidence between the first frame to be processed and other frames to be processed except the first frame to be processed in the plurality of frames to be processed to obtain several coincidence degree values, and calculate the several coincidence degree values Comparing with the preset threshold value to obtain the processing frame to be deleted;

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

一种模型训练方法,构建乘积型Focal loss函数,使用所述乘积型Focal loss函数对神经网络模型进行模型训练并输出训练好的神经网络模型;所述乘积型Focal loss函数的构建方法包括以下步骤:设定权重值,以解决现有的损失函数均含有log运算单元,计算复杂度较高,拖慢了模型收敛速度的问题;设定样本比例平衡因子α,通过W和α构建所述乘积型Focal loss函数,降低了计算的复杂度,提高了运算速度,而且解决了分类错误的目标个体对损失函数的贡献乘幂级数增大的同时,也考虑到了分类正确的目标个体对损失函数贡献呈现幂级数降低,使得所述乘积型Focal loss函数反应的是特征图整体的判别情况。还提供一种图像检测方法和检测装置。

Description

模型训练方法、图像检测方法及检测装置
交叉引用
本申请要求2021年6月16日提交的申请号为202110663586X的中国专利申请的优先权。上述申请的内容以引用方式被包含于此。
技术领域
本发明涉及图像处理技术领域,尤其涉及一种模型训练方法、图像检测方法及检测装置。
技术背景
人形检测是指在图像中检测是否有人形,对人形图像进行特征提取,通过提取的特征来对人形进行检测。人形检测是计算机视觉中的重要研究课题,被广泛应用于智能视频监控、车辆辅助驾驶、智能交通、智能机器人等领域。主流的人形检测方法分为基于人工图像特征的统计学习方法和基于人工神经网络的深度学习方法。深度学习方法包括损失函数,损失函数作为衡量模型预测值与真实值间不一致性的一种手段,对于模型训练过程中的自动参数调节至关重要。在神经网络训练过程,数据量往往比较庞大,对算力要求较高,而现有损失函数中往往采用交叉熵损失函数和Focal loss函数,但其均含有log运算单元,计算复杂度较高,拖慢了模型收敛速度。
公开号为CN111860631A的中国专利申请公开了一种采用错因强化方式优化损失函数的方法,其通过在原来的交叉熵损失函数的基础上加入惩罚项,调节相关性对交叉熵损失函数影响的强弱,提高了模型对物品识别的精度,能够提高深度学习网络模型的识别准确程度。但是该优化后的损失函数依然含有log运算单元,计算复杂度较高,运行速度较慢。
公开号为CN112419269A的中国专利申请公开了一种提高道面病害分割效果的改进型Focal Loss函数的构建方法及应用,包括:设定Focal Loss函数的权值w;预设阈值β,并将权值w转换成分段函数w';利用分段函数w'对Focal Loss函数进行优化,得到改进型Focal Loss函数。通过上述方案,该发明具有分类准确、抑制错误标注带来的干扰等优点,在图像处理技术领域具有很高的实用价值和推广价值。但是该专利中的改进型Focal Loss函数依然含有log运算单元,计算复杂度较高,拖慢了模型收敛速度。
因此,有必要提供一种新型的模型训练方法、图像检测方法及检测装置以解决现有技术中存在的上述问题。
发明概要
本发明的目的在于提供一种模型训练方法、图像检测方法及检测装置,以解决现有的损失函数含有log运算单元,计算复杂度较高,拖慢了模型收敛速度的问题。
为实现上述目的,本发明的所述模型训练方法,构建乘积型Focal loss函数,使用所述乘积型Focal loss函数对神经网络模型进行模型训练并输出训练好的神经网络模型,以应用于基于人形图像数据集进行的图像检测方法;
所述乘积型Focal loss函数的构建方法包括以下步骤:
设定权重值,所述权重值的表达式为:
Figure PCTCN2022098880-appb-000001
其中,W为所述权重值,m为调整参数,P i是网络模型输出的特征图中第i个像素点的预测概率值,γ为样本损失调整因子,y i是真实样本的有效值,当y i=1,得到的权重值为正样本的权重值,当y i=0,得到的权重值为负样本的权重值;
设定样本比例平衡因子α;
通过W和α构建所述乘积型Focal loss函数。
本发明的模型训练方法的有益效果在于:通过设定权重值,所述权重值的表达式为:
Figure PCTCN2022098880-appb-000002
,使得乘积型Focal loss函数中不含对数,解决现有的损失函数含有log运算单元,计算复杂度较高,拖慢了模型收敛速度的问题,而且通过采用样本损失调整因子γ,以平衡简单与困难样本,减少易分类样本的损失,使得乘积型Focal loss函数在计算中更关注于困难的、错分的样本;通过设定样本比例平衡因子α,以平衡正负样本本身的比例不均,解决了普通的交叉熵损失函数中正样本的输出概率越大损失越小,负样本的输出概率越小则损失越小,导致交叉熵损失函数在大量简单样本的迭代过程中比较缓慢且可能无法优化至最优的问题;通过W和α构建所述乘积型Focal loss函数,不仅降低了计算的复杂度,提高了运算速度,而且解决了分类错误的目标个体对损失函数的贡献乘幂级数增大的同时,也考虑到了分类正确的目标个体对损失函数贡献呈现幂级数降低,使得所述乘积型Focal loss函数反应的是特征图整体的判别情况。
优选的,所述乘积型Focal loss函数的表达式为:
Figure PCTCN2022098880-appb-000003
其中,L fl-new为所述乘积型Focal loss函数,当y i=1,得到的乘积型Focal loss函数为正样本的乘积型Focal loss函数,当y i=0,得到的乘积型Focal loss函数为负样本的乘积型Focal loss函数。其有益效果在于:通过将log运算单元去掉,使用乘积型的运算单元,减少了算法复杂度,提高了运算速度,而且解决了分类错误的目标个体对损失函数的贡献乘幂级数增大的同时,也考虑到了分类正确的目标个体对损失函数贡献呈现幂级数降低,使得所述乘积型Focal loss函数反应的是特征图整体的判别情况。
优选的,通过W和α构建所述乘积型Focal loss函数之后,再进行反向传播计算和权重系数调整。其有益效果在于:以提高模型的泛化能力。
优选的,所述调整参数的取值范围为0.5-1.2。其有益效果在于:m大于1.2,会使得m取值过大,在进行连乘运算时超出计算极限,增加算法复杂度,m小于0.5,会使得m取值过小,导致得出的结果没有意义。
优选的,所述调整参数m取值为1,所述乘积型Focal loss函数的表达式为:
Figure PCTCN2022098880-appb-000004
。其有益效果在于:所述调整参数m取值为1,不会越界,也不会过小,使得模型容易训练,更容易达到预设目标。
优选的,所述γ的取值大于0,所述α的取值为0.1-0.9。其有益效果在于:γ大于0能有效减少了易分类样本的损失,使得乘积型Focal loss函数在计算中更关注于困难的、错分的样本,所述α的取值为0.1-0.9,使得平衡了正负样本本身的比例不均,α取值大于0.9会导致正样本的比例过多,α取值小于0.1会导致负样本的比例过多。
优选的,本发明还提供一种图像检测方法,包括执行以下步骤:
S100:对人形图像数据集进行标注并分为训练集、验证集和测试集;
S200:对所述训练集、所述验证集和所述测试集进行数据预处理;
S300:使用所述的模型训练方法进行模型训练并输出训练好的神经网络模型。
本发明的图像检测方法的有益效果在于:通过步骤S100:对人形图像数据集进行标注并分为训练集、验证集和测试集,步骤S200:对所述训练集、所述验证集和所述测试集进行数据预处理,以对人形图像数据集进行预处理,通过步骤S300:使用所述的模型训练方法进行模型训练并输出训练 好的神经网络模型,使得可以将log运算单元去掉,同时使用乘积型的运算单元,不仅降低了计算的复杂度,提高了运算速度,而且解决了分类错误的目标个体对损失函数的贡献乘幂级数增大的同时,也考虑到了分类正确的目标个体对损失函数贡献呈现幂级数降低,使得所述乘积型Focal loss函数反应的是特征图整体的判别情况。
优选的,所述步骤S300具体包括以下步骤:采用所述乘积型Focal loss函数在所述训练集上对所述神经网络模型进行若干代模型训练后,将所述验证集输入到所述神经网络模型得到第一模型输出结果,然后使用NMS策略优化所述第一模型输出结果,再根据优化后的所述第一模型输出结果得到训练好的神经网络模型。
优选的,所述步骤S300执行完毕后,还包括执行以下步骤:
S400:将所述测试集输入到所述训练好的神经网络模型中得到第二模型输出结果后,采用NMS策略优化所述第二模型输出结果得到最终效果,然后对最终效果进行达标评估测试,并判断得到的评估结果是否达到预期效果;
S500:将评估结果达到预期效果的神经网络模型部署在芯片上,进行效果输出。
优选的,所述步骤S300和所述步骤S400中,使用NMS策略进行优化的方法包括以下步骤:
S410:提供候选框集合和备用候选框集合;
S420:将所述候选框集合初始化为空集合,对所述备用候选框集合中的所有候选框进行初始化得到若干待处理框;
S430;对所述若干待处理框按照置信度进行排序,选取置信度最高的待处理框为第一待处理框;
S440:对所述第一待处理框与所述若干待处理框中除所述第一待处理框外的其它待处理框进行重合度计算以得到若干重合度值,将所述若干重合度值与预设阈值进行比对得到待删除处理框;
S450:将所述待删除处理框从所述备用候选框集合中删除;
S460:重复所述步骤S430至所述步骤S450的处理,直至所述备用候选框集合为空集合。其有益效果在于:使得所述候选框集合中得到的待处理框都没有重合度相同的待处理框,从而解决了训练后的模型存在一人两框的问题。
优选的,所述步骤S440中,将所述若干重合度值与预设阈值进行比对得到待删除处理框的步骤包括:将重合度值高于所述预设阈值的待处理框选取为所述待删除处理框。其有益效果在于:待处理框的重合度值高于所述预设阈值说明其与所述第一待处理框的重合度比较高,因此需要从所述备用候选框集合中删除与所述第一待处理框相比重合度值较大的待处理框,以解决一人两框的问题。
优选的,所述步骤S430执行完毕后进行步骤S431,所述步骤S431执行完毕后进行所述步骤S440;所述步骤S431包括:根据所述第一待处理框的置信度获得所述预设阈值。其有益效果在于:使得获得的所述预设阈值能形成更精准的对比依据,由此获得的待删除处理框更精准,无误差。
优选的,所述预设阈值的选取公式为:
Figure PCTCN2022098880-appb-000005
其中,S i为预设阈值,S 0为预设初始值,conf为置信度,λ为调节参数。其有益效果在于:当所述第一待处理框的置信度大于零且小于所述预设初始值,引进调节参数,进行人工干预调节置信度的强度,避免在所述第一待处理框的置信度过低时,只依靠所述第一待处理框的置信度而导致影响所述预设阈值的选取结果,使得得到的所述预设阈值更可靠;当所述第一待处理框的置信度大于等于所述预设初始值,说明所述第一待处理框的置信度比较大,本身可信度比较高,此时将所述预设初始值设为所述预设阈值,能相对保留更多置信度较高的相对较大重合度值的待处理框。
优选的,所述调节参数的取值范围为0.5-0.75。其有益效果在于:所述调节参数值低于0.5会导致选取的待处理框置信度过低,所述调节参数值的取值大于0.5可以抑制选取过低的置信度的待处理框,所述调节参数值高于0.8会导致选取的待处理框置信度太高,会出现漏检的情况。
优选的,所述预设初始值的取值范围为0.2-0.8。其有益效果在于:所述预设初始值低于0.2会导致选取的待处理框置信度过低,所述预设初始值的取值大于0.2可以抑制选取过低的置信度的待处理框,所述预设初始值高于0.8会导致选取的待处理框置信度太高,会出现漏检的情况。
优选的,所述步骤S430执行完毕后进行步骤S432,所述步骤S432执行完毕后进行所述步骤S440;所述步骤S432包括:将所述第一待处理框从所述备用候选框集合移动到所述候选框集合中。
优选的,所述步骤S450执行完毕后进行步骤S451,所述步骤S451执行完毕后进行所述步骤S460;所述步骤S451包括:将所述第一待处理框从所述备用候选框集合移动到所述候选框集合中。
优选的,本发明还提供一种检测装置,包括:
处理器,适于加载并执行软件程序的指令;
储存器,适于存储软件程序,所述软件程序包括用于执行以下步骤的指令:
构建乘积型Focal loss函数,使用所述乘积型Focal loss函数对神经网络模型进行模型训练并输出训练好的神经网络模型,以应用于基于人形图像数据集进行的图像检测方法;
所述乘积型Focal loss函数的构建方法包括以下步骤:
设定权重值,所述权重值的表达式为:
Figure PCTCN2022098880-appb-000006
其中,W为所述权重值,m为调整参数,P i是网络模型输出的特征图中第i个像素点的预测概率值,γ为样本损失调整因子,y i是真实样本的有效 值,当y i=1,得到的权重值为正样本的权重值,当y i=0,得到的权重值为负样本的权重值;
设定样本比例平衡因子α;
通过W和α构建所述乘积型Focal loss函数。
本发明的检测装置的有益效果在于:通过处理器加载并执行软件程序的指令,储存器存储软件程序,使得检测装置能配备进行模型训练方法,通过设定设定权重值,使得乘积型Focal loss函数中不含对数,解决现有的损失函数含有log运算单元,计算复杂度较高,拖慢了模型收敛速度的问题;而且通过采用样本损失调整因子γ,以平衡简单与困难样本,以减少易分类样本的损失,使得乘积型Focal loss函数在计算中更关注于困难的、错分的样本,通过设定样本比例平衡因子α,以平衡正负样本本身的比例不均,解决了普通的交叉熵损失函数中正样本的输出概率越大损失越小,负样本的输出概率越小则损失越小,导致交叉熵损失函数在大量简单样本的迭代过程中比较缓慢且可能无法优化至最优的问题;通过W和α构建所述乘积型Focal loss函数,不仅降低了计算的复杂度,提高了运算速度,而且解决了分类错误的目标个体对损失函数的贡献乘幂级数增大的同时,也考虑到了分类正确的目标个体对损失函数贡献呈现幂级数降低,使得所述乘积型Focal loss函数反应的是特征图整体的判别情况。
优选的,所述软件程序还包括用于执行以下步骤的指令:
S100:对人形图像数据集进行标注并分为训练集、验证集和测试集;
S200:对所述训练集、所述验证集和所述测试集进行数据预处理;
S300:使用所述的模型训练方法进行模型训练并输出训练好的神经网络模型。其有益效果在于:以将log运算单元去掉,同时使用乘积型的运算单元,不仅降低了计算的复杂度,提高了运算速度,而且解决了分类错误的目标个体对损失函数的贡献乘幂级数增大的同时,也考虑到了分类正确的目标个体对损失函数贡献呈现幂级数降低,使得所述乘积型Focal loss函数反应的是特征图整体的判别情况。
优选的,所述软件程序执行所述步骤S300的指令之后,还包括用于执行以下步骤的指令:
S400:将所述测试集输入到所述训练好的神经网络模型中得到第二模型输出结果后,采用NMS策略优化所述第二模型输出结果得到最终效果,然后对最终效果进行达标评估测试,并判断得到的评估结果是否达到预期效果;
S500:将评估结果达到预期效果的神经网络模型部署在芯片上,进行效果输出。
附图说明
图1为本发明的一些实施例中图像检测方法的流程图;
图2为本发明的一些实施例中NMS策略的优化方法的流程图;
图3为本发明的一些实施例中检测装置的结构框图。
发明内容
为使本发明的目的、技术方案和优点更加清楚,下面将结合本发明的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明的一部分实施例,而不是全部的实施例。基于本发明中的实 施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。除非另外定义,此处使用的技术术语或者科学术语应当为本发明所属领域内具有一般技能的人士所理解的通常意义。本文中使用的“包括”等类似的词语意指出现该词前面的元件或者物件涵盖出现在该词后面列举的元件或者物件及其等同,而不排除其他元件或者物件。
针对现有技术存在的问题,本发明的实施例提供了一种图像检测方法,图1为本发明的一些实施例中图像检测方法的流程图,参照图1,包括以下步骤:
S100:对人形图像数据集进行标注并分为训练集、验证集和测试集;
S200:对所述训练集、所述验证集和所述测试集进行数据预处理;
S300:使用所述的模型训练方法进行模型训练并输出训练好的神经网络模型;
S400:将所述测试集输入到所述训练好的神经网络模型中得到第二模型输出结果后,采用NMS策略优化所述第二模型输出结果得到最终效果,然后对最终效果进行达标评估测试,并判断得到的评估结果是否达到预期效果;
S500:将评估结果达到预期效果的神经网络模型部署在芯片上,进行效果输出。
在神经网络训练过程,数据量往往比较庞大,对算力要求较高,而现有技术中所述步骤S300一般是采用交叉熵损失函数和Focal loss函数进行模型训练,并输出训练结果。
交叉熵损失函数,以二分类为例,原始的分类loss是各个训练样本交叉熵的直接求和,如公式(1)所示:
Figure PCTCN2022098880-appb-000007
其中,L ce为交叉熵损失函数,P i是网络模型输出的特征图中第i个像素点的预测概率值,y i是真实样本的有效值,当y i=1,得到的交叉熵损失函数为正样本的交叉熵损失函数,当y i=0,得到的交叉熵损失函数为负样本的交叉熵损失函数。
交叉熵损失函数对于正样本而言,输出概率越大损失越小,对于负样本而言,输出概率越小则损失越小,而且所述交叉熵损失函数在大量简单样本的迭代过程中比较缓慢且可能无法优化至最优。
Focal loss函数是在所述交叉熵损失函数基础上加了一个平衡简单与困难样本的样本损失调整因子γ和平衡正负样本的样本比例平衡因子α,如公式(2)所示:
Figure PCTCN2022098880-appb-000008
其中,L fl为Focal loss函数,P i是网络模型输出的特征图中第i个像素点的预测概率值,γ为样本损失调整因子,α为样本比例平衡因子,y i是真实样本的有效值,当y i=1,得到的Focal loss函数为正样本的Focal loss函数,当y i=0,得到的Focal loss函数为负样本的Focal loss函数。
对于Focal loss函数,从公式(2)可以知道每计算一个样本的loss,就需要进行一次log计算,由于现存计算机中,逻辑运算单元(ALU)只包含 加法器和乘法器,因此除法和对数运算必须转换成相应的形式。
计算对数ln(x)的传统方法是利用能量系数无限接近其值。ln(x)的能量系数展开式如公式(3)所示:
Figure PCTCN2022098880-appb-000009
在满足计算误差ε(ε>0)的前提下,展开式的前k+1项将被用于计算ln(x)。正整数k的选取直接关系到能量系数的截断误差,k的取值如公式(4)所示:
Figure PCTCN2022098880-appb-000010
由此可以看到,计算ln(x)的时间消耗转换成计算二项式的时间消耗,所述二项式为:
Figure PCTCN2022098880-appb-000011
假设计算机执行一次加法或减法花费t 1s,执行一次乘法或除法花费t 2s,满足条件t 1<t 2,那么交叉熵的计算复杂度可详述如下:
计算函数ln(x)的时间如公式(5)所示:
T[ln]=(4k+3)t 1+(k 2+3k+3)t 2
计算公式(2)Focal loss函数的时间如公式(6)所示:
T[L fl]=((4k+3+γ)t 1+(k 2+3k+5+γ)t 2)n
因此所述Focal loss函数L fl的计算复杂度为O(k 2n)。
从上述分析可知,所述交叉熵损失函数和所述Focal loss函数均含有log运算单元,计算复杂度较高,拖慢了模型收敛速度。
针对现有技术存在的问题,本发明的实施例提供了一种模型训练方法,
构建乘积型Focal loss函数,使用所述乘积型Focal loss函数对神经网络模型进行模型训练并输出训练好的神经网络模型,以应用于基于人形图像数据集进行的图像检测方法;
所述乘积型Focal loss函数的构建方法包括以下步骤:
设定权重值,所述权重值的表达式为:
Figure PCTCN2022098880-appb-000012
其中,W为所述权重值,m为调整参数,P i是网络模型输出的特征图中第i个像素点的预测概率值,γ为样本损失调整因子,y i是真实样本的有效值,当y i=1,得到的权重值为正样本的权重值,当y i=0,得到的权重值为负样本的权重值;
设定样本比例平衡因子α;
通过W和α构建所述乘积型Focal loss函数。
本发明的一些实施例中,所述乘积型Focal loss函数的表达式如公式(7)所示:
Figure PCTCN2022098880-appb-000013
其中,L fl-new为所述乘积型Focal loss函数,当y i=1,得到的乘积型Focal loss函数为正样本的乘积型Focal loss函数,当y i=0,得到的乘积型Focal loss函数为负样本的乘积型Focal loss函数。通过将log运算单元去掉,使用乘积型的运算单元,减少了算法复杂度,提高了运算速度,而且解决了分类错误的目标个体对损失函数的贡献乘幂级数增大的同时,也考虑到了分类正确的目标个体对损失函数贡献呈现幂级数降低,使得所述乘积型Focal loss函数反应的是特征图整体的判别情况。
本发明的一些实施例中,通过W和α构建所述乘积型Focal loss函数之后,再进行反向传播计算和权重系数调整,以提高模型的泛化能力。即采用反向传播算法进行计算,反向传播算法(Backpropagation algorithm,简称BP算法)是一种监督学习算法,常被用来训练多层感知机,主要由两个环节(激励传播、权重更新)反复循环迭代,直到网络的对输入的响应达到预定的目标范围为止。每次迭代中的激励传播环节包含两步:第一阶段、前向传播阶段,将训练输入送入网络以获得激励响应;第二阶段、反向传播阶段,将激励响应同训练输入对应的目标输出求差,从而获得隐层和输出层的响应误差。权重更新环节主要对于每个突触上的权重,按照以下步骤进行更新: 将输入激励和响应误差相乘,从而获得权重的梯度;将这个梯度乘上一个比例并取反后加到权重上。这个比例将会影响到训练过程的速度和效果,因此称为“训练因子”。梯度的方向指明了误差扩大的方向,因此在更新权重的时候需要对其取反,从而减小权重引起的误差。
所述Focal loss函数和所述乘积型Focal loss函数都同样刻画出了图像分类的损失,但是所述乘积型Focal loss函数在解决了分类错误的目标个体对损失函数的贡献乘幂级数增大的同时,也考虑到了分类正确对损失函数贡献呈现幂级数降低,这样的损失函数反应特征图整体的判别情况,再通过反向传播和权重系数的调整,最终提高模型的泛化能力。而从计算量上看所述乘积型Focal loss函数不包含对数项,计算所述乘积型Focal loss函数的时间消耗如公式(8)所示:
T[L fl-new]=(γ+1)t 1+(γ+3+n)t 2
因此所述乘积型Focal loss函数Lfl-new的计算复杂度为O(n)。
本发明的一些实施例中,所述调整参数的取值范围为0.5-1.2,m大于1.2,会使得m取值过大,在进行连乘运算时超出计算极限,增加算法复杂度,m小于0.5,会使得m取值过小,导致得出的结果没有意义。
本发明的另一些实施例中,所述调整参数m取值为1,所述乘积型Focal loss函数的表达式如公式(9)所示:
Figure PCTCN2022098880-appb-000014
其中,P i是网络模型输出的特征图中第i个像素点的预测概率值,γ为样本损失调整因子,α为样本比例平衡因子,y i是真实样本的有效值,L fl-new为所述乘积型Focal loss函数,当y i=1,得到的乘积型Focal loss函数为正样本的乘积型Focal loss函数,当y i=0,得到的乘积型Focal loss函数为负样本的乘积型Focal loss函数,所述调整参数m取值为1,不会越界,也不会过小,使得容易训练,更容易达到预设目标。
本发明的一些实施例中,所述γ的取值大于0,所述α的取值为0.1-0.9。γ大于0能有效减少了易分类样本的损失,使得乘积型Focal loss函数在计算中更关注于困难的、错分的样本,所述α的取值为0.1-0.9,使得平衡了正负样本本身的比例不均,α取值大于0.9会导致正样本的比例过多,α取值小于0.1会导致负样本的比例过多。
本发明的一些实施例中,所述步骤S100中对人形图像数据集进行标注并分为训练集、验证集和测试集包括:采集现实环境中在摄像头下拍摄的不同环境、不同背景、不同姿态、不同位置的图像形成人形图像数据集,并使用标注工具生成对应目标的标注框位置和标签信息,类别为人的标签信息为human;将标注后的人形图像数据集分为训练集、验证集和测试集;对训练 集图像生成列表并打乱排列顺序;对所有图像中的标签对应的目标框进行聚类,生成12个聚类点。
本发明的一些实施例中,所述步骤S200中对所述训练集、所述验证集和所述测试集进行数据预处理包括采用图像RGB通道进行归一化操作,且对每个通道进行如公式(11)所示的操作:
Figure PCTCN2022098880-appb-000015
X i为R,G,B
R、G、B分别表示红色、绿色、蓝色的通道,RGB色彩模式是工业界的一种颜色标准,是通过对红(R)、绿(G)、蓝(B)三个颜色通道的变化以及它们相互之间的叠加来得到各式各样的颜色的,RGB即是代表红、绿、蓝三个通道的颜色,这个标准几乎包括了人类视力所能感知的所有颜色,是目前运用最广的颜色系统之一。采用图像RGB通道进行归一化操作为本领域的常规技术,在此不再赘述。
本发明的一些实施例中,所述步骤S200中在进行归一化操作后还包括将图像随机进行图像水平翻转、最小包含图像目标区域的图像裁剪、饱和度倍数随机在[1/1.5,1.5]调整、曝光度倍数随机在[1/1.5,1.5]调整、色调倍数随机在[1/1.2,1.2]调整、图像随机在[-30,30]的角度按照中心点旋转;以上所有的随机概率均为50%。
本发明的一些实施例中,所述步骤S300具体包括以下步骤:采用所述乘积型Focal loss函数在所述训练集上对所述神经网络模型进行若干代模型 训练后,将所述验证集输入到所述神经网络模型得到第一模型输出结果,然后使用NMS策略优化所述第一模型输出结果,再根据优化后的所述第一模型输出结果得到训练好的神经网络模型。本发明的一些优选实施例中,使用NMS策略优化所述第一模型输出结果,以此评估所述神经网络模型的性能,并在所述神经网络模型的性能表现不佳时,进行模型优化,最终得到训练好的神经网络模型。所述模型优化包括针对人形网络训练模型进行神经网络结构调整,该调整指按照1.25的倍数扩充卷积核的个数,以及通过添加该场景的图像数据扩充训练集,再进行模型训练,以达到优化模型的目的,最终得到训练好的神经网络模型。所述进行模型优化为本领域的公知常识,在此不再赘述。
本发明的一些实施例中,所述步骤S400中进行达标评估测试是在训练所用测试集上采用摄像头进行场景实测,并和选择的市面上的产品进行对比,参考图1,效果优于所选择的市面上的产品的效果的平均值,判断为评估结果达到预期效果,则进行所述步骤S500将评估结果达到预期效果的神经网络模型部署在芯片上,进行效果输出,否则重复进行所述步骤S100-S400的处理。
图2为本发明的一些实施例中NMS策略的优化方法的流程图;本发明的一些实施例中,参考图2,所述步骤S300和所述步骤S400中,使用NMS策略进行优化的方法包括以下步骤:
S410:提供候选框集合和备用候选框集合;
S420:将所述候选框集合初始化为空集合,对所述备用候选框集合中的所有候选框进行初始化得到若干待处理框;
S430:对所述若干待处理框按照置信度进行排序,选取置信度最高的待处理框为第一待处理框;
S440:对所述第一待处理框与所述若干待处理框中除所述第一待处理框外的其它待处理框进行重合度计算以得到若干重合度值,将所述若干重合度值与预设阈值进行比对得到待删除处理框;
S450:将所述待删除处理框从所述备用候选框集合中删除;
S460:重复所述步骤S430至所述步骤S450的处理,直至所述备用候选框集合为空集合。使得所述候选框集合中得到的待处理框都没有重合度相同的待处理框,从而解决了训练后的模型存在一人两框的问题。
本发明的一些实施例中,所述步骤S430执行完毕后进行步骤S432,所述步骤S432执行完毕后进行所述步骤S440;所述步骤S432包括:将所述第一待处理框从所述备用候选框集合移动到所述候选框集合中,即先将所述第一待处理框从所述备用候选框集合移动到所述候选框集合中,再进行后续的重合度计算和比对处理。
本发明的另一些实施例中,所述步骤S450执行完毕后进行步骤S451,所述步骤S451执行完毕后进行所述步骤S460;所述步骤S451包括:将所述第一待处理框从所述备用候选框集合移动到所述候选框集合中,即先进行重合度计算和比对处理,再将所述第一待处理框从所述备用候选框集合移动到所述候选框集合中。
本发明的一些实施例中,所述步骤S440中,将所述若干重合度值与预设阈值进行比对得到待删除处理框的步骤包括:将重合度值高于所述预设阈值的待处理框选取为所述待删除处理框。待处理框的重合度值高于所述预设 阈值说明其与所述第一待处理框的重合度比较高,因此需要从所述备用候选框集合中删除与所述第一待处理框相比重合度值较大的待处理框,以解决一人两框的问题。
本发明的一些实施例中,所述步骤S430执行完毕后进行步骤S431,所述步骤S431执行完毕后进行所述步骤S440;所述步骤S431包括:根据所述第一待处理框的置信度获得所述预设阈值。使得获得的所述预设阈值能形成更精准的对比依据,由此获得的待删除处理框更精准,无误差。
本发明的一些实施例中,所述预设阈值的选取公式如公式(10)所示:
Figure PCTCN2022098880-appb-000016
其中,S i为预设阈值,S 0为预设初始值,conf为置信度,λ为调节参数。当所述第一待处理框的置信度大于零且小于所述预设初始值,引进调节参数,进行人工干预调节置信度的强度,避免在所述第一待处理框的置信度过低时,只依靠所述第一待处理框的置信度而导致影响所述预设阈值的选取结果,使得得到的所述预设阈值更可靠;所述第一待处理框的置信度大于等于所述预设初始值,说明所述第一待处理框的置信度比较大,本身可信度比较高,此时将所述预设初始值设为所述预设阈值,能相对保留更多置信度较高的相对较大重合度值的待处理框。
本发明的一些实施例中,所述调节参数的取值范围为0.5-0.75。所述调节参数值低于0.5会导致选取的待处理框置信度过低,所述调节参数值的取值大于0.5可以抑制选取过低的置信度的待处理框,所述调节参数值高于0.8 会导致选取的待处理框置信度太高,会出现漏检的情况。
本发明的一些实施例中,所述预设初始值S的取值范围为0.2-0.8,所述预设初始值为人工设定的超参数。所述预设初始值低于0.2会导致选取的待处理框置信度过低,所述预设初始值的取值大于0.2可以抑制选取过低的置信度的待处理框,所述预设初始值高于0.8会导致选取的待处理框置信度太高,会出现漏检的情况。本发明的一些具体实施例中,所述预设初始值选取为0.5。超参数是在开始学习过程之前设置值的参数,而不是通过训练得到的参数数据,通常情况下,需要对超参数进行优化,给学习机选择一组最优超参数,以提高学习的性能和效果,对于超参数的选取和设定为本领域的常规技术,在此不再赘述。
本发明的一些实施例中,所述神经网络模型为YOLOv4,YOLOv3和YOLOv5s中的任意一种。
本发明的一些具体实施例中,采用如公式(9)所示乘积型Focal loss函数在YOLOv5s训练自有数据集,作为训练过程中的分类损失函数部分,α为0.25,γ为2,训练20个epoch,在1W张测试图片能达到85.1%AP,同一平台上训练时间缩短了27%。
图3为本发明的一些实施例中检测装置的结构框图。本发明的一些实施例中,还提供一种检测装置,参考图3,包括:储存器1和处理器2,所述处理器2适于加载并执行软件程序的指令,所述储存器1适于存储软件程序,所述软件程序包括用于执行以下步骤的指令:
构建乘积型Focal loss函数,使用所述乘积型Focal loss函数对神经网络模型进行模型训练并输出训练好的神经网络模型,以应用于基于人形图像数据集进行的图像检测方法;
所述乘积型Focal loss函数的构建方法包括以下步骤:
设定权重值,所述权重值的表达式为:
Figure PCTCN2022098880-appb-000017
其中,W为所述权重值,m为调整参数,P i是网络模型输出的特征图中第i个像素点的预测概率值,γ为样本损失调整因子,y i是真实样本的有效值,当y i=1,得到的权重值为正样本的权重值,当y i=0,得到的权重值为负样本的权重值;
设定样本比例平衡因子α;
通过W和α构建所述乘积型Focal loss函数。
本发明的一些实施例中,所述软件程序还包括用于执行以下步骤的指令:S100:对人形图像数据集进行标注并分为训练集、验证集和测试集;
S200:对所述训练集、所述验证集和所述测试集进行数据预处理;
S300:使用所述的模型训练方法进行模型训练并输出训练好的神经网络模型。
本发明的一些实施例中,所述软件程序执行所述步骤S300的指令之后,还包括用于执行以下步骤的指令:
S400:将所述测试集输入到所述训练好的神经网络模型中得到第二模型输出结果后,采用NMS策略优化所述第二模型输出结果得到最终效果,然后对最终效果进行达标评估测试,并判断得到的评估结果是否达到预期效果;
S500:将评估结果达到预期效果的神经网络模型部署在芯片上,进行效果输出。
本发明的一些实施例中,所述软件程序还包括用于执行所述步骤S300使用NMS策略优化所述模型输出结果和所述步骤S400采用NMS策略优化所述模型输出结果得到最终效果中的所述NMS策略的优化方法的指令:
S410:提供候选框集合和备用候选框集合;
S420:将所述候选框集合初始化为空集合,对所述备用候选框集合中的所有候选框进行初始化得到若干待处理框;
S430:对所述若干待处理框按照置信度进行排序,选取置信度最高的待处理框为第一待处理框;
S440:对所述第一待处理框与所述若干待处理框中除所述第一待处理框外的其它待处理框进行重合度计算以得到若干重合度值,将所述若干重合度值与预设阈值进行比对得到待删除处理框;
S450:将所述待删除处理框从所述备用候选框集合中删除;
S460:重复所述步骤S430到所述步骤S450的处理,直至所述备用候选框集合为空集合。
虽然在上文中详细说明了本发明的实施方式,但是对于本领域的技术人员来说显而易见的是,能够对这些实施方式进行各种修改和变化。但是,应理解,这种修改和变化都属于权利要求书中所述的本发明的范围和精神之内。而且,在此说明的本发明可有其它的实施方式,并且可通过多种方式实施或实现。

Claims (20)

  1. 一种模型训练方法,其特征在于,构建乘积型Focal loss函数,使用所述乘积型Focal loss函数对神经网络模型进行模型训练并输出训练好的神经网络模型,以应用于基于人形图像数据集进行的图像检测方法;
    所述乘积型Focal loss函数的构建方法包括以下步骤:
    设定权重值,所述权重值的表达式为:
    Figure PCTCN2022098880-appb-100001
    其中,W为所述权重值,m为调整参数,P i是网络模型输出的特征图中第i个像素点的预测概率值,γ为样本损失调整因子,y i是真实样本的有效值,当y i=1,得到的权重值为正样本的权重值,当y i=0,得到的权重值为负样本的权重值;
    设定样本比例平衡因子α;
    通过W和α构建所述乘积型Focal loss函数。
  2. 根据权利要求1所述的模型训练方法,其特征在于,所述乘积型Focal loss函数的表达式为:
    Figure PCTCN2022098880-appb-100002
    其中,L fl-new为所述乘积型Focal loss函数,当y i=1,得到的乘积型Focal loss函数为正样本的乘积型Focal loss函数,当y i=0,得到的乘积型Focal loss函数为负样本的乘积型Focal loss函数。
  3. 根据权利要求1所述的模型训练方法,其特征在于,通过W和α构建所述乘积型Focal loss函数之后,再进行反向传播计算和权重系数调整。
  4. 根据权利要求2所述的模型训练方法,其特征在于,所述调整参数的取值范围为0.5-1.2。
  5. 根据权利要求4所述的模型训练方法,其特征在于,所述调整参数m取值为1,所述乘积型Focal loss函数的表达式为:
    Figure PCTCN2022098880-appb-100003
  6. 根据权利要求1所述的模型训练方法,其特征在于,所述γ的取值大于0,所述α的取值为0.1-0.9。
  7. 一种图像检测方法,其特征在于,包括执行以下步骤:
    S100:对人形图像数据集进行标注并分为训练集、验证集和测试集;
    S200:对所述训练集、所述验证集和所述测试集进行数据预处理;
    S300:使用如权利要求1-6任一项所述的模型训练方法进行模型训练并输出训练好的神经网络模型。
  8. 根据权利要求7所述的图像检测方法,其特征在于,所述步骤S300具体包括以下步骤:
    采用所述乘积型Focal loss函数在所述训练集上对所述神经网络模型进行若干代模型训练后,将所述验证集输入到所述神经网络模型得到第一模型输出结果,然后使用NMS策略优化所述第一模型输出结果,再根据优化后的所述第一模型输出结果得到训练好的神经网络模型。
  9. 根据权利要求8所述的图像检测方法,其特征在于,所述步骤S300执行完毕后,还包括执行以下步骤:
    S400:将所述测试集输入到所述训练好的神经网络模型中得到第二模型输出结果后,采用NMS策略优化所述第二模型输出结果得到最终效果,然后对最终效果进行达标评估测试,并判断得到的评估结果是否达到预期效果;
    S500:将评估结果达到预期效果的神经网络模型部署在芯片上,进行效果输出。
  10. 根据权利要求9所述的图像检测方法,其特征在于,所述步骤S300和所述步骤S400中,使用NMS策略进行优化的方法包括以下步骤:
    S410:提供候选框集合和备用候选框集合;
    S420:将所述候选框集合初始化为空集合,对所述备用候选框集合中的所有候选框进行初始化得到若干待处理框;
    S430:对所述若干待处理框按照置信度进行排序,选取置信度最高的待处理框为第一待处理框;
    S440:对所述第一待处理框与所述若干待处理框中除所述第一待处理框外的其它待处理框进行重合度计算以得到若干重合度值,将所述若干重合度值与预设阈值进行比对得到待删除处理框;
    S450:将所述待删除处理框从所述备用候选框集合中删除;
    S460:重复所述步骤S430至所述步骤S450的处理,直至所述备用候选框集合为空集合。
  11. 根据权利要求10所述的图像检测方法,其特征在于,所述步骤S440中,将所述若干重合度值与预设阈值进行比对得到待删除处理框的步骤包括:
    将重合度值高于所述预设阈值的待处理框选取为所述待删除处理框。
  12. 根据权利要求10所述的图像检测方法,其特征在于,所述步骤S430执行完毕后进行步骤S431,所述步骤S431执行完毕后进行所述步骤S440;
    所述步骤S431包括:根据所述第一待处理框的置信度获得所述预设阈值。
  13. 根据权利要求12所述的图像检测方法,其特征在于,所述预设阈值的选取公式为:
    Figure PCTCN2022098880-appb-100004
    其中,S i为预设阈值,S 0为预设初始值,conf为置信度,λ为调节参数。
  14. 根据权利要求13所述的图像检测方法,其特征在于,所述调节参数的取值范围为0.5-0.75。
  15. 根据权利要求13所述的图像检测方法,其特征在于,所述预设初始值的取值范围为0.2-0.8。
  16. 根据权利要求10所述的图像检测方法,其特征在于,所述步骤S430执行完毕后进行步骤S432,所述步骤S432执行完毕后进行所述步骤S440;
    所述步骤S432包括:将所述第一待处理框从所述备用候选框集合移动到所述候选框集合中。
  17. 根据权利要求10所述的图像检测方法,其特征在于,所述步骤S450执行完毕后进行步骤S451,所述步骤S451执行完毕后进行所述步骤S460;
    所述步骤S451包括:将所述第一待处理框从所述备用候选框集合移动到所述候选框集合中。
  18. 一种检测装置,其特征在于,包括:
    处理器,适于加载并执行软件程序的指令;
    储存器,适于存储软件程序,所述软件程序包括用于执行以下步骤的指令:
    构建乘积型Focal loss函数,使用所述乘积型Focal loss函数对神经网络模型进行模型训练并输出训练好的神经网络模型,以应用于基于人形图像数据集进行的图像检测方法;
    所述乘积型Focal loss函数的构建方法包括以下步骤:
    设定权重值,所述权重值的表达式为:
    Figure PCTCN2022098880-appb-100005
    其中,W为所述权重值,m为调整参数,P i是网络模型输出的特征图中第i个像素点的预测概率值,γ为样本损失调整因子,y i是真实样本的有效值,当y i=1,得到的权重值为正样本的权重值,当y i=0,得到的权重值为负样本的权重值;
    设定样本比例平衡因子α;
    通过W和α构建所述乘积型Focal loss函数。
  19. 根据权利要求18所述的检测装置,其特征在于,所述软件程序还包括用于执行以下步骤的指令:
    S100:对人形图像数据集进行标注并分为训练集、验证集和测试集;
    S200:对所述训练集、所述验证集和所述测试集进行数据预处理;
    S300:使用如权利要求1-6任一项所述的模型训练方法进行模型训练并输出训练好的神经网络模型。
  20. 根据权利要求19所述的检测装置,其特征在于,所述软件程序执行所述步骤S300的指令之后,还包括用于执行以下步骤的指令:
    S400:将所述测试集输入到所述训练好的神经网络模型中得到第二模型输出结果后,采用NMS策略优化所述第二模型输出结果得到最终效果,然后对最终效果进行达标评估测试,并判断得到的评估结果是否达到预期效果;
    S500:将评估结果达到预期效果的神经网络模型部署在芯片上,进行效果输出。
PCT/CN2022/098880 2021-06-16 2022-06-15 模型训练方法、图像检测方法及检测装置 WO2022262757A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110663586.X 2021-06-16
CN202110663586.XA CN113111979B (zh) 2021-06-16 2021-06-16 模型训练方法、图像检测方法及检测装置

Publications (1)

Publication Number Publication Date
WO2022262757A1 true WO2022262757A1 (zh) 2022-12-22

Family

ID=76723552

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/098880 WO2022262757A1 (zh) 2021-06-16 2022-06-15 模型训练方法、图像检测方法及检测装置

Country Status (2)

Country Link
CN (1) CN113111979B (zh)
WO (1) WO2022262757A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116863278A (zh) * 2023-08-25 2023-10-10 摩尔线程智能科技(北京)有限责任公司 模型训练方法、图像分类方法、装置、设备及存储介质
CN117633456A (zh) * 2023-11-17 2024-03-01 国网江苏省电力有限公司 基于自适应焦点损失的海上风电天气事件辨识方法和装置
CN117633456B (zh) * 2023-11-17 2024-05-31 国网江苏省电力有限公司 基于自适应焦点损失的海上风电天气事件辨识方法和装置

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113111979B (zh) * 2021-06-16 2021-09-07 上海齐感电子信息科技有限公司 模型训练方法、图像检测方法及检测装置
CN114220016B (zh) * 2022-02-22 2022-06-03 山东融瓴科技集团有限公司 面向开放场景下的无人机航拍图像的域自适应识别方法
CN115827880B (zh) * 2023-02-10 2023-05-16 之江实验室 一种基于情感分类的业务执行方法及装置
CN115965823B (zh) * 2023-02-13 2023-07-25 山东锋士信息技术有限公司 一种基于Focal损失函数的在线困难样本挖掘方法及系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110991652A (zh) * 2019-12-02 2020-04-10 北京迈格威科技有限公司 神经网络模型训练方法、装置及电子设备
CN111967305A (zh) * 2020-07-01 2020-11-20 华南理工大学 一种基于轻量级卷积神经网络的实时多尺度目标检测方法
US20210042580A1 (en) * 2018-10-10 2021-02-11 Tencent Technology (Shenzhen) Company Limited Model training method and apparatus for image recognition, network device, and storage medium
CN112819063A (zh) * 2021-01-28 2021-05-18 南京邮电大学 一种基于改进的Focal损失函数的图像识别方法
CN112861871A (zh) * 2021-02-07 2021-05-28 天津理工大学 一种基于目标边界定位的红外目标检测方法
CN113111979A (zh) * 2021-06-16 2021-07-13 上海齐感电子信息科技有限公司 模型训练方法、图像检测方法及检测装置

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109409252A (zh) * 2018-10-09 2019-03-01 杭州电子科技大学 一种基于改进型ssd网络的车辆行人多目标检测方法
US10810707B2 (en) * 2018-11-29 2020-10-20 Adobe Inc. Depth-of-field blur effects generating techniques
US11010641B2 (en) * 2019-03-14 2021-05-18 Mapbox, Inc. Low power consumption deep neural network for simultaneous object detection and semantic segmentation in images on a mobile computing device
CN112163520B (zh) * 2020-09-29 2022-02-15 广西科技大学 一种基于改进损失函数的mdssd人脸检测方法
CN112560614A (zh) * 2020-12-04 2021-03-26 中国电子科技集团公司第十五研究所 一种基于候选框特征修正的遥感图像目标检测方法及系统
CN112507861A (zh) * 2020-12-04 2021-03-16 江苏科技大学 一种多层卷积特征融合的行人检测方法
CN112580785B (zh) * 2020-12-18 2022-04-05 河北工业大学 基于三支决策的神经网络拓扑结构优化方法
CN112541483B (zh) * 2020-12-25 2024-05-17 深圳市富浩鹏电子有限公司 Yolo和分块-融合策略结合的稠密人脸检测方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210042580A1 (en) * 2018-10-10 2021-02-11 Tencent Technology (Shenzhen) Company Limited Model training method and apparatus for image recognition, network device, and storage medium
CN110991652A (zh) * 2019-12-02 2020-04-10 北京迈格威科技有限公司 神经网络模型训练方法、装置及电子设备
CN111967305A (zh) * 2020-07-01 2020-11-20 华南理工大学 一种基于轻量级卷积神经网络的实时多尺度目标检测方法
CN112819063A (zh) * 2021-01-28 2021-05-18 南京邮电大学 一种基于改进的Focal损失函数的图像识别方法
CN112861871A (zh) * 2021-02-07 2021-05-28 天津理工大学 一种基于目标边界定位的红外目标检测方法
CN113111979A (zh) * 2021-06-16 2021-07-13 上海齐感电子信息科技有限公司 模型训练方法、图像检测方法及检测装置

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116863278A (zh) * 2023-08-25 2023-10-10 摩尔线程智能科技(北京)有限责任公司 模型训练方法、图像分类方法、装置、设备及存储介质
CN116863278B (zh) * 2023-08-25 2024-01-26 摩尔线程智能科技(北京)有限责任公司 模型训练方法、图像分类方法、装置、设备及存储介质
CN117633456A (zh) * 2023-11-17 2024-03-01 国网江苏省电力有限公司 基于自适应焦点损失的海上风电天气事件辨识方法和装置
CN117633456B (zh) * 2023-11-17 2024-05-31 国网江苏省电力有限公司 基于自适应焦点损失的海上风电天气事件辨识方法和装置

Also Published As

Publication number Publication date
CN113111979B (zh) 2021-09-07
CN113111979A (zh) 2021-07-13

Similar Documents

Publication Publication Date Title
WO2022262757A1 (zh) 模型训练方法、图像检测方法及检测装置
CN109949255B (zh) 图像重建方法及设备
WO2021155706A1 (zh) 利用不平衡正负样本对业务预测模型训练的方法及装置
CN112150821B (zh) 轻量化车辆检测模型构建方法、系统及装置
CN111553193A (zh) 一种基于轻量级深层神经网络的视觉slam闭环检测方法
CN109117793B (zh) 基于深度迁移学习的直推式雷达高分辨距离像识别方法
EP4080416A1 (en) Adaptive search method and apparatus for neural network
CN111967480A (zh) 基于权重共享的多尺度自注意力目标检测方法
CN112115973A (zh) 一种基于卷积神经网络图像识别方法
CN113052006B (zh) 一种基于卷积神经网络的图像目标检测方法,系统及可读存储介质
WO2021051987A1 (zh) 神经网络模型训练的方法和装置
CN115661943A (zh) 一种基于轻量级姿态评估网络的跌倒检测方法
WO2024032010A1 (zh) 一种基于迁移学习策略的少样本目标实时检测方法
CN113436174A (zh) 一种人脸质量评估模型的构建方法及应用
CN116385879A (zh) 一种半监督海面目标检测方法、系统、设备及存储介质
CN112883931A (zh) 基于长短期记忆网络的实时真假运动判断方法
CN114333040B (zh) 一种多层级目标检测方法及系统
CN111310827A (zh) 一种基于双阶段卷积模型的目标区域检测方法
CN116486166A (zh) 一种基于边缘计算的输电线异物识别检测方法
CN111860601A (zh) 预测大型真菌种类的方法及装置
CN114724175B (zh) 行人图像的检测网络、检测方法、训练方法、电子设备和介质
CN116824212A (zh) 基于小样本学习的眼底照片分类方法
CN114973350B (zh) 一种源域数据无关的跨域人脸表情识别方法
CN114387484B (zh) 一种基于yolov4改进的口罩佩戴检测方法及系统
CN113537228B (zh) 一种基于深度特征的实时图像语义分割方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22824233

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE