CN113033719A

CN113033719A - Target detection processing method, device, medium and electronic equipment

Info

Publication number: CN113033719A
Application number: CN202110581939.1A
Authority: CN
Inventors: 王威
Original assignee: Zhejiang Zhuoyun Intelligent Technology Co ltd
Current assignee: Zhejiang Zhuoyun Intelligent Technology Co ltd
Priority date: 2021-05-27
Filing date: 2021-05-27
Publication date: 2021-06-25
Anticipated expiration: 2041-05-27
Also published as: CN113033719B

Abstract

The embodiment of the application discloses a target detection processing method, a target detection processing device, a target detection processing medium and electronic equipment. The method comprises the following steps: determining a set of candidate regions of a sample image and loss values of the candidate regions in the set of candidate regions; determining the expansion difficulty number of the sample image according to the loss value of the candidate region and the basic difficulty number of the target detection model; selecting a candidate region with a relatively high expansion difficulty case number of loss values from the candidate region set to obtain an expansion difficulty case set of the sample image, and selecting a candidate region with a basic difficulty case number from the expansion difficulty case set to obtain a basic difficulty case set of the sample image, wherein the basic difficulty case set of the sample image is used for training the target detection model. The scheme can improve the stability of the model.

Description

Target detection processing method, device, medium and electronic equipment

Technical Field

The embodiment of the application relates to the technical field of target detection, in particular to a target detection processing method, a target detection processing device, a target detection processing medium and electronic equipment.

Background

With the development of computer technology and the wide application of computer vision principles, it is increasingly popular to detect targets by using computer image processing technology.

In a conventional target detection algorithm, in order to balance the proportion of positive and negative samples during training, the positive and negative samples are often set to a certain proportion, then the number of the positive and negative samples required is calculated according to the set proportion, and random sampling is performed in a candidate detection area. In order to make the sampled samples more representative, the difficult-case samples are generally excavated in a difficult-case excavation mode, and the model can achieve a better training effect by using the difficult-case samples for training. An OHEM (Online Hard Mining) algorithm is a relatively typical Hard Mining method, and the algorithm is ranked according to loss values of candidate regions during model training, and selects a candidate box with a relatively large loss value for training. However, the OHEM algorithm has the defect of poor model stability.

Disclosure of Invention

The embodiment of the application provides a target detection processing method, a target detection processing device, a target detection processing medium and electronic equipment, and the stability of a model can be improved.

In a first aspect, an embodiment of the present application provides a target detection processing method, including:

determining a set of candidate regions of a sample image and loss values of the candidate regions in the set of candidate regions;

determining the expansion difficulty number of the sample image according to the loss value of the candidate region and the basic difficulty number of the target detection model;

selecting actual difficult example candidate regions with relatively high loss values from the candidate region set to obtain an expanded difficult example set of the sample image, selecting basic difficult example number candidate regions from the expanded difficult example set to obtain a basic difficult example set of the sample image, and using the basic difficult example set of the sample image to train the target detection model.

In a second aspect, an embodiment of the present application provides an object detection processing apparatus, where the apparatus includes:

a loss determination module, configured to determine a candidate region set of a sample image and a loss value of a candidate region in the candidate region set;

the expansion quantity determining module is used for determining the expansion difficult example quantity of the sample image according to the loss value of the candidate region and the basic difficult example quantity of the target detection model;

and the basic difficult case set determining module is used for selecting actual difficult case candidate regions with relatively high loss values from the candidate region set to obtain an expanded difficult case set of the sample image, selecting basic difficult case number candidate regions from the expanded difficult case set to obtain a basic difficult case set of the sample image, and training the target detection model by adopting the basic difficult case set of the sample image.

In a third aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a target detection processing method according to the present application.

In a fourth aspect, an embodiment of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable by the processor, where the processor executes the computer program to implement an object detection processing method according to an embodiment of the present application.

According to the technical scheme provided by the embodiment of the application, the difficult case range in the training process of the target detection model is dynamically adjusted by dynamically determining the expansion difficult case number of the sample images, so that the model training is more stable, and the robustness of the model is improved.

Drawings

Fig. 1 is a flowchart of a target detection processing method according to an embodiment of the present application;

fig. 2 is a schematic diagram of a model network structure of a target detection process according to an embodiment of the present application;

fig. 3 is a flowchart of another target detection processing method provided in the second embodiment of the present application;

fig. 4 is a schematic diagram of model indexes of dynamic difficult case mining compared with original difficult case mining provided in the second embodiment of the present application;

FIG. 5 is a schematic diagram of a model index for dynamically adjusting a sampling frequency of a hard sample compared to a fixed sampling frequency according to a second embodiment of the present application;

fig. 6 is a schematic structural diagram of an object detection processing apparatus according to a third embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be further noted that, for the convenience of description, only some of the structures related to the present application are shown in the drawings, not all of the structures.

Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the steps as a sequential process, many of the steps can be performed in parallel, concurrently or simultaneously. In addition, the order of the steps may be rearranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.

Example one

Fig. 1 is a flowchart of a target detection processing method according to an embodiment of the present application, which is applicable to a dynamic online hard sample mining. The method can be executed by an object detection processing device provided by the embodiment of the application, and the device can be realized by software and/or hardware and can be configured in an electronic device.

As shown in fig. 1, the target detection processing method includes:

s110, determining a candidate region set of the sample image and loss values of candidate regions in the candidate region set.

And S120, determining the expansion difficult case number of the sample image according to the loss value of the candidate region and the basic difficult case number of the target detection model.

S130, selecting actual difficult example candidate regions with relatively high loss values from the candidate region set to obtain an expansion difficult example set of the sample image, selecting basic difficult example number candidate regions from the expansion difficult example set to obtain a basic difficult example set of the sample image, and using the basic difficult example set of the sample image to train the target detection model.

The difficult case refers to some samples with larger loss values in the model training process, and the samples are used for retraining the target detection model so as to improve the learning effect of the target detection model. Fig. 2 is a schematic diagram of a model network structure of a target detection process according to an embodiment of the present application. Taking a target detection model adopting a fast regional convolutional neural Network (fast Regions with CNN features, fast R-CNN) structure as an example, performing feature extraction on a sample image to obtain a feature map, and extracting all candidate Regions in the sample image through a Region generation Network (RPN) to obtain a candidate Region set of the sample image; and mining a basic difficult case set of the sample image from the candidate region set through a dynamic difficult case mining (OHEM) module, and training a target detection model by using the basic difficult case set, for example, performing region of interest pooling (ROI posing) on the basic difficult case, and performing region convolution processing on a pooling result to obtain an output result.

The original candidate region of the sample image constructs a candidate region set, the original candidate region can be obtained by determining a feature map of the sample image and processing the feature map, the loss value of the candidate region can be a classification loss value of the candidate region, and can be determined by adopting the following cross entropy loss value formula:

wherein the content of the first and second substances,

is a probability value of the i-th candidate region,

is the loss value of the ith candidate region.

The basic difficult-case quantity of the target detection model refers to the super-parameter in the basic network of the target detection model, and is used for representing the quantity of candidate regions which need to participate in training actually. The number of expansion difficult cases of the sample image is larger than the number of basic difficult cases, and the expansion processing result of the number of basic difficult cases is referred to.

In the embodiment of the application, each candidate region in the candidate region set may be sorted according to the loss value, and the number of candidate regions with relatively high loss values and difficult expansion examples may be selected to obtain the difficult expansion example set of the sample image. And randomly selecting a basic difficult example number of candidate regions from the expansion difficult example set to obtain a basic difficult example set of the sample image. In the training process of the target detection model, the loss value of noise is generally large, the expansion difficult case number of candidate regions with high loss values in the candidate region set are used as the expansion difficult case set, the basic difficult case set is obtained by randomly selecting the basic difficult case number of candidate regions from the expansion difficult case set, and compared with the method that the basic difficult case set is directly obtained by the basic difficult case number of candidate regions with high loss values in the candidate region set, the noise sample in the expansion difficult case set is not randomly selected to the basic difficult case set every time, so that the sensitivity of the model to the noise can be reduced, the interference of the noise to the model is reduced, and the robustness of the target detection model is improved.

According to the technical scheme provided by the embodiment of the application, the expansion difficult case set of the sample image is determined, the candidate regions with the number of basic difficult cases are selected from the expansion difficult case set to obtain the basic difficult case set, and compared with the method that the candidate regions with the number of basic difficult cases with higher loss values in the candidate region set are obtained into the basic difficult case set, the noise sensitivity can be inhibited, and the stability and the robustness of the target detection model are improved.

In an alternative embodiment, determining the number of expansion hard cases of the sample image according to the loss value of the candidate region and the number of basic hard cases of the target detection model includes: determining an average of a first number of relatively high loss values in the set of candidate regions; determining a coefficient of expansion difficulty by using the average value; and determining the expansion difficulty number of the sample image according to the expansion difficulty coefficient and the basic difficulty number.

Wherein the first number may be an empirical value and may be set manually. The first number may be determined by an initial loss distribution of the target detection model, and the first number is required to satisfy that the expansion difficulty number is smaller than a difficulty number threshold, for example, the difficulty number threshold is 1000, and the first number may be 100 or 200.

Specifically, a first number of relatively high loss values in the candidate region set may be obtained, and an average value of the obtained loss values may be determined; determining a dilation difficulty factor of the sample image according to the following formula:

；

wherein the content of the first and second substances,

in order to expand the difficult-to-scale coefficient,

is the average of the loss values; since the average value of the loss values is always larger than 0, the expansion difficulty coefficient means larger than 1. The product of the expansion hard case coefficient and the number of base hard cases is used as the number of expansion hard cases of the sample image.

The average value of the initial loss value of the model training is large, so that the expansion difficult case coefficient is large, and the expansion difficult case number is close to the total candidate area number of the candidate area set, so that a basic difficult case set is selected from the expansion difficult case set and is similar to the selection of the basic difficult case set from the candidate area set; in the later stage of training, the number of expansion difficult cases is closer to the number of basic difficult cases, so that the probability that the candidate regions in the expansion difficult case set are selected to the difficult case basic set is higher, the candidate regions in the expansion difficult case set can be called expansion difficult cases, and the candidate regions in the difficult case basic set can be called basic difficult cases. That is, the difficult case range is dynamically adjusted, so that the range of the model training initial expansion difficult case set is larger, and the probability of randomly selecting the basic difficult case is smaller; in the later stage of model training, the range of the expansion difficult case set is smaller, the probability of randomly selecting the basic difficult case is higher, and the stability of the model training can be improved. Therefore, the probability of the expansion difficult cases to be selected as the basic difficult cases can be gradually improved from the initial stage to the later stage of the model training, so that the learning effect of the model is improved, namely the stability and the robustness of the model are improved.

Example two

Fig. 3 is a flowchart of another target detection processing method according to the second embodiment of the present application. The present embodiment is further optimized on the basis of the above-described embodiments. As shown in fig. 3, the target detection processing method includes:

s210, determining a candidate region set of the sample image and loss values of candidate regions in the candidate region set.

S220, determining the expansion difficult case number of the sample image according to the loss value of the candidate region and the basic difficult case number of the target detection model.

S230, selecting a candidate region with a relatively high expansion difficulty example number of loss values from the candidate region set to obtain an expansion difficulty example set of the sample image, and selecting a candidate region with a basic difficulty example number from the expansion difficulty example set to obtain a basic difficulty example set of the sample image.

S240, determining a difficult-to-sample loss threshold value of the target detection model in the current iteration process according to the loss value of the target detection model in the historical iteration process.

And S250, determining the repeated sampling times of the candidate region in the basic difficult case set according to the difficult case loss threshold in the iteration process of the round and the loss value of the candidate region in the basic difficult case set.

And S260, adding the candidate region subjected to repeated sampling into the basic difficult case set according to the repeated sampling times of the candidate region to obtain a new basic difficult case set, and performing the iterative training of the target detection model by adopting the new basic difficult case set.

The basic difficult case set comprises a number of candidate regions of basic difficult cases, the candidate regions in the basic difficult case set can be called basic difficult cases, the expansion difficult case set comprises a number of candidate regions of expansion difficult cases, and the candidate regions in the expansion difficult case set can be called expansion difficult cases.

The historical iteration process is located before the current iteration process, and the loss value of the target detection model in the certain iteration process is the integral loss value of the target detection model in the current iteration process. The number of sample images input in a single iteration process is not specifically limited in the embodiment of the present application. The repeated sampling times of the candidate region in the basic difficult case set are determined according to the difficult case loss threshold value in the iteration process of the current round and the loss value of the candidate region. Therefore, the hard case loss threshold values of different iteration processes can be different, and the repeated sampling times of different candidate areas in the same iteration process can be different. By adopting different repeated sampling times for different basic difficult cases in the basic difficult case set, namely distinguishing the difficult cases with different difficulty coefficients, compared with the same repeated sampling times, the learning capability of the model can be further improved.

In an optional implementation manner, determining a difficult-to-sample loss threshold of the target detection model in the current iteration process according to a loss value of the target detection model in the historical iteration process includes: and taking the loss average value between the current loss value of the target detection model in the current iteration process and the historical loss value of the target detection model in the previous second numerical value historical iteration process as a difficult case loss threshold value of the target detection model in the current iteration process.

Specifically, the hard case loss threshold of the target detection model in the iteration process of the round can be determined through the following formula:

wherein thr is a difficult case loss threshold value in the iteration process of the round, the iteration process is the c-th iteration process, t is the iteration cycle length of the difficult case loss threshold value, namely the difficult case loss threshold value is updated every t iterations, and the experience parameters can be manually set,

is the loss value of the ith iteration process. By taking the loss mean of t iterations as hardAnd the case loss threshold value enables the difficult case loss threshold value to be positively correlated with the model maturity, namely, the difficulty loss threshold value is increased as the model maturity is increased.

In an optional implementation manner, determining, according to a hard case loss threshold in the iteration process of the current round and a loss value of a candidate region in the base hard case set, the number of times of resampling of the candidate region in the base hard case set includes: obtaining the repeated sampling times of the candidate regions in the basic difficult case set by the following formula: s = ⌊ loss/thr ⌋;

wherein s is the repeated sampling times, ⌊ ⌋ is a down-rounded operator, loss is the loss value of the candidate region in the basic difficult case set, and thr is the difficult case loss threshold in the iteration process of the round.

Specifically, for each candidate region in the basic difficulty case set, a ratio between a loss value of the candidate region and a difficulty case loss threshold may be determined, and the ratio is rounded down to obtain the number of times of resampling of the candidate region, where a maximum value of the number of times of resampling is an empirical value, and may be, for example, 3. If the loss/thr of any candidate area is less than 1, namely the repeated sampling times s of the candidate area is 0, the candidate area is not subjected to repeated sampling; and if the repeated sampling times s of the candidate region is not 0, performing repeated sampling on the candidate region s times, so that the new basic difficult case set comprises s +1 candidate regions. Because the repeated sampling times of the candidate regions are positively correlated with the loss values of the candidate regions, the repeated sampling times of the candidate regions with high difficulty are also high, the learning of the candidate regions with high difficulty can be enhanced, and the learning effect of the model is improved.

And after a new basic difficult case set is obtained, the candidate regions in the new basic difficult case set can be sent to the region-of-interest pooling layer, and then further classification and regression are carried out through the RCNN to obtain an output result.

Fig. 4 is a schematic diagram of model indexes of dynamic difficult case mining compared with original difficult case mining provided in the second embodiment of the present application, and referring to fig. 4, compared with original OHEM, indexes of maps (Mean Average Precision) and AP50 (the ratio of the intersection and union of the areas of two rectangular boxes is 50%) are increased, and a false positive rate (FP) is reduced, so that model training is more stable and robust. Fig. 5 is a schematic diagram of a model index of a comparison between a dynamically adjusted sampling frequency of a difficult sample and a fixed sampling frequency, referring to fig. 5, the model index of the dynamically adjusted sampling frequency of the difficult sample is compared with the fixed sampling frequency, AP50 remains stable, the mAP increases by about 1%, the false positive rate is reduced, and the model training effect is better.

According to the embodiment of the application, the number of the difficult cases is dynamically expanded, the range of the difficult cases and the difficult case loss threshold are dynamically adjusted, and the repeated sampling times of the difficult case samples are controlled, so that the influence of noise on the model is reduced, the model training is more stable, the robustness of the model is improved, and the learning of the model to the more difficult cases is enhanced.

EXAMPLE III

Fig. 6 is a schematic structural diagram of a target detection processing apparatus according to a third embodiment of the present application, which is applicable to a case of dynamically performing online hard sample mining. The apparatus may be implemented by software and/or hardware, and may be configured in an electronic device. As shown in fig. 6, the apparatus may include:

a loss determining module 301, configured to determine a candidate region set of a sample image and a loss value of a candidate region in the candidate region set;

an expansion quantity determining module 302, configured to determine an expansion difficult-case quantity of the sample image according to the loss value of the candidate region and a basic difficult-case quantity of a target detection model;

a basic difficult case set determining module 303, configured to select a number of candidate regions with relatively high loss values from the candidate region set to obtain an expanded difficult case set of the sample image, and select a number of candidate regions with a basic difficult case number from the expanded difficult case set to obtain a basic difficult case set of the sample image, and configured to train the target detection model using the basic difficult case set of the sample image.

In an alternative embodiment, the dilation number determination module 302 comprises:

an averaging unit for determining an average of a first number of relatively high loss values in the set of candidate regions;

the expansion coefficient unit is used for determining an expansion difficult example coefficient by adopting the average value;

and the expansion number unit is used for determining the expansion difficult example number of the sample image according to the expansion difficult example coefficient and the basic difficult example number.

In an alternative embodiment, the apparatus further comprises an oversampling module comprising:

the difficult case loss threshold unit is used for determining a difficult case loss threshold of the target detection model in the current iteration process according to the loss value of the target detection model in the historical iteration process;

the repeated frequency unit is used for determining repeated sampling frequency of the candidate region in the basic difficult case set according to a difficult case loss threshold value in the iteration process of the round and the loss value of the candidate region in the basic difficult case set;

and the repeated sampling unit is used for adding the repeatedly sampled candidate region into the basic difficult case set according to the repeated sampling times of the candidate region to obtain a new basic difficult case set, and is used for performing the iterative training of the target detection model by adopting the new basic difficult case set.

In an alternative embodiment, the hard case loss threshold unit is specifically configured to:

and taking the loss average value between the current loss value of the target detection model in the current iteration process and the historical loss value of the target detection model in the previous second numerical value historical iteration process as a difficult case loss threshold value of the target detection model in the current iteration process.

In an alternative embodiment, the repetition number unit is specifically configured to:

obtaining the repeated sampling times of the candidate regions in the basic difficult case set by the following formula:

s=⌊loss/thr⌋

The target detection processing device provided by the embodiment of the invention can execute the target detection processing method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of executing the target detection processing method.

Example four

A fourth embodiment of the present application further provides a storage medium containing computer-executable instructions, which when executed by a computer processor, perform a method for object detection processing, the method including:

selecting a candidate region with a relatively high expansion difficulty case number of loss values from the candidate region set to obtain an expansion difficulty case set of the sample image, and selecting a candidate region with a basic difficulty case number from the expansion difficulty case set to obtain a basic difficulty case set of the sample image, wherein the basic difficulty case set of the sample image is used for training the target detection model.

Storage media refers to any of various types of memory electronics or storage electronics. The term "storage medium" is intended to include: mounting media such as CD-ROM, floppy disk, or tape devices; computer system memory or random access memory such as DRAM, DDR RAM, SRAM, EDO RAM, Lanbas (Rambus) RAM, etc.; non-volatile memory such as flash memory, magnetic media (e.g., hard disk or optical storage); registers or other similar types of memory elements, etc. The storage medium may also include other types of memory or combinations thereof. In addition, the storage medium may be located in the computer system in which the program is executed, or may be located in a different second computer system connected to the computer system through a network (such as the internet). The second computer system may provide the program instructions to the computer for execution. The term "storage medium" may include two or more storage media that may reside in different unknowns (e.g., in different computer systems connected by a network). The storage medium may store program instructions (e.g., embodied as a computer program) that are executable by one or more processors.

Of course, the storage medium provided in the embodiments of the present application and containing computer-executable instructions is not limited to the above-described object detection processing operation, and may also perform related operations in an object detection processing method provided in any embodiment of the present application.

EXAMPLE five

An embodiment of the present invention provides an electronic device, where the target detection processing apparatus provided in the embodiment of the present invention may be integrated into the electronic device, and the electronic device may be configured in a system, or may be a device that performs part or all of functions in the system. Fig. 7 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present application. As shown in fig. 7, the present embodiment provides an electronic device 400, which includes: one or more processors 420; the storage device 410 is configured to store one or more programs, and when the one or more programs are executed by the one or more processors 420, the one or more processors 420 implement a target detection processing method provided in an embodiment of the present application, the method includes:

Of course, those skilled in the art can understand that the processor 420 also implements a technical solution of a target detection processing method provided in any embodiment of the present application.

The electronic device 400 shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 7, the electronic device 400 includes a processor 420, a storage device 410, an input device 430, and an output device 440; the number of the processors 420 in the electronic device may be one or more, and one processor 420 is taken as an example in fig. 7; the processor 420, the storage device 410, the input device 430, and the output device 440 in the electronic apparatus may be connected by a bus or other means, and are exemplified by a bus 450 in fig. 7.

The storage device 410 is a computer-readable storage medium, and can be used for storing software programs, computer-executable programs, and module units, such as program instructions corresponding to an object detection processing method in the embodiment of the present application.

The storage device 410 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the storage 410 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, storage 410 may further include memory located remotely from processor 420, which may be connected via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input means 430 may be used to receive input numbers, character information, or voice information, and to generate key signal inputs related to user settings and function control of the electronic device. The output device 440 may include a display screen, speakers, or other electronic equipment.

The object detection processing device, the medium and the electronic device provided in the above embodiments may execute an object detection processing method provided in any embodiment of the present application, and have corresponding functional modules and beneficial effects for executing the method. For technical details that are not described in detail in the above embodiments, reference may be made to a target detection processing method provided in any embodiments of the present application.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present application and the technical principles employed. It will be understood by those skilled in the art that the present application is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the application. Therefore, although the present application has been described in more detail with reference to the above embodiments, the present application is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present application, and the scope of the present application is determined by the scope of the appended claims.

Claims

1. An object detection processing method, characterized by comprising:

2. The method according to claim 1, wherein determining the number of expansion difficult cases of the sample image according to the loss value of the candidate region and the number of basic difficult cases of the target detection model comprises:

determining an average of a first number of relatively high loss values in the set of candidate regions;

determining a coefficient of expansion difficulty by using the average value;

and determining the expansion difficulty number of the sample image according to the expansion difficulty coefficient and the basic difficulty number.

3. The method according to claim 1, wherein after selecting a base difficult case number of candidate regions from the expanded difficult case set to obtain a base difficult case set of the sample image, the method further comprises:

determining a difficult-to-sample loss threshold value of the target detection model in the current iteration process according to the loss value of the target detection model in the historical iteration process;

determining the repeated sampling times of the candidate regions in the basic difficult case set according to the difficult case loss threshold in the iteration process of the round and the loss value of the candidate regions in the basic difficult case set;

and adding the candidate region subjected to repeated sampling into the basic difficult case set according to the repeated sampling times of the candidate region to obtain a new basic difficult case set, and performing the iterative training of the target detection model by adopting the new basic difficult case set.

4. The method of claim 3, wherein determining a hard case loss threshold of the target detection model in the current iteration according to a loss value of the target detection model in the historical iteration comprises:

5. The method of claim 3, wherein determining the number of times of resampling for the candidate regions in the base hard case set according to the hard case loss threshold in the current iteration and the loss value of the candidate regions in the base hard case set comprises:

s=⌊loss/thr⌋

6. An object detection processing apparatus, characterized by comprising:

and the basic difficult example set determining module is used for selecting a plurality of candidate areas with relatively high loss values in the expansion difficult example number from the candidate area set to obtain an expansion difficult example set of the sample image, selecting a plurality of candidate areas with a basic difficult example number from the expansion difficult example set to obtain a basic difficult example set of the sample image, and training the target detection model by adopting the basic difficult example set of the sample image.

7. The apparatus of claim 6, wherein the expansion amount determination module comprises:

8. The apparatus of claim 6, further comprising an oversampling module comprising:

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-5.

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1-5 when executing the computer program.