WO2022116720A1

WO2022116720A1 - Target detection method and apparatus, and electronic device

Info

Publication number: WO2022116720A1
Application number: PCT/CN2021/124860
Authority: WO
Inventors: 张辉; 高巍
Original assignee: 歌尔股份有限公司
Priority date: 2020-12-02
Filing date: 2021-10-20
Publication date: 2022-06-09
Also published as: CN112634201A; CN112634201B

Abstract

A target detection method and apparatus, and an electronic device. Said method comprises: inputting a detection image into a first subnetwork of a target detection model, to obtain a first detection result outputted by the first subnetwork (S110); if there is a non-significant target in the first detection result, inputting a corresponding part of the non-significant target in the detection image into a second subnetwork of the target detection model, to obtain a second detection result outputted by the second subnetwork (S120); and determining a final detection result according to the first detection result and the second detection result (S130). In the present invention, high detection accuracy can be achieved, and the second subnetwork is not required to perform secondary detection on all targets, and the efficiency is also very high.

Description

Target detection method, device and electronic device

technical field

The present application relates to the technical field of machine vision, and in particular, to a method, apparatus and electronic device for target detection.

Background of the Invention

In recent years, more and more attention has been paid to the research of machine vision in the world. The machine vision technology based on image processing technology mainly uses computers to simulate people or reproduce some intelligent behaviors related to human vision, and extract information from images of objective things. Processing, and understanding, and finally used for actual detection and control, such as industrial testing, industrial flaw detection, precision measurement and control, automatic production lines and so on. Using machine vision detection method can not only greatly improve production efficiency and production automation, but also machine vision is easy to achieve information integration to meet the requirements of digital and automated production.

However, in industrialized production lines such as offset printing plates, paper, aluminum strips, as well as TFT (Thin Film Transistor, thin film transistor), LCD (Liquid Crystal Display, liquid crystal display) widely used in TVs, computers, mobile phones and other fields, sometimes The product produced will have some low-contrast defects that are not easy to detect.

SUMMARY OF THE INVENTION

Embodiments of the present application provide a target detection method, apparatus, and electronic device, so as to improve the detection level of difficult targets such as screen defects.

The embodiment of the present application adopts the following technical solutions:

In a first aspect, an embodiment of the present application provides a target detection method, including: inputting a detection image into a first sub-network of a target detection model to obtain a first detection result output by the first sub-network; If there is a non-salient target, input the corresponding part of the non-salient target in the detection image into the second sub-network of the target detection model to obtain the second detection result output by the second sub-network; according to the first detection result and the second detection The result determines the final test result.

In a second aspect, an embodiment of the present application further provides a target detection device, the device comprising:

The first detection unit is used to input the detection image into the first sub-network of the target detection model, and obtain the first detection result output by the first sub-network;

The second detection unit is configured to input the corresponding part of the non-salient target in the detection image into the second sub-network of the target detection model if there is a non-salient target in the first detection result, and obtain the first sub-network output by the second sub-network. 2. Test results;

A determination unit, configured to determine the final detection result according to the first detection result and the second detection result.

In a third aspect, embodiments of the present application further provide an electronic device, including: a processor; and a memory arranged to store computer-executable instructions, the executable instructions, when executed, cause the processor to execute the above target detection method.

In a fourth aspect, embodiments of the present application further provide a computer-readable storage medium, where the computer-readable storage medium stores one or more programs, and when the one or more programs are executed by an electronic device including multiple application programs, The device performs the object detection method as above.

The above-mentioned at least one technical solution adopted in the embodiment of the present application can achieve the following beneficial effects: using the first sub-network of the target detection model to detect the target in the detection image, but there may be non-salient targets in these targets, that is, the first sub-network of the target detection model The first detection result output by the sub-network may not be completely accurate. In this case, the image part corresponding to the non-salient target is input into the second sub-network to obtain the second detection result. In this way, the first detection result and the second detection result are combined. Taken together, a more accurate final detection result can be obtained. For example, the target detection model is used in the industry to identify product defects. The defects may be large or small, and it is difficult to obtain accurate detection results through a single network for small defects. The solution of the present application can achieve high detection accuracy, and does not require The second sub-network performs secondary detection on all targets, and the efficiency is also very high.

Brief Description of Drawings

The drawings described herein are used to provide further understanding of the present application and constitute a part of the present application. The schematic embodiments and descriptions of the present application are used to explain the present application and do not constitute an improper limitation of the present application. In the attached image:

1 shows a schematic flowchart of a target detection method according to an embodiment of the present application;

FIG. 2 shows a schematic diagram of a defect detection process in a diaphragm product according to an embodiment of the present application;

3 shows a schematic structural diagram of a target detection apparatus according to an embodiment of the present application;

FIG. 4 is a schematic structural diagram of an electronic device in an embodiment of the present application.

Detailed ways

In order to make the objectives, technical solutions and advantages of the present application clearer, the technical solutions of the present application will be clearly and completely described below with reference to the specific embodiments of the present application and the corresponding drawings. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

The technical idea of the present application is to use two sub-networks to construct a target detection model. The first sub-network detects all the targets as much as possible, while the second sub-network performs secondary detection on the non-salient targets, taking into account the detection accuracy and efficiency.

The technical solutions provided by the embodiments of the present application will be described in detail below with reference to the accompanying drawings.

FIG. 1 shows a schematic flowchart of a target detection method according to an embodiment of the present application. As shown in FIG. 1 , the method includes:

Step S110, the detection image is input into the first sub-network of the target detection model, and the first detection result output by the first sub-network is obtained.

Step S120, if there is a non-salient target in the first detection result, input the corresponding part of the non-salient target in the detection image into the second sub-network of the target detection model to obtain the second detection result output by the second sub-network.

In this application, the target to be detected can be determined according to actual needs, and it has a good detection effect for the target that has urgent needs in the industry, such as screen defects.

Taking screen defects as an example, there are many types of defects such as dead pixels and stains. Some defects are large in area, and some defects are small in area. Smaller defects are generally difficult to detect. In the technical solution of the present application, targets that are difficult to detect can be regarded as non-salient targets, and these targets can usually be detected, but the probability of false detection (such as type judgment error) is high. Therefore, the present application uses the second sub-network to detect non-salient objects again to improve the detection accuracy.

Step S130, determining a final detection result according to the first detection result and the second detection result.

It can be seen that the method shown in Figure 1 uses the first sub-network of the target detection model to detect the target in the detection image, but there may be non-salient targets in these targets, that is to say, the first detection result output by the first sub-network It may not be completely accurate. In this case, the image part corresponding to the non-salient target is input into the second sub-network to obtain the second detection result. In this way, the first detection result and the second detection result can be combined to obtain a more accurate result. the final test result. For example, the target detection model is used in the industry to identify product defects. The defects may be large or small, and it is difficult to obtain accurate detection results through a single network for small defects. The solution of the present application can achieve high detection accuracy, and does not require The second sub-network performs secondary detection on all targets, and the efficiency is also very high.

In some embodiments, the first sub-network is a target detection network, and the first detection result includes the location of the target and the first classification; the second sub-network is a target classification network, and the second detection result includes the second classification of the target; Determining the final detection result by the first detection result and the second detection result includes: replacing the first classification with the second classification of the non-salient object as the final classification of the non-salient object.

For example, a pixel coordinate system is established based on the pixel size of the detected image, and in the first detection result, the position of the target can be represented by pixel coordinates, such as by the pixel coordinates of the four vertices of the rectangular frame, or by the two diagonal corners. The pixel coordinate representation of the vertex. The first classification includes the type of the target, and may specifically include the confidence score of the type.

Generally speaking, when the target can be detected, the position of the target is generally more accurate, but there may be errors in the classification of the target, that is, false detection. Therefore, the second sub-network is used to not re-detect the position to improve the detection efficiency, but only to re-detect the classification, and then replace the first classification with the second classification of the non-salient target, that is, for the non-salient target. , and the final detection result is the position of the target in the first detection result and the classification of the target in the second detection result.

This application gives two examples of how to determine non-salient objects.

For the first example, in some embodiments, the method further includes: calculating the area of each target according to the position of each target in the first detection result; if the area of one target is smaller than the first threshold, the target is not significant Target.

For example, some defects in diaphragm products, characterized by small size, make sorting difficult. Therefore, a first threshold can be set, and if the area of the detected defect is smaller than the first threshold, it is considered that the defect may have the risk of misclassification, and it is regarded as a non-salient target.

For the second example, in some embodiments, an object is a non-salient object if its confidence score for the first classification is less than a second threshold.

When outputting the first detection result, the first sub-network actually predicts the probability that the target belongs to each type, and then outputs the type with the highest probability. This probability can be specifically expressed in the form of a confidence score. Therefore, if the confidence score of the first classification of a target is too low, for example, smaller than the second threshold, it means that there may be other types with similar confidence scores, and there may be misclassification.

In order to reduce the influence of artificial experience on the first threshold and the second threshold, the present application also proposes an example of determining the first threshold and the second threshold according to the training of the target detection model. In some embodiments, the method further includes: in the training phase of the target detection model, determining the area of each target in the first detection result, sorting each target according to the size of the area, The area is used as the first threshold; or, the targets in the first detection result are sorted according to the classification confidence score, and the confidence score at the second sequence position in the confidence score sequence is used as the second threshold.

For example, during training, the targets detected by the first sub-network are sorted according to their area or confidence score, and then the last third of the targets are considered as non-salient targets. Then according to the sorting, the area of two-thirds of the starting point can be used as the first threshold, and the confidence score of two-thirds of the starting point can be used as the second threshold, so that the setting of the threshold is suitable for the model training. Strong stick.

In some embodiments, the object detection network is implemented based on the Mask_RCNN algorithm, and the object classification network is implemented based on the EfficientNet algorithm.

Mask_RCNN integrates the two functions of target detection and instance segmentation, which can achieve the two effects of classifying the target and determining the position of the target in the detection image, and has the characteristics of simple training and significant detection effect.

EfficientNet is a multi-dimensional hybrid model scaling algorithm that combines the three dimensions of network depth, network width, and image resolution, which can take into account both speed and accuracy.

In the embodiments of the present application, the advantages of these two algorithms can be used to implement a target detection network and a target classification network respectively. Of course, in other embodiments, Faster_RCNN or the like may also be selected to implement the target detection network, and Resnet or the like may be used to implement the target classification network.

In some embodiments, the method further includes: counting the number of targets in the first detection result, and assigning the target number to the control variable; if the value of the control variable is not 0, selecting one of the first detection results that has not been selected The target selected this time is judged whether it is a non-salient target; if the target selected this time is a non-salient target, the second step of inputting the corresponding part of the non-salient target in the detection image to the target detection model is performed In the sub-network, the step of obtaining the second detection result output by the second sub-network, and the value of the control variable is decremented by 1; if the value of the control variable is 0, the final detection is determined according to the first detection result and the second detection result. Result steps.

In the following, with reference to Figure 2, an example of target detection with defects in diaphragm products as targets is introduced.

Step 1: Determine the detection image input into the first sub-network. For example, the diaphragm image image can be obtained by taking a photo with the camera, and the image can be scaled to a preset size of 1778×1778 (unit is pixel) (usually the same as the sample image in the training stage), and the scaled image image_resize can be obtained as the detection image.

In step 2, the detection image (eg image_resize) is sent to the first sub-network detect_model for defect detection, and the detection result detect_result is obtained.

Step 3: Count the total number of defective instances in detect_result, instance_num.

If instance_num is equal to 0, it means that the diaphragm product is free of defects; otherwise, traverse each defective instance, and execute step 4 until instance_num is equal to 0, and instance_num is equal to 0, execute step 6, and output the final inspection result.

Step 4: Determine whether the score (confidence score) in the instance is greater than the second threshold and whether the area (area) is greater than the first threshold; if both are greater than the corresponding threshold, it indicates that the defect has been identified, and the judgment of the next defect is continued. ; otherwise, go to step five.

Step 5: For defects whose score or area is less than the corresponding threshold (non-significant defects), they are sent to the second sub-network classify_model for separate classification, and the confidence score of the highest classification in the classification result, that is, the highest probability value in the classification result, and The corresponding category number is assigned to the defect.

Step 6: Output the final detection result. That is, for the final detection defect and classification defect, according to the coordinates of the upper left and lower right corners, the target defect is identified in the image, and the final detection result image_result is output.

In some embodiments, the target detection model is trained in the following manner: inputting the first training image into the first sub-network to obtain a first detection result output by the first sub-network; determining non-salient in the first detection result target, extract the corresponding part of each non-salient target in the first training image as the second training image; input the second training image into the second sub-network to obtain the second detection result output by the second sub-network; The detection result, the second detection result and the labeling information of the first training image determine a training loss value, and the parameters of the first sub-network and the second sub-network are updated according to the training loss value.

For example, obtain a sample image of a diaphragm product, mark the defect range, obtain a detection data set detect_data containing the first training image, input the detect_data data set into the first sub-network detect_model for preliminary target detection, and obtain the detection result detect_result; then , sort the score and area of each type of defect detected from high to low, set the thresholds of the score and area in the last third respectively, and extract the difficulty from detect_result according to the thresholds of score and area. The detected defect data classif__data is used to obtain the second training image; the classify_data data set is input into the second sub-network to obtain the second detection result output by the second sub-network. Then, a preset loss function is used to calculate the training loss value, and a back propagation algorithm is used to update the parameters, and the training of the target detection model is iteratively completed.

Embodiments of the present application further provide a target detection apparatus, which is used to implement the target detection method described in any of the above.

Specifically, FIG. 3 shows a schematic structural diagram of a target detection apparatus according to an embodiment of the present application. As shown in FIG. 3 , the target detection apparatus 300 includes:

The first detection unit 310 is configured to input the detection image into the first sub-network of the target detection model to obtain the first detection result output by the first sub-network.

The second detection unit 320 is configured to input the corresponding part of the non-salient target in the detection image into the second sub-network of the target detection model if there is a non-salient target in the first detection result, and obtain the output of the second sub-network. The second test result.

The determining unit 330 is configured to determine the final detection result according to the first detection result and the second detection result.

In some embodiments, the first sub-network is a target detection network, and the first detection result includes the location of the target and the first classification; the second sub-network is a target classification network, and the second detection result includes the second classification of the target; the determining unit 330, for replacing the first classification with the second classification of the non-salient object as the final classification of the non-salient object.

In some embodiments, the first detection unit 310 is configured to calculate the area of each target according to the position of each target in the first detection result; if the area of one target is smaller than the first threshold, the target is a non-salient target.

In some embodiments, an object is a non-salient object if its confidence score for the first classification is less than a second threshold.

In some embodiments, the apparatus further includes a training unit, configured to determine the area of each target in the first detection result in the training stage of the target detection model, sort each target according to the size of the area, and place the first target in the area sequence. The area at a sequence position is used as the first threshold; or, the targets in the first detection result are sorted according to their classification confidence scores, and the confidence score at the second sequence position in the sequence of confidence scores is used as the second threshold .

In some embodiments, the first detection unit 310 is configured to count the number of targets in the first detection result, and assign the target number to the control variable; if the value of the control variable is not 0, select one of the target numbers in the first detection result. For the selected target, judge whether the selected target is a non-salient target this time; if the target selected this time is a non-salient target, the second detection unit 320 will detect the corresponding part of the non-salient target in the detection image. Input into the second sub-network of the target detection model, obtain the second detection result output by the second sub-network, and subtract 1 from the value of the control variable; if the value of the control variable is 0, make the determination unit 330 according to the first detection The result and the second detection result determine the final detection result.

In some embodiments, the apparatus further includes a training unit, configured to input the first training image into the first sub-network to obtain a first detection result output by the first sub-network; and determine the non-salient in the first detection result target, extract the corresponding part of each non-salient target in the first training image as the second training image; input the second training image into the second sub-network to obtain the second detection result output by the second sub-network; The detection result, the second detection result and the labeling information of the first training image determine a training loss value, and the parameters of the first sub-network and the second sub-network are updated according to the training loss value.

It can be understood that the above-mentioned target detection apparatus can implement each step of the target detection method provided in the foregoing embodiments, and relevant explanations about the target detection method are applicable to the target detection apparatus, and are not repeated here.

FIG. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application. Referring to FIG. 4 , at the hardware level, the electronic device includes a processor, and optionally an internal bus, a network interface, and a memory. The memory may include memory, such as high-speed random-access memory (Random-Access Memory, RAM), or may also include non-volatile memory (non-volatile memory), such as at least one disk memory. Of course, the electronic equipment may also include hardware required for other services.

The processor, network interface and memory can be connected to each other through an internal bus, which can be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus or an EISA (Extended Component Interconnect) bus. Industry Standard Architecture, extended industry standard structure) bus, etc. The bus can be divided into address bus, data bus, control bus and so on. For ease of presentation, only one bidirectional arrow is used in FIG. 4, but it does not mean that there is only one bus or one type of bus.

memory for storing programs. Specifically, the program may include program code, and the program code includes computer operation instructions. The memory may include memory and non-volatile memory and provide instructions and data to the processor.

The processor reads the corresponding computer program from the non-volatile memory into the memory and runs it, forming a target detection device on a logical level. The processor executes the program stored in the memory, and is specifically used to perform the following operations:

The detection image is input into the first sub-network of the target detection model, and the first detection result output by the first sub-network is obtained; if there is a non-salient target in the first detection result, the corresponding part of the non-salient target in the detection image is Input into the second sub-network of the target detection model to obtain the second detection result output by the second sub-network; determine the final detection result according to the first detection result and the second detection result.

In this embodiment, the processor executes other operations that can be performed by the program stored in the memory, and reference is made to the relevant content of the method and device embodiments of the present application, and details are not described herein again.

The above-mentioned method performed by the target detection apparatus disclosed in the embodiment shown in FIG. 1 of the present application may be applied to a processor, or implemented by a processor. A processor may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above-mentioned method can be completed by a hardware integrated logic circuit in a processor or an instruction in the form of software. The above-mentioned processor can be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; it can also be a digital signal processor (Digital Signal Processor, DSP), dedicated integrated Circuit (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The methods, steps, and logic block diagrams disclosed in the embodiments of this application can be implemented or executed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art. The storage medium is located in the memory, and the processor reads the information in the memory, and completes the steps of the above method in combination with its hardware.

The electronic device can also execute the method performed by the target detection apparatus in FIG. 1 , and implement the functions of the target detection apparatus in the embodiment shown in FIG. 1 , and details are not described herein again in this embodiment of the present application.

The embodiments of the present application also provide a computer-readable storage medium, where the computer-readable storage medium stores one or more programs, and the one or more programs include instructions, and the instructions are executed by an electronic device including multiple application programs. , the electronic device can be made to execute the method executed by the target detection apparatus in the embodiment shown in FIG. 1 , and is specifically used to execute:

As will be appreciated by one skilled in the art, the embodiments of the present application may be provided as a method, a system, or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.

These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flows of the flowcharts and/or the block or blocks of the block diagrams.

These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

Memory may include forms of non-persistent memory, random access memory (RAM) and/or non-volatile memory in computer readable media, such as read only memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media includes both persistent and non-permanent, removable and non-removable media, and storage of information may be implemented by any method or technology. Information may be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer-readable media does not include transitory computer-readable media, such as modulated data signals and carrier waves.

The above descriptions are merely examples of the present application, and are not intended to limit the present application. Various modifications and variations of this application are possible for those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included within the scope of the claims of this application.

Claims

A target detection method, comprising:

Input the detection image into the first sub-network of the target detection model, and obtain the first detection result output by the first sub-network;

If there is a non-salient target in the first detection result, input the corresponding part of the non-salient target in the detection image into the second sub-network of the target detection model to obtain the second sub-network output by the second sub-network. Test results;

A final detection result is determined according to the first detection result and the second detection result.
The method of claim 1, wherein the first sub-network is a target detection network, and the first detection result includes a position and a first classification of the target;

The second sub-network is a target classification network, and the second detection result includes the second classification of the target;

The determining of the final detection result according to the first detection result and the second detection result includes: replacing the first classification with a second classification of non-salient objects as the final classification of non-salient objects.
The method of claim 2, further comprising:

Calculate the area of each target according to the position of each target in the first detection result;

If the area of an object is smaller than the first threshold, the object is a non-salient object.
The method of claim 2, wherein if the confidence score of the first classification of an object is less than the second threshold, the object is a non-salient object.
The method of claim 3 or 4, wherein the method further comprises:

In the training phase of the target detection model,

Determine the area of each target in the first detection result, sort each target according to the size of the area, and use the area at the first sequence position in the area sequence as the first threshold;

or,

The targets in the first detection result are sorted according to the classification confidence score, and the confidence score at the second sequence position in the confidence score sequence is used as the second threshold.
The method of claim 2, wherein the target detection network is implemented based on the Mask_RCNN algorithm, and the target classification network is implemented based on the EfficientNet algorithm.
The method of claim 1, further comprising:

Counting the target quantity in the first detection result, and assigning the target quantity to the control variable;

If the value of the control variable is not 0, select a target that has not been selected in the first detection result, and judge whether the target selected this time is a non-significant target; if the target selected this time is a non-significant target target, then perform the step of inputting the corresponding part of the non-salient target in the detection image into the second sub-network of the target detection model to obtain the second detection result output by the second sub-network, and use the control Decrement the value of the variable by 1;

If the value of the control variable is 0, the step of determining the final detection result according to the first detection result and the second detection result is performed.
The method of any one of claims 1-7, wherein the target detection model is trained in the following manner:

inputting the first training image into the first sub-network to obtain the first detection result output by the first sub-network;

Determine the non-salient target in the first detection result, and extract the corresponding part of each non-salient target in the first training image as the second training image;

Inputting the second training image into the second sub-network to obtain a second detection result output by the second sub-network;

The training loss value is determined according to the first detection result, the second detection result and the labeling information of the first training image, and the parameters of the first sub-network and the second sub-network are updated according to the training loss value.
A target detection device, characterized in that the target detection device comprises:

a first detection unit, configured to input the detection image into the first sub-network of the target detection model, and obtain the first detection result output by the first sub-network;

The second detection unit is configured to input the corresponding part of the non-salient target in the detection image into the second sub-network of the target detection model if there is a non-salient target in the first detection result, and obtain the first sub-network output by the second sub-network. 2. Test results;

A determination unit, configured to determine the final detection result according to the first detection result and the second detection result.
The apparatus of claim 9, wherein,

A determination unit for replacing the first classification with the second classification of the non-salient target as the final classification of the non-salient target;

The first detection unit is configured to calculate the area of each target according to the position of each target in the first detection result; if the area of one target is smaller than the first threshold, the target is a non-salient target.
The apparatus of claim 9, wherein the apparatus further comprises:

The training unit is used to determine the area of each target in the first detection result in the training phase of the target detection model, sort each target according to the size of the area, and use the area at the first sequence position in the area sequence as the first threshold; Or, sort each target in the first detection result according to the classification confidence score, and use the confidence score at the second sequence position in the confidence score sequence as the second threshold.
The apparatus of claim 9, wherein,

The first detection unit is used to count the number of targets in the first detection result, and assign the target number to the control variable; if the value of the control variable is not 0, then select an unselected target in the first detection result, and correct the The target selected this time is judged whether it is a non-salient target; if the target selected this time is a non-salient target, the second detection unit is made to input the corresponding part of the non-salient target in the detection image to the second detection model of the target. In the sub-network, the second detection result output by the second sub-network is obtained, and the value of the control variable is decremented by 1; if the value of the control variable is 0, the determination unit is made to determine the final detection according to the first detection result and the second detection result. result.
An electronic device comprising:

processor; and

a memory arranged to store computer-executable instructions which, when executed, cause the processor to perform the following object detection method:

Input the detection image into the first sub-network of the target detection model, and obtain the first detection result output by the first sub-network; if there is a non-salient target in the first detection result, put the non-salient target in the The corresponding part in the detection image is input into the second sub-network of the target detection model, and the second detection result output by the second sub-network is obtained; the final detection result is determined according to the first detection result and the second detection result .