WO2021212482A1

WO2021212482A1 - Method and apparatus for mining difficult case during target detection

Info

Publication number: WO2021212482A1
Application number: PCT/CN2020/086742
Authority: WO
Inventors: 晋周南; 孙叠; 刘新春
Original assignee: 华为技术有限公司
Priority date: 2020-04-24
Filing date: 2020-04-24
Publication date: 2021-10-28
Also published as: CN112639872B; CN112639872A

Abstract

Disclosed are a method and apparatus for mining a difficult case during target detection, which relate to the fields of artificial intelligence and intelligence automobiles. By means of the present application, the mining of a difficult case during target detection can be realized more effectively, and difficult case data can be extracted more accurately. The method can comprise: analyzing, by using a preset target detection algorithm, an image to be subjected to detection, so as to obtain a detection result of said image; analyzing said image by using a preset single-target tracking algorithm, so as to obtain a tracking result of said image; acquiring a determination result of each object in said image according to the detection result, the tracking result, and a preset rule, wherein the determination result comprises successful matching, missed detection, false detection, new occurrence, and an end; and determining an object, the determination result of which indicates missed detection, as a missed-detection difficult case, and determining an object, the determination result of which indicates false detection, as a false-detection difficult case.

Description

Method and device for mining difficult cases in target detection

Technical field

This application relates to the field of artificial intelligence, and in particular to a method and device for mining difficult cases in target detection.

Background technique

At this stage, unmanned driving, smart medical, smart cities, etc. are widely concerned, which need to use data-driven deep learning target detection algorithms to analyze image information.

The data-driven deep learning target detection algorithm is a method to obtain information such as location information and category information of objects such as vehicles and people by analyzing the acquired images. The target detection algorithm usually includes two stages of training and inference. Among them, the training phase is the process of learning data by the algorithm, and the inference phase is the phase in which the output of the algorithm in the training phase is used to analyze the information contained in the image. The results of the analysis at the inference stage may be correct or incorrect. Errors in the analysis result include two cases: missed detection and false detection. Objects that have missed detections are rare cases of missed detections, and objects that have missed detections are rare cases of misdetection. Exemplarily, (a), (b) and (c) of FIG. 1 are respectively the detection results of detecting traffic signs in a frame of road image. The road image includes a traffic sign 101, a traffic sign 102, and a vehicle 103, and a car sticker 104 is affixed to the vehicle 103. In Figure 1(a), the detection frame 1 is correctly labeled with a traffic sign 101, and the detection frame 2 is correctly labeled with a traffic sign 102; in Figure 1(b), the detection frame 1 is correctly labeled with a traffic sign 101, but it is not correctly detected Traffic sign 102, that is, a missed detection occurs. The traffic sign 102 in this image is a difficult example of missed detection; in Figure 1(c), the detection frame 1 is correctly marked with the traffic sign 101, and the detection frame 2 is correctly marked with the traffic sign 102. The detection frame 3 is marked with the car sticker 104, that is, the car sticker 104 is mistakenly detected as a traffic sign, and a misdetection occurs. The car sticker 104 in the image is a rare example of misdetection. The hard cases of missed detection and the hard cases of false detection constitute hard case data. Using hard-case data to join the training phase is an effective way to reduce the probability of error in the analysis results.

Difficult case mining is a method of extracting difficult case data sets. Its purpose is to extract difficult cases of missed detection and difficult cases of false detection. Commonly used methods for mining hard cases are divided into two categories, one is the method for mining supervised hard cases, and the other is the method for mining unsupervised hard cases. The supervised and difficult mining method requires a large amount of label data, while data labeling requires a lot of manpower; especially in the case of large-scale data, the cost is very high. In the method of unsupervised hard case mining, how to realize effective hard case mining and extract the missed hard cases and misdetected hard cases more accurately and efficiently is a problem that needs to be solved.

Summary of the invention

The embodiments of the present application provide a method and device for mining difficult cases in target detection, which can implement difficult case mining more effectively and extract difficult case data more accurately.

In order to achieve the foregoing objectives, the following technical solutions are adopted in the embodiments of the present application:

In the first aspect, this application provides a method and device for mining difficult cases in target detection.

In a possible design, the method may include: obtaining the image to be detected; using a preset target detection algorithm to analyze the image to be detected to obtain the detection result of the image to be detected; using a preset single target tracking algorithm to detect the image Perform analysis to obtain the tracking result of the image to be detected; according to the detection result, tracking result and preset rules, obtain the discrimination result of each object in the image to be detected. The discrimination result includes: successful matching, missed detection, false detection, new appearance and End; the object whose judgment result is a missed detection is determined as a rare case of missed detection, and the object whose judgment result is a false detection is determined as a rare case of misdetected. Wherein, the detection result includes: the category of one or more detection objects, the detection position of one or more detection objects, the classification accuracy value of one or more detection objects; the tracking result includes: the tracking position of one or more tracking objects, The tracking confidence of one or more tracking objects; the discrimination results include: successful matching, missed detection, false detection, new appearance and end.

In this method, the single target tracking algorithm and target detection algorithm are combined to mine the difficult cases in the target detection, and the tracking results of the single target tracking algorithm are applied to the difficult case mining in the target detection, and at the same time, the missed rare cases and false detections are distinguished. Difficult cases can be extracted more accurately from missed cases and misdetected cases, and difficult cases can be discovered more effectively.

In a possible design, the method includes: obtaining the association result of the first object in the image to be detected according to the detection result, the tracking result and the preset association rules; Associate the result, make a preliminary judgment of the first object in the image to be detected, and obtain the preliminary judgment result of the first object in the image to be detected; determine the judgment result of the first object in the image to be detected according to the preliminary judgment result of the first object in the image to be detected ; Among them, the association results include: successful matching, the first association result, and the second association result; the preliminary discrimination results include: successful matching, missed detection, false detection, possible missed detection, possible false detection, end; the first association result is detection The first object exists in the result, but the first object does not exist in the tracking result; the second association result is that the first object does not exist in the detection result and the first object exists in the tracking result. Since the tracking result is used for target detection, the location information of the detected target can be effectively predicted, and the accuracy of difficult case mining is improved.

In a possible design, according to the detection result, the result and the preset association rules, obtaining the association result of the first object in the image to be detected includes: selecting an object of the same category in the detection result as the tracking result to track The object is a row, the detection object is a column, or the detection object is a row, the tracking object is a column, and a matrix is constructed; using an association matching algorithm, the matrix is used to obtain the association result of the first object in the image to be detected.

In this way, the association matching algorithm is used to match the detection result and the tracking result, effectively ensuring that each detection object in the detection result has at most one tracking object corresponding to it in the tracking result; improving the accuracy of difficult case mining.

In a possible design, the preliminary discrimination rules include: for the first object whose association result is a successful match, the preliminary discrimination result is a successful match; for the first object whose association result is the first association result, the preliminary discrimination result is a possible error Check; for the first object whose association result is the second association result, and the tracking confidence of the tracking object is greater than or equal to the first confidence threshold, the preliminary judgment result is a missed detection; for the association result is the second association result, and the tracking object For the first object whose tracking confidence is less than or equal to the second confidence threshold, the preliminary judgment result is the end; for the correlation result is the second correlation result, and the tracking confidence of the tracking object is less than the first confidence threshold and greater than the second confidence threshold. For one subject, the preliminary judgment result is a possible missed detection; wherein, the second confidence threshold is less than the first confidence threshold.

In a possible design, determining the discrimination result of the first object in the image to be detected according to the preliminary discrimination result of the first object in the image to be detected includes: for the first object whose preliminary discrimination result is a successful match, determining that the discrimination result is a match Success; for the first object whose preliminary judgment result is a missed detection, the judgment result is determined to be a missed detection; for the first object whose preliminary judgment result is over, the judgment result is determined to be over; for the preliminary judgment result, it may be misdetected or may be missed The first object of is combined with the discrimination result of the S frame images adjacent to the image to be detected to determine the discrimination result; where S>1.

For objects that may be misdetected or missed, the judgment results of the adjacent S-frame images are combined to determine the judgment result, which can effectively reduce the probability of misjudgment of the difficult cases of false detection and the difficult cases of missed detection.

In a possible design, for the first object that may be missed as a result of the preliminary judgment, judge according to the order of the first frame image to the Sth frame image adjacent to the image to be detected, if the image adjacent to the image to be detected In at least one frame of images from the 1st frame to the Sth frame of images, the association result of the object corresponding to the first object that may be missed as a result of the preliminary discrimination is the second association result, and the association result with the preliminary discrimination result is that the first object may be missed The tracking confidence of the object corresponding to the first object is less than or equal to the second confidence threshold, and it is determined that the preliminary judgment result is that the judgment result of the first object that may be missed is the end; if the first frame image adjacent to the image to be detected reaches In each frame of the S-th image, the association result of the object corresponding to the first object that may be missed by the preliminary judgment result is the second association result, and corresponds to the first object that may be missed by the preliminary judgment result The tracking confidence of the object is less than the first confidence threshold and greater than the second confidence threshold, and it is determined that the preliminary judgment result is a possible missed detection result of the first object; otherwise, the preliminary judgment result is determined to be a possible missed detection result The judgment result of the first object is a missed inspection.

In a possible design, for the preliminary judgment result as the first object that may be misdetected, if each frame of image from the first frame image to the Sth frame image adjacent to the image to be detected, the preliminary judgment result is The association results of the objects corresponding to the first object that may be falsely detected are all successful matches, and the tracking confidence of the objects corresponding to the first object that may be falsely detected is greater than the first confidence threshold, and the preliminary determination result is determined The judgment result of the first object that is likely to be misdetected is new; otherwise, it is determined that the preliminary judgment result is the judgment result of the first object that may be misdetected as a misdetection.

In a possible design, if the number of frames of the remaining undetected images to be detected is greater than or equal to the preset frame number S, the S frame of images adjacent to the image to be detected has a frame number greater than the frame number of the image to be detected If the number of frames of the remaining undetected images to be detected is less than the preset number of frames S, the S frame images adjacent to the image to be detected are S frame images with a frame number smaller than the frame number of the image to be detected. In this way, when the number of remaining undetected image frames is insufficient, the information of S frames of images before the image to be detected can be used to assist the determination of the current image to be detected, and the problem of insufficient remaining frames can be effectively solved. In a possible design, if the number of remaining undetected image frames to be detected is less than the preset frame number S, the reverse order judgment can be adopted from back to front.

In a possible design, the preset single target tracking algorithm is a single target tracking algorithm based on deep learning. Using a single-target tracking algorithm based on deep learning and analyzing images based on deep semantic features can effectively predict the location information of the matching result and improve the accuracy of extracting rare cases of missed detection and difficult false detection.

Correspondingly, the present application also provides a device for mining difficult cases in target detection, which can implement the method for mining difficult cases in target detection described in the first aspect. The device can implement the above method by software, hardware, or by hardware executing corresponding software.

In a possible design, the device may include: an image acquisition unit, a target detection unit, a target tracking unit, and a difficult case mining unit. Among them, the image acquisition unit is used to obtain the image to be detected; the target detection unit is used to analyze the image to be detected using a preset target detection algorithm to obtain the detection result of the image to be detected; the target tracking unit is used to use the preset single target tracking The algorithm analyzes the image to be detected to obtain the tracking result of the image to be detected; the detection result includes: the category of one or more detection objects, the detection position of one or more detection objects, and the classification accuracy value of one or more detection objects ; The tracking results include: the tracking position of one or more tracking objects, the tracking confidence of one or more tracking objects; the difficult case mining unit is used to obtain each of the images to be detected according to the detection results, the tracking results and the preset rules The discrimination result of the object; the discrimination result includes: matching success, missed detection, misdetection, new appearance and end; the difficult case mining unit is also used to determine the object whose judgment result is a missed detection as a difficult case for missed detection, and the judgment result is wrong The object of inspection is determined to be a rare case of false inspection.

In a possible design, the difficult case mining unit is specifically used to: obtain the association result of the first object in the image to be detected according to the detection result, the tracking result and the preset association rules; The correlation result of the first object, the first object in the image to be detected is preliminarily discriminated, and the preliminary discrimination result of the first object in the image to be detected is obtained; the first object in the image to be detected is determined according to the preliminary discrimination result of the first object in the image to be detected Discrimination result of the object; among them, the association result includes: the matching is successful, the first association result, the second association result; the first association result is that the first object exists in the detection result, and the first object does not exist in the tracking result; the second association result Because the first object does not exist in the detection result, the first object exists in the tracking result; the preliminary judgment results include: successful matching, missed detection, false detection, possible missed detection, possible false detection, and end.

In a possible design, the difficult case mining unit obtains the association result of the first object in the image to be detected according to the detection result, the tracking result and the preset association rules, which specifically includes: selecting the detection result and the tracking result in the same category For objects, a matrix is constructed by taking the tracked object as the row and the detected object as the column or the detected object as the row and the tracked object as the column; using the association matching algorithm, the matrix is used to obtain the association result of the first object in the image to be detected.

In a possible design, the difficult case mining unit determines the discrimination result of the first object in the image to be detected according to the preliminary discrimination result of the first object in the image to be detected, which specifically includes: for the first object whose preliminary discrimination result is a successful match, Determine that the discrimination result is a successful match; for the first object whose preliminary discrimination result is missed, the discrimination result is determined to be missed; for the first object whose preliminary discrimination result is over, the discrimination result is determined to be over; for the initial discrimination result, it may be wrong The first object that is detected or may be missed is determined by combining the discrimination results of the S frame images adjacent to the image to be detected; where S>1.

In a possible design, the difficult case mining unit determines that the preliminary discrimination result is the first object that may be misdetected or may be missed, and the discrimination result of the S-frame image adjacent to the image to be detected is combined to determine the discrimination result, which specifically includes: The preliminary judgment result is the first object that may be missed. According to the order of the first frame image to the S frame image adjacent to the image to be detected, if the image from the first frame image to the S frame image adjacent to the image to be detected In at least one frame of the image, the association result of the object corresponding to the first object that may be missed as a result of the preliminary determination is the second association result, and the tracking confidence of the object corresponding to the first object that may be missed as the preliminary determination result If the degree is less than or equal to the second confidence threshold, it is determined that the preliminary judgment result is that the judgment result of the first object that may be missed is the end; if it is adjacent to the image to be detected from the first frame image to the S-th frame image , The association results of the objects corresponding to the first object that may be missed by the preliminary discrimination result are all second association results, and the tracking confidence of the objects corresponding to the first object that may be missed by the preliminary discrimination result is less than that of the first object. If a confidence threshold is greater than the second confidence threshold, it is determined that the preliminary judgment result is that the judgment result of the first object that may be missed is the end; otherwise, the judgment result of the preliminary judgment result is the first object that may be missed is the judgment result of the missed detection .

In a possible design, the difficult case mining unit determines that the preliminary discrimination result is the first object that may be misdetected or may be missed, and the discrimination result of the S-frame image adjacent to the image to be detected is combined to determine the discrimination result, which specifically includes: The preliminary discrimination result is the first object that may be misdetected. If each frame image from the first frame image to the Sth frame image adjacent to the image to be detected corresponds to the first object that may be misdetected by the preliminary discrimination result The association results of the objects are all matching successfully, and the tracking confidence of the objects corresponding to the first object that may be misdetected by the preliminary discrimination result is greater than the first confidence threshold, and the preliminary discrimination result is determined to be the first possible misdetection. The discrimination result of the object is new; otherwise, it is determined that the preliminary discrimination result is the first object that may be misdetected, and the discrimination result is a misdetection.

In a possible design, if the number of frames of the remaining undetected images to be detected is greater than or equal to the preset frame number S, the S frames of images adjacent to the image to be detected have a frame sequence number greater than that of the image to be detected. S-frame image with a large serial number; if the number of frames of the remaining undetected images to be detected is less than the preset frame number S, the S-frame image adjacent to the image to be detected is the S frame whose frame number is smaller than the frame number of the image to be detected image.

In a possible design, the preset single target tracking algorithm is a single target tracking algorithm based on deep learning.

In a second aspect, an embodiment of the present application provides a device that can implement the method for mining difficult cases in target detection described in the first aspect. For example, the device may be a server. In one possible design, the device may include a processor and a memory. The processor is configured to support the device to perform the corresponding function in the method of the first aspect described above. The memory is used for coupling with the processor, and it stores the necessary program instructions and data of the device.

In a third aspect, embodiments of the present application provide a computer-readable storage medium, which includes computer instructions, which when the computer instructions run on a device, cause the device to perform any of the above-mentioned aspects and possible The method of mining difficult cases in the target detection described in the design method.

In a fourth aspect, the embodiments of the present application provide a computer program product. When the computer program product runs on a computer, the computer can execute the target detection as described in any of the above aspects and possible design methods. Examples of mining methods.

In a fifth aspect, the embodiments of the present application also provide a chip system, which includes a processor and may also include a memory, which is used to implement the difficult example mining in target detection described in any of the above aspects and possible design methods. Methods.

Any device or device or computer readable storage medium or computer program product or chip system provided above is used to execute the corresponding method provided above. Therefore, the beneficial effects that can be achieved can refer to the above provided The beneficial effects of the corresponding scheme in the corresponding method will not be repeated here.

Description of the drawings

FIG. 1 is a schematic diagram of a scenario to which the technical solution provided by an embodiment of the application is applicable;

FIG. 2 is a schematic diagram of a device to which the technical solution provided in an embodiment of the application is applicable;

3 is a schematic diagram 1 of a method for mining difficult cases in target detection according to an embodiment of this application;

4 is a second schematic diagram of a method for mining difficult cases in target detection according to an embodiment of this application;

FIG. 5 is a structural schematic diagram 1 of an apparatus provided by an embodiment of this application;

FIG. 6 is a second structural diagram of an apparatus provided by an embodiment of this application;

FIG. 7 is a third structural diagram of an apparatus provided by an embodiment of this application.

Detailed ways

The term "plurality" herein refers to two or more. The terms "first" and "second" herein are used to distinguish different objects, rather than to describe a specific order of objects. For example, the first threshold and the second threshold are only for distinguishing different thresholds, and the order of their order is not limited. The term "and/or" in this article is only an association relationship describing the associated objects, which means that there can be three relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, exist alone B these three situations.

In the embodiments of the present application, words such as "exemplary" or "for example" are used as examples, illustrations, or illustrations. Any embodiment or design solution described as "exemplary" or "for example" in the embodiments of the present application should not be construed as being more preferable or advantageous than other embodiments or design solutions. To be precise, words such as "exemplary" or "for example" are used to present related concepts in a specific manner.

The method and device for mining difficult cases in target detection provided by the embodiments of the present application will be described in detail below with reference to the accompanying drawings.

The technical solutions provided in this application can be applied to various hardware devices that support scientific computing, such as personal computers (PC), servers, laptops, tablet computers, vehicle-mounted computers, mobile phones, mobile terminals, smart cameras, smart watches, Embedded devices, etc. The embodiment of the application does not impose special restrictions on the specific form of the hardware device.

Exemplarily, FIG. 2 is a schematic structural diagram of a device 100 provided in an embodiment of this application. The device 100 includes at least one processor 110, a communication line 120, a memory 130, and at least one communication interface 140.

The processor 110 may be a general-purpose central processing unit (central processing unit, CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more programs for controlling the execution of the program of this application. integrated circuit.

The communication line 120 may include a path to transmit information between the aforementioned components.

The communication interface 140 uses any device such as a transceiver to communicate with other devices or communication networks, such as Ethernet, radio access network (RAN), wireless local area networks (WLAN), etc. .

The memory 130 may be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), or other types that can store information and instructions The dynamic storage device can also be electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disk storage, optical disc storage (Including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program codes in the form of instructions or data structures and can be used by a computer Any other media accessed, but not limited to this. The memory 130 may exist independently, and is connected to the processor 110 through a communication line 120. The memory 130 may also be integrated with the processor 110.

The memory 130 is used to store computer-executed instructions for executing the solution of the present application, and the processor 110 controls the execution. The processor 110 is configured to execute computer-executable instructions stored in the memory 130, so as to implement the method for mining difficult cases in target detection provided in the following embodiments of the present application.

Optionally, the computer-executable instructions in the embodiments of the present application may also be referred to as application program codes, which are not specifically limited in the embodiments of the present application.

In a specific implementation, as an embodiment, the processor 110 may include one or more CPUs, such as CPU0 and CPU1 in FIG. 2.

In a specific implementation, as an embodiment, the device 100 may include multiple processors, such as the processor 110 and the processor 111 in FIG. 2. Each of these processors can be a single-CPU (single-CPU) processor or a multi-core (multi-CPU) processor. The processor here may refer to one or more devices, circuits, and/or processing cores for processing data (for example, computer program instructions).

In a specific implementation, as an embodiment, the device 100 may further include an output device 150 and an input device 160. The output device 150 communicates with the processor 110 and can display information in a variety of ways. For example, the output device 150 may be a liquid crystal display (LCD), a light emitting diode (LED) display device, a cathode ray tube (CRT) display device, or a projector (projector) Wait. The input device 160 communicates with the processor 110, and can receive user input in a variety of ways. For example, the input device 160 may be a mouse, a keyboard, a touch screen device, a sensor device, or the like.

The above-mentioned device 100 may be a general-purpose device or a special-purpose device. In a specific implementation, the device 100 may be a vehicle-mounted device, a desktop computer, a portable computer, a network server, a palmtop computer (personal digital assistant, PDA), a mobile phone, a tablet computer, a wireless terminal device, an embedded device, or the like in Figure 2 Structure of the equipment. The embodiment of the present application does not limit the type of the device 100. It should be noted that the structure of the device 100 shown in FIG. 2 is only used as an example, and is not used to limit the technical solution of the present application. Those skilled in the art should understand that, in a specific implementation process, the device 100 may also be in other forms and may also include other components.

The embodiment of the present application provides a method for mining difficult cases in target detection, which combines target detection algorithms and single target tracking algorithms to mine difficult cases in target detection. Difficult case mining can be carried out more effectively, and difficult case data can be extracted more accurately. The method for mining difficult cases in target detection can be the above-mentioned hardware equipment or device that supports scientific operations; the device can be a chip or a chip system; it can also be a computer-readable storage medium; it can also be a computer program Product; the embodiment of this application does not limit this.

The embodiment of the present application provides a method for mining difficult cases in target detection, which can be applied to the device shown in FIG. 2. As shown in Figure 3, the method may include:

S301: Acquire an image to be detected.

The method for mining difficult cases in target detection provided by the embodiments of the present application can be applied to difficult case mining in target detection on sequence images. A sequence of images is a group of images with time continuity obtained by decimating video frames. For example, the video can be captured by a camera. Perform frame extraction processing on the captured video to obtain the original sequence image. Analyze each frame of the original sequence image to obtain the information of one or more objects in each frame. The object with the wrong analysis result is the hard case data; among them, the hard case data includes missed rare cases and false detections Hard case.

Analyze the t-th frame image of the original sequence image, then the t-th frame image is the image to be detected; where t>0.

In an implementation manner, starting from the first frame (ie, t=1) of the sequence of images, the images to be detected are acquired in the order from front to back.

In one implementation, if the number of frames of the remaining undetected images to be detected is less than the preset number of frames S (ie t>NS, N is the total number of frames of the sequence image), start from the last frame of the sequence image (ie t=N) At the beginning of the image, the images to be detected are acquired in the order from back to front.

It should be noted that the embodiment of the present application takes as an example a difficult example of digging evidence for target detection on a sequence image. It does not constitute a limitation on the technical solution of this application. The method for mining difficult cases in target detection provided by the embodiments of the present application can also be applied to mining difficult cases in sequence data such as laser point clouds.

S302: Use a preset target detection algorithm to analyze the image to be detected, and obtain a detection result of the image to be detected.

Object detection is a method of identifying objects and their positions in an image.

Use the preset target detection algorithm to analyze the image to be detected (the t-th frame image) to obtain the detection result of the image to be detected. The preset target detection algorithm can be any target detection algorithm in conventional technologies; for example, the YOLO (you only look once: unified, real-time object detection) algorithm.

The detection result may include: the category of one or more detection objects, the position of each detection object (referred to as the detection position in this application), the classification accuracy value of each detection object, and other information. Among them, the detection object is the recognition target. The category of the detection object is used to distinguish the type of the detection object; for example, the category of the detection object may include people, traffic signs, buildings, vehicles, etc.; for example, in (a), (b) and (c) of Figure 1, The category of the detected object is a traffic sign. The position of the detection object may be the coordinates of the detection object in the image. The classification accuracy value of the detection object, that is, the probability value of each detection object output by the target detection algorithm is the category; when the classification accuracy value is greater than the set first value, the target detection algorithm determines and recognizes the detection object of the given category.

In one implementation, the device can save the detection result of the image to be detected. Exemplarily, the device saves a template information table, the template information table includes the detection result of the image to be detected and related information; for example, the image frame number, object serial number, object category, object location, classification accuracy value. For example, the template information table includes the information shown in Table 1.

Table 1

图像帧号Image frame number	对象序号Object number	对象类别Object category	对象位置Object location	分类精度值Classification accuracy value
11	11	交通标识Traffic signs	(x ₁，y ₁)，(x ₂，y ₂) (x ₁ , y ₁ ), (x ₂ , y ₂ )	0.870.87
11	22	交通标识Traffic signs	(x ₃，y ₃)，(x ₄，y ₄) (x ₃ , y ₃ ), (x ₄ , y ₄ )	0.880.88

The image frame number is the sequence number of the frame in which the detection object is located. The object serial number is the serial number of the detected object (optionally, the serial number of the detected object may be manually labeled). The object category is the category of the detection object. The object position is the position information of the detected object; for example, (x ₁ , y ₁ ), (x ₂ , y ₂ ) are the coordinates of the upper left corner and the lower right corner of the object 1 in the first frame of image, used to represent The position of object 1 in the first frame of image. The classification accuracy value is the classification accuracy value of the detection object.

Further, each time the detection result of the image to be detected is obtained, the relevant information in the template information table can be updated. For example, use the preset target detection algorithm to analyze the first frame of the sequence image to obtain the detection result of the first frame image; initialize the relevant information in the template information table according to the detection result of the first frame image, and the information of the template information table is as Table 1 shows. Among them, the first frame image is called a template image. After that, in the order from front to back, the second frame of the sequence image is obtained as the image to be detected; the second frame image is analyzed using the preset target detection algorithm to obtain the detection result of the second frame image; according to the second frame image The detection results update the relevant information in the template information table, and the information in the template information table is shown in Table 2.

Table 2

图像帧号Image frame number	对象序号Object number	对象类别Object category	对象位置Object location	分类精度值Classification accuracy value
11	11	交通标识Traffic signs	(x ₁，y ₁)，(x ₂，y ₂) (x ₁ , y ₁ ), (x ₂ , y ₂ )	0.870.87
11	22	交通标识Traffic signs	(x ₃，y ₃)，(x ₄，y ₄) (x ₃ , y ₃ ), (x ₄ , y ₄ )	0.880.88
22	11	交通标识Traffic signs	(x ₅，y ₅)，(x ₆，y ₆) (x ₅ , y ₅ ), (x ₆ , y ₆ )	0.860.86
22	22	交通标识Traffic signs	(x ₇，y ₇)，(x ₈，y ₈) (x ₇ , y ₇ ), (x ₈ , y ₈ )	0.90.9

It should be noted that the template information table in the embodiment of the present application is only an exemplary description. In actual applications, the template information table may also be in other forms, which are not limited in the embodiment of the present application.

S303: Use a preset single-target tracking algorithm to analyze the image to be detected, and obtain a tracking result of the image to be detected.

The single-target tracking algorithm is an algorithm that predicts the size and position of the target in subsequent frames under the condition of a given target size and position in the initial frame.

In an implementation manner, the preset single-target tracking algorithm may be any single-target tracking algorithm based on deep learning in conventional technologies; for example, a correlation filter (CF, correlation filter) algorithm. Using a single-target tracking algorithm based on deep learning and analyzing images based on deep semantic features can effectively predict the location information of the matching results, thereby improving the accuracy of difficult case mining.

The given initial frame can be the first frame of the sequence image. The target of the initial frame may be a detection object obtained by performing target detection on the first frame of image. The information of the first frame of the sequence image can be obtained from the template information table saved by the device.

The tracking result may include information such as the position of one or more tracking objects (referred to as the tracking position in this application), and the tracking confidence of each tracking object. Among them, the tracking confidence is used to reflect the reliability of each tracking result. The higher the tracking confidence, the higher the reliability of the tracking result; when the tracking confidence is greater than the set second value, the single target tracking algorithm determines that the tracking is correct To the given goal.

Exemplarily, the output of the single-target tracking algorithm based on deep learning is the last layer feature map f of the network part of the single-target tracking algorithm, and the position information of each tracking object; where f is the size of f _w × f _h matrix. The tracking confidence is the maximum response score in f; it can be expressed as max(f(i,j)).

S304: Obtain a discrimination result of each object in the image to be detected according to the detection result of the image to be detected, the tracking result and the preset rule.

As shown in Fig. 4, step S304 may include:

S3041, according to the detection result of the image to be detected, the tracking result and the preset association rule, obtain the association result of each object in the image to be detected.

Select the objects in the detection result of the image to be detected that are in the same category as the tracking result, take the tracked object as the row, the detected object as the column or the detected object as the row, and the tracked object as the column, and calculate the intersection over union (IOU) value, Build the matrix. Among them, IOU is the ratio of the intersection and union between the detection frame of the tracking object and the detection frame of the detection object.

The preset association rule is to use an association matching algorithm (for example, the Hungarian algorithm) and use a matrix to obtain the association result of each object in the image to be detected. The association result includes: a successful match, a first association result, and a second association result. Wherein, the matching success indicates that the detection object in the detection result and the tracking object in the tracking result are the same object, that is, the pairing is successful. The first association result indicates that after the matching, there is no detection object matching the tracking object in the tracking result; that is, the object exists in the detection result, and the object does not exist in the tracking result. The second association result indicates that after the matching, there is no tracking object matching the detection object in the detection result; that is, the object does not exist in the detection result, and the object exists in the tracking result. Further, it can also be determined that the association result is the classification accuracy value and the IOU value of the successfully paired object. If the classification accuracy value is less than the first threshold or the IOU value is less than the second threshold, the association result of the corresponding detection object is determined as the first association result , Determine the association result of the corresponding tracking object as the second association result.

In this way, the association result of each object in the image to be detected is obtained.

Using the correlation matching algorithm to match the detection result and the tracking result can effectively ensure that each detection object in the detection result is matched with at most one tracking object in the tracking result.

S3042, according to the discrimination rule and the association result of each object in the image to be detected, obtain a discrimination result of each object in the image to be detected.

The discrimination rules include:

1. According to the preliminary discrimination rules and the association result of each object in the image to be detected, a preliminary discrimination is made for each object in the image to be detected, and the preliminary discrimination result is obtained.

Among them, the preliminary judgment results include: successful matching, missed detection, false detection, possible missed detection, possible false detection, and ended.

The preliminary judgment rules include:

1. For objects whose association result is a successful match, the preliminary judgment result is a successful match.

2. For objects whose association result is the first association result, the preliminary judgment result is a possible misdetection.

3. For an object whose correlation result is the second correlation result and the tracking confidence of the tracking object is greater than or equal to the first confidence threshold, the preliminary judgment result is a missed detection.

4. For an object whose association result is the second association result, and the tracking confidence of the tracking object is less than or equal to the second confidence threshold, the preliminary judgment result is the end. Wherein, the second confidence threshold is less than the first confidence threshold.

5. For an object whose correlation result is the second correlation result, and the tracking confidence of the tracking object is less than the first confidence threshold and greater than the second confidence threshold, the preliminary judgment result is that the detection may be missed.

2. Determine the discrimination result of each object in the image to be detected according to the preliminary discrimination result of each object in the image to be detected.

Among them, the discrimination results include: successful matching, missed detection, false detection, new appearance and end.

1. For objects whose preliminary discrimination result is a successful match, the discrimination result is determined to be a successful match.

2. For objects whose preliminary judgment results are missed inspections, determine that the judgment results are missed inspections.

3. For objects whose preliminary judgment result is over, the judgment result is determined to be over.

4. For objects that may be misdetected or missed as a result of preliminary discrimination, the discrimination result of the S-frame image adjacent to the image to be detected is combined to determine the discrimination result. Among them, the S frame image adjacent to the image to be detected is the S frame image after the image to be detected (that is, the frame number is greater than the frame number of the image to be detected), or before the image to be detected (that is, the frame number is greater than the frame number of the image to be detected). S-frame image with the smaller serial number). For example, the image to be detected is the t-th frame image; if the number of frames of the remaining undetected images is greater than or equal to the preset number of frames S (ie t<=NS, N is the total number of frames of the sequence image, S>1 ), the S frames of images adjacent to the image to be detected are S frames of images after the image to be detected, and the order of discrimination is from front to back; if the number of frames of the remaining undetected images to be detected is less than the preset number of frames S (That is, t>NS, N is the total number of frames of the sequence image), the S frame images adjacent to the image to be detected are the S frame images before the image to be detected, and the discrimination order is from back to front.

①. The preliminary judgment result is the object that may be missed

According to the order of the first frame image to the S frame image adjacent to the image to be detected, if at least one frame of the image from the first frame image to the S frame image, the association result of the object corresponding to the object is the second Association result, and the tracking confidence of the object corresponding to the object is less than or equal to the second confidence threshold, then the determination result is determined to be the end; if each frame of image from the 1st frame to the Sth frame of image corresponds to the object The association results of the objects are the second association results, and the tracking confidence of the object corresponding to the object is less than the first confidence threshold and greater than the second confidence threshold, then the determination result is determined to be the end; for the above two cases Yes, it is determined that the judgment result is a missed inspection.

②, the preliminary judgment result is the object of possible misdetection

If in each of the images from the 1st frame to the Sth frame image adjacent to the image to be detected, the association result of the object corresponding to the object is that the matching is successful, and the tracking confidence of the object corresponding to the object is all If it is greater than the first confidence threshold, it is determined that the discrimination result is new; for those that do not belong to the above situation, it is determined that the discrimination result is a false detection.

In an implementation manner, the device may maintain a temporary template information table to store information used to identify objects that may be missed and objects that may be missed. For example, the temporary template information table may include: image frame number, object serial number, image detection result (for example, object category, object location, classification accuracy value), missed detection identification, false detection identification, possible missed detection identification, possible false detection Identification, matching success identification, etc.

After discriminating each of the image to be detected and the first frame image to the Sth frame image adjacent to the image to be detected, the temporary template information table is updated. The update rules are:

For an object whose preliminary discrimination result is a successful match, use the detection result of the object to update the corresponding location information in the temporary template information table, and set the matching success flag;

For an object whose preliminary judgment result is a missed inspection, use the tracking result of the object to update the corresponding location information in the temporary template information table, and set the missed inspection flag;

For objects whose preliminary discrimination results are false detections, set false detection flags;

For the newly-appearing object as a result of preliminary discrimination, add the object information to the temporary template information table and set the newly-appearing flag;

For the objects whose preliminary judgment result is the end, set the end flag;

For an object that may be missed as a result of preliminary discrimination, use the tracking result of the object to update the corresponding location information in the temporary template information table, and set a possible missed indicator;

For an object whose preliminary judgment result is a possible misdetection, the detection result of the object is used to update the corresponding location information in the temporary template information table, and a possible misdetection flag is set.

In this way, the discrimination result of each object in the image to be detected can be determined according to the temporary template information table.

Exemplarily, the image to be detected is the t-th frame image. After the t-th frame image is judged, the temporary template information table is updated. The temporary template information table includes the information shown in Table 3.

table 3

According to the temporary template information table, it can be determined that the discrimination result of object 1 in the t-th frame image is a successful match, the discrimination result of object 2 is a missed detection, the discrimination result of object 3 is a misdetection, and the discrimination result of object 4 is a new appearance, and object 5 The judgment result of is the end; the object 6 is a possible missed detection, and the object 7 is a possible false detection. You can combine the S (for example, S=2) frame images adjacent to the t-th frame image to determine the judgment results of the object 6 and the object 7 .

Further, taking the order from front to back as an example, the image to be detected is the (t+1)th frame image. After the (t+1)th frame image is judged, the temporary template information table is updated. The temporary template information table includes the information shown in Table 4.

Table 4

In an implementation manner, after the (t+1)th frame image is discriminated, the information of successful matching, missed detection, misdetection, and end in the tth frame may be deleted. It should be noted that Table 4 may also include information about other objects in the (t+1)th frame of image except for object 6 and object 7. This part of the content is omitted in this application.

After the (t+1)-th frame image is discriminated, it can be determined that the object 7 in the t-th frame image is misdetected.

Further, taking the order from front to back as an example, the image to be detected is the (t+2)th frame image. After the (t+2)th frame image is judged, the temporary template information table is updated. The temporary template information table includes the information shown in Table 5.

table 5

It should be noted that the table 5 may also include the information of other objects in the (t+2)th frame image except for the object 6 and the object 7. This part of the content is omitted in this application.

After the (t+2)-th frame image is discriminated, it can be determined that the object 6 in the t-th frame image is the end.

In this way, the discrimination result of each object in the t-th frame image is successfully determined.

S305. Obtain difficult case data according to the discrimination result of each object in the image to be detected.

Objects in the image to be detected whose judgment result is a missed detection are determined as difficult cases of missed detection; objects in the image to be detected that are judged to be misdetected are determined as difficult cases of false detection; That is, difficult case data is obtained. Optionally, missed rare cases and/or falsely detected rare cases can be added to the rare case data set.

In an implementation manner, the template information table is updated according to the discrimination result of each object in the image to be detected. In this way, the effective use of the detected results can be realized.

The template information table update rules are:

For an object whose discrimination result is a successful match, use the detection result of the object to update the corresponding position information in the template information table;

For an object whose identification result is a missed inspection, use the tracking result of the object to update the corresponding location information in the template information table;

The information in the template information table is not updated for objects whose judgment results are false detections;

For the newly-appearing object as the result of the discrimination, add the information of the detection result of the object to the template information table;

For the object whose judgment result is the end, the information of the object is removed from the template information table.

The method for mining difficult cases in target detection provided by the embodiments of the present application combines single target tracking algorithm and target detection algorithm to mine difficult cases in target detection, and at the same time distinguishes difficult cases of missed detection and difficult cases of false detection, and can extract more accurately Missing and false detection of difficult cases, more effective realization of difficult cases mining. Adopt a single target tracking algorithm based on deep learning, analyze the image based on deep semantic features, and effectively predict the location information of the matching result; and use the associated matching algorithm to match the detection result and the tracking result, effectively ensuring each detection object in the detection result , There is only one tracking object corresponding to it in the tracking results; the accuracy of mining difficult cases is improved.

The foregoing mainly introduces the solutions provided by the embodiments of the present application. It can be understood that, in order to realize the above-mentioned functions, the above-mentioned device includes hardware structures and/or software modules corresponding to each function. Those skilled in the art should understand that, in combination with the units and algorithm steps of the examples described in the embodiments disclosed herein, the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software-driven hardware depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.

The embodiments of the present application can divide the functional modules of devices that are difficult to mine in target detection according to the above method examples. For example, each functional module can be divided corresponding to each function, or two or more functions can be integrated into one process. Module. The above-mentioned integrated modules can be implemented in the form of hardware or software functional modules. It should be noted that the division of modules in the embodiments of the present application is illustrative, and is only a logical function division, and there may be other division methods in actual implementation. The following is an example of dividing each function module corresponding to each function.

FIG. 5 is a schematic diagram of the logical structure of a device 500 provided by an embodiment of the present application. The device 500 may be a device for mining difficult cases in target detection, and can implement the method for mining difficult cases in target detection provided by an embodiment of the present application. The apparatus 500 may be a hardware structure, a software module, or a hardware structure plus a software module. As shown in FIG. 5, the device 500 includes an initialization module 501, a target detection module 502, an information update module 503, a single target tracking module 504, and a difficult case mining module 505.

Please refer to FIG. 6, in the difficult case mining of target detection, the initialization module 501 is used to initialize various information, such as initializing the frame number of the image to be detected; The number is initialized to 1. The initialization module 501 can also be used to initialize the temporary template information table and the template information table to be empty. The target detection module 502 is configured to use a preset target detection algorithm to analyze the image to be detected and obtain the detection result of the image to be detected. The information update module 503 is used to update the temporary template information table and the template information table. For example, the detection result of the image to be detected is updated to the temporary template information table and the template information table. The single target tracking module 504 is configured to use a preset single target tracking algorithm to analyze the image to be detected and obtain the tracking result of the image to be detected. The hard case mining module 505 obtains the discrimination result of each object in the image to be detected according to the detection result output by the target detection module 502, the tracking result output by the single target tracking module 504, and the preset association rules and discrimination rules. The hard cases of missed detection and the hard cases of misdetection output by the hard case mining module 505 are added to the hard case data set. After the difficult example mining module 505 obtains the discrimination result of each object in the image to be detected, the information update module 503 updates the template information table. The information update module 503 is also used to update the temporary template information table when the difficult case mining module 505 discriminates an object that may be misdetected or may be missed as a result of the preliminary judgment. After obtaining the discrimination result of each object in the current image to be detected, the initialization module 501 adds 1 to the frame number of the current image to be detected to obtain the next image to be detected.

FIG. 7 is a schematic diagram of the logical structure of an apparatus 700 provided by an embodiment of the present application. The apparatus 700 may be a device for mining difficult cases in target detection, and can implement the method for mining difficult cases in target detection provided in the embodiments of the present application. The apparatus 700 may be a hardware structure, a software module, or a hardware structure plus a software module. As shown in FIG. 7, the apparatus 700 includes an image acquisition unit 701, a target detection unit 702, a target tracking unit 703, and a difficult case mining unit 704. Wherein, the image acquisition unit 701 may be used to perform S301 in FIG. 3, and/or perform other steps described in this application. The target detection unit 702 may be used to perform S302 in FIG. 3, and/or perform other steps described in this application. The target tracking unit 703 may be used to perform S303 in FIG. 3, and/or perform other steps described in this application. The hard case mining unit 704 may be used to perform S304 and S305 in FIG. 3, and/or perform other steps described in this application.

Among them, all relevant content of each step involved in the above method embodiment can be cited in the functional description of the corresponding functional unit, which will not be repeated here.

Those of ordinary skill in the art will know that all or part of the steps in the above method can be completed by a program instructing relevant hardware, and the program can be stored in a computer-readable storage medium such as ROM, RAM, and optical disk. .

The embodiment of the present application also provides a storage medium, and the storage medium may include a memory.

For explanations and beneficial effects of related content in any of the above-provided devices, reference may be made to the corresponding method embodiments provided above, which will not be repeated here.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented using a software program, it can be implemented in the form of a computer program product in whole or in part. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, network equipment, user equipment, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center. Transmission to another website, computer, server, or data center via wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be accessed by a computer or includes one or more data storage devices such as servers, data centers, etc. that can be integrated with the medium. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a tape), an optical medium (for example, a digital video disc (DVD)), or a semiconductor medium (for example, a solid state disk (SSD)) Wait.

Although this application has been described in conjunction with various embodiments, in the process of implementing the claimed application, those skilled in the art can understand and understand by viewing the drawings, the disclosure, and the appended claims in the process of implementing the claimed application. Implement other changes of the disclosed embodiment. In the claims, the word "comprising" does not exclude other components or steps, and "a" or "one" does not exclude a plurality. A single processor or other unit may implement several functions listed in the claims. Certain measures are described in mutually different dependent claims, but this does not mean that these measures cannot be combined to produce good results.

Although the application has been described in combination with specific features and embodiments, it is obvious that various modifications and combinations can be made without departing from the spirit and scope of the application. Accordingly, the specification and drawings are merely exemplary descriptions of the application as defined by the appended claims, and are deemed to cover any and all modifications, changes, combinations or equivalents within the scope of the application. Obviously, those skilled in the art can make various changes and modifications to the application without departing from the spirit and scope of the application. In this way, if these modifications and variations of this application fall within the scope of the claims of this application and their equivalent technologies, then this application is also intended to include these modifications and variations.

Claims

A method for mining difficult cases in target detection, which is characterized in that it includes:

Obtain the image to be detected;

Use a preset target detection algorithm to analyze the to-be-detected image to obtain a detection result of the to-be-detected image; the detection result includes: the category of one or more detection objects, and the Detection position, classification accuracy value of the one or more detection objects;

Use a preset single-target tracking algorithm to analyze the image to be detected to obtain the tracking result of the image to be detected; the tracking result includes: the tracking position of one or more tracking objects, and the one or more tracking The tracking confidence of the object;

According to the detection result, the tracking result and the preset rules, obtain the discrimination result of each object in the image to be detected; the discrimination result includes: successful matching, missed detection, false detection, new appearance and end;

The object whose judgment result is a missed detection is determined as a difficult case of missed detection, and the object whose judgment result is a wrong detection is determined as a difficult case of false detection.
The method for mining difficult cases in target detection according to claim 1, characterized in that it comprises:

According to the detection result, the tracking result and the preset association rules, obtain the association result of the first object in the image to be detected; the association result includes: a successful match, a first association result, and a second association result;

The first association result is that the first object exists in the detection result, and the first object does not exist in the tracking result;

The second association result is that the first object does not exist in the detection result, and the first object exists in the tracking result;

According to the preliminary discrimination rule and the association result of the first object in the image to be detected, the first object in the image to be detected is preliminarily discriminated, and the preliminary discrimination result of the first object in the image to be detected is obtained; The preliminary judgment results include: successful matching, missed detection, false detection, possible missed detection, possible misdetection, end;

The judgment result of the first object in the image to be detected is determined according to the preliminary judgment result of the first object in the image to be detected.
The method for mining difficult cases in target detection according to claim 2, characterized in that, according to the detection result, the tracking result, and a preset association rule, the information of the first object in the image to be detected is obtained Association results include:

Selecting objects of the same category in the detection result as those in the tracking result, taking the tracking object as a row, detecting the object as a column, or taking the detection object as a row, and the tracking object as a column, to construct a matrix;

An association matching algorithm is used to obtain an association result of the first object in the image to be detected by using the matrix.
The method for mining difficult cases in target detection according to claim 2, wherein the preliminary discrimination rule comprises:

For the first object whose association result is a successful match, the preliminary judgment result is a successful match;

For the first object whose association result is the first association result, the preliminary judgment result is a possible misdetection;

For the first object whose association result is the second association result, and the tracking confidence of the tracking object is greater than or equal to the first confidence threshold, the preliminary judgment result is a missed detection;

For the first object whose association result is the second association result, and the tracking confidence of the tracking object is less than or equal to the second confidence threshold, the preliminary judgment result is end; wherein, the second confidence threshold is less than the first confidence threshold;

For the first object whose association result is the second association result, and the tracking confidence of the tracking object is less than the first confidence threshold and greater than the second confidence threshold, the preliminary judgment result is that the detection may be missed.
The method for mining difficult cases in target detection according to claim 2, wherein the first object in the image to be detected is determined according to a preliminary discrimination result of the first object in the image to be detected The said discrimination results include:

For the first object whose preliminary discrimination result is a successful match, determine that the discrimination result is a successful match;

For the first object whose preliminary judgment result is a missed inspection, determine that the judgment result is a missed inspection;

For the first object whose preliminary discrimination result is over, determine that the discrimination result is over;

For the first object whose preliminary discrimination result is likely to be misdetected or possibly missed, the discrimination result is determined in combination with the discrimination result of S frames of images adjacent to the image to be detected; where S>1.
The method for mining difficult cases in target detection according to claim 5, characterized in that the preliminary discrimination result is the first object that may be misdetected or may be missed, combined with the S that is adjacent to the image to be detected. The discrimination result of the frame image determines that the discrimination result includes:

For the first object that may be missed as a result of the preliminary judgment, judge according to the order from the first frame image to the S frame image adjacent to the image to be detected,

If in at least one frame of image from the first frame image to the S-th frame image adjacent to the image to be detected, the association result of the object corresponding to the first object that may be missed as a result of the preliminary discrimination is the second association As a result, and the tracking confidence of the object corresponding to the first object that may be missed as a result of the preliminary judgment is less than or equal to the second confidence threshold, the judgment result of determining that the preliminary judgment result is the first object that may be missed is Finish;

If in each frame of image from the first frame image to the S-th frame image adjacent to the image to be detected, the association result of the object corresponding to the first object that may be missed is the second The correlation result, and the tracking confidence of the object corresponding to the first object that is likely to be missed as a result of the preliminary discrimination is less than the first confidence threshold and greater than the second confidence threshold, and it is determined that the preliminary discrimination result is the first object that may be missed. The judgment result of is the end;

Otherwise, it is determined that the preliminary judgment result is that the judgment result of the first object that may be missed is a missed detection.
The method for mining difficult cases in target detection according to claim 5, characterized in that the preliminary discrimination result is the first object that may be misdetected or may be missed, combined with the S that is adjacent to the image to be detected. The discrimination result of the frame image determines that the discrimination result includes:

For the preliminary judgment result as the first object that may be misdetected,

If in each frame of image from the first frame image to the S-th frame image adjacent to the image to be detected, the association result of the object corresponding to the first object that may be misdetected by the preliminary discrimination result is a successful match , And the tracking confidence of the object corresponding to the first object that may be misdetected by the preliminary judgment result is greater than the first confidence threshold, and the judgment result of the preliminary judgment result as the first object that may be misdetected is new ；

Otherwise, it is determined that the preliminary discrimination result is that the discrimination result of the first object that may be misdetected is a misdetection.
The method for mining difficult cases in target detection according to claim 5, characterized in that,

If the number of frames of the remaining undetected images to be detected is greater than or equal to the preset frame number S, the S frame of images adjacent to the image to be detected is S whose frame number is greater than the frame number of the image to be detected Frame image

If the number of frames of the remaining undetected images to be detected is less than the preset frame number S, the S frame image adjacent to the image to be detected is an S frame image with a frame number smaller than the frame number of the image to be detected .
The method for mining difficult cases in target detection according to any one of claims 1-8, wherein:

The preset single target tracking algorithm is a single target tracking algorithm based on deep learning.
A device for mining difficult cases in target detection, which is characterized in that it comprises:

The image acquisition unit is used to acquire the image to be detected;

The target detection unit is configured to analyze the to-be-detected image using a preset target detection algorithm to obtain a detection result of the to-be-detected image; the detection result includes: one or more types of detection objects, the one Detection positions of or multiple detection objects, and classification accuracy values of the one or more detection objects;

The target tracking unit is configured to analyze the to-be-detected image using a preset single-target tracking algorithm to obtain the tracking result of the to-be-detected image; the tracking result includes: the tracking position of one or more tracking objects, so State the tracking confidence of one or more tracking objects;

The hard case mining unit is used to obtain the discrimination result of each object in the image to be detected according to the detection result, the tracking result and the preset rules; the discrimination result includes: successful matching, missed detection, and false detection , New appearance and end;

The difficult case mining unit is also used to determine the object whose judgment result is a missed detection as a difficult case of missed detection, and determine the object whose judgment result is a false detection as a difficult case of misdetection.
The device for mining difficult cases in target detection according to claim 10, wherein the difficult case mining unit is specifically configured to:

According to the detection result, the tracking result and the preset association rules, obtain the association result of the first object in the image to be detected; the association result includes: a successful match, a first association result, and a second association result;

The first association result is that the first object exists in the detection result, and the first object does not exist in the tracking result;

The second association result is that the first object does not exist in the detection result, and the first object exists in the tracking result;

According to the preliminary discrimination rule and the association result of the first object in the image to be detected, the first object in the image to be detected is preliminarily discriminated, and the preliminary discrimination result of the first object in the image to be detected is obtained; The preliminary judgment results include: successful matching, missed detection, false detection, possible missed detection, possible misdetection, end;

The judgment result of the first object in the image to be detected is determined according to the preliminary judgment result of the first object in the image to be detected.
The device for mining difficult cases in target detection according to claim 11, wherein the difficult case mining unit obtains the images in the to-be-detected image according to the detection result, the tracking result, and the preset association rules. The association results of the first object specifically include:

Selecting objects of the same category in the detection result as those in the tracking result, taking the tracking object as a row, detecting the object as a column, or taking the detection object as a row, and the tracking object as a column, to construct a matrix;

An association matching algorithm is used to obtain an association result of the first object in the image to be detected by using the matrix.
The device for mining difficult cases in target detection according to claim 11, wherein the preliminary discrimination rule comprises:

For the first object whose association result is a successful match, the preliminary judgment result is a successful match;

For the first object whose association result is the first association result, the preliminary judgment result is a possible misdetection;

For the first object whose association result is the second association result, and the tracking confidence of the tracking object is greater than or equal to the first confidence threshold, the preliminary judgment result is a missed detection;

For the first object whose association result is the second association result, and the tracking confidence of the tracking object is less than or equal to the second confidence threshold, the preliminary judgment result is end; wherein, the second confidence threshold is less than the first confidence threshold;

For the first object whose association result is the second association result, and the tracking confidence of the tracking object is less than the first confidence threshold and greater than the second confidence threshold, the preliminary judgment result is that the detection may be missed.
The device for mining difficult cases in target detection according to claim 11, wherein the difficult case mining unit determines the data in the image to be detected according to the preliminary discrimination result of the first object in the image to be detected. The discrimination result of the first object specifically includes:

For the first object whose preliminary discrimination result is a successful match, determine that the discrimination result is a successful match;

For the first object whose preliminary judgment result is a missed inspection, determine that the judgment result is a missed inspection;

For the first object whose preliminary discrimination result is over, determine that the discrimination result is over;

For the first object whose preliminary discrimination result is likely to be misdetected or possibly missed, the discrimination result is determined in combination with the discrimination result of S frames of images adjacent to the image to be detected; where S>1.
The device for mining difficult cases in target detection according to claim 14, wherein the difficult case mining unit combines with the image to be detected for the first object that may be misdetected or may be missed as a result of the preliminary judgment. The judgment result of adjacent S frame images determines that the judgment result specifically includes:

For the first object that may be missed as a result of the preliminary judgment, judge according to the order from the first frame image to the S frame image adjacent to the image to be detected,

If in at least one frame of image from the first frame image to the S-th frame image adjacent to the image to be detected, the association result of the object corresponding to the first object that may be missed as a result of the preliminary discrimination is the second association As a result, and the tracking confidence of the object corresponding to the first object that may be missed as a result of the preliminary judgment is less than or equal to the second confidence threshold, the judgment result of determining that the preliminary judgment result is the first object that may be missed is Finish;

If in each frame of image from the first frame image to the S-th frame image adjacent to the image to be detected, the association result of the object corresponding to the first object that may be missed is the second The correlation result, and the tracking confidence of the object corresponding to the first object that is likely to be missed as a result of the preliminary discrimination is less than the first confidence threshold and greater than the second confidence threshold, and it is determined that the preliminary discrimination result is the first object that may be missed. The judgment result of is the end;

Otherwise, it is determined that the preliminary judgment result is that the judgment result of the first object that may be missed is a missed detection.
The device for mining difficult cases in target detection according to claim 14, wherein the difficult case mining unit combines with the image to be detected for the first object that may be misdetected or may be missed as a result of the preliminary judgment. The judgment result of adjacent S frame images determines that the judgment result specifically includes:

For the preliminary judgment result as the first object that may be misdetected,

If in each frame of image from the first frame image to the S-th frame image adjacent to the image to be detected, the association result of the object corresponding to the first object that may be misdetected by the preliminary discrimination result is a successful match , And the tracking confidence of the object corresponding to the first object that may be misdetected by the preliminary judgment result is greater than the first confidence threshold, and the judgment result of the preliminary judgment result as the first object that may be misdetected is new ；

Otherwise, it is determined that the preliminary discrimination result is that the discrimination result of the first object that may be misdetected is a misdetection.
The device for mining difficult cases in target detection according to claim 14, characterized in that:

If the number of frames of the remaining undetected images to be detected is greater than or equal to the preset frame number S, the S frame of images adjacent to the image to be detected is S whose frame number is greater than the frame number of the image to be detected Frame image

If the number of frames of the remaining undetected images to be detected is less than the preset frame number S, the S frame image adjacent to the image to be detected is an S frame image with a frame number smaller than the frame number of the image to be detected .
The device for mining difficult cases in target detection according to any one of claims 10-17, wherein:

The preset single target tracking algorithm is a single target tracking algorithm based on deep learning.
A device, characterized in that the device comprises: a processor and a memory; the memory is coupled with the processor; the memory is used to store computer program code; the computer program code includes computer instructions, when the When the processor executes the above computer instructions, it causes the device to execute the method for mining difficult cases in target detection according to any one of claims 1-9.
A computer-readable storage medium, characterized in that the computer-readable storage medium includes computer instructions, when the computer instructions run on a device, the device executes any one of claims 1-9 Difficult example mining method in target detection.
A computer program product, characterized in that, when the computer program product runs on a computer, the computer is caused to execute the method for mining difficult cases in target detection according to any one of claims 1-9.