CN112639872B - Method and device for difficult mining in target detection - Google Patents

Method and device for difficult mining in target detection Download PDF

Info

Publication number
CN112639872B
CN112639872B CN202080004676.1A CN202080004676A CN112639872B CN 112639872 B CN112639872 B CN 112639872B CN 202080004676 A CN202080004676 A CN 202080004676A CN 112639872 B CN112639872 B CN 112639872B
Authority
CN
China
Prior art keywords
result
detection
image
detected
tracking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202080004676.1A
Other languages
Chinese (zh)
Other versions
CN112639872A (en
Inventor
晋周南
孙叠
刘新春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN112639872A publication Critical patent/CN112639872A/en
Application granted granted Critical
Publication of CN112639872B publication Critical patent/CN112639872B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application discloses a method and a device for difficult mining in target detection, and relates to the field of artificial intelligence and intelligent automobiles. The method can more effectively realize the difficult case mining in the target detection and more accurately extract the difficult case data. The method can comprise the following steps: analyzing an image to be detected by using a preset target detection algorithm to obtain a detection result of the image to be detected; analyzing an image to be detected by using a preset single-target tracking algorithm to obtain a tracking result of the image to be detected; according to the detection result, the tracking result and a preset rule, acquiring the judgment result of each object in the image to be detected, wherein the judgment result comprises the following steps: successful matching, missed detection, false detection, new appearance and end; and determining the object with the judgment result of being missed as a missed detection difficult example, and determining the object with the judgment result of being false detected as a false detection difficult example.

Description

Method and device for difficult mining in target detection
Technical Field
The application relates to the field of artificial intelligence, in particular to a method and a device for difficult mining in target detection.
Background
Unmanned driving, smart medical treatment, smart cities and the like are widely concerned at the present stage, and the images and the information need to be analyzed by using a deep learning target detection algorithm based on data driving.
A data-driven deep learning target detection algorithm is a method for obtaining information such as position information and category information of an object such as a vehicle or a person by analyzing an acquired image. The target detection algorithm generally includes two phases of training and reasoning. The training stage is a process of learning data by an algorithm, and the reasoning stage is a stage of analyzing information contained in an image by using the algorithm output in the training stage. The results of the analysis in the inference phase may be correct or incorrect. The analysis result error comprises two conditions of omission and false detection, wherein the object with omission is the omission case, and the object with false detection is the false detection case. Illustratively, (a), (b), and (c) of fig. 1 are detection results of detecting a traffic sign in one frame of a road image, respectively. The road image includes a traffic sign 101, a traffic sign 102, and a vehicle 103, and a vehicle sticker 104 is attached to the vehicle 103. In fig. 1 (a), a traffic sign 101 is correctly marked on a detection frame 1, and a traffic sign 102 is correctly marked on a detection frame 2; in fig. 1 (b), the detection frame 1 correctly marks the traffic sign 101, and if the traffic sign 102 is not correctly detected, the missed detection occurs, and the traffic sign 102 in the image is a missed detection difficult example; in fig. 1 (c), the traffic sign 101 is correctly marked on the detection frame 1, the traffic sign 102 is correctly marked on the detection frame 2, and the vehicle sticker 104 is marked on the detection frame 3, that is, the vehicle sticker 104 is erroneously detected as a traffic sign, and a false detection occurs, and the vehicle sticker 104 in the image is a false detection difficult example. The undetected difficult cases and the false undetected difficult cases constitute difficult case data. The addition of the difficult data to the training stage is an effective method for reducing the error probability of the analysis result.
Difficult case mining is a method for extracting difficult case data sets. The method aims to extract the undetected difficult cases and the false undetected difficult cases. The common difficult case mining methods are divided into two categories, one is a supervised difficult case mining method, and the other is an unsupervised difficult case mining method. A large amount of label data is needed in a supervised difficult mining method, and a large amount of manpower is needed for data labeling; especially in the case of large-scale data, the costs are very high. In the unsupervised difficult case mining method, how to realize effective difficult case mining and more accurately and efficiently extract undetected difficult cases and false detection difficult cases is a problem to be solved.
Disclosure of Invention
The embodiment of the application provides a method and a device for difficult case mining in target detection, which can more effectively realize difficult case mining and more accurately extract difficult case data.
In order to achieve the above purpose, the embodiment of the present application adopts the following technical solutions:
in a first aspect, the application provides a method and a device for hard mining in target detection.
In one possible design, the method may include: acquiring an image to be detected; analyzing an image to be detected by using a preset target detection algorithm to obtain a detection result of the image to be detected; analyzing an image to be detected by using a preset single-target tracking algorithm to obtain a tracking result of the image to be detected; according to the detection result, the tracking result and a preset rule, acquiring the judgment result of each object in the image to be detected, wherein the judgment result comprises the following steps: successful matching, missed detection, false detection, new appearance and end; and determining the object with the judgment result of being missed as a missed detection difficult example, and determining the object with the judgment result of being false detected as a false detection difficult example. Wherein, the detection result includes: the classification precision value of one or more detection objects, the detection position of one or more detection objects and the classification precision value of one or more detection objects; the tracking result comprises the following steps: a tracking location of one or more tracked objects, a tracking confidence of one or more tracked objects; the judgment result comprises the following steps: successful matching, missed detection, false detection, new appearance and end.
In the method, the single-target tracking algorithm and the target detection algorithm are combined to carry out difficult case mining in target detection, the tracking result of the single-target tracking algorithm is applied to the difficult case mining in the target detection, the undetected difficult case and the false detection difficult case are judged simultaneously, the undetected difficult case and the false detection difficult case can be extracted more accurately, and the difficult case mining is realized more effectively.
In one possible design, the method includes: acquiring a correlation result of a first object in the image to be detected according to the detection result, the tracking result and a preset correlation rule; according to the preliminary discrimination rule and the correlation result of the first object in the image to be detected, preliminarily discriminating the first object in the image to be detected to obtain a preliminary discrimination result of the first object in the image to be detected; determining a discrimination result of a first object in the image to be detected according to a preliminary discrimination result of the first object in the image to be detected; wherein, the correlation result comprises: matching is successful, and the first correlation result and the second correlation result are obtained; the preliminary discrimination result includes: matching is successful, detection is missed, false detection is carried out, detection is possibly missed, false detection is possible, and the process is finished; the first correlation result is that the first object exists in the detection result, and the first object does not exist in the tracking result; the second correlation result is that the first object does not exist in the detection result, and the first object exists in the tracking result. The tracking result is used for target detection, so that the position information of the detected target can be effectively predicted, and the accuracy of difficult mining is improved.
In one possible design, obtaining the association result of the first object in the image to be detected according to the detection result, the result and a preset association rule includes: selecting objects with the same category as the objects in the tracking result in the detection result, and constructing a matrix by taking the tracking objects as row detection objects or taking the detection objects as row tracking objects as columns; and obtaining the correlation result of the first object in the image to be detected by using a matrix by using a correlation matching algorithm.
Therefore, the detection result and the tracking result are matched by using the association matching algorithm, each detection object in the detection result is effectively ensured, and at most one tracking object in the tracking result corresponds to the detection object; the accuracy of difficult excavation is improved.
In one possible design, the preliminary decision rule includes: for the first object with the correlation result of successful matching, the preliminary judgment result is successful matching; for a first object with the correlation result being a first correlation result, the preliminary judgment result is possible false detection; for the first object of which the correlation result is the second correlation result and the tracking confidence of the tracked object is greater than or equal to the first confidence threshold, the preliminary judgment result is the omission detection; for the first object of which the association result is the second association result and the tracking confidence of the tracked object is less than or equal to the second confidence threshold, the preliminary judgment result is end; for the first object of which the association result is the second association result and the tracking confidence of the tracked object is smaller than the first confidence threshold and larger than the second confidence threshold, the preliminary judgment result is possible missing detection; wherein the second confidence threshold is less than the first confidence threshold.
In one possible design, determining the discrimination result of the first object in the image to be detected according to the preliminary discrimination result of the first object in the image to be detected includes: for the first object with the primary judgment result of successful matching, determining the judgment result as successful matching; for the first object with the preliminary judgment result of missing detection, determining the judgment result as the missing detection; determining the first object with the initial judgment result as the end; determining a judgment result by combining the judgment result of the S frame image adjacent to the image to be detected for the first object of which the preliminary judgment result is possible to be detected wrongly or missed; wherein S > 1.
For the object which is possibly misdetected or missed detected, the judgment result is determined by combining the judgment result of the S frame image adjacent to the object, so that the probability of misjudgment of the misdetection difficult case and the missed detection difficult case can be effectively reduced.
In one possible design, for a first object of which the preliminary discrimination result is possible to miss detection, judging according to the sequence from a 1 st frame image to an S frame image adjacent to an image to be detected, if in at least one frame image from the 1 st frame image to the S frame image adjacent to the image to be detected, the correlation result of the object corresponding to the first object of which the preliminary discrimination result is possible to miss detection is a second correlation result, and the tracking confidence of the object corresponding to the first object of which the preliminary discrimination result is possible to miss detection is less than or equal to a second confidence threshold, determining that the discrimination result of the first object of which the preliminary discrimination result is possible to miss detection is end; if the correlation results of the objects corresponding to the first object with the possible missed detection as the preliminary judgment result are all second correlation results in each frame of images from the 1 st frame of image to the S th frame of image adjacent to the image to be detected, and the tracking confidence degrees of the objects corresponding to the first object with the possible missed detection as the preliminary judgment result are all smaller than the first confidence threshold value and larger than the second confidence threshold value, determining that the judgment result of the first object with the possible missed detection as the preliminary judgment result is finished; otherwise, determining that the primary judgment result is the judgment result of the first object which is possible to miss the detection as the missed detection.
In a possible design, for a first object of which the preliminary discrimination result is possible to be erroneously detected, if the association result of the object corresponding to the first object of which the preliminary discrimination result is possible to be erroneously detected is successful in each frame image from the 1 st frame image to the S th frame image adjacent to the image to be detected, and the tracking confidence of the object corresponding to the first object of which the preliminary discrimination result is possible to be erroneously detected is greater than a first confidence threshold, determining that the discrimination result of the first object of which the preliminary discrimination result is possible to be erroneously detected is new; otherwise, determining the judgment result of the first object with the preliminary judgment result possibly being false-detected as false-detection.
In one possible design, if the number of the remaining undetected images to be detected is greater than or equal to a preset number of frames S, the S-frame images adjacent to the images to be detected are S-frame images with the frame number greater than that of the images to be detected; and if the frame number of the residual undetected image to be detected is less than the preset frame number S, the S-frame image adjacent to the image to be detected is an S-frame image with a frame number less than that of the image to be detected. Therefore, for the condition that the number of the residual undetected images to be detected is insufficient, the information of the S frame images before the images to be detected can be used for assisting the judgment of the current images to be detected, and the problem of insufficient residual frames is effectively solved. In a possible design, for the remaining undetected image frames with the number less than the preset frame number S, the judgment from back to front in the reverse order can be adopted.
In one possible design, the preset single-target tracking algorithm is a deep learning-based single-target tracking algorithm. By adopting a single-target tracking algorithm based on deep learning and analyzing the image based on deep semantic features, the position information of the matching result can be effectively predicted, and the accuracy of extracting the missed detection difficult cases and the false detection difficult cases can be improved.
Correspondingly, the application also provides a device for difficult mining in target detection, and the device can realize the method for difficult mining in target detection in the first aspect. The device can realize the method through software, hardware or corresponding software executed by hardware.
In one possible design, the apparatus may include: the system comprises an image acquisition unit, a target detection unit, a target tracking unit and a difficult case mining unit. The image acquisition unit is used for acquiring an image to be detected; the target detection unit is used for analyzing the image to be detected by using a preset target detection algorithm to obtain a detection result of the image to be detected; the target tracking unit is used for analyzing the image to be detected by using a preset single target tracking algorithm to obtain a tracking result of the image to be detected; wherein, the detection result includes: the classification precision value of one or more detection objects, the detection position of one or more detection objects and the classification precision value of one or more detection objects; the tracking result comprises the following steps: a tracking location of one or more tracked objects, a tracking confidence of one or more tracked objects; the difficult case mining unit is used for acquiring the judgment result of each object in the image to be detected according to the detection result, the tracking result and the preset rule; the judgment result comprises the following steps: successful matching, missed detection, false detection, new appearance and end; the hard case mining unit is also used for determining the object with the judgment result of being missed as a missed detection hard case and determining the object with the judgment result of being false detected as a false detection hard case.
In one possible design, the difficult excavation unit is specifically configured to: acquiring a correlation result of a first object in the image to be detected according to the detection result, the tracking result and a preset correlation rule; according to the preliminary discrimination rule and the correlation result of the first object in the image to be detected, preliminarily discriminating the first object in the image to be detected to obtain a preliminary discrimination result of the first object in the image to be detected; determining a discrimination result of a first object in the image to be detected according to a preliminary discrimination result of the first object in the image to be detected; wherein, the correlation result comprises: matching is successful, and the first correlation result and the second correlation result are obtained; the first correlation result is that the first object exists in the detection result, and the first object does not exist in the tracking result; the second correlation result is that the first object does not exist in the detection result, and the first object exists in the tracking result; the preliminary discrimination result includes: and (4) matching successfully, missing detection, false detection, possible missing detection, possible false detection and ending.
In a possible design, the obtaining, by the difficult-to-sample mining unit, the association result of the first object in the image to be detected according to the detection result, the tracking result, and a preset association rule specifically includes: selecting objects with the same category as the objects in the tracking result in the detection result, and constructing a matrix by taking the tracking objects as row detection objects or taking the detection objects as row tracking objects as columns; and obtaining the correlation result of the first object in the image to be detected by using a matrix by using a correlation matching algorithm.
In one possible design, the preliminary decision rule includes: for the first object with the correlation result of successful matching, the preliminary judgment result is successful matching; for a first object with the correlation result being a first correlation result, the preliminary judgment result is possible false detection; for the first object of which the correlation result is the second correlation result and the tracking confidence of the tracked object is greater than or equal to the first confidence threshold, the preliminary judgment result is the omission detection; for the first object of which the association result is the second association result and the tracking confidence of the tracked object is less than or equal to the second confidence threshold, the preliminary judgment result is end; for the first object of which the association result is the second association result and the tracking confidence of the tracked object is smaller than the first confidence threshold and larger than the second confidence threshold, the preliminary judgment result is possible missing detection; wherein the second confidence threshold is less than the first confidence threshold.
In a possible design, the determining, by the difficult case mining unit, the discrimination result of the first object in the image to be detected according to the preliminary discrimination result of the first object in the image to be detected specifically includes: for the first object with the primary judgment result of successful matching, determining the judgment result as successful matching; for the first object with the preliminary judgment result of missing detection, determining the judgment result as the missing detection; determining the first object with the initial judgment result as the end; determining a judgment result by combining the judgment result of the S frame image adjacent to the image to be detected for the first object of which the preliminary judgment result is possible to be detected wrongly or missed; wherein S > 1.
In one possible design, the determining, by the difficult-to-sample mining unit, the discrimination result by combining the discrimination result of the S-frame image adjacent to the image to be detected with respect to the first object whose preliminary discrimination result is likely to be false-detected or likely to be missed-detected specifically includes: judging the first object with the preliminary judgment result of possible omission according to the sequence from the 1 st frame image to the S frame image adjacent to the image to be detected, if the correlation result of the object corresponding to the first object with the preliminary judgment result of possible omission is a second correlation result and the tracking confidence coefficient of the object corresponding to the first object with the preliminary judgment result of possible omission is smaller than or equal to a second confidence threshold value in at least one frame from the 1 st frame image to the S frame image adjacent to the image to be detected, determining the judgment result of the first object with the preliminary judgment result of possible omission as end; if the correlation results of the objects corresponding to the first object with the possible missed detection as the preliminary judgment result are all second correlation results in each frame of images from the 1 st frame of image to the S th frame of image adjacent to the image to be detected, and the tracking confidence degrees of the objects corresponding to the first object with the possible missed detection as the preliminary judgment result are all smaller than the first confidence threshold value and larger than the second confidence threshold value, determining that the judgment result of the first object with the possible missed detection as the preliminary judgment result is finished; otherwise, determining that the primary judgment result is the judgment result of the first object which is possible to miss the detection as the missed detection.
In one possible design, the determining, by the difficult-to-sample mining unit, the discrimination result by combining the discrimination result of the S-frame image adjacent to the image to be detected with respect to the first object whose preliminary discrimination result is likely to be false-detected or likely to be missed-detected specifically includes: for the first object with the preliminary discrimination result of possible false detection, if the correlation result of the object corresponding to the first object with the preliminary discrimination result of possible false detection is successful and the tracking confidence coefficient of the object corresponding to the first object with the preliminary discrimination result of possible false detection is greater than a first confidence threshold value in each frame of images from the 1 st frame image to the S th frame image adjacent to the image to be detected, determining that the discrimination result of the first object with the preliminary discrimination result of possible false detection is new; otherwise, determining the judgment result of the first object with the preliminary judgment result possibly being false-detected as false-detection.
In one possible design, if the number of the remaining undetected images to be detected is greater than or equal to a preset number of frames S, the S-frame images adjacent to the images to be detected are S-frame images with frame numbers greater than that of the images to be detected; and if the frame number of the residual undetected image to be detected is less than the preset frame number S, the S-frame image adjacent to the image to be detected is an S-frame image with a frame number less than that of the image to be detected.
In one possible design, the preset single-target tracking algorithm is a deep learning-based single-target tracking algorithm.
In a second aspect, an embodiment of the present application provides an apparatus, which may implement the method for hard mining in object detection described in the first aspect, for example, the apparatus may be a server. In one possible design, the apparatus may include a processor and a memory. The processor is configured to enable the device to perform the corresponding functions in the method of the first aspect described above. The memory is for coupling to the processor and holds the program instructions and data necessary for the device.
In a third aspect, embodiments of the present application provide a computer-readable storage medium, which includes computer instructions that, when executed on an apparatus, cause the apparatus to perform a method for hard mining in object detection as described in any one of the above aspects and possible design manners.
In a fourth aspect, embodiments of the present application provide a computer program product, which when run on a computer, causes the computer to execute a method for hard mining in object detection as described in any one of the above aspects and possible designs thereof.
In a fifth aspect, an embodiment of the present application further provides a chip system, where the chip system includes a processor and may further include a memory, and is used to implement the method for hard mining in target detection in any of the foregoing aspects and possible design manners.
Any one of the above-provided apparatuses or devices or computer-readable storage media or computer program products or chip systems is configured to execute the corresponding methods provided above, and therefore, the beneficial effects that can be achieved by the apparatuses or devices or computer program products or chip systems refer to the beneficial effects of the corresponding schemes in the corresponding methods provided above, which are not described herein again.
Drawings
Fig. 1 is a schematic view of a scene to which the technical solution provided by the embodiment of the present application is applied;
fig. 2 is a schematic diagram of an apparatus to which the technical solution provided by the embodiment of the present application is applied;
fig. 3 is a first schematic diagram illustrating a method for difficult mining in target detection according to an embodiment of the present disclosure;
fig. 4 is a second schematic diagram illustrating a method for difficult mining in target detection according to an embodiment of the present disclosure;
FIG. 5 is a first schematic structural diagram of an apparatus according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of a device according to an embodiment of the present disclosure;
fig. 7 is a schematic structural diagram three of an apparatus according to an embodiment of the present application.
Detailed Description
The term "plurality" herein means two or more. The terms "first" and "second" herein are used to distinguish between different objects, and are not used to describe a particular order of objects. For example, the first threshold and the second threshold are only used for distinguishing different thresholds, and the sequence order of the thresholds is not limited. The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone.
In the embodiments of the present application, words such as "exemplary" or "for example" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g.," is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.
The following describes in detail a method and an apparatus for hard mining in target detection provided by the embodiments of the present application with reference to the accompanying drawings.
The technical scheme provided by the application can be applied to various hardware devices supporting scientific operation, such as Personal Computers (PCs), servers, notebook computers, tablet computers, vehicle-mounted computers, mobile phones, mobile terminals, intelligent cameras, intelligent watches, embedded devices and the like. The embodiment of the present application does not specifically limit the specific form of the hardware device.
Fig. 2 is a schematic structural diagram of an apparatus 100 according to an embodiment of the present disclosure. The device 100 includes at least one processor 110, communication lines 120, memory 130, and at least one communication interface 140.
The processor 110 may be a general-purpose Central Processing Unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more ics for controlling the execution of programs in accordance with the present disclosure.
The communication link 120 may include a path for transmitting information between the aforementioned components.
The communication interface 140, which may be any transceiver or other communication device, is used for communicating with other devices or communication networks, such as ethernet, Radio Access Network (RAN), Wireless Local Area Networks (WLAN), etc.
The memory 130 may be, but is not limited to, a read-only memory (ROM) or other type of static storage device that may store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that may store information and instructions, an electrically erasable programmable read-only memory (EEPROM), a compact disk read-only memory (CD-ROM) or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory 130 may be separate and coupled to the processor 110 via the communication line 120. Memory 130 may also be integrated with processor 110.
The memory 130 is used for storing computer-executable instructions for executing the present application, and is controlled by the processor 110 to execute. The processor 110 is configured to execute the computer-executable instructions stored in the memory 130, so as to implement the method for hard mining in object detection provided by the following embodiments of the present application.
Optionally, the computer-executable instructions in the embodiments of the present application may also be referred to as application program codes, which are not specifically limited in the embodiments of the present application.
In particular implementations, processor 110 may include one or more CPUs such as CPU0 and CPU1 in fig. 2, for example, as an example.
In particular implementations, device 100 may include multiple processors, such as processor 110 and processor 111 in FIG. 2, for example, as an example. Each of these processors may be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor. A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).
In particular implementations, device 100 may also include an output device 150 and an input device 160, as one embodiment. The output device 150 is in communication with the processor 110 and may display information in a variety of ways. For example, the output device 150 may be a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display device, a Cathode Ray Tube (CRT) display device, a projector (projector), or the like. The input device 160 is in communication with the processor 110 and may receive user input in a variety of ways. For example, the input device 160 may be a mouse, a keyboard, a touch screen device, or a sensing device, among others.
The device 100 may be a general purpose device or a special purpose device. In a specific implementation, the device 100 may be a vehicle-mounted device, a desktop computer, a laptop computer, a web server, a Personal Digital Assistant (PDA), a mobile phone, a tablet computer, a wireless terminal device, an embedded device, or a device with a similar structure as in fig. 2. The embodiments of the present application do not limit the type of the apparatus 100. It should be noted that the structure of the apparatus 100 shown in fig. 2 is only for example and is not used to limit the technical solution of the present application. Those skilled in the art will appreciate that the device 100 may also take other forms and include other components in a particular implementation.
The embodiment of the application provides a method for difficultly mining in target detection, which is used for difficultly mining in target detection by combining a target detection algorithm and a single target tracking algorithm. The method can more effectively carry out difficult case mining and more accurately extract difficult case data. The method for executing the hard case mining in the target detection can be the hardware equipment or the device supporting the scientific operation; the apparatus may be a chip or a system of chips; or a computer-readable storage medium; or a computer program product; this is not limited in the embodiments of the present application.
The embodiment of the application provides a method for difficult mining in target detection, which can be applied to the equipment shown in FIG. 2. As shown in fig. 3, the method may include:
s301, acquiring an image to be detected.
The method for difficult case mining in target detection provided by the embodiment of the application can be applied to difficult case mining for target detection of the sequence image. The sequence of images is a group of images with temporal continuity obtained by video framing. For example, the video may be captured by a camera. And performing frame extraction processing on the shot video to obtain an original sequence image. Analyzing each frame of image of the original sequence image to obtain information of one or more objects in each frame of image, wherein the object with the wrong analysis result is the difficult data; the difficult case data comprises a missed detection difficult case and a false detection difficult case.
Analyzing the t frame image of the original sequence image, wherein the t frame image is an image to be detected; wherein t > 0.
In one implementation, the images to be detected are acquired in a front-to-back order starting from frame 1 (i.e., t ═ 1) of the sequence images.
In one implementation, if the number of frames of the remaining undetected images to be detected is less than a preset number of frames S (i.e., t > N-S, N being the total number of frames of the sequence images), the images to be detected are acquired in order from the back to the front, starting from the last frame (i.e., t ═ N) of the sequence images.
In the embodiments of the present application, difficult examples of object detection for sequence images are described as an example. It is not intended to limit the scope of the claims. The method for difficult mining in target detection provided by the embodiment of the application can also be applied to difficult mining of sequence data such as laser point cloud and the like.
S302, analyzing the image to be detected by using a preset target detection algorithm to obtain a detection result of the image to be detected.
Object detection, i.e. a method of identifying objects and their positions in an image.
And analyzing the image to be detected (the t frame image) by using a preset target detection algorithm to obtain a detection result of the image to be detected. The preset target detection algorithm can be any one of the conventional technologies; such as the YOLO (you only look once: unified, real-time object detection) algorithm.
The detection result may include: a category of one or more detection objects, a position of each detection object (referred to as a detection position in this application), a classification accuracy value of each detection object, and the like. Wherein, the detection object is the identification target. The category of the detection object is used for distinguishing the category of the detection object; for example, the category of the detection object may include a person, a traffic sign, a building, a vehicle, and the like; exemplarily, in (a), (b) and (c) of fig. 1, the category of the detection object is a traffic sign. The position of the detection object may be a coordinate of the detection object in the image. The classification precision value of the detection object, namely the probability value of each detection object output by the target detection algorithm as the class; and when the classification precision value is larger than a set first value, the target detection algorithm judges that the detection object of the given class is identified.
In one implementation, the device may save the detection results of the image to be detected. Illustratively, the equipment stores a template information table, and the template information table comprises the detection result and related information of the image to be detected; such as image frame number, object class, object location, classification accuracy value. For example, the template information table includes the information shown in table 1.
TABLE 1
Image frame number Object number Object classes Object position Classification accuracy value
1 1 Traffic sign (x1,y1),(x2,y2) 0.87
1 2 Traffic sign (x3,y3),(x4,y4) 0.88
The image frame number is the serial number of the frame where the detection object is located. The object sequence number is a sequence number of the detection object (optionally, the sequence number of the detection object may be manually marked). The object class is a class of the detection object. Position information of object position as detection object(ii) a Exemplary, (x)1,y1),(x2,y2) Coordinates of the upper left corner and the lower right corner of the object 1 in the image of the frame 1 are used for representing the position of the object 1 in the image of the frame 1. The classification precision value is the classification precision value of the detection object.
Furthermore, after the detection result of the image to be detected is obtained each time, the related information in the template information table can be updated. For example, a preset target detection algorithm is used to analyze the 1 st frame of the sequence image, and a detection result of the 1 st frame of the sequence image is obtained; relevant information in a template information table is initialized according to the detection result of the 1 st frame image, and the information of the template information table is shown in table 1. The 1 st frame image is referred to as a template image. Then, acquiring a 2 nd frame of the sequence image as an image to be detected according to the sequence from front to back; analyzing the 2 nd frame image by using a preset target detection algorithm to obtain a detection result of the 2 nd frame image; and updating relevant information in a template information table according to the detection result of the 2 nd frame image, wherein the information of the template information table is shown in table 2.
TABLE 2
Image frame number Object number Object classes Object position Classification accuracy value
1 1 Traffic sign (x1,y1),(x2,y2) 0.87
1 2 Traffic sign (x3,y3),(x4,y4) 0.88
2 1 Traffic sign (x5,y5),(x6,y6) 0.86
2 2 Traffic sign (x7,y7),(x8,y8) 0.9
It should be noted that the template information table in the embodiment of the present application is only an exemplary illustration. In practical applications, the template information table may also be in other forms, which is not limited in this application.
S303, analyzing the image to be detected by using a preset single-target tracking algorithm to obtain a tracking result of the image to be detected.
A single target tracking algorithm is an algorithm that predicts the size and position of a target in a subsequent frame given the size and position of the target in an initial frame.
In one implementation, the preset single-target tracking algorithm may be any one of the conventional techniques based on a deep learning single-target tracking algorithm; such as a Correlation Filter (CF) like algorithm. By adopting a single-target tracking algorithm based on deep learning and analyzing the image based on the deep semantic features, the position information of the matching result can be effectively predicted, so that the accuracy of difficult mining is improved.
The given initial frame may be the 1 st frame of the sequence image. The target of the initial frame may be a detection object obtained by performing target detection on the 1 st frame image. The information of the 1 st frame of the sequence image can be acquired from a template information table held by the device.
The tracking result may include: the location of one or more tracked objects (referred to herein as a tracking location), the tracking confidence of each tracked object, and the like. The tracking confidence coefficient is used for reflecting the reliability degree of each tracking result, and the higher the tracking confidence coefficient is, the higher the reliability degree of the tracking result is; and when the tracking confidence coefficient is larger than a set second value, the single-target tracking algorithm judges that the given target is correctly tracked.
Illustratively, the output of the single-target tracking algorithm based on deep learning is the last layer of feature map f of the network part of the single-target tracking algorithm and the position information of each tracked object; wherein f is a size of fw×fhOf the matrix of (a). Tracking confidence, namely the maximum response score in f; may be denoted as max (f (i, j)).
S304, acquiring a judgment result of each object in the image to be detected according to the detection result, the tracking result and a preset rule of the image to be detected.
As shown in fig. 4, step S304 may include:
s3041, obtaining a correlation result of each object in the image to be detected according to the detection result of the image to be detected, the tracking result and a preset correlation rule.
Selecting objects with the same category as the objects in the tracking result in the detection result of the image to be detected, taking the tracking object as a row detection object as a column or taking the detection object as a row tracking object as a column, calculating an intersection ratio (IOU) value, and constructing a matrix. The IOU is the proportion of the intersection and the union between the detection frames of the tracking object and the detection frames of the detection object.
The preset association rule is that an association matching algorithm (such as Hungarian algorithm) is used, and the matrix is used for obtaining the association result of each object in the image to be detected. The correlation result comprises: and matching successfully, and obtaining a first correlation result and a second correlation result. The successful matching indicates that the detection object in the detection result and the tracking object in the tracking result are the same object, namely, the matching is successful. The first correlation result shows that after matching, the detection object of the matched tracking object does not exist in the tracking result; that is, the object exists in the detection result and does not exist in the tracking result. The second correlation result shows that after matching, the detection result does not have a tracking object of the detection object matched with the second correlation result; that is, the object is not present in the detection result, and the object is present in the tracking result. Further, it may also be determined that the association result is a classification accuracy value and an IOU value of the successfully paired object, and if the classification accuracy value is smaller than a first threshold or the IOU value is smaller than a second threshold, the association result of the corresponding detected object is determined as a first association result, and the association result of the corresponding tracked object is determined as a second association result.
Thus, the correlation result of each object in the image to be detected is obtained.
And the detection result and the tracking result are matched by using an associated matching algorithm, so that each detection object in the detection result can be effectively ensured, and at most one tracking object in the tracking result corresponds to the detection object.
S3042, obtaining the discrimination result of each object in the image to be detected according to the discrimination rule and the correlation result of each object in the image to be detected.
The judgment rule comprises the following steps:
firstly, according to the preliminary discrimination rule and the correlation result of each object in the image to be detected, preliminarily discriminating each object in the image to be detected to obtain the preliminary discrimination result.
Wherein, the preliminary discrimination result includes: and (4) matching successfully, missing detection, false detection, possible missing detection, possible false detection and ending.
The preliminary discrimination rules include:
1. and for the object with the correlation result of successful matching, the preliminary judgment result is successful matching.
2. And for the object with the correlation result being the first correlation result, the preliminary judgment result is possible false detection.
3. And for the object of which the correlation result is the second correlation result and the tracking confidence of the tracked object is greater than or equal to the first confidence threshold, the preliminary judgment result is the missed detection.
4. And for the object of which the association result is the second association result and the tracking confidence of the tracked object is less than or equal to the second confidence threshold, the preliminary judgment result is end. Wherein the second confidence threshold is less than the first confidence threshold.
5. And for the object of which the association result is the second association result and the tracking confidence of the tracked object is smaller than the first confidence threshold and larger than the second confidence threshold, the preliminary judgment result is possible missing detection.
And secondly, determining the discrimination result of each object in the image to be detected according to the preliminary discrimination result of each object in the image to be detected.
Wherein, the discrimination result includes: successful matching, missed detection, false detection, new appearance and end.
1. And determining the discrimination result as the successful matching for the object with the preliminary discrimination result as the successful matching.
2. And determining the object with the preliminary judgment result of missing detection as the object with the preliminary judgment result of missing detection.
3. And determining that the judgment result is the end for the object of which the preliminary judgment result is the end.
4. And determining the discrimination result of the object with the preliminary discrimination result of possible false detection or possible missed detection by combining the discrimination result of the S-frame image adjacent to the image to be detected. The S-frame image adjacent to the image to be detected is an S-frame image behind the image to be detected (i.e. the frame number is greater than that of the image to be detected), or an S-frame image in front of the image to be detected (i.e. the frame number is less than that of the image to be detected). For example, the image to be detected is the t frame image; if the frame number of the residual undetected images to be detected is greater than or equal to a preset frame number S (i.e., t < ═ N-S, N is the total frame number of the sequence images, and S >1), the S frame images adjacent to the images to be detected are S frame images behind the images to be detected, and the judging sequence is the sequence from front to back; and if the frame number of the residual undetected images to be detected is less than a preset frame number S (namely t is greater than N-S, N is the total frame number of the sequence images), the S frame images adjacent to the images to be detected are the S frame images before the images to be detected, and the judging sequence is the sequence from back to front.
Firstly, the primary judgment result is an object which is possibly missed for detection
Judging according to the sequence from the 1 st frame image to the S frame image adjacent to the image to be detected, if the association result of the object corresponding to the object in at least one frame image from the 1 st frame image to the S frame image is a second association result, and the tracking confidence coefficient of the object corresponding to the object is less than or equal to a second confidence threshold value, determining that the judgment result is finished; if the association result of the object corresponding to the object in each frame of images from the 1 st frame of image to the S th frame of image is the second association result, and the tracking confidence of the object corresponding to the object is smaller than the first confidence threshold and larger than the second confidence threshold, determining that the judgment result is finished; and determining that the judgment result is the missed detection for the situations which do not belong to the two situations.
② for the object which is possible to be detected by mistake in the preliminary discrimination result
If the correlation results of the object corresponding to the object are successfully matched and the tracking confidence of the object corresponding to the object is greater than a first confidence threshold value in each frame of images from the 1 st frame of image to the S th frame of image adjacent to the image to be detected, determining that the judgment result is new; for those not, the discrimination result is determined to be false detection.
In one implementation, the device may maintain a temporary template information table that stores information used to distinguish between objects that may have been missed and objects that may have been missed. For example, the temporary template information table may include: image frame number, object serial number, image detection result (e.g., object category, object position, classification accuracy value), missed detection identifier, false detection identifier, possible missed detection identifier, possible false detection identifier, matching success identifier, etc.
And after distinguishing the image to be detected and each frame of image from the 1 st frame of image to the S frame of image adjacent to the image to be detected, updating the temporary template information table. The update rule is as follows:
for the object with the primary judgment result of successful matching, updating the corresponding position information in the temporary template information table by using the detection result of the object, and setting a matching success identifier;
for the object with the preliminary judgment result of missed detection, updating corresponding position information in the temporary template information table by using the tracking result of the object, and setting a missed detection identifier;
setting a false detection identifier for the object with the primary judgment result of false detection;
adding the object information into a temporary template information table for the object which is newly appeared in the preliminary discrimination result, and setting a newly appeared mark;
setting an ending mark for the object with the initial judgment result as the ending object;
for the object with the preliminary judgment result of possible missed detection, updating corresponding position information in the temporary template information table by using the tracking result of the object, and setting a possible missed detection identifier;
and for the object with the preliminary discrimination result of possible false detection, updating the corresponding position information in the temporary template information table by using the detection result of the object, and setting a possible false detection identifier.
Thus, the discrimination result of each object in the image to be detected can be determined based on the temporary template information table.
Illustratively, the image to be detected is the image of the t-th frame. And after the t frame image is judged, updating the temporary template information table. The temporary template information table includes the information shown in table 3.
TABLE 3
Figure BDA0002947729120000101
Determining that the judgment result of the object 1 in the t-th frame image is successful in matching, the judgment result of the object 2 is missed in detection, the judgment result of the object 3 is false in detection, the judgment result of the object 4 is new in appearance and the judgment result of the object 5 is finished according to the temporary template information table; the object 6 is likely to be missed and the object 7 is likely to be false, and the discrimination results of the object 6 and the object 7 may be determined in association with an S (for example, S — 2) frame image adjacent to the t-th frame image.
Further, taking the sequence from front to back as an example, the image to be detected is the (t +1) th frame image. And after the (t +1) th frame image is judged, updating the temporary template information table. The temporary template information table includes the information shown in table 4.
TABLE 4
Figure BDA0002947729120000111
In one implementation, after the (t +1) th frame image is discriminated, information of successful matching, missing detection, false detection and end in the t-th frame can be deleted. Note that, information of other objects than the object 6 and the object 7 in the (t +1) th frame image may be included in table 4. This part is omitted in the present application.
After the (t +1) th frame image is discriminated, the object 7 in the t-th frame image can be determined to be false detection.
Further, taking the sequence from front to back as an example, the image to be detected is the (t +2) th frame image. And after the (t +2) th frame image is judged, updating the temporary template information table. The temporary template information table includes the information shown in table 5.
TABLE 5
Figure BDA0002947729120000112
Note that, information of other objects than the object 6 and the object 7 in the (t +2) th frame image may be included in table 5. This part is omitted in the present application.
After the (t +2) th frame image is discriminated, the object 6 in the t-th frame image may be determined to be the end.
Thus, the discrimination results of the respective objects in the t-th frame image are successfully determined.
S305, obtaining difficult case data according to the judgment result of each object in the image to be detected.
Determining an object with a judgment result of omission in the image to be detected as an omission difficult case; determining an object with a judgment result of false detection in the image to be detected as a false detection difficult case; and acquiring a missed detection difficult case and/or a false detection difficult case, namely acquiring difficult case data. Optionally, missed and/or false misleading incidents may be added to the difficult data set.
In one implementation, the template information table is updated based on the discrimination result of each object in the image to be detected. This allows efficient use of the detected results.
The template information table updating rule is as follows:
for the object with the discrimination result of successful matching, updating the corresponding position information in the template information table by using the detection result of the object;
for the object with the judgment result of missed detection, updating the corresponding position information in the template information table by using the tracking result of the object;
for the object with the misdetection result, the information of the template information table is not updated;
adding the information of the detection result of the object into a template information table for the object with the judgment result of new appearance;
and for the object with the judgment result of ending, removing the information of the object from the template information table.
According to the method for mining the difficult cases in the target detection, the difficult cases in the target detection are mined by combining the single target tracking algorithm and the target detection algorithm, the undetected difficult cases and the false detection difficult cases are judged at the same time, the undetected difficult cases and the false detection difficult cases can be extracted more accurately, and the difficult cases are mined more effectively. The image is analyzed based on the depth semantic features by adopting a single-target tracking algorithm based on deep learning, and the position information of the matching result is effectively predicted; and the detection result and the tracking result are matched by using an associated matching algorithm, so that each detection object in the detection result is effectively ensured, and at most one tracking object in the tracking result corresponds to the detection object; the accuracy of difficult excavation is improved.
The foregoing mainly introduces aspects provided by embodiments of the present application. It is understood that the above-described apparatus includes corresponding hardware structures and/or software modules for performing the respective functions in order to implement the above-described functions. Those of skill in the art would understand that the present application is capable of being implemented in hardware or a combination of hardware and computer software for implementing the various example elements and algorithm steps described in connection with the embodiments disclosed herein. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiment of the present application, according to the method example, functional modules of a device that is difficult to mine in target detection may be divided, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. It should be noted that, in the embodiment of the present application, the division of the module is schematic, and is only one logic function division, and there may be another division manner in actual implementation. The following description will be given taking the example of dividing each functional module corresponding to each function.
Fig. 5 is a schematic logical structure diagram of the apparatus 500 provided in the embodiment of the present application, where the apparatus 500 may be a device for mining difficult cases in target detection, and can implement the method for mining difficult cases in target detection provided in the embodiment of the present application. The apparatus 500 may be a hardware structure, a software module, or a hardware structure plus a software module. As shown in FIG. 5, the apparatus 500 includes an initialization module 501, a target detection module 502, an information update module 503, a single target tracking module 504, and an exception mining module 505.
Referring to fig. 6, in the difficult mining of target detection, the initialization module 501 is used to initialize various information, for example, initialize the frame number of the image to be detected; for example, when processing the 1 st frame image, the frame number of the current image to be detected is initialized to 1. The initialization module 501 may also be used to initialize the temporary template information table and the template information table to null. The target detection module 502 is configured to analyze an image to be detected by using a preset target detection algorithm, so as to obtain a detection result of the image to be detected. The information updating module 503 is configured to update the temporary template information table and the template information table. For example, the detection result of the image to be detected is updated to the temporary template information table and the template information table. The single-target tracking module 504 is configured to analyze the image to be detected by using a preset single-target tracking algorithm, so as to obtain a tracking result of the image to be detected. The difficult-to-detect mining module 505 obtains the discrimination result of each object in the image to be detected according to the detection result output by the target detection module 502, the tracking result output by the single-target tracking module 504, the preset association rule and the discrimination rule. And adding the missed and false detection difficult cases output by the difficult case mining module 505 into the difficult case data set. After the hard case mining module 505 obtains the discrimination result of each object in the image to be detected, the information updating module 503 updates the template information table. The information updating module 503 is further configured to update the temporary template information table when the difficult-to-sample mining module 505 determines that the preliminary determination result is an object that may be detected incorrectly or missed. After the result of distinguishing each object in the current image to be detected is obtained, the initialization module 501 adds 1 to the frame number of the current image to be detected to obtain the next image to be detected.
Fig. 7 is a schematic logical structure diagram of an apparatus 700 provided in the embodiment of the present application, where the apparatus 700 may be a device for performing difficult mining in target detection, and can implement the method for difficult mining in target detection provided in the embodiment of the present application. The apparatus 700 may be a hardware structure, a software module, or a hardware structure plus a software module. As shown in fig. 7, the apparatus 700 includes an image acquisition unit 701, a target detection unit 702, a target tracking unit 703, and a difficult-to-excavate unit 704. The image acquiring unit 701 may be configured to perform S301 in fig. 3 and/or perform other steps described in this application. The object detection unit 702 may be configured to perform S302 in fig. 3, and/or perform other steps described herein. The target tracking unit 703 may be used to perform S303 in fig. 3, and/or perform other steps described in this application. The difficult mining unit 704 may be used to perform S304, S305 in fig. 3, and/or perform other steps described herein.
All relevant contents of each step related to the above method embodiment may be referred to the functional description of the corresponding functional unit, and are not described herein again.
It will be apparent to those skilled in the art that all or part of the steps of the above method may be performed by hardware associated with program instructions, and the program may be stored in a computer readable storage medium such as ROM, RAM, optical disk, etc.
Embodiments of the present application also provide a storage medium, which may include a memory.
For the explanation and beneficial effects of the related content in any one of the above-mentioned apparatuses, reference may be made to the corresponding method embodiments provided above, and details are not repeated here.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented using a software program, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions described in accordance with the embodiments of the present application are all or partially generated upon loading and execution of computer program instructions on a computer. The computer may be a general purpose computer, a special purpose computer, a computer network, a network appliance, a user device, or other programmable apparatus. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or can comprise one or more data storage devices, such as a server, a data center, etc., that can be integrated with the medium. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., Digital Video Disk (DVD)), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
While the present application has been described in connection with various embodiments, other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed application, from a review of the drawings, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps, and the word "a" or "an" does not exclude a plurality. A single processor or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.
Although the present application has been described in conjunction with specific features and embodiments thereof, it will be evident that various modifications and combinations can be made thereto without departing from the spirit and scope of the application. Accordingly, the specification and figures are merely exemplary of the present application as defined in the appended claims and are intended to cover any and all modifications, variations, combinations, or equivalents within the scope of the present application. It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (14)

1. A method for hard case mining in target detection is characterized by comprising the following steps:
acquiring an image to be detected;
analyzing the image to be detected by using a preset target detection algorithm to obtain a detection result of the image to be detected; the detection result comprises: a category of one or more detection objects, a detection location of the one or more detection objects, a classification accuracy value of the one or more detection objects;
analyzing the image to be detected by using a preset single-target tracking algorithm to obtain a tracking result of the image to be detected; the tracking result comprises: a tracking location of one or more tracked objects, a tracking confidence of the one or more tracked objects;
acquiring a correlation result of a first object in the image to be detected according to the detection result, the tracking result and a preset correlation rule; the correlation result comprises: matching is successful, and the first correlation result and the second correlation result are obtained;
the first correlation result is that the first object exists in the detection result, and the first object does not exist in the tracking result;
the second correlation result is that the first object does not exist in the detection result, and the first object exists in the tracking result;
according to a preliminary discrimination rule and a correlation result of a first object in an image to be detected, preliminarily discriminating the first object in the image to be detected to obtain a preliminary discrimination result of the first object in the image to be detected; the preliminary discrimination result includes: matching is successful, detection is missed, false detection is carried out, detection is possibly missed, false detection is possible, and the process is finished;
for the first object with the primary judgment result of successful matching, determining the judgment result as successful matching;
for the first object with the preliminary judgment result of missing detection, determining the judgment result as the missing detection;
determining the first object with the initial judgment result as the end;
for a first object with a preliminary judgment result of possible false detection or possible missed detection, if the number of frames of the remaining undetected images to be detected is greater than or equal to a preset number of frames S, determining a judgment result by combining the judgment results of S-frame images which are adjacent to the images to be detected and have frame numbers greater than the frame numbers of the images to be detected according to the sequence from front to back; if the frame number of the residual undetected image to be detected is less than the preset frame number S, determining a judgment result by combining the judgment results of the S-frame images which are adjacent to the image to be detected and have the frame number less than the frame number of the image to be detected according to the sequence from back to front;
wherein S > 1; the discrimination result includes: successful matching, missed detection, false detection, new appearance and end;
and determining the object with the judgment result of being missed as a missed detection difficult example, and determining the object with the judgment result of being false detected as a false detection difficult example.
2. The method of claim 1, wherein the obtaining of the correlation result of the first object in the image to be detected according to the detection result, the tracking result and a preset correlation rule comprises:
selecting objects with the same category as the objects in the tracking result in the detection result, and constructing a matrix by taking the tracking objects as row detection objects or taking the detection objects as row tracking objects as columns;
and obtaining the correlation result of the first object in the image to be detected by using the matrix by using a correlation matching algorithm.
3. The method of claim 1, wherein the preliminary discriminant rules comprise:
for the first object with the correlation result of successful matching, the preliminary judgment result is successful matching;
for a first object with the correlation result being a first correlation result, the preliminary judgment result is possible false detection;
for the first object of which the correlation result is the second correlation result and the tracking confidence of the tracked object is greater than or equal to the first confidence threshold, the preliminary judgment result is the omission detection;
for the first object of which the association result is the second association result and the tracking confidence of the tracked object is less than or equal to the second confidence threshold, the preliminary judgment result is end; wherein the second confidence threshold is less than the first confidence threshold;
and for the first object of which the association result is the second association result and the tracking confidence of the tracked object is smaller than the first confidence threshold and larger than the second confidence threshold, the preliminary judgment result is possible missing detection.
4. The method of claim 1, wherein the determining the discrimination result by combining the discrimination results of the S-frame images adjacent to the image to be detected for the first object whose preliminary discrimination result is likely to be false-detected or likely to be missed-detected comprises:
judging the first object which is probably missed according to the sequence from the 1 st frame image to the S frame image adjacent to the image to be detected as the primary judgment result,
if the correlation result of the object corresponding to the first object with the preliminary judgment result of possible missed detection is a second correlation result and the tracking confidence coefficient of the object corresponding to the first object with the preliminary judgment result of possible missed detection is less than or equal to a second confidence threshold value in at least one frame of images from the 1 st frame image to the S th frame image adjacent to the image to be detected, determining that the judgment result of the first object with the preliminary judgment result of possible missed detection is finished;
if the association results of the objects corresponding to the first object with the preliminary discrimination result of possible missed detection are the second association results and the tracking confidence degrees of the objects corresponding to the first object with the preliminary discrimination result of possible missed detection are smaller than the first confidence threshold and larger than the second confidence threshold in each frame of images from the 1 st frame of image to the S th frame of image adjacent to the image to be detected, determining that the discrimination result of the first object with the preliminary discrimination result of possible missed detection is end;
otherwise, determining that the primary judgment result is the judgment result of the first object which is possibly missed to be detected as the missed detection.
5. The method of claim 1, wherein the determining the discrimination result by combining the discrimination results of the S-frame images adjacent to the image to be detected for the first object whose preliminary discrimination result is likely to be false-detected or likely to be missed-detected comprises:
for the first object whose preliminary discrimination result is likely to be false-detected,
if the correlation results of the objects corresponding to the first object with the possible false detection as the preliminary judgment result are successfully matched and the tracking confidence degrees of the objects corresponding to the first object with the possible false detection as the preliminary judgment result are greater than a first confidence threshold value in each frame of images from the 1 st frame image to the S th frame image adjacent to the image to be detected, determining that the judgment result of the first object with the possible false detection as the preliminary judgment result is new;
otherwise, determining the judgment result of the first object with the preliminary judgment result of possible false detection as false detection.
6. The method of hard case mining in object detection according to any one of claims 1 to 5,
the preset single-target tracking algorithm is a single-target tracking algorithm based on deep learning.
7. An apparatus for hard case mining in target detection, comprising:
the image acquisition unit is used for acquiring an image to be detected;
the target detection unit is used for analyzing the image to be detected by using a preset target detection algorithm to obtain a detection result of the image to be detected; the detection result comprises: a category of one or more detection objects, a detection location of the one or more detection objects, a classification accuracy value of the one or more detection objects;
the target tracking unit is used for analyzing the image to be detected by using a preset single target tracking algorithm to obtain a tracking result of the image to be detected; the tracking result comprises: a tracking location of one or more tracked objects, a tracking confidence of the one or more tracked objects;
the difficult case mining unit is used for acquiring a correlation result of the first object in the image to be detected according to the detection result, the tracking result and a preset correlation rule; the correlation result comprises: matching is successful, and the first correlation result and the second correlation result are obtained;
the first correlation result is that the first object exists in the detection result, and the first object does not exist in the tracking result;
the second correlation result is that the first object does not exist in the detection result, and the first object exists in the tracking result;
the hard case mining unit is further used for preliminarily judging the first object in the image to be detected according to a preliminary judgment rule and a correlation result of the first object in the image to be detected to obtain a preliminary judgment result of the first object in the image to be detected; the preliminary discrimination result includes: matching is successful, detection is missed, false detection is carried out, detection is possibly missed, false detection is possible, and the process is finished;
for the first object with the primary judgment result of successful matching, determining the judgment result as successful matching;
for the first object with the preliminary judgment result of missing detection, determining the judgment result as the missing detection;
determining the first object with the initial judgment result as the end;
for a first object with a preliminary judgment result of possible false detection or possible missed detection, if the number of frames of the remaining undetected images to be detected is greater than or equal to a preset number of frames S, determining a judgment result by combining the judgment results of S-frame images which are adjacent to the images to be detected and have frame numbers greater than the frame numbers of the images to be detected according to the sequence from front to back; if the frame number of the residual undetected image to be detected is less than the preset frame number S, determining a judgment result by combining the judgment results of the S-frame images which are adjacent to the image to be detected and have the frame number less than the frame number of the image to be detected according to the sequence from back to front; wherein S > 1; the discrimination result includes: successful matching, missed detection, false detection, new appearance and end;
and the hard case mining unit is further used for determining the object with the judgment result of being missed as a missed detection hard case and determining the object with the judgment result of being false detected as a false detection hard case.
8. The apparatus for mining with difficulty in target detection according to claim 7, wherein the means for mining with difficulty for obtaining the association result of the first object in the image to be detected according to the detection result, the tracking result and a preset association rule specifically comprises:
selecting objects with the same category as the objects in the tracking result in the detection result, and constructing a matrix by taking the tracking objects as row detection objects or taking the detection objects as row tracking objects as columns;
and obtaining the correlation result of the first object in the image to be detected by using the matrix by using a correlation matching algorithm.
9. The apparatus of claim 7, wherein the preliminary discriminant rule comprises:
for the first object with the correlation result of successful matching, the preliminary judgment result is successful matching;
for a first object with the correlation result being a first correlation result, the preliminary judgment result is possible false detection;
for the first object of which the correlation result is the second correlation result and the tracking confidence of the tracked object is greater than or equal to the first confidence threshold, the preliminary judgment result is the omission detection;
for the first object of which the association result is the second association result and the tracking confidence of the tracked object is less than or equal to the second confidence threshold, the preliminary judgment result is end; wherein the second confidence threshold is less than the first confidence threshold;
and for the first object of which the association result is the second association result and the tracking confidence of the tracked object is smaller than the first confidence threshold and larger than the second confidence threshold, the preliminary judgment result is possible missing detection.
10. The apparatus according to claim 7, wherein the determining, by the hard-case mining unit, the discrimination result by combining the discrimination result of the S-frame image adjacent to the image to be detected with respect to the first object whose preliminary discrimination result is likely to be false-detected or likely to be missed-detected specifically includes:
judging the first object which is probably missed according to the sequence from the 1 st frame image to the S frame image adjacent to the image to be detected as the primary judgment result,
if the correlation result of the object corresponding to the first object with the preliminary judgment result of possible missed detection is a second correlation result and the tracking confidence coefficient of the object corresponding to the first object with the preliminary judgment result of possible missed detection is less than or equal to a second confidence threshold value in at least one frame of images from the 1 st frame image to the S th frame image adjacent to the image to be detected, determining that the judgment result of the first object with the preliminary judgment result of possible missed detection is finished;
if the association results of the objects corresponding to the first object with the preliminary discrimination result of possible missed detection are the second association results and the tracking confidence degrees of the objects corresponding to the first object with the preliminary discrimination result of possible missed detection are smaller than the first confidence threshold and larger than the second confidence threshold in each frame of images from the 1 st frame of image to the S th frame of image adjacent to the image to be detected, determining that the discrimination result of the first object with the preliminary discrimination result of possible missed detection is end;
otherwise, determining that the primary judgment result is the judgment result of the first object which is possibly missed to be detected as the missed detection.
11. The apparatus according to claim 7, wherein the determining, by the hard-case mining unit, the discrimination result by combining the discrimination result of the S-frame image adjacent to the image to be detected with respect to the first object whose preliminary discrimination result is likely to be false-detected or likely to be missed-detected specifically includes:
for the first object whose preliminary discrimination result is likely to be false-detected,
if the correlation results of the objects corresponding to the first object with the possible false detection as the preliminary judgment result are successfully matched and the tracking confidence degrees of the objects corresponding to the first object with the possible false detection as the preliminary judgment result are greater than a first confidence threshold value in each frame of images from the 1 st frame image to the S th frame image adjacent to the image to be detected, determining that the judgment result of the first object with the possible false detection as the preliminary judgment result is new;
otherwise, determining the judgment result of the first object with the preliminary judgment result of possible false detection as false detection.
12. The apparatus for hard mining in object detection according to any one of claims 7 to 11,
the preset single-target tracking algorithm is a single-target tracking algorithm based on deep learning.
13. An electronic device, characterized in that the electronic device comprises: a processor and a memory; the memory is coupled with the processor; the memory for storing computer program code; the computer program code comprises computer instructions which, when executed by the processor, cause the electronic device to perform the method of hard mining in object detection as claimed in any one of claims 1 to 6.
14. A computer-readable storage medium comprising computer instructions that, when executed on a device, cause the device to perform the method of hard mining in object detection as claimed in any one of claims 1-6.
CN202080004676.1A 2020-04-24 2020-04-24 Method and device for difficult mining in target detection Active CN112639872B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/086742 WO2021212482A1 (en) 2020-04-24 2020-04-24 Method and apparatus for mining difficult case during target detection

Publications (2)

Publication Number Publication Date
CN112639872A CN112639872A (en) 2021-04-09
CN112639872B true CN112639872B (en) 2022-02-11

Family

ID=75291201

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080004676.1A Active CN112639872B (en) 2020-04-24 2020-04-24 Method and device for difficult mining in target detection

Country Status (2)

Country Link
CN (1) CN112639872B (en)
WO (1) WO2021212482A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113361413A (en) * 2021-06-08 2021-09-07 南京三百云信息科技有限公司 Mileage display area detection method, device, equipment and storage medium
CN113468365B (en) * 2021-09-01 2022-01-25 北京达佳互联信息技术有限公司 Training method of image type recognition model, image retrieval method and device
CN115359308B (en) * 2022-04-06 2024-02-13 北京百度网讯科技有限公司 Model training method, device, equipment, storage medium and program for identifying difficult cases
CN117710944A (en) * 2024-02-05 2024-03-15 虹软科技股份有限公司 Model defect detection method, model training method, target detection method and target detection system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107516303A (en) * 2017-09-01 2017-12-26 成都通甲优博科技有限责任公司 Multi-object tracking method and system
CN108053427A (en) * 2017-10-31 2018-05-18 深圳大学 A kind of modified multi-object tracking method, system and device based on KCF and Kalman
CN108647587A (en) * 2018-04-23 2018-10-12 腾讯科技(深圳)有限公司 Demographic method, device, terminal and storage medium
CN109460702A (en) * 2018-09-14 2019-03-12 华南理工大学 Passenger's abnormal behaviour recognition methods based on human skeleton sequence
CN110751096A (en) * 2019-10-21 2020-02-04 陕西师范大学 Multi-target tracking method based on KCF track confidence
CN110852283A (en) * 2019-11-14 2020-02-28 南京工程学院 Helmet wearing detection and tracking method based on improved YOLOv3

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9092841B2 (en) * 2004-06-09 2015-07-28 Cognex Technology And Investment Llc Method and apparatus for visual detection and inspection of objects
US8712096B2 (en) * 2010-03-05 2014-04-29 Sri International Method and apparatus for detecting and tracking vehicles
CN104123532B (en) * 2013-04-28 2017-05-10 浙江大华技术股份有限公司 Target object detection and target object quantity confirming method and device
CN105046220A (en) * 2015-07-10 2015-11-11 华为技术有限公司 Multi-target tracking method, apparatus and equipment
CN106874894B (en) * 2017-03-28 2020-04-14 电子科技大学 Human body target detection method based on regional full convolution neural network
EP3540634A1 (en) * 2018-03-13 2019-09-18 InterDigital CE Patent Holdings Method for audio-visual events classification and localization and corresponding apparatus computer readable program product and computer readable storage medium
CN108446622A (en) * 2018-03-14 2018-08-24 海信集团有限公司 Detecting and tracking method and device, the terminal of target object
CN108647577B (en) * 2018-04-10 2021-04-20 华中科技大学 Self-adaptive pedestrian re-identification method and system for difficult excavation
CN109635649B (en) * 2018-11-05 2022-04-22 航天时代飞鸿技术有限公司 High-speed detection method and system for unmanned aerial vehicle reconnaissance target

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107516303A (en) * 2017-09-01 2017-12-26 成都通甲优博科技有限责任公司 Multi-object tracking method and system
CN108053427A (en) * 2017-10-31 2018-05-18 深圳大学 A kind of modified multi-object tracking method, system and device based on KCF and Kalman
CN108647587A (en) * 2018-04-23 2018-10-12 腾讯科技(深圳)有限公司 Demographic method, device, terminal and storage medium
CN109460702A (en) * 2018-09-14 2019-03-12 华南理工大学 Passenger's abnormal behaviour recognition methods based on human skeleton sequence
CN110751096A (en) * 2019-10-21 2020-02-04 陕西师范大学 Multi-target tracking method based on KCF track confidence
CN110852283A (en) * 2019-11-14 2020-02-28 南京工程学院 Helmet wearing detection and tracking method based on improved YOLOv3

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于生成对抗网络和线上难例挖掘的SAR图像舰船目标检测;李健伟等;《电子与信息学报》;20190131;第41卷(第1期);第143-149页 *

Also Published As

Publication number Publication date
CN112639872A (en) 2021-04-09
WO2021212482A1 (en) 2021-10-28

Similar Documents

Publication Publication Date Title
CN112639872B (en) Method and device for difficult mining in target detection
US11854237B2 (en) Human body identification method, electronic device and storage medium
CN108229322A (en) Face identification method, device, electronic equipment and storage medium based on video
US9886762B2 (en) Method for retrieving image and electronic device thereof
US11436447B2 (en) Target detection
CN111275011B (en) Mobile traffic light detection method and device, electronic equipment and storage medium
CN112560862B (en) Text recognition method and device and electronic equipment
CN112380981A (en) Face key point detection method and device, storage medium and electronic equipment
US11915500B2 (en) Neural network based scene text recognition
CN113222942A (en) Training method of multi-label classification model and method for predicting labels
CN111581423A (en) Target retrieval method and device
CN112818314A (en) Equipment detection method, device, equipment and storage medium
CN115049954B (en) Target identification method, device, electronic equipment and medium
CN111950345A (en) Camera identification method and device, electronic equipment and storage medium
CN115359308A (en) Model training method, apparatus, device, storage medium, and program for identifying difficult cases
CN112857746A (en) Tracking method and device of lamplight detector, electronic equipment and storage medium
CN112215271A (en) Anti-occlusion target detection method and device based on multi-head attention mechanism
CN115482436B (en) Training method and device for image screening model and image screening method
US20220392192A1 (en) Target re-recognition method, device and electronic device
CN114429631B (en) Three-dimensional object detection method, device, equipment and storage medium
CN112819859B (en) Multi-target tracking method and device applied to intelligent security
CN112560459B (en) Sample screening method, device, equipment and storage medium for model training
CN110826448B (en) Indoor positioning method with automatic updating function
CN108133206B (en) Static gesture recognition method and device and readable storage medium
CN113869317A (en) License plate recognition method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant