WO2022183780A1 - 一种目标标注方法和一种目标标注装置 - Google Patents

一种目标标注方法和一种目标标注装置 Download PDF

Info

Publication number
WO2022183780A1
WO2022183780A1 PCT/CN2021/131971 CN2021131971W WO2022183780A1 WO 2022183780 A1 WO2022183780 A1 WO 2022183780A1 CN 2021131971 W CN2021131971 W CN 2021131971W WO 2022183780 A1 WO2022183780 A1 WO 2022183780A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
prediction
classification network
frame
prediction frame
Prior art date
Application number
PCT/CN2021/131971
Other languages
English (en)
French (fr)
Inventor
冯扬扬
张文超
刘杰
张一凡
Original Assignee
歌尔股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 歌尔股份有限公司 filed Critical 歌尔股份有限公司
Publication of WO2022183780A1 publication Critical patent/WO2022183780A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/62Analysis of geometric attributes of area, perimeter, diameter or volume
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Definitions

  • the invention relates to the technical field of target labeling, in particular to a target labeling method and a target labeling device.
  • a target labeling method includes:
  • the classification network is trained from the marked target and its background image; if the classification result of the prediction frame is consistent with the prediction category of the prediction frame, the prediction The box information is written to the annotation file.
  • a target labeling device including:
  • a processor a processor
  • a memory arranged to store computer-executable instructions that, when executed, cause the processor to perform the target annotation method above.
  • FIG. 1 is a schematic flowchart of a target labeling method provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of a comparison between a predicted frame and a real frame provided by an embodiment of the present application
  • FIG. 3 is a schematic diagram of a classification network structure of a target labeling method provided by an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of a target labeling method provided by another embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a target labeling apparatus according to an embodiment of the present application.
  • the terms “installed”, “connected” and “connected” should be understood in a broad sense, for example, it may be a fixed connection or a detachable connection Connection, or integral connection; can be mechanical connection, can also be electrical connection; can be directly connected, can also be indirectly connected through an intermediate medium, can be internal communication between two elements.
  • installed should be understood in a broad sense, for example, it may be a fixed connection or a detachable connection Connection, or integral connection; can be mechanical connection, can also be electrical connection; can be directly connected, can also be indirectly connected through an intermediate medium, can be internal communication between two elements.
  • the technical idea of the present application is: in the labeling process, the trained model is used to label the prediction picture set with some labelled pictures, and the prediction frame whose coincidence degree with the labelled target is lower than the preset value or lacks the labelled target is screened out. , use the classification network for classification inspection, realize the intelligent realization and result inspection of target labeling, so as to replace manual detection and improve the efficiency and accuracy of target labeling
  • FIG. 1 is a schematic flowchart of a target labeling method provided by an embodiment of the present application. As shown in Figure 1, the target labeling method includes:
  • Step S110 use the trained target detection model to label the predicted picture set to obtain prediction frame information, and the predicted picture set includes labeled pictures and unlabeled pictures.
  • the target detection model when used to label pictures, the target detection model is used to label the labelled pictures and the pictures to be labelled at the same time, and the obtained prediction frame information is labelled. callout.
  • Step S120 from the prediction frame, screen out the prediction frame whose degree of coincidence with the marked target is lower than the preset value or lacks the marked target.
  • This type of prediction frame is a prediction frame that may have errors, that is, FP (False Positives, false positive samples) type information. In this embodiment, it is necessary to further judge the type of prediction result to determine whether this type of prediction frame information is really needed. Labeled target.
  • Step S130 use the built and trained classification network to classify the selected prediction frame, and the classification network is obtained by training the marked target and its background image; if the classification result of the prediction frame is consistent with the prediction category of the prediction frame, then Write the prediction box information to the annotation file.
  • the classification network trained by the marked target and its background image is used to classify the prediction frame of this class, and check whether the classification result is consistent with the prediction class of the prediction frame. Whether the annotation prediction of the object detection model is correct.
  • the classification result of the prediction frame is consistent with the prediction category of the prediction frame, it can be considered that the prediction frame information is accurate, and can be written into a label file to realize automatic labeling of the picture target.
  • the target detection model used in step S110 may be any existing detection model with better performance, such as the yolov4 model.
  • the target detection model used in step S110 may be any existing detection model with better performance, such as the yolov4 model.
  • a known simple IoU (Intersection over Union, intersection ratio) calculation method can be used, and the calculation formula of IoU is as follows:
  • A is the area of the real frame (in this application, the marked frame of the marked picture can be understood as the real frame)
  • B is the area of the predicted frame
  • IoU is the intersection ratio between the real frame and the predicted frame
  • a ⁇ B is The area of the intersection of A and B
  • a ⁇ B is the area of the union of A and B.
  • the comparison diagram between the prediction frame and the real frame is shown in Figure 2.
  • other improved calculation methods that express the degree of coincidence between the predicted frame and the real frame can also be used in this application.
  • the target labeling method of this embodiment before using the classification network to classify the selected prediction frame, further includes: calculating the area and the minimum side length of the prediction frame, if the area of the prediction frame and the minimum value of the side length are calculated. If the minimum side length is smaller than the minimum area and minimum side length of the target labeling box corresponding to its prediction category, the prediction box is sent to the classification network for classification, otherwise, the prediction box information is discarded.
  • the preliminary verification of the prediction frame is realized.
  • the minimum area and minimum side length of the prediction frame are both smaller than the minimum area and minimum side length of the target labeling frame corresponding to its prediction category, only It is tentatively believed that the predicted frame may be a real target that needs to be labeled, and further classification and verification are required.
  • the training process of the classification network used in step S130 is as follows: for each type of target, the labeling frame of the marked target and the background image of the preset area outside the labeling frame of the target are respectively intercepted, and input to the classification training in the network.
  • the targets to be detected usually include many categories, and all kinds of targets need to collect samples for training.
  • the manually marked target (such as product defect) picture can be cut and classified from the original picture according to the mark table information and mark category in the mark file, and outside the mark frame, cut 2 -3 times the unlabeled part of the labeled picture is used as the background class to form the training set of the classification network, and the classification network is trained.
  • step S130 using the built and trained classification network to classify the selected prediction frame, including: cutting out the selected prediction frame from the picture, and Record the interception position and prediction category; send the intercepted prediction frame picture to the classification network for classification, and determine whether the prediction frame picture is a background image or a target image. If it is a target image, determine whether the classification result of the prediction frame picture and the prediction category are not Consistent.
  • a classification network is built by using a residual module combined with a cross-stage local connection module, and the residual module is set in the cross-stage local connection module.
  • Combining the classification network with the residual module and the cross-stage local connection module can maximize the learning of defect features on the shallow network while shortening the computation time.
  • the directly connected convolutional layer or fully connected layer will have more or less information loss, loss and other problems during information transfer.
  • the residual module can solve this problem to a certain extent. By directly passing the input information to the output, Protect the integrity of information, so that only the part of the difference between input and output needs to be learned, which simplifies learning goals and difficulty.
  • the cross-stage local connection module by dividing the gradient flow, makes the gradient flow propagate through different network paths, and then splicing and connecting through concat, combining the features of different gradients or receptive fields, and reducing the amount of calculation.
  • FIG. 3 shows a schematic diagram of a classification network structure according to an embodiment of an object labeling method of the present application.
  • the classification network is constructed by combining the residual module and the cross-stage local connection module.
  • 340, the second fully connected layer 350 and the classifier layer 360 are connected in sequence to realize the construction of a classification network.
  • Each convolution layer adopts a 3x3 convolution kernel Conv 3x3, the image data input by the classification network is 64*64 size, 3 channels (64*64*3), the first fully connected layer 340 is represented as fc 128, and the second fully connected layer 350 is denoted fc, (where fc is abbreviated from "fully connected”), and the classifier layer 360 is denoted softmax.
  • the dashed box part is the cross-stage local connection module 320 , and the cross-stage local connection module 320 divides the gradient flow to make the gradient flow along different network paths (ie, the first branch B1 and the second branch B2 ) Propagation takes place and then concatenation is done through the concatenation layer 324 (denoted as concat, concatenation). Therefore, the embodiment of the present application adopts the cross-stage local connection module 320, which combines the features of different gradients or receptive fields, and also reduces the amount of calculation.
  • the part marked by the solid line frame is the residual module 322.
  • the residual module 322 has two channels (the first channel p1 and the second channel p2) with different numbers of convolutional layers.
  • the residual module 322 By detouring the input information to the output, the integrity of the information is protected as much as possible, so that only the part of the difference between the input and output needs to be learned, which simplifies the learning goal and difficulty.
  • the specific structures of the cross-stage local connection module 320 and the residual module 322 are as follows:
  • the cross-stage local connection module 322 includes a first branch B1 and a second branch B2, the first branch B1 includes a third convolutional layer 321, a residual module 322 and a fourth convolutional layer 323 connected in sequence; the second branch B2 includes Fifth convolutional layer 325. After the first branch B1 and the second branch B2 process the information, the information is spliced and output through the splicing layer 324 .
  • the residual module 322 in the first branch includes a first channel p1 and a second channel p2, the first channel p1 includes a sixth convolutional layer 322-3; the second channel p2 includes a seventh convolutional layer 322-1 connected in sequence and the eighth convolutional layer 322-2.
  • the first channel p1 and the second channel p2 add and output the information after processing to realize the combination of different channel mappings.
  • the classification network constructed by the embodiment shown in FIG. 3 of the present application is only a preferred mode of the present application, and does not mean that the present application can only use the classification network structure. In actual use, it can be adapted and modified according to the size of the target to be identified, etc., to determine the depth of the classification network.
  • a classification network may also be constructed by using a residual module combined with an image pyramid module, and the residual module is arranged in the image pyramid module.
  • the specific structure can be set by those skilled in the art according to the existing network structure with reference to the above example in FIG. 3 , and details are not repeated here.
  • the target labeling method of this embodiment further includes: adding a picture corresponding to a prediction frame whose classification result is consistent with the predicted category as a sample, and adding it to the training set of the classification network to supplement the training classification network, thereby Continuously realize the training evolution of the classification network and learn new labeling and classification rules.
  • the target labeling method of this embodiment further includes: when it is determined that the classification result of the prediction frame is inconsistent with the prediction category of the prediction frame, checking the number of training rounds of the classification network, if the number of training rounds of the classification network is If the preset number of rounds is not reached, continue to perform the next round of training on the classification network, and use the retrained classification network to classify the prediction frame.
  • the target labeling method of this embodiment further includes: when it is determined that the classification result of the prediction frame is inconsistent with the prediction category of the prediction frame, checking the number of training rounds of the classification network, if the number of training rounds of the classification network is If the preset number of rounds is not reached, continue to perform the next round of training on the classification network, and use the retrained classification network to classify the prediction frame.
  • the target labeling method in this embodiment further includes: for a prediction frame whose coincidence degree between the prediction frame and the labelled target is not less than a preset value, the prediction frame information of the prediction frame is directly written into the labeling file.
  • the prediction frame information of the prediction frame is directly written into the labeling file.
  • the prediction frame when the target detection model is used to label the prediction picture collection, there is also a prediction frame whose degree of overlap with the labelled target is not less than a preset value. Label the prediction frame with a higher IoU value than the target intersection.
  • This type of prediction frame is called a TP (True Positives, true positive sample) class prediction frame, which indicates a frame that needs to be labeled and has been successfully labeled, so no further processing is required, just write Enter the annotation file.
  • FIG. 4 shows a schematic flowchart of a target labeling method provided by an embodiment of the present application.
  • the method starts from the output FP prediction frame of the target detection model in step S410, and checks the FP prediction frame.
  • the target detection model used can be an existing target detection model with better performance, such as yolov4.
  • some preparatory work needs to be done in this embodiment that is, through steps S401, S402 and S403, the frame and the background of the picture are intercepted, and the intercepted content is reserved as the inspection material, and the picture that has been accurately labeled is passed through step S404.
  • a classification network is constructed through step S405, and a classification network that can be used for testing is obtained by training the above-mentioned training set.
  • the specific FP class prediction box testing process is as follows:
  • step S420 a preliminary judgment is made on the screened FP class prediction frame, and by comparing with the content of the label frame intercepted by various targets A, B... The prediction frame of the frame.
  • these FP-type prediction frames may be the real targets to be labeled, and they are cut out from the picture and numbered and recorded, and the source of the cut picture and the prediction category of the prediction frame are recorded.
  • step S430 input the prediction frame that is preliminarily judged to be the real desired target into the trained classification network, perform classification inspection, and determine whether the classification category is consistent with the predicted category through classification inspection.
  • step S440 the predicted frame information is written into the marking file to complete the marking.
  • step S450 if it is detected that the classification network does not reach the preset number of rounds when the predicted category is inconsistent with the category predicted by the classification network, the classification network is retrained to improve the accuracy of the classification network.
  • the above embodiment realizes automatic and accurate labeling of the target through the screening and classification verification of the prediction frame, which can replace a large amount of manual labeling work and improve labeling efficiency and accuracy.
  • the present application also discloses a target labeling device, which is used to implement any of the above target labeling methods.
  • FIG. 5 is a schematic structural diagram of a target labeling apparatus according to an embodiment of the present application.
  • the target labeling apparatus includes a processor, and optionally an internal bus, a network interface, and a memory.
  • the memory may include memory, such as high-speed random-access memory (Random-Access Memory, RAM), or may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
  • RAM Random-Access Memory
  • non-volatile memory such as at least one disk memory.
  • the target labeling device may also include hardware required by other services.
  • the processor, network interface and memory can be connected to each other through an internal bus, which can be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus or an EISA (Extended Component Interconnect Standard) bus. Industry Standard Architecture, extended industry standard structure) bus, etc.
  • the bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one bidirectional arrow is shown in FIG. 5, but it does not mean that there is only one bus or one type of bus.
  • the program may include program code, and the program code includes computer operation instructions.
  • the memory may include memory and non-volatile memory and provide instructions and data to the processor.
  • the processor reads the corresponding computer program from the non-volatile memory into the memory and then executes it, forming a target marking device on a logical level.
  • the processor executes the program stored in the memory, and is specifically used to perform the following operations:
  • the trained target detection model uses the trained target detection model to annotate the predicted image set to obtain the predicted frame information.
  • the predicted image set contains labeled images and unlabeled images; from the predicted frame, the degree of coincidence with the labeled target is selected to be lower than the preset value. Or lack the prediction frame of the labeled target; use the built and trained classification network to classify the selected prediction frame, and the classification network is trained from the labeled target and its background image; if the classification result of the prediction frame is the same as the prediction frame If the prediction category is the same, the prediction box information will be written into the annotation file.
  • the processor executes the program stored in the memory, and is further configured to execute: before using the classification network to classify the selected prediction frame, calculate the minimum value of the area and side length of the prediction frame, if The minimum area and side length of the prediction box are smaller than the minimum area and minimum side length of the target labeling box corresponding to its prediction category, then the prediction box is sent to the classification network for classification, otherwise, the prediction box information is discarded.
  • the processor executes the program stored in the memory, and the process of training the classification network is as follows: for each type of target, respectively intercepting the marked frame of the marked target and the predetermined area outside the marked frame of the target.
  • the background image is fed into the classification network for training.
  • the processor executes the program stored in the memory, and uses the built and trained classification network to classify the selected prediction frame, including: for the selected prediction frame, changing the prediction frame from the Take out the picture from the picture, and record the interception position and prediction category; send the intercepted picture of the prediction frame to the classification network for classification, and determine whether the picture of the prediction box is the background image or the target image. If it is the target image, determine the picture of the prediction box. Whether the classification results are consistent with the predicted categories.
  • the processor executes the program stored in the memory, and the process of building the classification network is as follows: the classification network is built by using the residual module combined with the cross-stage local connection module, and the residual module is set in the cross-stage local connection module. In the connection module; or, the classification network is built by combining the residual module with the image pyramid module, and the residual module is set in the image pyramid module.
  • the processor executes the program stored in the memory to build a classification network by using a residual module combined with a cross-stage local connection module, and connects the first convolutional layer, the cross-stage local connection module, the cross-stage local connection module, The second convolutional layer, the first fully connected layer, the second fully connected layer and the classifier layer are connected in sequence;
  • the cross-stage local connection module includes a first branch and a second branch, and the first branch includes a third convolutional layer connected in sequence , residual module and the fourth convolution layer, the second branch includes the fifth convolution layer, the first branch and the second branch splicing and output the information after processing;
  • the residual module includes the first channel and the second channel, the first One channel includes the sixth convolutional layer, the second channel includes the seventh convolutional layer and the eighth convolutional layer which are connected in sequence, the first channel and the second channel add and output the information after processing.
  • the processor executes the program stored in the memory, and is further configured to perform the following operation: take the picture corresponding to the prediction frame whose classification result is consistent with the predicted category as a sample, and add it to the training set of the classification network , to supplement the training classification network.
  • the processor executing the program stored in the memory, is further configured to perform the following operations: when it is determined that the classification result of the prediction frame is inconsistent with the prediction category of the prediction frame, check the number of training rounds of the classification network , if the number of training rounds of the classification network does not reach the preset number of rounds, continue to perform the next round of training on the classification network, and use the retrained classification network to classify the prediction frame.
  • the processor executes the program stored in the memory, and is further configured to perform the following operation: for a prediction frame whose degree of coincidence between the prediction frame and the marked target is not less than a preset value, use the prediction frame information of the prediction frame. Write directly to the annotation file.
  • the method performed by the target labeling apparatus disclosed in the above-mentioned embodiments of the present application may be applied to a processor, or implemented by a processor.
  • a processor may be an integrated circuit chip with signal processing capabilities.
  • each step of the above-mentioned method can be completed by a hardware integrated logic circuit in a processor or an instruction in the form of software.
  • the above-mentioned processor can be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; it can also be a digital signal processor (Digital Signal Processor, DSP), dedicated integrated Circuit (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
  • the storage medium is located in the memory, and the processor reads the information in the memory, and completes the steps of the above method in combination with its hardware.
  • An embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium stores one or more programs, and the one or more programs include instructions, and the instructions, when used by a target tagging device including multiple application programs When executed, the target labeling device can be made to execute the method performed by the target labeling device in the above-described embodiment, and is specifically used to execute:
  • the trained target detection model uses the trained target detection model to annotate the predicted image set to obtain the predicted frame information.
  • the predicted image set contains labeled images and unlabeled images; from the predicted frame, the degree of coincidence with the labeled target is selected to be lower than the preset value. Or lack the prediction frame of the labeled target; use the built and trained classification network to classify the selected prediction frame, and the classification network is trained from the labeled target and its background image; if the classification result of the prediction frame is the same as the prediction frame If the prediction category is the same, the prediction box information will be written into the annotation file.
  • the embodiments of the present application may be provided as a method, a system, or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions
  • the apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • Memory may include forms of non-persistent memory, random access memory (RAM) and/or non-volatile memory in computer readable media, such as read only memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
  • RAM random access memory
  • ROM read only memory
  • flash RAM flash memory
  • Computer-readable media includes both persistent and non-permanent, removable and non-removable media, and storage of information may be implemented by any method or technology.
  • Information may be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • computer-readable media does not include transitory computer-readable media, such as modulated data signals and carrier waves.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种目标标注方法和目标标注装置。方法包括:使用训练后的目标检测模型标注预测图片集合,得到预测框信息,预测图片集合中包含已标注的图片和未标注的图片;从预测框中,筛选出与已标注目标重合程度低于预设值或缺少已标注目标的预测框;使用搭建并训练过的分类网络对筛选出的预测框进行分类,分类网络由已标注目标及其背景图像训练得到;若该预测框的分类结果与该预测框的预测类别一致,则将该预测框信息写入标注文件。

Description

一种目标标注方法和一种目标标注装置 技术领域
本发明涉及目标标注技术领域,特别涉及一种目标标注方法和一种目标标注装置。
发明背景
近几年,传统制造正逐渐向智能制造转变。传统的人工标注,容易随着标注人员疲惫的增加造成精确度与效率的降低,并且人工标注容易伴随着判断标准变化而产生不稳定。首先,人为判断的标准不能保证持续稳定,其次,人工容易受工作量大的影响,产生疲惫与积极性的下降,从而导致工作的准确性下降,因此如果在大批量(几千到上万张图片需要标注)需要标注的工作面前,就需要借助智能办法实现。深度学习是智能制造的重要一环,因此用于深度学习的图像正确和快速的标注,显得较为重要。
发明内容
鉴于现有技术大批量图像人工标注工作易出现错误、不稳定的问题,提出了本申请实施例的一种目标标注方法和一种目标标注装置,以便克服上述问题。
为了实现上述目的,本申请实施例采用了如下技术方案:
依据本申请实施例的一个方面,提供了一种目标标注方法,该方法包括:
使用训练后的目标检测模型标注预测图片集合,得到预测框信息,预测图片集合中包含已标注的图片和未标注的图片;
从预测框中,筛选出与已标注目标重合程度低于预设值或缺少已标注目标的预测框;
使用搭建并训练过的分类网络对筛选出的预测框进行分类,分类网络由已标注目标及其背景图像训练得到;若该预测框的分类结果与该预测框的预测类别一致,则将该预测框信息写入标注文件。
依据本申请实施例的另一个方面,提供了一种目标标注装置,包括:
处理器;以及被安排成存储计算机可执行指令的存储器,该可执行指令在被执行时使处理器执行如上的目标标注方法。
综上所述,本申请的有益效果是:
使用训练后的模型标注带有部分已标注图片的预测图片集合,筛选出与已 标注目标重合程度低于预设值或缺少已标注目标的预测框,利用分类网络进行分类检验,实现对目标标注的智能实现和结果检验,替代人工检测,提高目标标注的效率和准确度。
附图简要说明
图1为本申请一个实施例提供的一种目标标注方法的流程示意图;
图2为本申请一个实施例提供的一种预测框与真实框对照示意图;
图3为本申请一个实施例提供的一种目标标注方法的分类网络结构示意图;
图4为本申请另一个实施例提供的一种目标标注方法的流程示意图;
图5为本申请一个实施例提供的一种目标标注装置的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
在本申请的描述中,需要说明的是,术语“中心”、“上”、“下”、“左”、“右”、“竖直”、“水平”、“内”、“外”等指示的方位或位置关系为基于附图所示的方位或位置关系,仅是为了便于描述本申请和简化描述,而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作,因此不能理解为对本申请的限制。此外,术语“第一”、“第二”、“第三”仅用于描述目的,而不能理解为指示或暗示相对重要性。
在本申请的描述中,需要说明的是,除非另有明确的规定和限定,术语“安装”、“相连”、“连接”应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或一体地连接;可以是机械连接,也可以是电连接;可以是直接相连,也可以通过中间媒介间接相连,可以是两个元件内部的连通。对于本领域的普通技术人员而言,可以具体情况理解上述术语在本申请中的具体含义。
本申请的技术构思是:在标注过程中,使用训练后的模型标注带有部分已标注图片的预测图片集合,筛选出与已标注目标重合程度低于预设值或缺少已标注目标的预测框,利用分类网络进行分类检验,实现对目标标注的智能实现和结果检验,从而替代人工检测,提高目标标注的效率和准确度
图1为本申请一个实施例提供的一种目标标注方法的流程示意图。如图1所示,该目标标注方法,包括:
步骤S110,使用训练后的目标检测模型标注预测图片集合,得到预测框信 息,预测图片集合中包含已标注的图片和未标注的图片。
本实施例在使用目标检测模型标注图片时,同时使用目标检测模型标注已标注的图片和待标注的图片,标注获得的预测框信息,进一步检验后用于写入图片的标注文件,实现图片自动标注。
步骤S120,从预测框中,筛选出与已标注目标重合程度低于预设值或缺少已标注目标的预测框。
该类预测框为可能存在错误的预测框,即FP(False Positives,假的正样本)类信息,本实施例需要对该类预测结果进行进一步判断,以确定该类预测框信息是否是真正需要标注的目标。
步骤S130,使用搭建并训练过的分类网络对筛选出的预测框进行分类,分类网络由已标注目标及其背景图像训练得到;若该预测框的分类结果与该预测框的预测类别一致,则将该预测框信息写入标注文件。
在步骤S120筛选出FP类的预测框后,利用由已标注目标及其背景图像训练得到的分类网络对该类预测框进行分类处理,通过比较分类结果和预测框的预测类别是否一致,来检验目标检测模型的标注预测是否正确。当该预测框的分类结果与该预测框的预测类别一致时,则可认为该预测框信息准确,可写入标注文件,实现图片目标自动标注。
在本申请的一个实施例中,步骤S110中所用的目标检测模型可以是目前现有的性能较好的任何检测模型,例如yolov4模型等。对于步骤S120中预测框结果的筛选,可以采用已知简单的IoU(Intersection over Union,交并比)计算方式,IoU的计算公式如下:
Figure PCTCN2021131971-appb-000001
其中,A为真实框(在本申请中已标注图片的标注框即可理解为真实框)的面积,B为预测框的面积,IoU为真实框与预测框的交并比,A∩B为A、B交集的面积,A∪B为A、B并集的面积,预测框与真实框的对照示意图见图2所示。当然,其他改进的、表现预测框与真实框的重合程度的计算方法也可用于本申请。
在本申请的一个实施例中,在使用分类网络对筛选出的预测框进行分类之前,本实施例目标标注方法还包括:计算预测框的面积和边长最小值,若该预测框的面积和边长最小值均小于其预测类别对应的目标标注框最小面积和最小 边长,则将该预测框送入分类网络中进行分类,否则,将该预测框信息丢弃。
通过对预测框面积尺寸和边长尺寸的判断,实现对预测框的初步验证,当预测框的最小面积和最小边长均小于其预测类别对应的目标标注框最小面积和最小边长时,才暂时认为该预测框可能为真实需要标注的目标,需要进一步分类验证。
在本申请的一个实施例中,步骤S130中所用分类网络的训练过程为:对每一类目标,分别截取已标注目标的标注框以及该目标标注框外预设面积的背景图像,输入到分类网络中进行训练。
待检测的目标通常包括很多类,各类目标均需要收集样本进行训练。本实施例的一个实施例中,可以将人工标注好的目标(如产品缺陷)图片,按照标注文件中的标注框标表信息和标注类别从原图片中截取分类,在标注框外,截取2-3倍的已标注图片中未标注部分的图片作为背景类,形成分类网络的训练集,对分类网络进行训练。
在本申请的一个实施例中,步骤S130中,使用搭建并训练过的分类网络对筛选出的预测框进行分类,包括:对筛选出的预测框,将该预测框从图片中截取出来,并记录截取位置和预测类别;将截取的预测框图片送入分类网络中进行分类,判定该预测框图片是背景图像还是目标图像,若为目标图像,判断该预测框图片的分类结果与预测类别是否一致。
通过对目标检测模型标注得到的预测框进一步进行分类,验证分类类别是否与预测框的预测类别一致,从而验证预测框的准确性。
在本申请的一个实施例中,采用残差模块结合跨阶段局部连接模块的方式搭建分类网络,残差模块设置在跨阶段局部连接模块内。
将分类网络结合残差模块和跨阶段局部连接模块,在浅层网络上可以最大限度的学习缺陷特征,同时缩短计算时间。直连的卷积层或全连接层在信息传递时,或多或少会存在信息丢失、损耗等问题,残差模块可以在一定程度上解决这个问题,通过直接将输入信息绕道传到输出,保护信息的完整性,这样只需要学习输入、输出差别的那一部分,简化了学习目标和难度。跨阶段局部连接模块,通过分割梯度流,使梯度流通过不同的网络路径进行传播,然后通过concat进行拼接连接,组合了不同梯度或者感受野的特征,同时也减少了计算量。
图3示出了本申请一个目标标注方法实施例的分类网络结构示意图。如图3所示,该分类网络采用残差模块结合跨阶段局部连接模块的方式搭建,将第一 卷积层310、跨阶段局部连接模块320、第二卷积层330、第一全连接层340、第二全连接层350和分类器层360依次连接,实现分类网络构建。各卷积层采用3x3卷积核Conv 3x3,分类网络输入的图片数据为64*64尺寸、3通道(64*64*3),第一全连接层340表示为fc 128、第二全连接层350表示为fc,(其中fc缩写自“fully connected”),分类器层360表示为softmax。
如图3所示,其中,虚线框部分为跨阶段局部连接模块320,跨阶段局部连接模块320通过分割梯度流,使梯度流沿不同的网络路径(即第一分支B1和第二分支B2)进行传播,然后通过拼接层324(表示为concat,拼接)进行连接。从而,本申请实施例采用跨阶段局部连接模块320,组合了不同梯度或者感受野的特征,也同时减少了计算量。
在跨阶段局部连接模块320内,实线框标注部分为残差模块322,残差模块322具有卷积层数量不同的两个通道(第一通道p1和第二通道p2),残差模块322通过将输入信息绕道传到输出,尽可能保护了信息的完整性,这样只需要学习输入、输出差别的那一部分,从而简化了学习目标和难度。
参考图3所示,本申请一个实施例中,跨阶段局部连接模块320和残差模块322的具体结构如下:
跨阶段局部连接模块322,包括第一分支B1和第二分支B2,第一分支B1包括依次连接的第三卷积层321、残差模块322和第四卷积层323;第二分支B2包括第五卷积层325。第一分支B1和第二分支B2将信息处理后,经拼接层324拼接输出。
第一分支中的残差模块322包括第一通道p1和第二通道p2,第一通道p1包括第六卷积层322-3;第二通道p2包括依次连接的第七卷积层322-1和第八卷积层322-2。第一通道p1和第二通道p2将信息处理后加和输出,实现不同通道映射的组合。
本申请图3所示实施例构建的分类网络仅是本申请的一种优选方式,并非表示本申请只可使用该分类网络结构。实际使用中,可根据待识别目标的尺寸大小等进行适应修改,确定分类网络的深度。
在本申请的另一些实施例中,还可以采用残差模块结合图像金字塔模块的方式搭建分类网络,残差模块设置在图像金字塔模块内。具体结构本领域技术人员可根据现有网络结构参照上述图3示例自行设置,在此不再赘述。
在本申请的一个实施例中,本实施例目标标注方法还包括:将分类结果与 预测类别一致的预测框所对应的图片作为样本,加入到分类网络的训练集中,以补充训练分类网络,从而不断实现分类网络的训练进化,学习新的标注分类规则。
在本申请的一个实施例中,本实施例目标标注方法还包括:在判定预测框的分类结果与该预测框的预测类别不一致时,检查分类网络的训练轮数,若分类网络的训练轮数未达到预设轮数,则继续对分类网络进行下一轮训练,以再训练后的分类网络对预测框进行分类。通过对分类网络的训练设定预设轮数,尽量避免因为分类网络训练不到位造成分类检测错误。
在本申请的一个实施例中,本实施例目标标注方法还包括:对于预测框中与已标注目标重合程度不小于预设值的预测框,将其预测框信息直接写入标注文件。除了本申请上述实施例处理的可能存在错误的FP类预测框以外,在使用目标检测模型标注预测图片合集时,还存在与已标注目标重合程度不小于预设值的预测框,例如,与已标注目标交并比IoU值较高的预测框,这类预测框称为TP(True Positives,真的正样本)类预测框,表示需要标注且已经成功标注的框,因此无需进一步处理,直接写入标注文件即可。
图4示出了本申请一个实施例提供的一种目标标注方法的流程示意图。
如图4所示,该方法从步骤S410目标检测模型输出FP类预测框出发,对FP类预测框进行检验。其中,所用的目标检测模型可以为现有性能较好的目标检测模型,如yolov4等。此外,本实施例还需要做一些准备工作,即通过步骤S401、S402和S403,对图片进行标注框和背景的截取,所截取内容留作检验素材,以及,经过步骤S404将已标注准确的图片作为训练集,通过步骤S405构建分类网络,并利用上述训练集训练得到可用于检验的分类网络。
具体的FP类预测框检验过程如下:
首先通过步骤S420,对筛选出的FP类预测框进行初步判断,通过与A、B……N各类目标截取的标注框内容进行比较,筛选出面积和边长最小值均小于预测类别目标标注框的预测框,此时暂判定这些FP类预测框有可能是真实所需标注的目标,将其从图片中截取出来并编号记录,记录截取的图片源以及预测框的预测类别。
而后,在步骤S430中,将初步判断可能是真实所需目标的预测框输入到训练好的分类网络中,进行分类检验,通过分类检验判断分类类别与预测类别是否一致。
若分类类别与预测类别一致,则证明该预测框标注正确,是所要标注的目标,在步骤S440中,将该预测框信息写入标注文件完成标注。
此外,在步骤S450中,若在预测类别与分类网络预测的类别不一致时,检测到分类网络没有达到预设轮数,则进行分类网络再训练,以提高分类网络的准确性。
至此,上述实施例通过对预测框的筛选以及分类验证,实现了目标的自动准确标注,可替代大量人工标注工作,提高标注效率和准确率。
本申请还公开了一种目标标注装置,该目标标注装置用于实现上述任一的目标标注方法。
图5是本申请的一个实施例目标标注装置的结构示意图。请参考图5,在硬件层面,该目标标注装置包括处理器,可选地还包括内部总线、网络接口、存储器。其中,存储器可能包含内存,例如高速随机存取存储器(Random-Access Memory,RAM),也可能还包括非易失性存储器(non-volatile memory),例如至少1个磁盘存储器等。当然,该目标标注装置还可能包括其他业务所需要的硬件。
处理器、网络接口和存储器可以通过内部总线相互连接,该内部总线可以是ISA(Industry Standard Architecture,工业标准体系结构)总线、PCI(Peripheral Component Interconnect,外设部件互连标准)总线或EISA(Extended Industry Standard Architecture,扩展工业标准结构)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图5中仅用一个双向箭头表示,但并不表示仅有一根总线或一种类型的总线。
存储器,用于存放程序。具体地,程序可以包括程序代码,程序代码包括计算机操作指令。存储器可以包括内存和非易失性存储器,并向处理器提供指令和数据。
处理器从非易失性存储器中读取对应的计算机程序到内存中然后运行,在逻辑层面上形成目标标注装置。处理器,执行存储器所存放的程序,并具体用于执行以下操作:
使用训练后的目标检测模型标注预测图片集合,得到预测框信息,预测图片集合中包含已标注的图片和未标注的图片;从预测框中,筛选出与已标注目标重合程度低于预设值或缺少已标注目标的预测框;使用搭建并训练过的分类网络对筛选出的预测框进行分类,分类网络由已标注目标及其背景图像训练得 到;若该预测框的分类结果与该预测框的预测类别一致,则将该预测框信息写入标注文件。
在本申请的一个实施例中,处理器,执行存储器所存放的程序,还用于执行:在使用分类网络对筛选出的预测框进行分类之前,计算预测框的面积和边长最小值,若该预测框的面积和边长最小值均小于其预测类别对应的目标标注框最小面积和最小边长,则将该预测框送入分类网络中进行分类,否则,将该预测框信息丢弃。
在本申请的一个实施例中,处理器,执行存储器所存放的程序,训练分类网络的过程为:对每一类目标,分别截取已标注目标的标注框以及该目标标注框外预设面积的背景图像,输入到分类网络中进行训练。
在本申请的一个实施例中,处理器,执行存储器所存放的程序,使用搭建并训练过的分类网络对筛选出的预测框进行分类,包括:对筛选出的预测框,将该预测框从图片中截取出来,并记录截取位置和预测类别;将截取的预测框图片送入分类网络中进行分类,判定该预测框图片是背景图像还是目标图像,若为目标图像,判断该预测框图片的分类结果与预测类别是否一致。
在本申请的一个实施例中,处理器,执行存储器所存放的程序,搭建分类网络的过程为:采用残差模块结合跨阶段局部连接模块的方式搭建分类网络,残差模块设置在跨阶段局部连接模块内;或者,采用残差模块结合图像金字塔模块的方式搭建分类网络,残差模块设置在图像金字塔模块内。
在本申请的一个实施例中,处理器,执行存储器所存放的程序,用于采用残差模块结合跨阶段局部连接模块的方式搭建分类网络,将第一卷积层、跨阶段局部连接模块、第二卷积层、第一全连接层、第二全连接层和分类器层依次连接;跨阶段局部连接模块包括第一分支和第二分支,第一分支包括依次连接的第三卷积层、残差模块和第四卷积层,第二分支包括第五卷积层,第一分支和第二分支将信息处理后拼接输出;其中,残差模块包括第一通道和第二通道,第一通道包括第六卷积层,第二通道包括依次连接的第七卷积层和第八卷积层,第一通道和第二通道将信息处理后加和输出。
在本申请的一个实施例中,处理器,执行存储器所存放的程序,还用于执行如下操作:将分类结果与预测类别一致的预测框所对应的图片作为样本,加入到分类网络的训练集中,以补充训练分类网络。
在本申请的一个实施例中,处理器,执行存储器所存放的程序,还用于执 行如下操作:在判定预测框的分类结果与该预测框的预测类别不一致时,检查分类网络的训练轮数,若分类网络的训练轮数未达到预设轮数,则继续对分类网络进行下一轮训练,以再训练后的分类网络对预测框进行分类。
在本申请的一个实施例中,处理器,执行存储器所存放的程序,还用于执行如下操作:对于预测框中与已标注目标重合程度不小于预设值的预测框,将其预测框信息直接写入标注文件。
上述本申请所示实施例揭示的目标标注装置执行的方法可以应用于处理器中,或者由处理器实现。处理器可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器可以是通用处理器,包括中央处理器(Central Processing Unit,CPU)、网络处理器(Network Processor,NP)等;还可以是数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。
本申请实施例还提出了一种计算机可读存储介质,该计算机可读存储介质存储一个或多个程序,该一个或多个程序包括指令,该指令当被包括多个应用程序的目标标注装置执行时,能够使该目标标注装置执行上述所示实施例中目标标注装置执行的方法,并具体用于执行:
使用训练后的目标检测模型标注预测图片集合,得到预测框信息,预测图片集合中包含已标注的图片和未标注的图片;从预测框中,筛选出与已标注目标重合程度低于预设值或缺少已标注目标的预测框;使用搭建并训练过的分类网络对筛选出的预测框进行分类,分类网络由已标注目标及其背景图像训练得到;若该预测框的分类结果与该预测框的预测类别一致,则将该预测框信息写入标注文件。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、 数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
以上所述,仅为本发明的具体实施方式,在本发明的上述教导下,本领域技术人员可以在上述实施例的基础上进行其他的改进或变形。本领域技术人员应该明白,上述的具体描述只是更好的解释本发明的目的,本发明的保护范围应以权利要求的保护范围为准。

Claims (14)

  1. 一种目标标注方法,其中,所述方法包括:
    使用训练后的目标检测模型标注预测图片集合,得到预测框信息,所述预测图片集合中包含已标注的图片和未标注的图片;
    从所述预测框中,筛选出与已标注目标重合程度低于预设值或缺少已标注目标的预测框;
    使用搭建并训练过的分类网络对筛选出的预测框进行分类,所述分类网络由已标注目标及其背景图像训练得到;若该预测框的分类结果与该预测框的预测类别一致,则将该预测框信息写入标注文件。
  2. 根据权利要求1所述的目标标注方法,其中,在使用分类网络对筛选出的预测框进行分类之前,所述方法还包括:
    计算预测框的面积和边长最小值,若该预测框的面积和边长最小值均小于其预测类别对应的目标标注框最小面积和最小边长,则将该预测框送入分类网络中进行分类,否则,将该预测框信息丢弃。
  3. 根据权利要求1所述的目标标注方法,其中,所述分类网络的训练过程为:
    对每一类目标,分别截取已标注目标的标注框以及该目标标注框外预设面积的背景图像,输入到所述分类网络中进行训练。
  4. 根据权利要求3所述的目标标注方法,其中,所述使用搭建并训练过的分类网络对筛选出的预测框进行分类,包括:
    对筛选出的预测框,将该预测框从图片中截取出来,并记录截取位置和预测类别;
    将截取的预测框图片送入分类网络中进行分类,判定该预测框图片是背景图像还是目标图像,若为目标图像,判断该预测框图片的分类结果与预测类别是否一致。
  5. 根据权利要求1所述的目标标注方法,其中,利用下述方式搭建分类网络:
    采用残差模块结合跨阶段局部连接模块的方式搭建所述分类网络,所述残差模块设置在所述跨阶段局部连接模块内;
    或者,采用残差模块结合图像金字塔模块的方式搭建所述分类网络,所述残差模块设置在所述图像金字塔模块内。
  6. 根据权利要求5所述的目标标注方法,其中,所述分类网络采用残差模块结合跨阶段局部连接模块的方式搭建,将第一卷积层、跨阶段局部连接模块、 第二卷积层、第一全连接层、第二全连接层和分类器层依次连接;
    所述跨阶段局部连接模块包括第一分支和第二分支,所述第一分支包括依次连接的第三卷积层、残差模块和第四卷积层,所述第二分支包括第五卷积层,所述第一分支和所述第二分支将信息处理后拼接输出;
    所述残差模块包括第一通道和第二通道,所述第一通道包括第六卷积层,所述第二通道包括依次连接的第七卷积层和第八卷积层,所述第一通道和所述第二通道将信息处理后加和输出。
  7. 根据权利要求3所述的目标标注方法,其中,所述方法还包括:
    将分类结果与预测类别一致的预测框所对应的图片作为样本,加入到所述分类网络的训练集中,以补充训练所述分类网络。
  8. 根据权利要求1所述的目标标注方法,其中,所述方法还包括:
    在判定预测框的分类结果与该预测框的预测类别不一致时,检查所述分类网络的训练轮数,若所述分类网络的训练轮数未达到预设轮数,则继续对所述分类网络进行下一轮训练,以再训练后的分类网络对预测框进行分类。
  9. 根据权利要求1所述的目标标注方法,其中,所述方法还包括:对于预测框中与已标注目标重合程度不小于预设值的预测框,将其预测框信息直接写入标注文件。
  10. 一种目标标注装置,其中,包括:
    处理器;以及
    被安排成存储计算机可执行指令的存储器,所述可执行指令在被执行时使所述处理器执行如下的目标标注方法:
    使用训练后的目标检测模型标注预测图片集合,得到预测框信息,所述预测图片集合中包含已标注的图片和未标注的图片;
    从所述预测框中,筛选出与已标注目标重合程度低于预设值或缺少已标注目标的预测框;
    使用搭建并训练过的分类网络对筛选出的预测框进行分类,所述分类网络由已标注目标及其背景图像训练得到;若该预测框的分类结果与该预测框的预测类别一致,则将该预测框信息写入标注文件。
  11. 根据权利要求10所述的目标标注装置,其中,
    所述处理器,执行存储器所存放的程序,还用于执行:在使用分类网络对筛选出的预测框进行分类之前,计算预测框的面积和边长最小值,若该预测框 的面积和边长最小值均小于其预测类别对应的目标标注框最小面积和最小边长,则将该预测框送入分类网络中进行分类,否则,将该预测框信息丢弃。
  12. 根据权利要求10所述的目标标注装置,其中,
    所述处理器,执行存储器所存放的程序,训练分类网络的过程为:对每一类目标,分别截取已标注目标的标注框以及该目标标注框外预设面积的背景图像,输入到分类网络中进行训练。
  13. 根据权利要求10所述的目标标注装置,其中,
    所述处理器,执行存储器所存放的程序,搭建分类网络的过程为:采用残差模块结合跨阶段局部连接模块的方式搭建分类网络,残差模块设置在跨阶段局部连接模块内;或者,采用残差模块结合图像金字塔模块的方式搭建分类网络,残差模块设置在图像金字塔模块内。
  14. 根据权利要求10所述的目标标注装置,其中,
    所述处理器,执行存储器所存放的程序,还用于执行如下操作:在判定预测框的分类结果与该预测框的预测类别不一致时,检查分类网络的训练轮数,若分类网络的训练轮数未达到预设轮数,则继续对分类网络进行下一轮训练,以再训练后的分类网络对预测框进行分类。
PCT/CN2021/131971 2021-03-03 2021-11-22 一种目标标注方法和一种目标标注装置 WO2022183780A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110236192.6 2021-03-03
CN202110236192.6A CN112884055B (zh) 2021-03-03 2021-03-03 一种目标标注方法和一种目标标注装置

Publications (1)

Publication Number Publication Date
WO2022183780A1 true WO2022183780A1 (zh) 2022-09-09

Family

ID=76055287

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/131971 WO2022183780A1 (zh) 2021-03-03 2021-11-22 一种目标标注方法和一种目标标注装置

Country Status (2)

Country Link
CN (1) CN112884055B (zh)
WO (1) WO2022183780A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115294505A (zh) * 2022-10-09 2022-11-04 平安银行股份有限公司 风险物体检测及其模型的训练方法、装置及电子设备
CN115713664A (zh) * 2022-12-06 2023-02-24 浙江中测新图地理信息技术有限公司 一种消防验收的智能标注方法及装置
CN115827876A (zh) * 2023-01-10 2023-03-21 中国科学院自动化研究所 未标注文本的确定方法、装置和电子设备
CN117392527A (zh) * 2023-12-11 2024-01-12 中国海洋大学 一种高精度水下目标分类检测方法及其模型搭建方法

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112884055B (zh) * 2021-03-03 2023-02-03 歌尔股份有限公司 一种目标标注方法和一种目标标注装置
KR20220169373A (ko) * 2021-06-17 2022-12-27 센스타임 인터내셔널 피티이. 리미티드. 타겟 검출 방법들, 장치들, 전자 디바이스들 및 컴퓨터 판독가능한 저장 매체
WO2022263904A1 (en) * 2021-06-17 2022-12-22 Sensetime International Pte. Ltd. Target detection methods, apparatuses, electronic devices and computer-readable storage media
CN114723940A (zh) * 2022-04-22 2022-07-08 广州文远知行科技有限公司 一种基于规则标注图片数据的方法、装置及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112163634A (zh) * 2020-10-14 2021-01-01 平安科技(深圳)有限公司 实例分割模型样本筛选方法、装置、计算机设备及介质
CN112270379A (zh) * 2020-11-13 2021-01-26 北京百度网讯科技有限公司 分类模型的训练方法、样本分类方法、装置和设备
US20210035324A1 (en) * 2019-08-01 2021-02-04 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for identifying item
CN112884055A (zh) * 2021-03-03 2021-06-01 歌尔股份有限公司 一种目标标注方法和一种目标标注装置

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106355188B (zh) * 2015-07-13 2020-01-21 阿里巴巴集团控股有限公司 图像检测方法及装置
CN109558902A (zh) * 2018-11-20 2019-04-02 成都通甲优博科技有限责任公司 一种快速目标检测方法
CN109614990A (zh) * 2018-11-20 2019-04-12 成都通甲优博科技有限责任公司 一种目标检测装置
JP7017718B2 (ja) * 2019-03-11 2022-02-09 省司 青山 袋ナット用の電気抵抗溶接電極
CN110119710A (zh) * 2019-05-13 2019-08-13 广州锟元方青医疗科技有限公司 细胞分类方法、装置、计算机设备和存储介质
CN111028224B (zh) * 2019-12-12 2020-12-01 广西医准智能科技有限公司 数据标注、模型训练和图像处理方法、装置及存储介质
CN111798417A (zh) * 2020-06-19 2020-10-20 中国资源卫星应用中心 一种基于ssd遥感图像目标检测的方法及装置
CN112052787B (zh) * 2020-09-03 2021-07-30 腾讯科技(深圳)有限公司 基于人工智能的目标检测方法、装置及电子设备
CN112085126B (zh) * 2020-09-30 2023-12-12 浙江大学 一种侧重于分类任务的单样本目标检测方法
CN112287896A (zh) * 2020-11-26 2021-01-29 山东捷讯通信技术有限公司 一种基于深度学习的无人机航拍图像目标检测方法及系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210035324A1 (en) * 2019-08-01 2021-02-04 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for identifying item
CN112163634A (zh) * 2020-10-14 2021-01-01 平安科技(深圳)有限公司 实例分割模型样本筛选方法、装置、计算机设备及介质
CN112270379A (zh) * 2020-11-13 2021-01-26 北京百度网讯科技有限公司 分类模型的训练方法、样本分类方法、装置和设备
CN112884055A (zh) * 2021-03-03 2021-06-01 歌尔股份有限公司 一种目标标注方法和一种目标标注装置

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115294505A (zh) * 2022-10-09 2022-11-04 平安银行股份有限公司 风险物体检测及其模型的训练方法、装置及电子设备
CN115713664A (zh) * 2022-12-06 2023-02-24 浙江中测新图地理信息技术有限公司 一种消防验收的智能标注方法及装置
CN115827876A (zh) * 2023-01-10 2023-03-21 中国科学院自动化研究所 未标注文本的确定方法、装置和电子设备
CN117392527A (zh) * 2023-12-11 2024-01-12 中国海洋大学 一种高精度水下目标分类检测方法及其模型搭建方法
CN117392527B (zh) * 2023-12-11 2024-02-06 中国海洋大学 一种高精度水下目标分类检测方法及其模型搭建方法

Also Published As

Publication number Publication date
CN112884055B (zh) 2023-02-03
CN112884055A (zh) 2021-06-01

Similar Documents

Publication Publication Date Title
WO2022183780A1 (zh) 一种目标标注方法和一种目标标注装置
CN107886074A (zh) 一种人脸检测方法以及人脸检测系统
CN110348360B (zh) 一种检测报告识别方法及设备
WO2022116720A1 (zh) 目标检测方法、装置和电子设备
CN114240821A (zh) 一种基于改进型yolox的焊缝缺陷检测方法
CN110378258B (zh) 一种基于图像的车辆座椅信息检测方法及设备
WO2020151340A1 (zh) 一种目标细胞标记方法、装置、存储介质及终端设备
TW202040511A (zh) 資料處理方法、瑕疵的檢測方法、計算設備及儲存媒體
CN109961107A (zh) 目标检测模型的训练方法、装置、电子设备及存储介质
CN110765963A (zh) 车辆制动检测方法、装置、设备及计算机可读存储介质
CN113095444B (zh) 图像标注方法、装置及存储介质
WO2022166293A1 (zh) 一种目标检测方法和装置
CN110969600A (zh) 一种产品缺陷检测方法、装置、电子设备及存储介质
CN109961030A (zh) 路面修补信息检测方法、装置、设备及存储介质
CN109741296B (zh) 产品质量检测方法及装置
CN110796078A (zh) 车辆的灯光检测方法、装置、电子设备及可读存储介质
CN111507332A (zh) 车辆vin码检测方法与设备
CN117173568A (zh) 目标检测模型训练方法和目标检测方法
CN112884054B (zh) 一种目标标注方法和一种目标标注装置
CN116823793A (zh) 设备缺陷检测方法、装置、电子设备和可读存储介质
CN116385770A (zh) Pcb缺陷板标注和存储方法、系统、电子设备及存储介质
CN114445841A (zh) 纳税申报表识别方法和装置
CN113344079B (zh) 一种图像标签半自动标注方法、系统、终端及介质
WO2021184178A1 (zh) 标注方法和装置
CN113052244B (zh) 一种分类模型训练方法和一种分类模型训练装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21928852

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21928852

Country of ref document: EP

Kind code of ref document: A1