WO2022257614A1 - Training method and apparatus for object detection model, and image detection method and apparatus - Google Patents

Training method and apparatus for object detection model, and image detection method and apparatus Download PDF

Info

Publication number
WO2022257614A1
WO2022257614A1 PCT/CN2022/088005 CN2022088005W WO2022257614A1 WO 2022257614 A1 WO2022257614 A1 WO 2022257614A1 CN 2022088005 W CN2022088005 W CN 2022088005W WO 2022257614 A1 WO2022257614 A1 WO 2022257614A1
Authority
WO
WIPO (PCT)
Prior art keywords
detection model
feature
difference
student
training
Prior art date
Application number
PCT/CN2022/088005
Other languages
French (fr)
Chinese (zh)
Inventor
邹智康
叶晓青
孙昊
Original Assignee
北京百度网讯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京百度网讯科技有限公司 filed Critical 北京百度网讯科技有限公司
Priority to JP2023515610A priority Critical patent/JP2023539934A/en
Publication of WO2022257614A1 publication Critical patent/WO2022257614A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • FIG. 7 is a schematic diagram according to a fifth embodiment of the present disclosure.
  • FIG. 9 is a schematic diagram according to a seventh embodiment of the present disclosure.
  • Step 202 Input the training image into the teacher detection model to obtain the first feature map extracted from the training image by the teacher detection model and the first object distance map predicted according to the first feature map.
  • Step 406 Determine the second loss item of the loss function according to the difference between the first feature map and the second feature map.
  • calculate the cos (cosine) distance between the feature T output by the teacher’s detection model and the feature S output by the student’s detection model and judge the teacher’s detection model by the cos distance
  • the similarity between the output features and the features output by the student detection model is calculated to optimize the similarity loss function to shorten the distance between the output features of the teacher detection model and the output features of the student detection model.
  • the cos distance can be defined by the following formula:
  • the object position marked by the training sample can be compared with the object position predicted by the student detection model, and the difference between the object position marked by the training sample and the object position predicted by the student detection model can be determined as the first loss function Three loss terms, to train the student detection model.
  • FIG. 10 shows a schematic block diagram of an example electronic device 1000 that may be used to implement embodiments of the present disclosure.
  • Electronic device is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers.
  • Electronic devices may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smart phones, wearable devices, and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions, are by way of example only, and are not intended to limit implementations of the disclosure described and/or claimed herein.
  • the device 1000 includes a computing unit 1001, which can be loaded into a RAM (Random Access Memory, random access/ accesses the computer program in the memory) 1003 to execute various appropriate actions and processes.
  • RAM Random Access Memory
  • various programs and data necessary for the operation of the device 1000 can also be stored.
  • the computing unit 1001, ROM 1002, and RAM 1003 are connected to each other through a bus 1004.
  • An I/O (Input/Output, input/output) interface 1005 is also connected to the bus 1004 .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure relates to the technical field of artificial intelligence, and specifically to the technical fields of computer vision and deep learning. Provided are a training method and apparatus for an object detection model, and an image detection method and apparatus, wherein same can be applied to fields such as autonomous driving and smart robots. The specific implementation solution involves: training a student detection model according to a difference position, between a distance map, which corresponds to a feature map output by a teacher detection model, of a training image and a distance map, which corresponds to a feature map output by the student detection model, of the training image, and the difference, between features in the corresponding feature maps, of the difference position. In this way, detection information mining for a teacher detection model can be further improved by means of a student detection model, and the detection precision of the student detection model is improved, such that a simple student detection model can achieve a similar detection precision to that of a complex teacher detection model, thereby reducing the occupied computing resources and deployment costs, and increasing the estimation speed.

Description

物体检测模型的训练方法、图像检测方法及其装置Object detection model training method, image detection method and device thereof
相关申请的交叉引用Cross References to Related Applications
本公开要求北京百度网讯科技有限公司于2021年06月10日提交的、发明名称为“物体检测模型的训练方法、图像检测方法及其装置”的、中国专利申请号“202110649762.4”的优先权。This disclosure claims the priority of the Chinese patent application number "202110649762.4" submitted by Beijing Baidu Netcom Technology Co., Ltd. on June 10, 2021, with the title of invention "Training method for object detection model, image detection method and device thereof" .
技术领域technical field
本公开涉及人工智能技术领域,具体涉及计算机视觉和深度学习技术领域,可应用于自动驾驶、智能机器人等领域,尤其涉及物体检测模型的训练方法、图像检测方法及其装置。The present disclosure relates to the technical field of artificial intelligence, specifically to the technical field of computer vision and deep learning, which can be applied to the fields of automatic driving, intelligent robot, etc., and especially relates to a training method of an object detection model, an image detection method and a device thereof.
背景技术Background technique
目前,主要通过深度学习技术和关键点估计进行物体的检测(比如,单目图像的3D检测)。物体的检测可提供物体的位置信息、物体的长宽高以及物体的朝向角共七个自由度的信息,可广泛应用于智能机器人和自动驾驶等场景。At present, object detection (for example, 3D detection of monocular images) is mainly performed through deep learning technology and key point estimation. The detection of objects can provide information on the position information of the object, the length, width and height of the object, and the orientation angle of the object, a total of seven degrees of freedom, and can be widely used in scenarios such as intelligent robots and automatic driving.
发明内容Contents of the invention
本公开提供了一种用于物体检测模型的训练方法、图像检测方法及其装置。The disclosure provides a training method for an object detection model, an image detection method and a device thereof.
根据本公开的一方面,提供了一种物体检测模型的训练方法,包括:获取经过训练的教师检测模型,以及待训练的学生检测模型;将训练图像输入所述教师检测模型,得到所述教师检测模型对所述训练图像提取的第一特征图,以及根据所述第一特征图预测的第一物体距离图;将所述训练图像输入所述学生检测模型,得到所述学生检测模型对所述训练图像提取的第二特征图,以及根据所述第二特征图预测的第二物体距离图;根据所述第二物体距离图与所述第一物体距离图中的差异位置,在所述第一特征图中确定所述差异位置对应的第一局部特征,以及在所述第二特征图中确定所述差异位置对应的第二局部特征;根据所述第一局部特征和所述第二局部特征之间的差异,对所述学生检测模型进行训练。According to an aspect of the present disclosure, a method for training an object detection model is provided, including: obtaining a trained teacher detection model and a student detection model to be trained; inputting training images into the teacher detection model, and obtaining the teacher The first feature map extracted by the detection model from the training image, and the first object distance map predicted according to the first feature map; the training image is input into the student detection model to obtain the student detection model for all The second feature map extracted from the training image, and the second object distance map predicted according to the second feature map; according to the difference position between the second object distance map and the first object distance map, in the Determine the first local feature corresponding to the difference position in the first feature map, and determine the second local feature corresponding to the difference position in the second feature map; according to the first local feature and the second The difference between local features, the student detection model is trained.
根据本公开的另一方面,提供了一种图像检测方法,包括:获取单目图像;采用经过训练的学生检测模型对所述单目图像进行图像检测,以得到所述单目图像中物体的物体信息;其中,所述学生检测模型,是采用本公开第一方面实施例所述的训练方法训练得到。According to another aspect of the present disclosure, there is provided an image detection method, including: acquiring a monocular image; using a trained student detection model to perform image detection on the monocular image, so as to obtain the object in the monocular image Object information; wherein, the student detection model is obtained through training using the training method described in the embodiment of the first aspect of the present disclosure.
根据本公开的另一方面,提供了一种物体检测模型的训练装置,包括:第一获取模块,用于获取经过训练的教师检测模型,以及待训练的学生检测模型;第一处理模块,用于将训练图像输入所述教师检测模型,得到所述教师检测模型对所述训练图像提取的第一特征 图,以及根据所述第一特征图预测的第一物体距离图;第二处理模块,用于将所述训练图像输入所述学生检测模型,得到所述学生检测模型对所述训练图像提取的第二特征图,以及根据所述第二特征图预测的第二物体距离图;第一确定模块,用于根据所述第二物体距离图与所述第一物体距离图中的差异位置,在所述第一特征图中确定所述差异位置对应的第一局部特征,以及在所述第二特征图中确定所述差异位置对应的第二局部特征;训练模块,用于根据所述第一局部特征和所述第二局部特征之间的差异,对所述学生检测模型进行训练。According to another aspect of the present disclosure, a training device for an object detection model is provided, including: a first acquisition module for acquiring a trained teacher detection model and a student detection model to be trained; a first processing module for Inputting the training image into the teacher detection model, obtaining the first feature map extracted by the teacher detection model from the training image, and the first object distance map predicted according to the first feature map; the second processing module, For inputting the training image into the student detection model, obtaining a second feature map extracted from the training image by the student detection model, and a second object distance map predicted according to the second feature map; the first A determination module, configured to determine a first local feature corresponding to the difference position in the first feature map according to the difference position between the second object distance map and the first object distance map, and in the A second local feature corresponding to the difference position is determined in the second feature map; a training module is configured to train the student detection model according to the difference between the first local feature and the second local feature.
根据本公开的另一方面,提供了一种图像检测装置,包括:获取模块,用于获取单目图像;检测模块,用于采用经过训练的学生检测模型对所述单目图像进行图像检测,以得到所述单目图像中物体的物体信息;其中,所述学生检测模型,是采用本公开实施例所述的训练装置训练得到。According to another aspect of the present disclosure, an image detection device is provided, including: an acquisition module for acquiring a monocular image; a detection module for performing image detection on the monocular image using a trained student detection model, to obtain the object information of the object in the monocular image; wherein, the student detection model is trained by using the training device described in the embodiment of the present disclosure.
根据本公开的另一方面,提供了一种电子设备,包括:至少一个处理器;以及与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行本公开第一方面实施例所述的方法,或者,执行本公开实施例所述的图像检测方法。According to another aspect of the present disclosure, there is provided an electronic device, including: at least one processor; and a memory communicatively connected to the at least one processor; wherein, the memory stores Executable instructions, the instructions are executed by the at least one processor, so that the at least one processor can execute the method described in the embodiment of the first aspect of the present disclosure, or perform the image detection described in the embodiment of the present disclosure method.
根据本公开的另一方面,提供了一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于使所述计算机执行本公开第一方面实施例所述的物体检测模型的训练方法,或者,执行本公开实施例所述的图像检测方法。According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to perform the object detection described in the embodiment of the first aspect of the present disclosure The training method of the model, or execute the image detection method described in the embodiment of the present disclosure.
根据本公开的另一方面,提供了一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时实现本公开第一方面实施例所述的物体检测模型的训练方法,或者,执行本公开实施例所述的图像检测方法。According to another aspect of the present disclosure, a computer program product is provided, including a computer program. When the computer program is executed by a processor, the object detection model training method described in the embodiment of the first aspect of the present disclosure is implemented, or, Execute the image detection method described in the embodiment of the present disclosure.
应当理解,本部分所描述的内容并非旨在标识本公开的实施例的关键或重要特征,也不用于限制本公开的范围。本公开的其它特征将通过以下的说明书而变得容易理解。It should be understood that what is described in this section is not intended to identify key or important features of the embodiments of the present disclosure, nor is it intended to limit the scope of the present disclosure. Other features of the present disclosure will be readily understood through the following description.
附图说明Description of drawings
附图用于更好地理解本方案,不构成对本公开的限定。其中:The accompanying drawings are used to better understand the present solution, and do not constitute a limitation to the present disclosure. in:
图1是根据本公开第一实施例的示意图;FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure;
图2是根据本公开第二实施例的示意图;FIG. 2 is a schematic diagram according to a second embodiment of the present disclosure;
图3是根据本公开实施例的获取第一局部特征和第二局部特征之间的差异示意图;Fig. 3 is a schematic diagram of differences between acquiring first local features and second local features according to an embodiment of the present disclosure;
图4是根据本公开第三实施例的示意图;FIG. 4 is a schematic diagram according to a third embodiment of the present disclosure;
图5是根据本公开实施例的物体检测模型的训练方法的流程示意图;5 is a schematic flowchart of a method for training an object detection model according to an embodiment of the present disclosure;
图6是根据本公开第四实施例的示意图;FIG. 6 is a schematic diagram according to a fourth embodiment of the present disclosure;
图7是根据本公开第五实施例的示意图;FIG. 7 is a schematic diagram according to a fifth embodiment of the present disclosure;
图8是根据本公开第六实施例的示意图;FIG. 8 is a schematic diagram according to a sixth embodiment of the present disclosure;
图9是根据本公开第七实施例的示意图;FIG. 9 is a schematic diagram according to a seventh embodiment of the present disclosure;
图10是用来实现本公开实施例的物体检测模型的训练方法或者图像检测方法的电子设备的框图。FIG. 10 is a block diagram of an electronic device for implementing the method for training an object detection model or the method for image detection according to an embodiment of the present disclosure.
具体实施方式Detailed ways
以下结合附图对本公开的示范性实施例做出说明,其中包括本公开实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本公开的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and they should be regarded as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
目前,主要通过深度学习技术和关键点估计进行物体的检测(比如,单目图像的3D检测)。物体的检测可提供物体的位置信息、物体的长宽高以及物体的朝向角共七个自由度的信息,可广泛应用于智能机器人和自动驾驶等场景。At present, object detection (for example, 3D detection of monocular images) is mainly performed through deep learning technology and key point estimation. The detection of objects can provide information on the position information of the object, the length, width and height of the object, and the orientation angle of the object, a total of seven degrees of freedom, and can be widely used in scenarios such as intelligent robots and automatic driving.
相关技术中,物体的检测主要集中于通过设计新颖的模块来增加网络对于输入图片的处理能力,从而提升检测的精度;另外一种主流的方法是通过引入深度信息来增加网络对于空间距离的表征能力,进一步提高检测的精度。In related technologies, the detection of objects mainly focuses on increasing the network's processing ability for input images by designing novel modules, thereby improving the accuracy of detection; another mainstream method is to increase the network's representation of spatial distance by introducing depth information. ability to further improve the detection accuracy.
但是,上述技术中,通过设计新颖的模块来增加网络对于输入图片的处理能力,主要依赖于强大复杂的骨干网络来提高最终的检测精度,这种复杂的网络对于计算资源的要求非常庞大,不方便部署到服务中去;另外,深层复杂的网络会导致推理速度过慢。However, in the above-mentioned technologies, by designing novel modules to increase the network’s ability to process input images, it mainly relies on a powerful and complex backbone network to improve the final detection accuracy. This complex network has very large requirements for computing resources. It is convenient to deploy to the service; in addition, the deep and complex network will cause the inference speed to be too slow.
针对上述问题,本公开提出物体检测模型的训练方法、图像检测方法及其装置。In view of the above problems, the present disclosure proposes a training method for an object detection model, an image detection method and a device thereof.
图1是根据本公开第一实施例的示意图。需要说明的是,本公开实施例的物体检测模型的训练方法可应用于本公开实施例的物体检测模型的训练装置,该装置可被配置于电子设备中。其中,该电子设备可以是移动终端,例如,手机、平板电脑、个人数字助理等具有各种操作系统的硬件设备。FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure. It should be noted that the object detection model training method of the embodiment of the present disclosure can be applied to the object detection model training device of the embodiment of the present disclosure, and the device can be configured in an electronic device. Wherein, the electronic device may be a mobile terminal, for example, a mobile phone, a tablet computer, a personal digital assistant, and other hardware devices with various operating systems.
如图1所示,该物体检测模型的训练方法可包括如下步骤:As shown in Figure 1, the training method of this object detection model can comprise the following steps:
步骤101,获取经过训练的教师检测模型,以及待训练的学生检测模型。 Step 101, acquire a trained teacher detection model and a student detection model to be trained.
在本公开实施例中,可对复杂的神经网络进行物体检测训练,将经过训练的复杂的神经网络作为教师检测模型,将未经过训练的简单的神经网络模型作为待训练的学生检测模型。In the embodiment of the present disclosure, the complex neural network can be trained for object detection, the trained complex neural network is used as a teacher detection model, and the untrained simple neural network model is used as a student detection model to be trained.
步骤102,将训练图像输入教师检测模型,得到教师检测模型对训练图像提取的第一特征图,以及根据第一特征图预测的第一物体距离图。 Step 102, input the training image into the teacher detection model to obtain the first feature map extracted from the training image by the teacher detection model, and the first object distance map predicted according to the first feature map.
作为本公开实施例的一种可能实现方式,可通过图像采集设备获取训练图像,将训练设备输入至教师检测模型中,教师检测模型可对训练图像进行特征提取,生成特征图,教师检测模型还可以根据特征图进一步进行特征提取,生成物体检测所需要的信息(比如,位置信息、尺寸信息等),根据物体检测所需要的信息可生成距离图,该距离图可用于表征 物体在图像采集设备(如相机)坐标系中的距离,教师检测模型对生成的特征图和距离图进行输出,将输出的特征图作为第一特征图,将输出的距离图作为第一物体距离图。As a possible implementation of the embodiment of the present disclosure, the training image can be obtained through the image acquisition device, and the training device can be input into the teacher detection model. The teacher detection model can perform feature extraction on the training image to generate a feature map, and the teacher detection model can also Feature extraction can be further performed according to the feature map to generate information required for object detection (such as location information, size information, etc.), and a distance map can be generated according to the information required for object detection, which can be used to represent objects in the image acquisition device The distance in the (such as camera) coordinate system, the teacher detection model outputs the generated feature map and distance map, the output feature map is used as the first feature map, and the output distance map is used as the first object distance map.
步骤103,将训练图像输入学生检测模型,得到学生检测模型对训练图像提取的第二特征图,以及根据第二特征图预测的第二物体距离图。 Step 103, input the training image into the student detection model to obtain a second feature map extracted from the training image by the student detection model, and a second object distance map predicted according to the second feature map.
接着,可将训练图像输入至待训练的学生检测模型中,待训练的学生检测模型可对训练图像进行特征提取,生成特征图,待训练的学生检测模型还可以根据该特征图进一步进行特征提取,生成物体检测需要的信息,根据物体检测所需的信息可生成距离图,待训练的学生检测模型可对生成的特征图和距离图进行输出,将输出的特征图作为第二特征图,将输出的距离图作为第二物体距离图。其中,需要说明的是,步骤102可在步骤103之前执行,也可在步骤103后执行,也可与步骤103同时执行,本公开不做具体限制。Then, the training image can be input into the student detection model to be trained, and the student detection model to be trained can perform feature extraction on the training image to generate a feature map, and the student detection model to be trained can further perform feature extraction based on the feature map , generate the information required for object detection, and generate a distance map according to the information required for object detection. The student detection model to be trained can output the generated feature map and distance map, and use the output feature map as the second feature map. The output distance map is used as the second object distance map. Wherein, it should be noted that step 102 may be performed before step 103, may be performed after step 103, or may be performed simultaneously with step 103, which is not specifically limited in the present disclosure.
步骤104,根据第二物体距离图与第一物体距离图中的差异位置,在第一特征图中确定差异位置对应的第一局部特征,以及在第二特征图中确定差异位置对应的第二局部特征。 Step 104, according to the difference position between the second object distance map and the first object distance map, determine the first local feature corresponding to the difference position in the first feature map, and determine the second local feature corresponding to the difference position in the second feature map local features.
可以理解的是,教师检测模型与待训练的学生检测模型不同,输出的第一物体距离图与第二物体距离图也不同,在本公开实施例中,可将第二物体距离图与第一物体距离图进行距离度量,获取第二物体距离图与第一物体距离图中的差异位置,接着,获取该差异位置在第一特征图中对应的特征,并将该特征作为第一局部特征,同理,将该差异位置在第二特征图中对应的特征作为第二局部特征。It can be understood that the teacher detection model is different from the student detection model to be trained, and the output first object distance map is also different from the second object distance map. In the embodiment of the present disclosure, the second object distance map can be compared with the first object distance map. Performing distance measurement on the object distance map, obtaining the difference position between the second object distance map and the first object distance map, and then obtaining the feature corresponding to the difference position in the first feature map, and using this feature as the first local feature, Similarly, the feature corresponding to the difference position in the second feature map is used as the second local feature.
步骤105,根据第一局部特征和第二局部特征之间的差异,对学生检测模型进行训练。 Step 105, train the student detection model according to the difference between the first local feature and the second local feature.
进一步地,将第一局部特征与第二局部特征进行比对,可获取第一局部特征与第二局部特征之间的差异,根据该差异,对学生检测模型进行训练。Further, by comparing the first local feature with the second local feature, the difference between the first local feature and the second local feature can be obtained, and the student detection model is trained according to the difference.
综上,通过获取经过训练的教师检测模型,以及待训练的学生检测模型;将训练图像输入教师检测模型,得到教师检测模型对训练图像提取的第一特征图,以及根据第一特征图预测的第一物体距离图;将训练图像输入学生检测模型,得到学生检测模型对训练图像提取的第二特征图,以及根据第二特征图预测的第二物体距离图;根据第二物体距离图与第一物体距离图中的差异位置,在第一特征图中确定差异位置对应的第一局部特征,以及在第二特征图中确定差异位置对应的第二局部特征;根据第一局部特征和第二局部特征之间的差异,对学生检测模型进行训练,该方法根据教师检测模型和学生检测模型输出的特征图对应的距离图的差异位置,该差异位置在对应特征图中的特征之间的差异,对学生检测模型进行训练,可进一步提高学生检测模型对教师检测模型的检测信息的挖掘,提高了学生检测模型的检测精度,这样,简单的学生检测模型可以达到跟复杂的教师检测模型相似的检测精度,而减少了对计算资源的占用与部署成本,并且提高了推算速度。To sum up, by obtaining the trained teacher detection model and the student detection model to be trained; input the training image into the teacher detection model, and obtain the first feature map extracted from the training image by the teacher detection model, and the prediction based on the first feature map The first object distance map; the training image is input into the student detection model to obtain the second feature map extracted by the student detection model from the training image, and the second object distance map predicted according to the second feature map; according to the second object distance map and the first object distance map A difference position in the object distance map, determine the first local feature corresponding to the difference position in the first feature map, and determine the second local feature corresponding to the difference position in the second feature map; according to the first local feature and the second The difference between local features is used to train the student detection model. This method is based on the difference position of the distance map corresponding to the feature map output by the teacher detection model and the student detection model. The difference position is the difference between the features in the corresponding feature map. , the training of the student detection model can further improve the mining of the detection information of the teacher detection model by the student detection model, and improve the detection accuracy of the student detection model. Detection accuracy, reducing the occupation of computing resources and deployment costs, and improving the calculation speed.
为了更好地获取第一局部特征和第二局部特征,如图2所示,图2是根据本公开第二实施例的示意图。在本公开实施例中,可先获取第一物体距离图与第二物体距离图之间的差异位置,在第一特征图中获取差异位置对应的第一局部特征,在第二特征图中获取差异 位置对应的第二局部特征,图2所示实施例包括如下步骤:In order to better obtain the first local feature and the second local feature, as shown in FIG. 2 , FIG. 2 is a schematic diagram according to a second embodiment of the present disclosure. In the embodiment of the present disclosure, the difference position between the first object distance map and the second object distance map can be obtained first, the first local feature corresponding to the difference position is obtained in the first feature map, and the first local feature corresponding to the difference position is obtained in the second feature map. The second local feature corresponding to the difference position, the embodiment shown in Figure 2 includes the following steps:
步骤201,获取经过训练的教师检测模型,以及待训练的学生检测模型。 Step 201, acquire the trained teacher detection model and the student detection model to be trained.
步骤202,将训练图像输入教师检测模型,得到教师检测模型对训练图像提取的第一特征图,以及根据第一特征图预测的第一物体距离图。Step 202: Input the training image into the teacher detection model to obtain the first feature map extracted from the training image by the teacher detection model and the first object distance map predicted according to the first feature map.
步骤203,将训练图像输入学生检测模型,得到学生检测模型对训练图像提取的第二特征图,以及根据第二特征图预测的第二物体距离图。 Step 203, input the training image into the student detection model to obtain a second feature map extracted from the training image by the student detection model, and a second object distance map predicted according to the second feature map.
步骤204,确定教师检测模型中头部网络输出的第一物体距离图,与学生检测模型中对应头部网络输出的第二物体距离图之间存在差异的差异位置。 Step 204, determine the difference position between the first object distance map output by the head network in the teacher detection model and the second object distance map output by the corresponding head network in the student detection model.
为了有效地确定出第一物体距离图与第二物体距离图之间的差异位置,可选地,将教师检测模型中头部网络输出的第一物体距离图与学生检测模型中对应头部网络输出的第二物体距离图中同一位置的距离值比对差异;将距离值之间差异大于阈值的位置作为差异位置。In order to effectively determine the difference position between the first object distance map and the second object distance map, optionally, the first object distance map output by the head network in the teacher detection model and the corresponding head network in the student detection model Compare the difference between the distance values of the same position in the output distance map of the second object; take the position where the difference between the distance values is greater than the threshold value as the difference position.
也就是说,将教师检测模型和学生检测模型输出的特征分别输入到教师检测模型中和学生检测模型中不同的头部网络可得到不同的预测数据,比如,教师检测模型和学生检测模型输出的特征分别输入到教师检测模型和学生检测模型中的类别头部网络可输出对应的物体类别、教师检测模型和学生检测模型输出的特征输出的特征分别输入到教师检测模型和学生检测模型中的2D框头部网络可输出对应的物体2D框。在本公开实施例中,将教师检测模型输出的特征输入到教师检测模型中的3D头部网络可输出的第一物体距离图,与学生检测模型输出的特征输入到学生检测模型中的头部网络输出的第二物体距离图进行对应的相同位置的距离值比对,获取第一物体距离图与第二物体距离图中同一位置的距离值之间的差异,将距离值之间差异大于预设阈值的位置作为差异位置。That is to say, the features output by the teacher detection model and the student detection model are respectively input into different head networks in the teacher detection model and the student detection model to obtain different prediction data, for example, the output of the teacher detection model and the student detection model The features are respectively input into the teacher detection model and the category head network in the student detection model. The head network can output the corresponding object category, the features output by the teacher detection model and the student detection model. The output features are respectively input into the 2D The box head network can output the corresponding object 2D box. In the embodiment of the present disclosure, the features output by the teacher detection model are input into the first object distance map that can be output by the 3D head network in the teacher detection model, and the features output by the student detection model are input into the head in the student detection model The second object distance map output by the network is compared with the corresponding distance values at the same position, and the difference between the distance values of the same position in the first object distance map and the second object distance map is obtained, and the difference between the distance values is greater than the preset value. Let the location of the threshold be the difference location.
步骤205,在第一特征图中,将从差异位置提取的特征作为第一局部特征。 Step 205, in the first feature map, use the feature extracted from the difference position as the first local feature.
进一步地,根据差异位置,获取该差异位置在第一特征图中的位置,并根据该位置在第一特征图中进行特征提取,将提取的特征作为第一局部特征。Further, according to the difference position, the position of the difference position in the first feature map is obtained, and feature extraction is performed in the first feature map according to the position, and the extracted feature is used as the first local feature.
步骤206,在第二特征图中,将从差异位置提取的特征作为第二局部特征。 Step 206, in the second feature map, use the features extracted from the difference positions as the second local features.
进一步地,根据差异位置,获取该差异位置在第二特征图中的位置,并根据该位置在第二特征图中进行特征提取,将提取的特征作为第二局部特征。Further, according to the difference position, the position of the difference position in the second feature map is obtained, and feature extraction is performed in the second feature map according to the position, and the extracted feature is used as the second local feature.
步骤207,根据第一局部特征和第二局部特征之间的差异,对学生检测模型进行训练。 Step 207, train the student detection model according to the difference between the first local feature and the second local feature.
举例而言,如图3所示,训练图片经过教师检测模型可获取教师特征(第一特征图),训练图片经过学生检测模型可获取学生特征(第二特征图),教师特征经过教师检测模型中的3D head(头部网络)输出第一物体距离图,学生特征经过学生检测模型中的3D head(头部网络)输出第二物体距离图,将第一物体距离图与第二物体距离图进行距离度量,获取第一物体距离图与第二物体距离图之间的差异位置,确定教师特征中该差异位置对应的第一局部特征,确定学生特征中该差异位置对应的第二局部特征,根据第一局部特征与第二 局部特征之间的差异,对学生检测模型进行训练。For example, as shown in Figure 3, the training picture can obtain the teacher's feature (the first feature map) through the teacher's detection model, the training picture can obtain the student's feature (the second feature map) through the student's detection model, and the teacher's feature can be obtained through the teacher's detection model The 3D head (head network) in the model outputs the first object distance map, and the student features pass through the 3D head (head network) in the student detection model to output the second object distance map, and the first object distance map and the second object distance map Perform distance measurement, obtain the difference position between the first object distance map and the second object distance map, determine the first local feature corresponding to the difference position in the teacher feature, and determine the second local feature corresponding to the difference position in the student feature, A student detection model is trained based on the difference between the first local feature and the second local feature.
在本公开实施例中,步骤201-203可以分别采用本公开的各实施例中的任一种方式实现,本公开实施例并不对此作出限定,也不再赘述。In the embodiment of the present disclosure, steps 201-203 may be implemented in any one of the embodiments of the present disclosure, which is not limited in the embodiment of the present disclosure, and will not be repeated here.
综上,通过获取经过训练的教师检测模型,以及待训练的学生检测模型;将训练图像输入教师检测模型,得到教师检测模型对训练图像提取的第一特征图,以及根据第一特征图预测的第一物体距离图;将训练图像输入学生检测模型,得到学生检测模型对训练图像提取的第二特征图,以及根据第二特征图预测的第二物体距离图;确定教师检测模型中头部网络输出的第一物体距离图,与学生检测模型中对应头部网络输出的第二物体距离图之间存在差异的差异位置;在第一特征图中,将从差异位置提取的特征作为第一局部特征;在第二特征图中,将从差异位置提取的特征作为第二局部特征;根据第一局部特征和第二局部特征之间的差异,对学生检测模型进行训练。该方法根据第一物体距离图与第二物体距离图之间的差异位置,可在第一特征图中获取差异位置对应的第一局部特征,在第二特征图中获取差异位置对应的第二局部特征,根据第一局部特征与第二局部特征之间的差异对学生检测模型进行训练,可进一步提高学生检测模型对教师检测模型的检测信息的挖掘,提高了学生检测模型的检测精度,这样,简单的学生检测模型可以达到跟复杂的教师检测模型相似的检测精度,而减少了对计算资源的占用与部署成本,并且提高了推算速度。To sum up, by obtaining the trained teacher detection model and the student detection model to be trained; input the training image into the teacher detection model, and obtain the first feature map extracted from the training image by the teacher detection model, and the prediction based on the first feature map The first object distance map; input the training image into the student detection model, obtain the second feature map extracted by the student detection model from the training image, and the second object distance map predicted according to the second feature map; determine the head network in the teacher detection model The difference between the output first object distance map and the second object distance map output by the corresponding head network in the student detection model; in the first feature map, the features extracted from the difference position are used as the first local features; in the second feature map, the features extracted from the difference positions are used as the second local features; according to the difference between the first local features and the second local features, the student detection model is trained. According to the difference position between the first object distance map and the second object distance map, the method can obtain the first local feature corresponding to the difference position in the first feature map, and obtain the second local feature corresponding to the difference position in the second feature map. Local features, the student detection model is trained according to the difference between the first local feature and the second local feature, which can further improve the mining of the detection information of the teacher detection model by the student detection model, and improve the detection accuracy of the student detection model. , the simple student detection model can achieve detection accuracy similar to the complex teacher detection model, which reduces the occupation of computing resources and deployment costs, and improves the calculation speed.
为了提高学生检测模型的检测精度,如图4所示,图4是根据本公开第三实施例的示意图。在本公开实施例中,可根据第一局部特征和第二局部特征之间的差异,对学生检测模型进行训练,图4所示实施例可包括如下步骤:In order to improve the detection accuracy of the student detection model, as shown in FIG. 4 , FIG. 4 is a schematic diagram according to a third embodiment of the present disclosure. In the embodiment of the present disclosure, the student detection model may be trained according to the difference between the first local feature and the second local feature, and the embodiment shown in FIG. 4 may include the following steps:
步骤401,获取经过训练的教师检测模型,以及待训练的学生检测模型。 Step 401, acquire the trained teacher detection model and the student detection model to be trained.
步骤402,将训练图像输入教师检测模型,得到教师检测模型对训练图像提取的第一特征图,以及根据第一特征图预测的第一物体距离图。Step 402: Input the training image into the teacher detection model to obtain the first feature map extracted from the training image by the teacher detection model and the first object distance map predicted according to the first feature map.
步骤403,将训练图像输入学生检测模型,得到学生检测模型对训练图像提取的第二特征图,以及根据第二特征图预测的第二物体距离图。Step 403: Input the training image into the student detection model to obtain a second feature map extracted from the training image by the student detection model and a second object distance map predicted according to the second feature map.
步骤404,根据第二物体距离图与第一物体距离图中的差异位置,在第一特征图中确定差异位置对应的第一局部特征,以及在第二特征图中确定差异位置对应的第二局部特征。 Step 404, according to the difference position between the second object distance map and the first object distance map, determine the first local feature corresponding to the difference position in the first feature map, and determine the second local feature corresponding to the difference position in the second feature map local features.
步骤405,根据第一局部特征和第二局部特征之间的差异,确定损失函数的第一损失项。Step 405: Determine the first loss item of the loss function according to the difference between the first local feature and the second local feature.
在本公开实施例中,将第一局部特征与第二局部特征进行比对,可确定第一局部特征与第二局部特征之间的差异,将该差异作为损失函数的第一损失项。In the embodiment of the present disclosure, the first local feature is compared with the second local feature, and the difference between the first local feature and the second local feature can be determined, and the difference is used as the first loss item of the loss function.
步骤406,根据第一特征图和第二特征图之间的差异,确定损失函数的第二损失项。Step 406: Determine the second loss item of the loss function according to the difference between the first feature map and the second feature map.
可选地,将第一特征图与第二特征图进行比对,可确定第一特征图与第二特征图之间的特征差异,将该特征差异作为损失函数的第二损失项。Optionally, by comparing the first feature map with the second feature map, a feature difference between the first feature map and the second feature map can be determined, and the feature difference can be used as a second loss item of the loss function.
作为本公开实施例的一种可能实现方式,教师检测模型和学生检测模型可分别包括对 应的多个特征提取层,将教师检测模型各特征提取层输出的第一特征图,分别与学生检测模型中对应的特征提取层输出的第二特征图确定特征差异,根据确定出的特征差异,确定损失函数的第二损失项。As a possible implementation of the embodiment of the present disclosure, the teacher detection model and the student detection model may respectively include multiple corresponding feature extraction layers, and the first feature maps output by each feature extraction layer of the teacher detection model are respectively compared with the student detection model The second feature map output by the corresponding feature extraction layer in , determines the feature difference, and determines the second loss item of the loss function according to the determined feature difference.
也就是说,教师检测模型和学生检测模型分别包括对应的多个特征提取层,教师检测模型可根据多个特征提取层提取特征,并输出第一特征图,学生检测模型可根据对应的多个特征提取层提取特征,并输出第二特征图,将教师检测模型输出的第一特征图与学生检测模型输出的第二特征图进行距离计算,可确定教师检测模型的多个特征提取层提取的特征与学生检测模型的对应的多个特征提取层提取的特征之间的特征差异,将该特征差异作为损失函数的第二损失项。That is to say, the teacher detection model and the student detection model respectively include a plurality of corresponding feature extraction layers, the teacher detection model can extract features according to the multiple feature extraction layers, and output the first feature map, and the student detection model can be based on the corresponding multiple feature extraction layers The feature extraction layer extracts features and outputs the second feature map, and calculates the distance between the first feature map output by the teacher detection model and the second feature map output by the student detection model, which can determine the number of features extracted by multiple feature extraction layers of the teacher detection model. The feature difference between the feature and the features extracted by the corresponding multiple feature extraction layers of the student detection model is used as the second loss item of the loss function.
比如,教师检测模型根据多个特征提取层提取的特征为T={t 1,t 2,t 3,t 4,t 5},学生检测模型根据对应的多个特征提取层提取的特征为S={s 1,s 2,s 3,s 4,s 5},通过教师检测模型输出的特征T与学生检测模型输出的特征S进行cos(余弦)距离计算,并通过cos距离判断教师检测模型输出的特征与学生检测模型输出的特征之间的相似度,算出相似损失函数进行优化,以拉近教师检测模型输出的特征与学生检测模型输出的特征之间的距离。其中,cos距离可通过如下公式进行定义: For example, the feature extracted by the teacher detection model based on multiple feature extraction layers is T={t 1 ,t 2 ,t 3 ,t 4 ,t 5 }, and the feature extracted by the student detection model based on the corresponding multiple feature extraction layers is S ={s 1 , s 2 , s 3 , s 4 , s 5 }, calculate the cos (cosine) distance between the feature T output by the teacher’s detection model and the feature S output by the student’s detection model, and judge the teacher’s detection model by the cos distance The similarity between the output features and the features output by the student detection model is calculated to optimize the similarity loss function to shorten the distance between the output features of the teacher detection model and the output features of the student detection model. Among them, the cos distance can be defined by the following formula:
Figure PCTCN2022088005-appb-000001
Figure PCTCN2022088005-appb-000001
此外,在教师检测模型输出的特征与学生检测模型输出的特征越相似时,cos距离就越大,因此,相似损失函数可为S i=1-D i,在本公开实施例中,可将相似损失函数作为损失函数的第二损失项。 In addition, when the features output by the teacher detection model are more similar to those output by the student detection model, the cos distance will be larger. Therefore, the similarity loss function can be S i =1-D i . In the embodiment of the present disclosure, it can be The similarity loss function is used as the second loss term of the loss function.
步骤407,根据损失函数的各损失项,对学生检测模型进行训练。 Step 407, train the student detection model according to each loss item of the loss function.
进一步地,可根据损失函数的第一损失项和第二损失项对学生检测模型进行训练。Further, the student detection model can be trained according to the first loss item and the second loss item of the loss function.
举例而言,如图5所示,将训练图片分别输入教师网络(教师检测模型)和学生网络(学生检测模型),教师网络可输出教师特征(第一特征图),学生网络可输出学生特征(第二特征图),教师特征经过教师检测模型中的头部网络(如3D head)输出第一物体距离图,学生特征经过学生检测模型中的头部网络输出第二物体距离图,将第一物体距离图与第二物体距离图进行距离度量,获取第一物体距离图与第二物体距离图之间的差异位置,确定教师特征中该差异位置对应的第一局部特征,确定学生特征中该差异位置对应的第二局部特征,根据第一局部特征与第二局部特征之间的差异,将该差异作为损失函数的第一损失项,将教师特征和学生特征之间的差异,作为损失函数的第二损失项,根据损失函数的第一损失项和第二损失项对学生网络进行训练。For example, as shown in Figure 5, the training pictures are respectively input into the teacher network (teacher detection model) and the student network (student detection model), the teacher network can output teacher features (the first feature map), and the student network can output student features (the second feature map), the teacher features output the first object distance map through the head network (such as 3D head) in the teacher detection model, the student features output the second object distance map through the head network in the student detection model, and the second Carry out distance measurement between the first object distance map and the second object distance map, obtain the difference position between the first object distance map and the second object distance map, determine the first local feature corresponding to the difference position in the teacher feature, and determine the difference position in the student feature The second local feature corresponding to the difference position, according to the difference between the first local feature and the second local feature, the difference is used as the first loss item of the loss function, and the difference between the teacher feature and the student feature is used as the loss The second loss term of the function, the student network is trained according to the first loss term and the second loss term of the loss function.
在本公开实施例中,步骤401-404可以分别采用本公开的各实施例中的任一种方式实 现,本公开实施例并不对此作出限定,也不再赘述。In the embodiments of the present disclosure, steps 401-404 can be implemented in any of the embodiments of the present disclosure, which are not limited in the embodiments of the present disclosure, and will not be repeated here.
综上,通过获取经过训练的教师检测模型,以及待训练的学生检测模型;将训练图像输入教师检测模型,得到教师检测模型对训练图像提取的第一特征图,以及根据第一特征图预测的第一物体距离图;将训练图像输入学生检测模型,得到学生检测模型对训练图像提取的第二特征图,以及根据第二特征图预测的第二物体距离图;根据第二物体距离图与第一物体距离图中的差异位置,在第一特征图中确定差异位置对应的第一局部特征,以及在第二特征图中确定所述差异位置对应的第二局部特征;根据第一局部特征和第二局部特征之间的差异,确定损失函数的第一损失项;根据第一特征图和第二特征图之间的差异,确定损失函数的第二损失项;根据损失函数的各损失项,对学生检测模型进行训练。该方法根据教师检测模型和学生检测模型输出的特征图对应的距离图的差异位置,该差异位置在对应特征图中的特征之间的差异,作为损失函数的第一损失项,以及特征图之间的差异作为损失函数第二损失项,根据第一损失项和第二损失项对学生检测模型进行训练,可提高学生检测模型的检测精度,这样,简单的学生检测模型可以达到跟复杂的教师检测模型相似的检测精度,而减少了对计算资源的占用与部署成本,并且提高了推算速度。To sum up, by obtaining the trained teacher detection model and the student detection model to be trained; input the training image into the teacher detection model, and obtain the first feature map extracted from the training image by the teacher detection model, and the prediction based on the first feature map The first object distance map; the training image is input into the student detection model to obtain the second feature map extracted by the student detection model from the training image, and the second object distance map predicted according to the second feature map; according to the second object distance map and the first object distance map A difference position in an object distance map, determining the first local feature corresponding to the difference position in the first feature map, and determining the second local feature corresponding to the difference position in the second feature map; according to the first local feature and The difference between the second local features determines the first loss item of the loss function; according to the difference between the first feature map and the second feature map, determines the second loss item of the loss function; according to each loss item of the loss function, Train the student detection model. In this method, according to the difference position of the distance map corresponding to the feature map output by the teacher detection model and the student detection model, the difference between the features of the difference position in the corresponding feature map is used as the first loss item of the loss function, and the difference between the feature maps The difference between them is used as the second loss item of the loss function, and the student detection model is trained according to the first loss item and the second loss item, which can improve the detection accuracy of the student detection model. In this way, the simple student detection model can achieve the same level as the complex teacher The detection accuracy of the detection model is similar, which reduces the occupation of computing resources and deployment costs, and improves the calculation speed.
为了进一步提高学生检测模型的检测精度,如图6所示,图6是根据本公开第四实施例的示意图。在本公开实施例中,对学生检测模型进行训练的损失函数还可包括第三损失项,图6所示实施例可包括如下步骤:In order to further improve the detection accuracy of the student detection model, as shown in FIG. 6 , FIG. 6 is a schematic diagram according to a fourth embodiment of the present disclosure. In the embodiment of the present disclosure, the loss function for training the student detection model may also include a third loss item, and the embodiment shown in FIG. 6 may include the following steps:
步骤601,获取经过训练的教师检测模型,以及待训练的学生检测模型。 Step 601, acquire the trained teacher detection model and the student detection model to be trained.
步骤602,将训练图像输入教师检测模型,得到教师检测模型对训练图像提取的第一特征图,以及根据第一特征图预测的第一物体距离图。Step 602: Input the training image into the teacher detection model to obtain the first feature map extracted from the training image by the teacher detection model and the first object distance map predicted according to the first feature map.
步骤603,将训练图像输入学生检测模型,得到学生检测模型对训练图像提取的第二特征图,以及根据第二特征图预测的第二物体距离图。Step 603: Input the training image into the student detection model to obtain the second feature map extracted from the training image by the student detection model and the second object distance map predicted according to the second feature map.
步骤604,根据第二物体距离图与第一物体距离图中的差异位置,在第一特征图中确定差异位置对应的第一局部特征,以及在第二特征图中确定差异位置对应的第二局部特征。 Step 604, according to the difference position between the second object distance map and the first object distance map, determine the first local feature corresponding to the difference position in the first feature map, and determine the second local feature corresponding to the difference position in the second feature map local features.
步骤605,根据第一局部特征和第二局部特征之间的差异,确定损失函数的第一损失项。Step 605: Determine the first loss item of the loss function according to the difference between the first local feature and the second local feature.
步骤606,根据第一特征图和第二特征图之间的差异,确定损失函数的第二损失项。Step 606: Determine the second loss item of the loss function according to the difference between the first feature map and the second feature map.
步骤607,获取训练样本的标注。 Step 607, obtaining labels of training samples.
在本公开实施例中,可预先在训练样本上进行物体位置或者物体尺寸的标注。In the embodiment of the present disclosure, the object position or object size may be marked on the training samples in advance.
步骤608,根据训练样本标注的物体位置与学生检测模型预测的物体位置之间的差异,和/或根据训练样本标注的物体尺寸与学生检测模型预测的物体尺寸之间的差异,确定第三损失项。 Step 608, according to the difference between the object position marked by the training sample and the object position predicted by the student detection model, and/or according to the difference between the object size marked by the training sample and the object size predicted by the student detection model, determine the third loss item.
作为一种示例,可将训练样本标注的物体位置与学生检测模型预测的物体位置进行比对,确定训练样本标注的物体位置与学生检测模型预测的物体位置之间的差异,作为损失 函数的第三损失项,对学生检测模型进行训练。As an example, the object position marked by the training sample can be compared with the object position predicted by the student detection model, and the difference between the object position marked by the training sample and the object position predicted by the student detection model can be determined as the first loss function Three loss terms, to train the student detection model.
作为另一种示例,将训练样本标注的物体尺寸与学生检测模型预测的物体尺寸进行比对,确定训练样本标注的物体尺寸与学生检测模型预测的物体尺寸之间的差异,作为损失函数的第三损失项,对学生检测模型进行训练。As another example, compare the object size marked by the training sample with the object size predicted by the student detection model, determine the difference between the object size marked by the training sample and the object size predicted by the student detection model, and use it as the first loss function Three loss terms, to train the student detection model.
作为另一种示例,可将训练样本标注的物体位置与学生检测模型预测的物体位置进行比对,确定训练样本标注的物体位置与学生检测模型预测的物体位置之间的差异,将训练样本标注的物体尺寸与学生检测模型预测的物体尺寸进行比对,确定训练样本标注的物体尺寸与学生检测模型预测的物体尺寸之间的差异,训练样本标注的物体尺寸与学生检测模型预测的物体尺寸之间的差异,作为损失函数的第三损失项,对学生检测模型进行训练。As another example, the object position marked by the training sample can be compared with the object position predicted by the student detection model, the difference between the object position marked by the training sample and the object position predicted by the student detection model can be determined, and the training sample can be marked Compare the size of the object with the object size predicted by the student detection model to determine the difference between the object size marked by the training sample and the object size predicted by the student detection model, and the difference between the object size marked by the training sample and the object size predicted by the student detection model The difference between , as the third loss term of the loss function, trains the student detection model.
步骤609,根据损失函数的各损失项,对学生检测模型进行训练。 Step 609, train the student detection model according to each loss item of the loss function.
进一步地,可根据损失函数的第一损失项、第二损失项和第三损失项对学生检测模型进行训练。Further, the student detection model can be trained according to the first loss item, the second loss item and the third loss item of the loss function.
在本公开实施例中,步骤601-606可以分别采用本公开的各实施例中的任一种方式实现,本公开实施例并不对此作出限定,也不再赘述。In the embodiment of the present disclosure, steps 601-606 may be implemented in any one of the embodiments of the present disclosure, which is not limited in the embodiment of the present disclosure, and will not be repeated here.
综上,通过获取经过训练的教师检测模型,以及待训练的学生检测模型;将训练图像输入教师检测模型,得到教师检测模型对训练图像提取的第一特征图,以及根据第一特征图预测的第一物体距离图;将训练图像输入学生检测模型,得到学生检测模型对训练图像提取的第二特征图,以及根据第二特征图预测的第二物体距离图;根据第二物体距离图与第一物体距离图中的差异位置,在第一特征图中确定差异位置对应的第一局部特征,以及在第二特征图中确定所述差异位置对应的第二局部特征;根据第一局部特征和第二局部特征之间的差异,确定损失函数的第一损失项;根据第一特征图和第二特征图之间的差异,确定损失函数的第二损失项;获取训练样本的标注;根据训练样本标注的物体位置与学生检测模型预测的物体位置之间的差异,和/或根据训练样本标注的物体尺寸与学生检测模型预测的物体尺寸之间的差异,确定第三损失项;根据损失函数的各损失项,对学生检测模型进行训练。该方法教师检测模型和学生检测模型输出的特征图对应的距离图的差异位置,该差异位置在对应特征图中的特征之间的差异,作为损失函数的第一损失项,以及特征图之间的差异作为损失函数第二损失项,根据训练样本标注的物体位置与学生检测模型预测的物体位置之间的差异,和/或根据训练样本标注的物体尺寸与学生检测模型预测的物体尺寸之间的差异作为损失函数的第三损失项,可提高学生检测模型的检测精度,这样,简单的学生检测模型可以达到跟复杂的教师检测模型相似的检测精度,而减少了对计算资源的占用与部署成本,并且提高了推算速度。To sum up, by obtaining the trained teacher detection model and the student detection model to be trained; input the training image into the teacher detection model, and obtain the first feature map extracted from the training image by the teacher detection model, and the prediction based on the first feature map The first object distance map; the training image is input into the student detection model to obtain the second feature map extracted by the student detection model from the training image, and the second object distance map predicted according to the second feature map; according to the second object distance map and the first object distance map A difference position in an object distance map, determining the first local feature corresponding to the difference position in the first feature map, and determining the second local feature corresponding to the difference position in the second feature map; according to the first local feature and The difference between the second local features determines the first loss item of the loss function; according to the difference between the first feature map and the second feature map, determines the second loss item of the loss function; obtains the label of the training sample; according to the training The difference between the position of the object marked by the sample and the position of the object predicted by the student detection model, and/or according to the difference between the size of the object marked by the training sample and the size of the object predicted by the student detection model, determine the third loss term; according to the loss function Each loss term of , trains the student detection model. In this method, the difference position of the distance map corresponding to the feature map output by the teacher detection model and the student detection model, the difference position between the features in the corresponding feature map, as the first loss item of the loss function, and the difference between the feature maps As the second loss term of the loss function, the difference between the object position marked according to the training sample and the object position predicted by the student detection model, and/or between the object size marked according to the training sample and the object size predicted by the student detection model The difference of is used as the third loss term of the loss function, which can improve the detection accuracy of the student detection model. In this way, the simple student detection model can achieve similar detection accuracy as the complex teacher detection model, and reduce the occupancy and deployment of computing resources cost, and increased calculation speed.
本公开实施例的物体检测模型的训练方法,通过获取经过训练的教师检测模型,以及待训练的学生检测模型;将训练图像输入教师检测模型,得到教师检测模型对训练图像提取的第一特征图,以及根据第一特征图预测的第一物体距离图;将训练图像输入学生检测 模型,得到学生检测模型对训练图像提取的第二特征图,以及根据第二特征图预测的第二物体距离图;根据第二物体距离图与第一物体距离图中的差异位置,在第一特征图中确定差异位置对应的第一局部特征,以及在第二特征图中确定差异位置对应的第二局部特征;根据第一局部特征和第二局部特征之间的差异,对学生检测模型进行训练,该方法根据教师检测模型和学生检测模型输出的特征图对应的距离图的差异位置,该差异位置在对应特征图中的特征之间的差异,对学生检测模型进行训练,可进一步提高学生检测模型对教师检测模型的检测信息的挖掘,提高了学生检测模型的检测精度,这样,简单的学生检测模型可以达到跟复杂的教师检测模型相似的检测精度,而减少了对计算资源的占用,减少部署成本,并且提高了推算速度。The training method of the object detection model in the embodiment of the present disclosure obtains the trained teacher detection model and the student detection model to be trained; inputs the training image into the teacher detection model, and obtains the first feature map extracted from the training image by the teacher detection model , and the first object distance map predicted according to the first feature map; input the training image into the student detection model, and obtain the second feature map extracted by the student detection model from the training image, and the second object distance map predicted according to the second feature map ; According to the difference position between the second object distance map and the first object distance map, determine the first local feature corresponding to the difference position in the first feature map, and determine the second local feature corresponding to the difference position in the second feature map ; According to the difference between the first local feature and the second local feature, the student detection model is trained. This method is based on the difference position of the distance map corresponding to the feature map output by the teacher detection model and the student detection model. The difference position is in the corresponding The difference between the features in the feature map, the training of the student detection model can further improve the mining of the detection information of the teacher detection model by the student detection model, and improve the detection accuracy of the student detection model. In this way, the simple student detection model can be It achieves detection accuracy similar to that of the complex teacher detection model, reduces the occupation of computing resources, reduces deployment costs, and improves the calculation speed.
图7是根据本公开第五实施例的示意图,在本公开实施例中,可将经过训练的学生检测模型用于图像检测,基于此,本公开提出一种图像检测方法。本公开实施例的图像检测方法可应用于本公开实施例的图像检测装置,该装置可被配置于电子设备中。其中,该电子设备可以是移动终端,例如,手机、平板电脑、个人数字助理等具有各种操作系统的硬件设备。如图7所示,该图像检测方法包括:FIG. 7 is a schematic diagram according to a fifth embodiment of the present disclosure. In the embodiment of the present disclosure, a trained student detection model can be used for image detection. Based on this, the present disclosure proposes an image detection method. The image detection method of the embodiment of the present disclosure can be applied to the image detection device of the embodiment of the present disclosure, and the device can be configured in an electronic device. Wherein, the electronic device may be a mobile terminal, for example, a mobile phone, a tablet computer, a personal digital assistant, and other hardware devices with various operating systems. As shown in Figure 7, the image detection method includes:
步骤701,获取单目图像。 Step 701, acquire a monocular image.
在本公开实施例中,可通过图像采集设备获取单目图像。In the embodiment of the present disclosure, a monocular image may be acquired by an image acquisition device.
步骤702,采用经过训练的学生检测模型对单目图像进行图像检测,以得到单目图像中物体的物体信息;其中,学生检测模型,是采用图1至图6所述的训练方法训练得到。 Step 702, using the trained student detection model to perform image detection on the monocular image to obtain the object information of the object in the monocular image; wherein, the student detection model is obtained by training using the training methods described in FIGS. 1 to 6 .
可选地,将单目图像输入经过训练的学生检测模型,经过训练的学生检测模型可输出单目图像中物体的物体信息,比如,物体的3D位置信息、物体的长宽高以及物体的朝向角共七个自由度的信息。其中,需要说明的是,学生检测模型,是采用图1至图6所述的训练方法训练得到。Optionally, input the monocular image into the trained student detection model, and the trained student detection model can output the object information of the object in the monocular image, such as the 3D position information of the object, the length, width and height of the object, and the orientation of the object There are seven degrees of freedom in total. Wherein, it should be noted that the student detection model is trained by using the training methods described in FIGS. 1 to 6 .
本公开实施例的图像检测方法,通过获取单目图像;采用经过训练的学生检测模型对单目图像进行图像检测,以得到单目图像中物体的物体信息,其中,学生检测模型,是采用图1至图6所述的训练方法训练得到。由此,采用经过训练的学生检测模型对单目图像进行检测,可提高图像的检测精度。In the image detection method of the embodiment of the present disclosure, by acquiring a monocular image; using a trained student detection model to perform image detection on the monocular image to obtain object information of the object in the monocular image, wherein the student detection model is to use the image 1 to the training method described in Figure 6 to obtain. Therefore, using the trained student detection model to detect the monocular image can improve the detection accuracy of the image.
为了实现上述图1至图6实施例,本公开实施例还提出一种物体检测模型的训练装置。In order to realize the above-mentioned embodiments in FIG. 1 to FIG. 6 , an embodiment of the present disclosure further proposes an object detection model training device.
图8是根据本公开第六实施例的示意图,如图8所示,该物体检测模型的训练装置800包括:第一获取模块810、第一处理模块820、第二处理模块830、第一确定模块840、训练模块850。Fig. 8 is a schematic diagram according to the sixth embodiment of the present disclosure. As shown in Fig. 8, the object detection model training device 800 includes: a first acquisition module 810, a first processing module 820, a second processing module 830, a first determination Module 840, training module 850.
其中,第一获取模块810,用于获取经过训练的教师检测模型,以及待训练的学生检测模型;第一处理模块820,用于将训练图像输入所述教师检测模型,得到教师检测模型对训练图像提取的第一特征图,以及根据第一特征图预测的第一物体距离图;第二处理模块830,用于将训练图像输入学生检测模型,得到学生检测模型对训练图像提取的第二特 征图,以及根据第二特征图预测的第二物体距离图;第一确定模块840,用于根据第二物体距离图与第一物体距离图中的差异位置,在第一特征图中确定差异位置对应的第一局部特征,以及在第二特征图中确定差异位置对应的第二局部特征;训练模块850,用于根据第一局部特征和第二局部特征之间的差异,对学生检测模型进行训练。Among them, the first acquisition module 810 is used to obtain the trained teacher detection model and the student detection model to be trained; the first processing module 820 is used to input the training image into the teacher detection model to obtain the teacher detection model pair for training The first feature map extracted from the image, and the first object distance map predicted according to the first feature map; the second processing module 830 is used to input the training image into the student detection model, and obtain the second feature extracted by the student detection model from the training image map, and the second object distance map predicted according to the second feature map; the first determination module 840 is used to determine the difference position in the first feature map according to the difference position between the second object distance map and the first object distance map The corresponding first local feature, and the second local feature corresponding to the difference position determined in the second feature map; the training module 850 is used to perform the student detection model according to the difference between the first local feature and the second local feature train.
作为本公开实施例的一种可能实现方式,第一确定模块840,用于:确定教师检测模型中头部网络输出的所述第一物体距离图,与学生检测模型中对应头部网络输出的第二物体距离图之间存在差异的差异位置;在第一特征图中,将从差异位置提取的特征作为第一局部特征;在第二特征图中,将从差异位置提取的特征作为第二局部特征。As a possible implementation of the embodiment of the present disclosure, the first determination module 840 is configured to: determine that the first object distance map output by the head network in the teacher detection model is different from the output of the corresponding head network in the student detection model. The difference position where there is a difference between the second object distance map; in the first feature map, the feature extracted from the difference position is used as the first local feature; in the second feature map, the feature extracted from the difference position is used as the second local features.
作为本公开实施例的一种可能实现方式,第一确定模块840,还用于:将教师检测模型中头部网络输出的第一物体距离图与学生检测模型中对应头部网络输出的第二物体距离图中同一位置的距离值比对差异;将距离值之间差异大于阈值的位置作为差异位置。As a possible implementation of the embodiment of the present disclosure, the first determination module 840 is further configured to: combine the first object distance map output by the head network in the teacher detection model with the second object distance map output by the corresponding head network in the student detection model The distance value comparison difference of the same position in the object distance map; the position where the difference between the distance values is greater than the threshold is taken as the difference position.
作为本公开实施例的一种可能实现方式,训练模块850,用于:根据第一局部特征和第二局部特征之间的差异,确定损失函数的第一损失项;根据第一特征图和第二特征图之间的差异,确定损失函数的第二损失项;根据损失函数的各损失项,对学生检测模型进行训练。As a possible implementation of the embodiment of the present disclosure, the training module 850 is configured to: determine the first loss item of the loss function according to the difference between the first local feature and the second local feature; The difference between the two feature maps determines the second loss item of the loss function; according to each loss item of the loss function, the student detection model is trained.
作为本公开实施例的一种可能实现方式,损失函数还包括第三损失项;物体检测模型的训练装置800还包括:第二获取模块、第二确定模块。As a possible implementation of the embodiment of the present disclosure, the loss function further includes a third loss item; the object detection model training apparatus 800 further includes: a second acquisition module and a second determination module.
其中,第二获取模块,用于获取训练样本的标注;第二确定模块,用于根据训练样本标注的物体位置与学生检测模型预测的物体位置之间的差异,和/或根据训练样本标注的物体尺寸与学生检测模型预测的物体尺寸之间的差异,确定第三损失项。Wherein, the second acquisition module is used to obtain the annotation of the training sample; the second determination module is used to use the difference between the object position marked according to the training sample and the object position predicted by the student detection model, and/or according to the difference between the object position marked by the training sample The difference between the object size and the object size predicted by the student detection model determines a third loss term.
作为本公开实施例的一种可能实现方式,教师检测模型和所述学生检测模型分别包括对应的多个特征提取层;训练模块850,还用于:将所述教师检测模型各特征提取层输出的第一特征图,分别与所述学生检测模型中对应的特征提取层输出的第二特征图确定特征差异;根据确定出的特征差异,确定所述损失函数的第二损失项。As a possible implementation of the embodiment of the present disclosure, the teacher detection model and the student detection model respectively include a plurality of corresponding feature extraction layers; the training module 850 is also configured to: output each feature extraction layer of the teacher detection model The first feature map of the student detection model is used to determine the feature difference with the second feature map output by the corresponding feature extraction layer in the student detection model; according to the determined feature difference, the second loss item of the loss function is determined.
本公开实施例的物体检测模型的训练装置,通过获取经过训练的教师检测模型,以及待训练的学生检测模型;将训练图像输入教师检测模型,得到教师检测模型对训练图像提取的第一特征图,以及根据第一特征图预测的第一物体距离图;将训练图像输入学生检测模型,得到学生检测模型对训练图像提取的第二特征图,以及根据第二特征图预测的第二物体距离图;根据第二物体距离图与第一物体距离图中的差异位置,在第一特征图中确定差异位置对应的第一局部特征,以及在第二特征图中确定差异位置对应的第二局部特征;根据第一局部特征和第二局部特征之间的差异,对学生检测模型进行训练,该装置可实现根据教师检测模型和学生检测模型输出的特征图对应的距离图的差异位置,该差异位置在对应特征图中的特征之间的差异,对学生检测模型进行训练,可进一步提高学生检测模型对教师检测模型的检测信息的挖掘,提高了学生检测模型的检测精度,这样,简单的学生 检测模型可以达到跟复杂的教师检测模型相似的检测精度,而减少了对计算资源的占用与部署成本,并且提高了推算速度。The training device of the object detection model in the embodiment of the present disclosure acquires the trained teacher detection model and the student detection model to be trained; inputs the training image into the teacher detection model, and obtains the first feature map extracted by the teacher detection model from the training image , and the first object distance map predicted according to the first feature map; input the training image into the student detection model, and obtain the second feature map extracted by the student detection model from the training image, and the second object distance map predicted according to the second feature map ; According to the difference position between the second object distance map and the first object distance map, determine the first local feature corresponding to the difference position in the first feature map, and determine the second local feature corresponding to the difference position in the second feature map ; According to the difference between the first local feature and the second local feature, the student detection model is trained, and the device can realize the difference position of the distance map corresponding to the feature map output according to the teacher detection model and the student detection model, the difference position In the difference between the features in the corresponding feature map, training the student detection model can further improve the mining of the detection information of the teacher detection model by the student detection model, and improve the detection accuracy of the student detection model. In this way, the simple student detection The model can achieve detection accuracy similar to that of the complex teacher detection model, while reducing the occupancy of computing resources and deployment costs, and improving the inference speed.
为了实现图7所述实施例,本公开实施例还提出一种图像检测装置。In order to realize the embodiment described in FIG. 7 , an embodiment of the present disclosure further proposes an image detection device.
图9是根据本公开第七实施例的示意图,如图9所示,该图像检测装置900包括:获取模块910、检测模块920。FIG. 9 is a schematic diagram according to a seventh embodiment of the present disclosure. As shown in FIG. 9 , the image detection device 900 includes: an acquisition module 910 and a detection module 920 .
其中,获取模块910,用于获取单目图像;检测模块920,用于采用经过训练的学生检测模型对单目图像进行图像检测,以得到单目图像中物体的物体信息;其中,所述学生检测模型,是采用图8所述的训练装置训练得到。Among them, the acquisition module 910 is used to acquire the monocular image; the detection module 920 is used to perform image detection on the monocular image using the trained student detection model to obtain the object information of the object in the monocular image; wherein the student The detection model is obtained through training using the training device described in FIG. 8 .
本公开实施例的图像检测装置,通过获取单目图像;采用经过训练的学生检测模型对单目图像进行图像检测,以得到单目图像中物体的物体信息,其中,学生检测模型,是采用图8所述的训练装置训练得到。由此,采用经过训练的学生检测模型对单目图像进行检测,可提高图像的检测精度。The image detection device in the embodiment of the present disclosure acquires a monocular image; uses a trained student detection model to perform image detection on the monocular image to obtain the object information of the object in the monocular image, wherein the student detection model uses a graph 8 obtained by training with the training device. Therefore, using the trained student detection model to detect the monocular image can improve the detection accuracy of the image.
根据本公开的实施例,本公开还提供了一种电子设备、一种可读存储介质和一种计算机程序产品。According to the embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium, and a computer program product.
图10示出了可以用来实施本公开的实施例的示例电子设备1000的示意性框图。电子设备旨在表示各种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本公开的实现。FIG. 10 shows a schematic block diagram of an example electronic device 1000 that may be used to implement embodiments of the present disclosure. Electronic device is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are by way of example only, and are not intended to limit implementations of the disclosure described and/or claimed herein.
如图10所示,设备1000包括计算单元1001,其可以根据存储在ROM(Read-Only Memory,只读存储器)1002中的计算机程序或者从存储单元1008加载到RAM(Random Access Memory,随机访问/存取存储器)1003中的计算机程序,来执行各种适当的动作和处理。在RAM 1003中,还可存储设备1000操作所需的各种程序和数据。计算单元1001、ROM 1002以及RAM 1003通过总线1004彼此相连。I/O(Input/Output,输入/输出)接口1005也连接至总线1004。As shown in FIG. 10 , the device 1000 includes a computing unit 1001, which can be loaded into a RAM (Random Access Memory, random access/ accesses the computer program in the memory) 1003 to execute various appropriate actions and processes. In the RAM 1003, various programs and data necessary for the operation of the device 1000 can also be stored. The computing unit 1001, ROM 1002, and RAM 1003 are connected to each other through a bus 1004. An I/O (Input/Output, input/output) interface 1005 is also connected to the bus 1004 .
设备1000中的多个部件连接至I/O接口1005,包括:输入单元1006,例如键盘、鼠标等;输出单元1007,例如各种类型的显示器、扬声器等;存储单元1008,例如磁盘、光盘等;以及通信单元1009,例如网卡、调制解调器、无线通信收发机等。通信单元1009允许设备1000通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。Multiple components in the device 1000 are connected to the I/O interface 1005, including: an input unit 1006, such as a keyboard, a mouse, etc.; an output unit 1007, such as various types of displays, speakers, etc.; a storage unit 1008, such as a magnetic disk, an optical disk, etc. ; and a communication unit 1009, such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 1009 allows the device 1000 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks.
计算单元1001可以是各种具有处理和计算能力的通用和/或专用处理组件。计算单元1001的一些示例包括但不限于CPU(Central Processing Unit,中央处理单元)、GPU(Graphic Processing Units,图形处理单元)、各种专用的AI(Artificial Intelligence,人工智能)计 算芯片、各种运行机器学习模型算法的计算单元、DSP(Digital Signal Processor,数字信号处理器)、以及任何适当的处理器、控制器、微控制器等。计算单元1001执行上文所描述的各个方法和处理,例如物体检测模型的训练方法或者图像检测方法。例如,在一些实施例中,物体检测模型的训练方法或者图像检测方法可被实现为计算机软件程序,其被有形地包含于机器可读介质,例如存储单元1008。在一些实施例中,计算机程序的部分或者全部可以经由ROM 1002和/或通信单元1009而被载入和/或安装到设备1000上。当计算机程序加载到RAM 1003并由计算单元1001执行时,可以执行上文描述的物体检测模型的训练方法的一个或多个步骤。备选地,在其他实施例中,计算单元1001可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行物体检测模型的训练方法或者图像检测方法。The computing unit 1001 may be various general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of computing unit 1001 include but are not limited to CPU (Central Processing Unit, central processing unit), GPU (Graphic Processing Units, graphics processing unit), various dedicated AI (Artificial Intelligence, artificial intelligence) computing chips, various operating The computing unit of the machine learning model algorithm, DSP (Digital Signal Processor, digital signal processor), and any appropriate processor, controller, microcontroller, etc. The calculation unit 1001 executes various methods and processes described above, such as a training method of an object detection model or an image detection method. For example, in some embodiments, a method for training an object detection model or a method for image detection may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 1008 . In some embodiments, part or all of the computer program may be loaded and/or installed on the device 1000 via the ROM 1002 and/or the communication unit 1009. When the computer program is loaded into RAM 1003 and executed by computing unit 1001, one or more steps of the method for training an object detection model described above can be performed. Alternatively, in other embodiments, the computing unit 1001 may be configured in any other appropriate way (for example, by means of firmware) to execute an object detection model training method or an image detection method.
本文中以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、FPGA(Field Programmable Gate Array,现场可编程门阵列)、ASIC(Application-Specific Integrated Circuit,专用集成电路)、ASSP(Application Specific Standard Product,专用标准产品)、SOC(System On Chip,芯片上系统的系统)、CPLD(Complex Programmable Logic Device,复杂可编程逻辑设备)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。Various implementations of the systems and technologies described above in this paper can be implemented in digital electronic circuit systems, integrated circuit systems, FPGA (Field Programmable Gate Array, Field Programmable Gate Array), ASIC (Application-Specific Integrated Circuit, application-specific integrated circuit) , ASSP (Application Specific Standard Product, dedicated standard product), SOC (System On Chip, system on a chip), CPLD (Complex Programmable Logic Device, complex programmable logic device), computer hardware, firmware, software, and/or realized in combination of them. These various embodiments may include being implemented in one or more computer programs executable and/or interpreted on a programmable system including at least one programmable processor, the programmable processor Can be special-purpose or general-purpose programmable processor, can receive data and instruction from storage system, at least one input device, and at least one output device, and transmit data and instruction to this storage system, this at least one input device, and this at least one output device an output device.
用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器,使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。Program codes for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, a special purpose computer, or other programmable data processing devices, so that the program codes, when executed by the processor or controller, make the functions/functions specified in the flow diagrams and/or block diagrams Action is implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、RAM、ROM、EPROM(Electrically Programmable Read-Only-Memory,可擦除可编程只读存储器)或快闪存储器、光纤、CD-ROM(Compact Disc Read-Only Memory,便捷式紧凑盘只读存储器)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include electrical connections based on one or more wires, portable computer disks, hard disks, RAM, ROM, EPROM (Electrically Programmable Read-Only-Memory, Erasable Programmable Read-Only Memory) Or flash memory, optical fiber, CD-ROM (Compact Disc Read-Only Memory, portable compact disk read-only memory), optical storage device, magnetic storage device, or any suitable combination of the above.
为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术,该计算机具有: 用于向用户显示信息的显示装置(例如,CRT(Cathode-Ray Tube,阴极射线管)或者LCD(Liquid Crystal Display,液晶显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。To provide interaction with the user, the systems and techniques described herein can be implemented on a computer having: a display device (e.g., a CRT (Cathode-Ray Tube) or LCD ( Liquid Crystal Display (LCD) monitor); and a keyboard and pointing device (such as a mouse or trackball) through which a user can provide input to a computer. Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and can be in any form (including Acoustic input, speech input or, tactile input) to receive input from the user.
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:LAN(Local Area Network,局域网)、WAN(Wide Area Network,广域网)、互联网和区块链网络。The systems and techniques described herein can be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., as a a user computer having a graphical user interface or web browser through which a user can interact with embodiments of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system. The components of the system can be interconnected by any form or medium of digital data communication, eg, a communication network. Examples of communication networks include: LAN (Local Area Network, local area network), WAN (Wide Area Network, wide area network), the Internet, and blockchain networks.
计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务器可以是云服务器,又称为云计算服务器或云主机,是云计算服务体系中的一项主机产品,以解决了传统物理主机与VPS服务("Virtual Private Server",或简称"VPS")中,存在的管理难度大,业务扩展性弱的缺陷。服务器也可以为分布式系统的服务器,或者是结合了区块链的服务器。A computer system may include clients and servers. Clients and servers are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also known as cloud computing server or cloud host, which is a host product in the cloud computing service system to solve the problem of traditional physical host and VPS service ("Virtual Private Server", or "VPS") Among them, there are defects such as difficult management and weak business scalability. The server can also be a server of a distributed system, or a server combined with a blockchain.
其中,需要说明的是,人工智能是研究使计算机来模拟人的某些思维过程和智能行为(如学习、推理、思考、规划等)的学科,既有硬件层面的技术也有软件层面的技术。人工智能硬件技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理等技术;人工智能软件技术主要包括计算机视觉技术、语音识别技术、自然语言处理技术以及机器学习/距离学习、大数据处理技术、知识图谱技术等几大方向。Among them, it should be noted that artificial intelligence is a discipline that studies the use of computers to simulate certain human thinking processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.), including both hardware-level technology and software-level technology. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, and big data processing; artificial intelligence software technologies mainly include computer vision technology, speech recognition technology, natural language processing technology, and machine learning/distance Learning, big data processing technology, knowledge map technology and other major directions.
应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本发公开中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本公开公开的技术方案所期望的结果,本文在此不进行限制。It should be understood that steps may be reordered, added or deleted using the various forms of flow shown above. For example, each step described in the present disclosure may be executed in parallel, sequentially, or in a different order, as long as the desired result of the technical solution disclosed in the present disclosure can be achieved, no limitation is imposed herein.
上述具体实施方式,并不构成对本公开保护范围的限制。本领域技术人员应该明白的是,根据设计要求和其他因素,可以进行各种修改、组合、子组合和替代。任何在本公开的精神和原则之内所作的修改、等同替换和改进等,均应包含在本公开保护范围之内。The specific implementation manners described above do not limit the protection scope of the present disclosure. It should be apparent to those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made depending on design requirements and other factors. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present disclosure shall be included within the protection scope of the present disclosure.

Claims (17)

  1. 一种物体检测模型的训练方法,包括:A training method for an object detection model, comprising:
    获取经过训练的教师检测模型,以及待训练的学生检测模型;Obtain the trained teacher detection model and the student detection model to be trained;
    将训练图像输入所述教师检测模型,得到所述教师检测模型对所述训练图像提取的第一特征图,以及根据所述第一特征图预测的第一物体距离图;Inputting the training image into the teacher detection model to obtain a first feature map extracted by the teacher detection model from the training image, and a first object distance map predicted according to the first feature map;
    将所述训练图像输入所述学生检测模型,得到所述学生检测模型对所述训练图像提取的第二特征图,以及根据所述第二特征图预测的第二物体距离图;inputting the training image into the student detection model to obtain a second feature map extracted from the training image by the student detection model, and a second object distance map predicted according to the second feature map;
    根据所述第二物体距离图与所述第一物体距离图中的差异位置,在所述第一特征图中确定所述差异位置对应的第一局部特征,以及在所述第二特征图中确定所述差异位置对应的第二局部特征;According to the difference position between the second object distance map and the first object distance map, determine the first local feature corresponding to the difference position in the first feature map, and determine in the second feature map determining a second local feature corresponding to the difference position;
    根据所述第一局部特征和所述第二局部特征之间的差异,对所述学生检测模型进行训练。The student detection model is trained based on the difference between the first local feature and the second local feature.
  2. 根据权利要求1所述的训练方法,其中,所述根据所述第二物体距离图与所述第一物体距离图中的差异位置,在所述第一特征图中确定所述差异位置对应的第一局部特征,以及在所述第二特征图中确定所述差异位置对应的第二局部特征,包括:The training method according to claim 1, wherein, according to the difference position between the second object distance map and the first object distance map, determine the difference position corresponding to the difference position in the first feature map The first local feature, and determining the second local feature corresponding to the difference position in the second feature map includes:
    确定所述教师检测模型中头部网络输出的所述第一物体距离图,与所述学生检测模型中对应头部网络输出的所述第二物体距离图之间存在差异的差异位置;determining the difference position between the first object distance map output by the head network in the teacher detection model and the second object distance map output by the corresponding head network in the student detection model;
    在所述第一特征图中,将从所述差异位置提取的特征作为所述第一局部特征;In the first feature map, using the features extracted from the difference positions as the first local features;
    在所述第二特征图中,将从所述差异位置提取的特征作为所述第二局部特征。In the second feature map, the features extracted from the difference positions are used as the second local features.
  3. 根据权利要求2所述的训练方法,其中,所述确定所述教师检测模型中头部网络输出的所述第一物体距离图,与所述学生检测模型中对应头部网络输出的所述第二物体距离图之间存在差异的差异位置,包括:The training method according to claim 2, wherein said determining the first object distance map output by the head network in the teacher detection model is the same as the first object distance map output by the corresponding head network in the student detection model Difference locations where differences exist between the two object distance maps, including:
    将所述教师检测模型中头部网络输出的所述第一物体距离图与所述学生检测模型中对应头部网络输出的第二物体距离图中同一位置的距离值比对差异;comparing the distance values at the same position in the first object distance map output by the head network in the teacher detection model with the distance values at the same position in the second object distance map output by the corresponding head network in the student detection model;
    将所述距离值之间差异大于阈值的位置作为所述差异位置。A position where the difference between the distance values is greater than a threshold is taken as the difference position.
  4. 根据权利要求1所述的训练方法,其中,所述根据所述第一局部特征和所述第二局部特征之间的差异,对所述学生检测模型进行训练,包括:The training method according to claim 1, wherein said training the student detection model according to the difference between the first local feature and the second local feature comprises:
    根据所述第一局部特征和所述第二局部特征之间的差异,确定损失函数的第一损失项;determining a first loss term of a loss function based on the difference between the first local feature and the second local feature;
    根据所述第一特征图和所述第二特征图之间的差异,确定所述损失函数的第二损 失项;determining a second loss term of the loss function based on a difference between the first feature map and the second feature map;
    根据所述损失函数的各损失项,对所述学生检测模型进行训练。The student detection model is trained according to each loss item of the loss function.
  5. 根据权利要求4所述的训练方法,其中,所述损失函数还包括第三损失项;所述方法还包括:The training method according to claim 4, wherein the loss function also includes a third loss term; the method also includes:
    获取所述训练样本的标注;obtaining the label of the training sample;
    根据所述训练样本标注的物体位置与所述学生检测模型预测的物体位置之间的差异,和/或根据所述训练样本标注的物体尺寸与所述学生检测模型预测的物体尺寸之间的差异,确定所述第三损失项。The difference between the object position marked according to the training sample and the object position predicted by the student detection model, and/or the difference between the object size marked according to the training sample and the object size predicted by the student detection model , to determine the third loss term.
  6. 根据权利要求4所述的训练方法,其中,所述教师检测模型和所述学生检测模型分别包括对应的多个特征提取层;所述根据所述第一特征图和所述第二特征图之间的差异,确定所述损失函数的第二损失项,包括:The training method according to claim 4, wherein the teacher detection model and the student detection model respectively include a plurality of corresponding feature extraction layers; The difference between, determine the second loss term of the loss function, including:
    将所述教师检测模型各特征提取层输出的第一特征图,分别与所述学生检测模型中对应的特征提取层输出的第二特征图确定特征差异;The first feature map output by each feature extraction layer of the teacher detection model is respectively determined with the second feature map output by the corresponding feature extraction layer in the student detection model to determine the feature difference;
    根据确定出的特征差异,确定所述损失函数的第二损失项。A second loss item of the loss function is determined according to the determined feature difference.
  7. 一种图像检测方法,包括:An image detection method, comprising:
    获取单目图像;Get a monocular image;
    采用经过训练的学生检测模型对所述单目图像进行图像检测,以得到所述单目图像中物体的物体信息;其中,所述学生检测模型,是采用如权利要求1-6中任一项所述的训练方法训练得到。Using a trained student detection model to perform image detection on the monocular image to obtain object information of objects in the monocular image; wherein, the student detection model is to use any one of claims 1-6 obtained by training with the training method.
  8. 一种物体检测模型的训练装置,包括:A training device for an object detection model, comprising:
    第一获取模块,用于获取经过训练的教师检测模型,以及待训练的学生检测模型;The first obtaining module is used to obtain the trained teacher detection model and the student detection model to be trained;
    第一处理模块,用于将训练图像输入所述教师检测模型,得到所述教师检测模型对所述训练图像提取的第一特征图,以及根据所述第一特征图预测的第一物体距离图;A first processing module, configured to input a training image into the teacher detection model, obtain a first feature map extracted from the training image by the teacher detection model, and a first object distance map predicted according to the first feature map ;
    第二处理模块,用于将所述训练图像输入所述学生检测模型,得到所述学生检测模型对所述训练图像提取的第二特征图,以及根据所述第二特征图预测的第二物体距离图;A second processing module, configured to input the training image into the student detection model, obtain a second feature map extracted by the student detection model from the training image, and a second object predicted according to the second feature map distance map;
    第一确定模块,用于根据所述第二物体距离图与所述第一物体距离图中的差异位置,在所述第一特征图中确定所述差异位置对应的第一局部特征,以及在所述第二特征图中确定所述差异位置对应的第二局部特征;A first determining module, configured to determine a first local feature corresponding to the difference position in the first feature map according to the difference position between the second object distance map and the first object distance map, and determining a second local feature corresponding to the difference position in the second feature map;
    训练模块,用于根据所述第一局部特征和所述第二局部特征之间的差异,对所述 学生检测模型进行训练。A training module is used to train the student detection model according to the difference between the first local feature and the second local feature.
  9. 根据权利要求8所述的装置,其中,所述第一确定模块,用于:The device according to claim 8, wherein the first determining module is configured to:
    确定所述教师检测模型中头部网络输出的所述第一物体距离图,与所述学生检测模型中对应头部网络输出的所述第二物体距离图之间存在差异的差异位置;determining the difference position between the first object distance map output by the head network in the teacher detection model and the second object distance map output by the corresponding head network in the student detection model;
    在所述第一特征图中,将从所述差异位置提取的特征作为所述第一局部特征;In the first feature map, using the features extracted from the difference positions as the first local features;
    在所述第二特征图中,将从所述差异位置提取的特征作为所述第二局部特征。In the second feature map, the features extracted from the difference positions are used as the second local features.
  10. 根据权利要求9所述的装置,其中,所述第一确定模块,还用于:The device according to claim 9, wherein the first determining module is further configured to:
    将所述教师检测模型中头部网络输出的所述第一物体距离图与所述学生检测模型中对应头部网络输出的第二物体距离图中同一位置的距离值比对差异;comparing the distance values at the same position in the first object distance map output by the head network in the teacher detection model with the distance values at the same position in the second object distance map output by the corresponding head network in the student detection model;
    将所述距离值之间差异大于阈值的位置作为所述差异位置。A position where the difference between the distance values is greater than a threshold is taken as the difference position.
  11. 根据权利要求8所述的装置,其中,所述训练模块,用于:The device according to claim 8, wherein the training module is configured to:
    根据所述第一局部特征和所述第二局部特征之间的差异,确定损失函数的第一损失项;determining a first loss term of a loss function based on the difference between the first local feature and the second local feature;
    根据所述第一特征图和所述第二特征图之间的差异,确定所述损失函数的第二损失项;determining a second loss term of the loss function based on a difference between the first feature map and the second feature map;
    根据所述损失函数的各损失项,对所述学生检测模型进行训练。The student detection model is trained according to each loss item of the loss function.
  12. 根据权利要求11所述的装置,其中,所述损失函数还包括第三损失项;所述装置还包括:The apparatus according to claim 11, wherein the loss function further comprises a third loss term; the apparatus further comprises:
    第二获取模块,用于获取所述训练样本的标注;The second obtaining module is used to obtain the label of the training sample;
    第二确定模块,用于根据所述训练样本标注的物体位置与所述学生检测模型预测的物体位置之间的差异,和/或根据所述训练样本标注的物体尺寸与所述学生检测模型预测的物体尺寸之间的差异,确定所述第三损失项。The second determination module is configured to use the difference between the position of the object marked by the training sample and the position of the object predicted by the student detection model, and/or the size of the object marked by the training sample to be predicted by the student detection model The difference between the object sizes determines the third loss term.
  13. 根据权利要求11所述的装置,其中,所述教师检测模型和所述学生检测模型分别包括对应的多个特征提取层;所述训练模块,还用于:The device according to claim 11, wherein the teacher detection model and the student detection model respectively include a plurality of corresponding feature extraction layers; the training module is also used for:
    将所述教师检测模型各特征提取层输出的第一特征图,分别与所述学生检测模型中对应的特征提取层输出的第二特征图确定特征差异;The first feature map output by each feature extraction layer of the teacher detection model is respectively determined with the second feature map output by the corresponding feature extraction layer in the student detection model to determine the feature difference;
    根据确定出的特征差异,确定所述损失函数的第二损失项。A second loss item of the loss function is determined according to the determined feature difference.
  14. 一种图像检测装置,包括:An image detection device, comprising:
    获取模块,用于获取单目图像;Acquisition module, used to obtain monocular image;
    检测模块,用于采用经过训练的学生检测模型对所述单目图像进行图像检测,以得到所述单目图像中物体的物体信息;其中,所述学生检测模型,是采用如权利要求8-13中任一项所述的训练装置训练得到。The detection module is used to use the trained student detection model to perform image detection on the monocular image, so as to obtain the object information of the object in the monocular image; wherein, the student detection model is adopted as claimed in claim 8- 13 obtained by training with the training device described in any one.
  15. 一种电子设备,包括:An electronic device comprising:
    至少一个处理器;以及at least one processor; and
    与所述至少一个处理器通信连接的存储器;其中,a memory communicatively coupled to the at least one processor; wherein,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-6中任一项所述的方法,或者,执行权利要求7所述的方法。The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can perform any one of claims 1-6. The method, or, carry out the method described in claim 7.
  16. 一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于使所述计算机执行根据权利要求1-6中任一项所述的方法,或者,执行权利要求7所述的方法。A non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to make the computer execute the method according to any one of claims 1-6, or execute the method described in claim 7 described method.
  17. 一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时实现根据权利要求1-6中任一项所述的方法,或者,执行权利要求7所述的方法。A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-6, or performs the method according to claim 7.
PCT/CN2022/088005 2021-06-10 2022-04-20 Training method and apparatus for object detection model, and image detection method and apparatus WO2022257614A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2023515610A JP2023539934A (en) 2021-06-10 2022-04-20 Object detection model training method, image detection method and device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110649762.4A CN113378712B (en) 2021-06-10 2021-06-10 Training method of object detection model, image detection method and device thereof
CN202110649762.4 2021-06-10

Publications (1)

Publication Number Publication Date
WO2022257614A1 true WO2022257614A1 (en) 2022-12-15

Family

ID=77573820

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/088005 WO2022257614A1 (en) 2021-06-10 2022-04-20 Training method and apparatus for object detection model, and image detection method and apparatus

Country Status (3)

Country Link
JP (1) JP2023539934A (en)
CN (1) CN113378712B (en)
WO (1) WO2022257614A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113378712B (en) * 2021-06-10 2023-07-04 北京百度网讯科技有限公司 Training method of object detection model, image detection method and device thereof
CN113806387A (en) * 2021-09-17 2021-12-17 北京百度网讯科技有限公司 Model training method, high-precision map change detection method and device and electronic equipment
CN113920307A (en) * 2021-09-29 2022-01-11 北京百度网讯科技有限公司 Model training method, device, equipment, storage medium and image detection method
CN115797736B (en) * 2023-01-19 2023-05-09 北京百度网讯科技有限公司 Training method, device, equipment and medium for target detection model and target detection method, device, equipment and medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180268292A1 (en) * 2017-03-17 2018-09-20 Nec Laboratories America, Inc. Learning efficient object detection models with knowledge distillation
CN111639744A (en) * 2020-04-15 2020-09-08 北京迈格威科技有限公司 Student model training method and device and electronic equipment
CN111709409A (en) * 2020-08-20 2020-09-25 腾讯科技(深圳)有限公司 Face living body detection method, device, equipment and medium
CN112257815A (en) * 2020-12-03 2021-01-22 北京沃东天骏信息技术有限公司 Model generation method, target detection method, device, electronic device, and medium
CN112508169A (en) * 2020-11-13 2021-03-16 华为技术有限公司 Knowledge distillation method and system
CN112561059A (en) * 2020-12-15 2021-03-26 北京百度网讯科技有限公司 Method and apparatus for model distillation
CN113378712A (en) * 2021-06-10 2021-09-10 北京百度网讯科技有限公司 Training method of object detection model, image detection method and device thereof

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3367295B2 (en) * 1995-08-14 2003-01-14 ケイディーディーアイ株式会社 Multi-valued neural network learning method
CN111127432B (en) * 2019-12-24 2021-01-12 推想医疗科技股份有限公司 Medical image detection method, device, equipment and storage medium
KR102191351B1 (en) * 2020-04-28 2020-12-15 아주대학교산학협력단 Method for semantic segmentation based on knowledge distillation
CN111967597A (en) * 2020-08-18 2020-11-20 上海商汤临港智能科技有限公司 Neural network training and image classification method, device, storage medium and equipment
CN112801298B (en) * 2021-01-20 2023-09-01 北京百度网讯科技有限公司 Abnormal sample detection method, device, equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180268292A1 (en) * 2017-03-17 2018-09-20 Nec Laboratories America, Inc. Learning efficient object detection models with knowledge distillation
CN111639744A (en) * 2020-04-15 2020-09-08 北京迈格威科技有限公司 Student model training method and device and electronic equipment
CN111709409A (en) * 2020-08-20 2020-09-25 腾讯科技(深圳)有限公司 Face living body detection method, device, equipment and medium
CN112508169A (en) * 2020-11-13 2021-03-16 华为技术有限公司 Knowledge distillation method and system
CN112257815A (en) * 2020-12-03 2021-01-22 北京沃东天骏信息技术有限公司 Model generation method, target detection method, device, electronic device, and medium
CN112561059A (en) * 2020-12-15 2021-03-26 北京百度网讯科技有限公司 Method and apparatus for model distillation
CN113378712A (en) * 2021-06-10 2021-09-10 北京百度网讯科技有限公司 Training method of object detection model, image detection method and device thereof

Also Published As

Publication number Publication date
CN113378712B (en) 2023-07-04
JP2023539934A (en) 2023-09-20
CN113378712A (en) 2021-09-10

Similar Documents

Publication Publication Date Title
US20220147822A1 (en) Training method and apparatus for target detection model, device and storage medium
EP4040401A1 (en) Image processing method and apparatus, device and storage medium
WO2022257614A1 (en) Training method and apparatus for object detection model, and image detection method and apparatus
WO2022227769A1 (en) Training method and apparatus for lane line detection model, electronic device and storage medium
CN113033622B (en) Training method, device, equipment and storage medium for cross-modal retrieval model
US11810319B2 (en) Image detection method, device, storage medium and computer program product
US11861919B2 (en) Text recognition method and device, and electronic device
WO2022257487A1 (en) Method and apparatus for training depth estimation model, and electronic device and storage medium
WO2022227768A1 (en) Dynamic gesture recognition method and apparatus, and device and storage medium
CN113780098B (en) Character recognition method, character recognition device, electronic equipment and storage medium
CN113204615B (en) Entity extraction method, device, equipment and storage medium
US20230306081A1 (en) Method for training a point cloud processing model, method for performing instance segmentation on point cloud, and electronic device
CN113205041B (en) Structured information extraction method, device, equipment and storage medium
WO2022227759A1 (en) Image category recognition method and apparatus and electronic device
CN113378770A (en) Gesture recognition method, device, equipment, storage medium and program product
WO2023273344A1 (en) Vehicle line crossing recognition method and apparatus, electronic device, and storage medium
CN115457329B (en) Training method of image classification model, image classification method and device
JP2022185143A (en) Text detection method, and text recognition method and device
CN114972910B (en) Training method and device for image-text recognition model, electronic equipment and storage medium
CN112580666A (en) Image feature extraction method, training method, device, electronic equipment and medium
JP2022185144A (en) Object detection method and training method and device of object detection model
US20230245429A1 (en) Method and apparatus for training lane line detection model, electronic device and storage medium
WO2023232031A1 (en) Neural network model training method and apparatus, electronic device and medium
CN114842541A (en) Model training and face recognition method, device, equipment and storage medium
CN113378774A (en) Gesture recognition method, device, equipment, storage medium and program product

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22819217

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2023515610

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22819217

Country of ref document: EP

Kind code of ref document: A1