TWI798098B

TWI798098B - Method for detecting three-dimensional target object, electronic device and storage medium

Info

Publication number: TWI798098B
Application number: TW111120338A
Authority: TW
Inventors: 李潔; 盧志德; 郭錦斌
Original assignee: 鴻海精密工業股份有限公司
Priority date: 2022-05-31
Filing date: 2022-05-31
Publication date: 2023-04-01
Also published as: TW202349348A

Abstract

The present application provides a method for detecting a three-dimensional (3D) target object, an electronic device and a storage medium. The method includes: acquiring a detection image and a depth image corresponding to the detection image; inputting the detection image into a trained target detection model, and determining a category of an object in the detection image and a two-dimensional (2D) bounding box of the object; determining an object model and a 3D bounding box from a 3D object model library according to the object type; calculating point cloud data and a distance from a depth camera to the object model according to the depth image and the 2D bounding box; determining a rotation angle of the object model according to the object model and point cloud data; determining a position of the object model in the 3D space according to the distance from the depth camera to the object model, the rotation angle and the 3D bounding box,. The present application can quickly determine a position of an object in 3D space from an image.

Description

Three-dimensional object detection method, electronic device and computer-readable storage medium

本申請涉及計算機視覺和深度學習技術、尤其涉及一種三維目標檢測方法、電子設備及計算機可讀存儲媒體。 The present application relates to computer vision and deep learning technology, in particular to a three-dimensional object detection method, electronic equipment and computer-readable storage media.

在自動駕駛領域中，自動駕駛系統會利用不同種類的傳感器檢測車輛前方或附近的物體，做出對應的決策。因此自動駕駛系統需要快速精準的檢測出物體的類別及位置，才能確保行車安全。目前多數三維目標檢測算法為檢測出物體的類別，需要大量的標註樣本，對物體的旋轉角度標註困難且需要使用回歸運算，在預測上需要花費很長的時間。此外，目前多數三維目標檢測算法還需要準確的檢測車輛與前方物體的距離，目前多數作法是利用光達或者雷達取得深度信息，但目前使用光達或者雷達的成本高昂、視場範圍比較小。 In the field of automatic driving, the automatic driving system will use different types of sensors to detect objects in front of or near the vehicle and make corresponding decisions. Therefore, the automatic driving system needs to quickly and accurately detect the type and location of objects to ensure driving safety. At present, most 3D object detection algorithms require a large number of labeled samples to detect the category of objects. It is difficult to label the rotation angle of objects and requires the use of regression operations, which takes a long time to predict. In addition, most of the current 3D target detection algorithms also need to accurately detect the distance between the vehicle and the object in front. At present, most methods use lidar or radar to obtain depth information, but the cost of using lidar or radar is high and the field of view is relatively small.

鑒於以上內容，有必要提供一種三維目標檢測方法、電子設備及計算機可讀存儲媒體，能夠解決旋轉角度標註困難問題及檢測成本過高的問題。 In view of the above, it is necessary to provide a three-dimensional object detection method, an electronic device and a computer-readable storage medium, which can solve the problems of difficult labeling of rotation angles and high detection costs.

本申請實施例提供一種三維目標檢測方法，所述三維目標檢測方法包括：獲取檢測圖像及所述檢測圖像對應的深度圖像，其中，所述深度圖像藉由深度相機獲取；將所述檢測圖像輸入至訓練完成的目標檢測模型，利用所述目標檢測模型確定所述檢測圖像中物體的物體類別及物體的二維邊線框；根據所述物體類別，從三維物體模型庫中確定與所述物體對應的物件模型及與所述物件模型對應的三維邊線框；根據所述深度圖像及所述二維邊線框，計算所述二維邊線框所框選的物體的點雲數據及所述深度相機到所述物件模型的距離；根據所述物件模型與所述點雲數據，確定所述物件模型的旋轉角度；根據所述深度相機到所述物件模型的距離、所述旋轉角度及所述三維邊線框，確定所述物件模型在三維空間中的位置。 An embodiment of the present application provides a three-dimensional target detection method, the three-dimensional target detection method includes: acquiring a detection image and a depth image corresponding to the detection image, wherein the depth image is acquired by a depth camera; The detection image is input to the trained target detection model, and the target detection model is used to determine the object category of the object in the detection image and the two-dimensional boundary frame of the object; According to the object category, determine the object model corresponding to the object and the three-dimensional boundary frame corresponding to the object model from the three-dimensional object model library; according to the depth image and the two-dimensional boundary frame, calculate the The point cloud data of the object framed by the two-dimensional border frame and the distance from the depth camera to the object model; according to the object model and the point cloud data, determine the rotation angle of the object model; according to the The distance from the depth camera to the object model, the rotation angle, and the three-dimensional boundary frame determine the position of the object model in three-dimensional space.

在一種可選的實施方式中，所述根據所述深度圖像及所述二維邊線框，計算所述二維邊線框所框選的物體的點雲數據及所述深度相機到所述物件模型的距離包括：根據所述深度圖像獲取所述二維邊線框所框選的物體的深度值及座標，根據所述深度值確定所述深度相機到所述物體的物件模型的距離；根據所述座標和所述深度相機的內外參矩陣變換公式得到所述點雲數據。 In an optional implementation manner, according to the depth image and the two-dimensional border frame, the point cloud data of the object framed by the two-dimensional border frame and the depth camera to the object are calculated The distance of the model includes: obtaining the depth value and coordinates of the object framed by the two-dimensional border frame according to the depth image, and determining the distance from the depth camera to the object model of the object according to the depth value; The coordinates and the internal and external parameter matrix transformation formula of the depth camera are used to obtain the point cloud data.

在一種可選的實施方式中，所述根據所述物件模型與所述點雲數據，確定所述物件模型的旋轉角度包括：根據所述點雲數據，得到所述物體輪廓的第一點雲數據；將所述物件模型轉化為第二點雲數據；將所述第一點雲數據與所述第二點雲數據進行點雲匹配，得到所述物件模型的旋轉角度。 In an optional implementation manner, the determining the rotation angle of the object model according to the object model and the point cloud data includes: obtaining a first point cloud of the object outline according to the point cloud data data; converting the object model into second point cloud data; performing point cloud matching on the first point cloud data and the second point cloud data to obtain the rotation angle of the object model.

在一種可選的實施方式中，所述方法還包括：獲取訓練圖像；基於You Only Look Once(YOLO)網路構建目標檢測模型；將所述訓練圖像輸入所述目標檢測模型進行訓練，藉由所述目標檢測模型進行卷積和均值池化交替處理後得到所述訓練圖像的特徵值數據；利用所述目標檢測模型的全連接層處理所述特徵值數據，得到所述訓練圖像中物體的二維邊線框和物體類別，藉由調整所述目標檢測模型的參數，以最小化損失函數，得到所述訓練完成的目標檢測模型。 In an optional embodiment, the method further includes: obtaining training images; constructing a target detection model based on You Only Look Once (YOLO) network; inputting the training images into the target detection model for training, The eigenvalue data of the training image is obtained after alternate processing of convolution and mean pooling by the target detection model; the eigenvalue data is processed by the fully connected layer of the target detection model to obtain the training image The two-dimensional boundary frame and object category of the object in the image are obtained by adjusting the parameters of the target detection model to minimize the loss function to obtain the trained target detection model.

在一種可選的實施方式中，所述利用所述目標檢測模型的全連接層處理所述特徵值數據，得到所述訓練圖像中物體的二維邊線框和物體類別包括：利用所述目標檢測模型的全連接層處理所述特徵值數據，得到所述訓練圖像中的物體的多個候選二維邊線框，將所述多個候選二維邊線框進行非極大值抑制運算，得到所述訓練圖像中的物體的二維邊線框和物體類別。 In an optional implementation manner, the feature value data is processed by using the fully connected layer of the target detection model to obtain the two-dimensional bounding box and object category package of the object in the training image Including: using the fully connected layer of the target detection model to process the feature value data to obtain a plurality of candidate two-dimensional border frames of the object in the training image, and demaximizing the plurality of candidate two-dimensional border frames value suppression operation to obtain the two-dimensional boundary frame and object category of the object in the training image.

在一種可選的實施方式中，所述方法還包括：建立所述三維物體模型庫，其中，所述三維物體模型庫包括與不同物體類別對應的多個物件模型及與每個物件模型對應的三維邊線框，所述三維邊線框包括每個物體類別對應的長、寬、高。 In an optional implementation manner, the method further includes: establishing the three-dimensional object model library, wherein the three-dimensional object model library includes a plurality of object models corresponding to different object categories and a corresponding object model A three-dimensional border frame, where the three-dimensional border frame includes the length, width, and height corresponding to each object category.

在一種可選的實施方式中，所述根據所述深度相機到所述物件模型的距離、所述旋轉角度及所述三維邊線框，確定所述物件模型在三維空間中的位置包括：根據所述旋轉角度確定所述物件模型在所述三維空間中的方向；根據所述物件模型在所述三維空間中的方向、所述深度相機到所述物件模型的距離及所述物件模型的三維邊線框，確定所述物件模型在三維空間中的位置。 In an optional implementation manner, the determining the position of the object model in the three-dimensional space according to the distance from the depth camera to the object model, the rotation angle, and the three-dimensional border box includes: according to the The rotation angle determines the direction of the object model in the three-dimensional space; according to the direction of the object model in the three-dimensional space, the distance from the depth camera to the object model, and the three-dimensional edge of the object model frame to determine the position of the object model in the three-dimensional space.

在一種可選的實施方式中，所述方法還包括：將所述物件模型在三維空間中的位置作為所述物體在三維空間中的位置，輸出所述物體類別及所述物體在三維空間中的位置。 In an optional implementation manner, the method further includes: using the position of the object model in the three-dimensional space as the position of the object in the three-dimensional space, and outputting the object category and the object's position in the three-dimensional space s position.

本申請實施例還提供一種電子設備，所述電子設備包括處理器和記憶體，所述處理器用於執行記憶體中存儲的計算機程式以實現所述的三維目標檢測方法。 The embodiment of the present application also provides an electronic device, the electronic device includes a processor and a memory, and the processor is configured to execute a computer program stored in the memory to implement the three-dimensional object detection method.

本申請實施例還提供一種計算機可讀存儲媒體，所述計算機可讀存儲媒體存儲有至少一個指令，所述至少一個指令被處理器執行時實現所述的三維目標檢測方法。 The embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores at least one instruction, and when the at least one instruction is executed by a processor, the above three-dimensional object detection method is realized.

藉由利用本申請各實施例提供的技術方案，不需進行複雜的運算且無需對物體的旋轉角度進行標註，減少了人力成本且能快速地得到物體的三維位置。 By using the technical solutions provided by the embodiments of the present application, there is no need to perform complex calculations and mark the rotation angle of the object, which reduces labor costs and can quickly obtain the three-dimensional position of the object.

4:電子設備 4: Electronic equipment

401:記憶體 401: memory

402:處理器 402: Processor

403:計算機程式 403:Computer program

404:通訊匯流排 404: communication bus

101-106:步驟 101-106: Steps

21-24:步驟 21-24: Steps

圖1為本申請實施例提供的一種三維目標檢測方法的流程圖。 FIG. 1 is a flowchart of a three-dimensional object detection method provided by an embodiment of the present application.

圖2為本申請實施例提供的非極大值抑制方法流程圖。 FIG. 2 is a flow chart of the non-maximum value suppression method provided by the embodiment of the present application.

圖3為本申請實施例提供的確定三維邊線框示意圖。 FIG. 3 is a schematic diagram of determining a three-dimensional boundary frame provided by an embodiment of the present application.

圖4為本申請實施例提供的一種電子設備的結構示意圖。 FIG. 4 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

為了能夠更清楚地理解本申請的上述目的、特徵和優點，下面結合附圖和具體實施例對本申請進行詳細描述。需要說明的是，此處所描述的具體實施例僅用以解釋本申請，並不用於限定本申請。 In order to more clearly understand the above objects, features and advantages of the present application, the present application will be described in detail below in conjunction with the accompanying drawings and specific embodiments. It should be noted that the specific embodiments described here are only used to explain the present application, and are not used to limit the present application.

在下面的描述中闡述了很多具體細節以便於充分理解本申請，所描述的實施例僅僅是本申請一部分實施例，而不是全部的實施例。基於本申請中的實施例，本領域普通技術人員在沒有做出創造性勞動前提下所獲得的所有其他實施例，都屬本申請保護的範圍。 A lot of specific details are set forth in the following description to facilitate a full understanding of the application, and the described embodiments are only a part of the embodiments of the application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the scope of protection of this application.

以下，術語“第一”、“第二”僅用於描述目的，而不能理解為指示或暗示相對重要性或者隱含指明所指示的技術特徵的數量。由此，限定有“第一”、“第二”的特徵可以明示或者隱含地包括一個或者更多個該特徵。在本申請的一些實施例的描述中，“示例性的”或者“例如”等詞用於表示作例子、例證或說明。本申請的一些實施例中被描述為“示例性的”或者“例如”的任何實施例或設計方案不應被解釋為比其它實施例或設計方案更優選或更具優勢。確切而言，使用“示例性的”或者“例如”等詞旨在以具體方式呈現相關概念。 Hereinafter, the terms "first" and "second" are used for descriptive purposes only, and cannot be understood as indicating or implying relative importance or implicitly specifying the quantity of indicated technical features. Thus, a feature defined as "first" and "second" may explicitly or implicitly include one or more of these features. In the description of some embodiments of the present application, words such as "exemplary" or "for example" are used as examples, illustrations or descriptions. Any embodiment or design described as "exemplary" or "for example" in some embodiments of the present application should not be construed as being preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete manner.

除非另有定義，本文所使用的所有的技術和科學術語與屬本申請的技術領域的技術人員通常理解的含義相同。本文中在本申請的說明書中所使用的術語只是為了描述具體的實施例的目的，不是旨在於限制本申請。 Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field of the application. The terms used herein in the specification of the application are only for the purpose of describing specific embodiments, and are not intended to limit the application.

參閱圖1所示，圖1為本申請實施例提供的一種三維目標檢測方法的流程圖。所述方法應用於電子設備(例如，圖4所示的電子設備4)中，所述電子設備可以是任何一種可與用戶進行人機交互的電子產品，例如，個人計算機、平板電腦、智能手機、個人數位助理(Personal Digital Assistant，PDA)、遊戲機、交互式網路電視(Internet Protocol Television，IPTV)、智能穿戴式裝置等。 Referring to FIG. 1 , FIG. 1 is a flowchart of a three-dimensional object detection method provided by an embodiment of the present application. The method is applied to an electronic device (for example, the electronic device 4 shown in FIG. 4 ), and the electronic device may be any electronic product capable of human-computer interaction with the user, such as a personal computer, a tablet computer, a smart phone , Personal Digital Assistant (PDA), game consoles, Internet Protocol Television (IPTV), smart wearable devices, etc.

所述電子設備是一種能夠按照事先設定或存儲的指令，自動進行數值計算和/或信息處理的設備，其硬體包括，但不限於：微處理器、專用集成電路(Application Specific Integrated Circuit，ASIC)、可編程門陣列(Field-Programmable Gate Array，FPGA)、數位訊號處理器(Digital Signal Processor，DSP)、嵌入式設備等。 The electronic device is a device that can automatically perform numerical calculations and/or information processing according to preset or stored instructions, and its hardware includes, but is not limited to: microprocessors, Application Specific Integrated Circuits (ASICs) ), Programmable Gate Array (Field-Programmable Gate Array, FPGA), Digital Signal Processor (Digital Signal Processor, DSP), embedded devices, etc.

所述電子設備所處的網路包括，但不限於：網際網路、廣域網、城域網、區域網、虛擬專用網路(Virtual Private Network，VPN)等。 The network where the electronic device is located includes, but is not limited to: the Internet, a wide area network, a metropolitan area network, a local area network, a virtual private network (Virtual Private Network, VPN) and the like.

所述方法具體包括如下。 The method specifically includes the following.

101，獲取檢測圖像及與所述檢測圖像對應的深度圖像。 101. Acquire a detection image and a depth image corresponding to the detection image.

在本申請的至少一個實施例中，利用安裝在車輛內部或外部的攝像頭進行拍攝，將拍攝到的車輛前方的圖像作為檢測圖像。 In at least one embodiment of the present application, a camera installed inside or outside the vehicle is used for shooting, and the captured image in front of the vehicle is used as the detection image.

在本申請的至少一個實施例中，獲取與所述檢測圖像對應的深度圖像包括：使用深度相機獲取深度圖像，利用安裝在車輛上的深度相機拍攝車輛前方的圖像作為深度圖像。需要說明的是，當利用安裝在車輛內部或者外部的攝像頭拍攝車輛前方的圖像作為檢測圖像時，所述深度相機同時拍攝車輛前方圖像作為深度圖像，所述深度圖像與所述檢測圖像對應。例如，利用不同類型的攝像頭針對車輛前方的同一對象分別進行拍攝以得到檢測圖像與深度圖像。 In at least one embodiment of the present application, obtaining the depth image corresponding to the detection image includes: using a depth camera to obtain a depth image, and using the depth camera installed on the vehicle to take an image in front of the vehicle as the depth image . It should be noted that when a camera installed inside or outside the vehicle is used to capture an image in front of the vehicle as a detection image, the depth camera simultaneously captures the image in front of the vehicle. A square image is used as a depth image, and the depth image corresponds to the detection image. For example, different types of cameras are used to take pictures of the same object in front of the vehicle to obtain detection images and depth images.

本申請實施例中，所述三維目標檢測方法的應用場景包括，車輛自動駕駛領域。在車輛行駛的過程中，應用所述三維目標檢測方法，實現對車輛前方的物體的三維檢測。 In the embodiment of the present application, the application scenarios of the three-dimensional object detection method include the field of vehicle automatic driving. During the running of the vehicle, the three-dimensional object detection method is applied to realize three-dimensional detection of objects in front of the vehicle.

102，將所述檢測圖像輸入至訓練完成的目標檢測模型，利用所述目標檢測模型確定所述檢測圖像中的物體的物體類別及所述物體的二維邊線框。 102. Input the detection image into the trained target detection model, and use the target detection model to determine the object category of the object in the detection image and the two-dimensional boundary frame of the object.

在本申請的至少一個實施例中，所述訓練完成的目標檢測模型包括：基於You Only Look Once(YOLO)網路構建目標檢測模型，所述YOLO網路包括YOLOv3網路或者YOLOv5網路。 In at least one embodiment of the present application, the trained target detection model includes: constructing a target detection model based on a You Only Look Once (YOLO) network, and the YOLO network includes a YOLOv3 network or a YOLOv5 network.

在本申請的至少一個實施例中，訓練目標檢測模型得到所述訓練完成的目標檢測模型的方法包括：獲取訓練圖像；將所述訓練圖像輸入所述目標檢測模型進行訓練，藉由所述目標檢測模型進行卷積和均值池化交替處理後得到所述訓練圖像的特徵值數據；利用所述目標檢測模型的全連接層處理所述特徵值數據，得到所述訓練圖像中物體的二維邊線框和物體類別，藉由調整所述目標檢測模型的參數，以最小化損失函數，得到所述訓練完成的目標檢測模型。在本實施例中，所述目標檢測模型的參數包括，但不限於所述目標檢測模型的學習率、迭代次數。在本實施例中，所述目標檢測模型的損失函數包括均方差損失函數。 In at least one embodiment of the present application, the method for training a target detection model to obtain the trained target detection model includes: acquiring a training image; inputting the training image into the target detection model for training, by means of the The target detection model is alternately processed by convolution and mean value pooling to obtain the eigenvalue data of the training image; using the fully connected layer of the target detection model to process the eigenvalue data to obtain the object in the training image The two-dimensional bounding box and object category, by adjusting the parameters of the target detection model to minimize the loss function, to obtain the trained target detection model. In this embodiment, the parameters of the target detection model include, but are not limited to, the learning rate and the number of iterations of the target detection model. In this embodiment, the loss function of the target detection model includes a mean square error loss function.

在本申請的至少一個實施例中，所述獲取訓練圖像還包括：對所述訓練圖像進行數據增強操作，以獲取更多不相同的訓練樣本，所述數據增強操作包括，但不限於翻轉圖像、旋轉圖像、縮放圖像、裁剪圖像。藉由所述數據增強操作可以有效擴充樣本數據，使用更多不同場景下的訓練圖像(例如，車輛前方圖像)訓練並優化所述目標檢測模型，使所述目標檢測模型更具魯棒性。 In at least one embodiment of the present application, the acquiring training images further includes: performing data enhancement operations on the training images to acquire more different training samples, the data enhancement Operations include, but are not limited to, flipping images, rotating images, scaling images, and cropping images. Through the data enhancement operation, the sample data can be effectively expanded, and the target detection model can be trained and optimized using more training images (for example, images in front of the vehicle) in different scenarios, so that the target detection model can be more robust sex.

在本申請的至少一個實施例中，所述利用所述目標檢測模型的全連接層處理所述特徵值數據，得到所述訓練圖像中物體的二維邊線框和物體類別包括：利用所述目標檢測模型的全連接層處理所述特徵值數據，得到多個候選二維邊線框及多個所述候選二維邊線框的得分。在本實施例中，所述候選二維邊線框的得分包括，全連接層對所述候選二維邊線框內的物體類別進行類別預測後的得分，即所述物體類別包含在所述候選二維邊線框內的概率的得分。在本實施例中，將多個候選二維邊線框進行非極大值抑制運算(Non-Maximum Suppression，NMS)，得到物體的二維邊線框和物體類別。 In at least one embodiment of the present application, the process of using the fully connected layer of the target detection model to process the feature value data to obtain the two-dimensional bounding box and object category of the object in the training image includes: using the The fully connected layer of the target detection model processes the feature value data to obtain a plurality of candidate two-dimensional border boxes and scores of the plurality of candidate two-dimensional border boxes. In this embodiment, the score of the candidate two-dimensional bounding box includes the score of the fully connected layer after performing category prediction on the object category in the candidate two-dimensional bounding box, that is, the object category is included in the candidate two-dimensional bounding box. Scores for probabilities inside the bounding box. In this embodiment, non-maximum suppression operation (Non-Maximum Suppression, NMS) is performed on multiple candidate two-dimensional bounding boxes to obtain the two-dimensional bounding box and object category of the object.

在本實施例中，所述進行非極大值抑制運算(Non-Maximum Suppression，NMS)參考圖2所述流程圖，具體包括： In this embodiment, the non-maximum suppression operation (Non-Maximum Suppression, NMS) refers to the flow chart described in FIG. 2, specifically including:

21，按照候選二維邊線框的得分，對多個候選二維邊線框進行排序，選擇得分最高的候選二維邊線框。所述“候選二維邊線框”為訓練圖像中物體的候選二維邊線框。 21. According to the scores of the candidate 2D border frames, sort the candidate 2D border frames, and select the candidate 2D border frame with the highest score. The "candidate 2D bounding box" is a candidate 2D bounding box of the object in the training image.

22，遍歷其他候選二維邊線框，計算其他候選二維邊線框與選擇的候選二維邊線框之間的交並比(Intersection Over Union，IOU)，刪除大於預設閾值的交並比對應的候選二維邊線框。在本實施例中，所述交並比為選擇的候選二維邊線框(即得分最高的)與其他候選二維邊線框之間的重疊程度。 22. Traverse other candidate two-dimensional border frames, calculate the intersection over union (IOU) between other candidate two-dimensional border frames and the selected candidate two-dimensional border frame, and delete the intersection over union ratio corresponding to the preset threshold value Candidate 2D bounding boxes. In this embodiment, the intersection ratio is the degree of overlap between the selected candidate 2D border frame (ie, the one with the highest score) and other candidate 2D border frames.

23，判斷除了所述選擇的候選二維邊線框之外，是否還存在其他的候選二維邊線框。若還存在其他的候選二維邊線框，流程返回21。若不存在其他的候選二維邊線框，執行24，輸出所述選擇的候選二維邊線框作為所述訓練圖像中物體的二維邊線框。 23. Determine whether other candidate two-dimensional border boxes exist besides the selected candidate two-dimensional border boxes. If there are other candidate two-dimensional border boxes, the process returns to step 21 . if not exist For other candidate 2D bounding boxes, perform step 24, and output the selected candidate 2D bounding boxes as the 2D bounding boxes of the objects in the training image.

在本申請的至少一個實施例中，藉由上述方法，可以完成對目標檢測模型的訓練，並得到訓練完成的目標檢測模型。進一步地，將所述檢測圖像輸入至所述訓練完成的目標檢測模型，輸出所述檢測圖像中的物體類別及所述二維邊線框。 In at least one embodiment of the present application, by using the above method, the training of the target detection model can be completed, and the trained target detection model can be obtained. Further, the detection image is input into the trained target detection model, and the object category and the two-dimensional boundary frame in the detection image are output.

103，根據所述物體類別，從三維物體模型庫中確定與所述物體對應的物件模型及與所述物件模型對應的三維邊線框。 103. Determine an object model corresponding to the object and a three-dimensional border box corresponding to the object model from a three-dimensional object model library according to the object category.

在本申請的至少一個實施例中，三維目標檢測方法還包括：預先建立三維物體模型庫，其中，所述三維物體模型庫包括與不同物體類別對應的多個物件模型及與每個物件模型對應的三維邊線框，每個三維邊線框包括物體類別對應的長、寬、高。 In at least one embodiment of the present application, the three-dimensional object detection method further includes: pre-establishing a three-dimensional object model library, wherein the three-dimensional object model library includes multiple object models corresponding to different object categories and corresponding to each object model The three-dimensional boundary frame, each three-dimensional boundary frame includes the length, width and height corresponding to the object category.

在本實施例中，根據所述物體類別查找所述三維物體模型庫確定所述物件模型，並根據所述物件模型確定所述物件模型的三維邊線框。例如，如圖3所示，圖3為本申請實施例提供的確定三維邊線框示意圖。當物體類別為小車時，基於所述三維物體模型庫，查找小車的物件模型，根據小車的物件模型，查找小車的三維邊線框；當物體類別為小貨車時，基於所述三維物體模型庫，查找小貨車的物件模型，根據小貨車的物件模型，查找小貨車的三維邊線框；當物體類別為電動車時，基於所述三維物體模型庫，查找電動車的物件模型，根據電動車的物件模型，查找電動車的三維邊線框；當物體類別為大型巴士時，基於所述三維物體模型庫，查找大型巴士的物件模型，根據大型巴士的物件模型，查找大型巴士的三維邊線框。在本實施例中，所述物件模型包括，但不限於三維模型。 In this embodiment, the 3D object model library is searched according to the object category to determine the object model, and the 3D boundary frame of the object model is determined according to the object model. For example, as shown in FIG. 3 , FIG. 3 is a schematic diagram of determining a three-dimensional boundary frame provided by an embodiment of the present application. When the object category is a car, search for the object model of the car based on the three-dimensional object model library, and search for the three-dimensional border frame of the car according to the object model of the car; when the object category is a small truck, based on the three-dimensional The object model library searches for the object model of the small truck, and searches for the three-dimensional border frame of the small truck according to the object model of the small truck; when the object category is an electric vehicle, based on the three-dimensional object model library, searches for the object model of the electric vehicle, according to The object model of the electric vehicle is searched for the three-dimensional boundary frame of the electric vehicle; when the object type is a large bus, based on the three-dimensional object model library, the object model of the large bus is searched, and the three-dimensional edge of the large bus is searched according to the object model of the large bus frame. In this embodiment, the object model includes, but not limited to, a three-dimensional model.

104，根據所述深度圖像及所述二維邊線框，計算所述二維邊線框所框選的物體的點雲數據及所述深度相機到所述物件模型的距離。 104. According to the depth image and the two-dimensional border frame, calculate the point cloud data of the object framed by the two-dimensional border frame and the distance from the depth camera to the object model.

在本申請的至少一個實施例中，確定所述深度相機到所述物件模型的距離的方法包括：根據所述深度圖像獲取所述二維邊線框所框選的物體的深度值；根據所述深度值確定所述深度相機到所述物體的物件模型的距離。在本實施例中，所述深度值由深度相機獲得，在利用深度相機拍攝得到深度圖像時，深度相機顯示深度值，所述深度值為所述深度相機到所述物體的距離，在本實施例中，將所述深度相機到所述物體的距離作為所述深度相機到所述物體的物件模型的距離。 In at least one embodiment of the present application, the method for determining the distance from the depth camera to the object model includes: acquiring the depth value of the object framed by the two-dimensional border frame according to the depth image; The depth value determines the distance from the depth camera to the object model of the object. In this embodiment, the depth value is obtained by a depth camera. When a depth image is captured by a depth camera, the depth camera displays a depth value, and the depth value is the distance from the depth camera to the object. In an embodiment, the distance from the depth camera to the object is taken as the distance from the depth camera to the object model of the object.

在本申請的至少一個實施例中，獲取所述點雲數據的方法包括：根據所述深度圖像獲取所述二維邊線框所框選的物體的座標集；根據所述座標集和所述深度相機的內外參矩陣變換公式得到所述點雲數據。在本實施例中，所述二維邊線框所框選的物體的座標集為物體的像素座標集；所述點雲數據為所述二維邊線框所框選的物體的座標集中的座標對應的世界座標。所述點雲數據是用於表徵物體輪廓的數據。將所述二維邊線框所框選的物體的座標集中的座標轉化為對應的世界座標需經過內外參矩陣變換公式得到，所述內外參矩陣變換公式為：

In at least one embodiment of the present application, the method for obtaining the point cloud data includes: obtaining the coordinate set of the object framed by the two-dimensional border frame according to the depth image; according to the coordinate set and the The transformation formula of the internal and external parameter matrix of the depth camera is used to obtain the point cloud data. In this embodiment, the coordinate set of the object framed by the two-dimensional border frame is the pixel coordinate set of the object; the point cloud data corresponds to the coordinates in the coordinate set of the object framed by the two-dimensional border frame The world coordinates of . The point cloud data is data used to characterize the outline of an object. Converting the coordinates in the coordinate set of the object framed by the two-dimensional border frame into the corresponding world coordinates needs to be obtained through the transformation formula of the internal and external parameters matrix, and the transformation formula of the internal and external parameters matrix is:

其中(x，y，z)為世界座標，用於表示一個像素座標的點雲，f為焦距，D為深度值，(x ₁ ，y ₁ )為所述二維邊線框所框選的物體的座標集中任意一像素點的像素座標。利用上述公式將所述做座標集中的所有座標逐一轉化為世界座標，得到所述點雲數據。 Wherein (x, y, z) is the world coordinate, used to represent the point cloud of a pixel coordinate, f is the focal length, D is the depth value, (x ₁ , y ₁ ) is the object framed by the two-dimensional border The pixel coordinates of any pixel in the coordinate set of . Using the above formula to convert all the coordinates in the coordinate set into world coordinates one by one to obtain the point cloud data.

105，根據所述物體的物件模型與所述點雲數據，確定所述物體的物件模型的旋轉角度。 105. Determine a rotation angle of the object model of the object according to the object model of the object and the point cloud data.

根據所述點雲數據，得到所述物體輪廓的第一點雲數據；將所述物體的物件模型轉化為第二點雲數據；將所述第一點雲數據與所述第二點雲數據進行點雲匹配，得到所述物體的物件模型的旋轉角度。 Obtain the first point cloud data of the object outline according to the point cloud data; convert the object model of the object into the second point cloud data; combine the first point cloud data with the second point cloud data Perform point cloud matching to obtain the rotation angle of the object model of the object.

在本申請的至少一個實施例中，所述將所述物體的物件模型轉化為第二點雲數據包括，利用點雲庫(Point Cloud Library，PCL)中的多個函數讀取所述物體的物件模型並生成所述物體的物件模型的點雲數據作為第二點雲數據。 In at least one embodiment of the present application, the converting the object model of the object into the second point cloud data includes, using multiple functions in the point cloud library (Point Cloud Library, PCL) to read the data of the object. The object model and generate the point cloud data of the object model of the object as the second point cloud data.

在本申請的至少一個實施例中，將所述第一點雲數據與所述第二點雲數據進行點雲匹配，得到所述物體的物件模型的旋轉角度包括：將所述第一點雲數據中物體輪廓的點擬合成第一平面並計算所述第一平面的曲率，將所述第二點雲數據的點擬合成第二平面並計算所述第二平面的曲率；計算所述第一平面的曲率與所述第二平面的曲率之差得到曲率偏差值，根據所述曲率偏差值確定所述物體的物件模型的旋轉角度。 In at least one embodiment of the present application, performing point cloud matching on the first point cloud data and the second point cloud data to obtain the rotation angle of the object model of the object includes: combining the first point cloud The points of the object outline in the data are fitted into a first plane and the curvature of the first plane is calculated, and the points of the second point cloud data are fitted into a second plane and the curvature of the second plane is calculated; the second plane is calculated; The difference between the curvature of the first plane and the curvature of the second plane is used to obtain a curvature deviation value, and the rotation angle of the object model of the object is determined according to the curvature deviation value.

106，確定物體在三維空間中的位置。 106. Determine the position of the object in the three-dimensional space.

在本申請的至少一個實施例中，根據所述旋轉角度確定所述物件模型在所述三維空間中的方向，根據所述物件模型在所述三維空間中的方向、所述深度相機到所述物件模型的距離及所述物件模型的三維邊線框，確定所述物件模型在三維空間中的位置。具體地，將所述物件模型在三維空間中的位置作為所述物體在三維空間中的位置，輸出所述物體類別及所述物體在三維空間中的位置。例如將所述物體類別及所述物體在三維空間中的位置以三維邊線框的方式顯示於一顯示屏。 In at least one embodiment of the present application, the direction of the object model in the three-dimensional space is determined according to the rotation angle, and according to the direction of the object model in the three-dimensional space, the depth camera to the The distance of the object model and the 3D boundary frame of the object model determine the position of the object model in the 3D space. Specifically, the position of the object model in the three-dimensional space is used as the position of the object in the three-dimensional space, and the object category and the three-dimensional space of the object are output position in . For example, the object category and the position of the object in the three-dimensional space are displayed on a display screen in the form of a three-dimensional border frame.

以上所述，僅是本申請的具體實施方式，但本申請的保護範圍並不局限於此，對於本領域的普通技術人員來說，在不脫離本申請創造構思的前提下，還可以做出改進，但這些均屬本申請的保護範圍。 The above is only a specific embodiment of the application, but the scope of protection of the application is not limited thereto. For those of ordinary skill in the art, without departing from the inventive concept of the application, they can also make Improvements, but these all belong to the protection scope of the present application.

如圖4所示，圖4為本申請實施例提供的一種電子設備的結構示意圖。所述電子設備4包括記憶體401、至少一個處理器402、存儲在所述記憶體401中並可在所述至少一個處理器402上運行的計算機程式403及至少一條通訊匯流排404。 As shown in FIG. 4 , FIG. 4 is a schematic structural diagram of an electronic device provided in an embodiment of the present application. The electronic device 4 includes a memory 401 , at least one processor 402 , a computer program 403 stored in the memory 401 and executable on the at least one processor 402 , and at least one communication bus 404 .

本領域技術人員可以理解，圖4所示的示意圖僅僅是所述電子設備4的示例，並不構成對所述電子設備4的限定，可以包括比圖示更多或更少的部件，或者組合某些部件，或者不同的部件，例如所述電子設備4還可以包括輸入輸出設備、網路接入設備等。 Those skilled in the art can understand that the schematic diagram shown in FIG. 4 is only an example of the electronic device 4, and does not constitute a limitation to the electronic device 4, and may include more or less components than those shown in the illustration, or a combination Certain components, or different components, for example, the electronic device 4 may also include input and output devices, network access devices, and the like.

所述至少一個處理器402可以是中央處理單元(Central Processing Unit，CPU)，還可以是其他通用處理器、數位訊號處理器(Digital Signal Processor，DSP)、專用集成電路(Application Specific Integrated Circuit，ASIC)、現場可編程門陣列(Field-Programmable Gate Array，FPGA)或者其他可編程邏輯器件、分立門或者晶體管邏輯器件、分立硬體組件等。該至少一個處理器402可以是微處理器或者該至少一個處理器402也可以是任何常規的處理器等，所述至少一個處理器402是所述電子設備4的控制中心，利用各種介面和線路連接整個電子設備4的各個部分。 The at least one processor 402 can be a central processing unit (Central Processing Unit, CPU), and can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC ), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The at least one processor 402 can be a microprocessor or the at least one processor 402 can also be any conventional processor, etc., the at least one processor 402 is the control center of the electronic device 4, using various interfaces and lines Connect the various parts of the whole electronic equipment 4.

所述記憶體401可用於存儲所述計算機程式403，所述至少一個處理器402藉由運行或執行存儲在所述記憶體401內的計算機程式403，以及調用存儲在記憶體401內的數據，實現所述電子設備4的各種功能。所述記憶體401 可主要包括存儲程式區和存儲數據區，其中，存儲程式區可存儲操作系統、至少一個功能所需的應用程式(比如聲音播放功能、圖像播放功能等)等；存儲數據區可存儲根據電子設備4的使用所創建的數據(比如音頻數據)等。此外，記憶體401可以包括非易失性記憶體，例如硬碟、內存、插接式硬碟，智能存儲卡(Smart Media Card，SMC)，安全數位(Secure Digital，SD)卡，閃存卡(Flash Card)、至少一個磁盤記憶體件、閃存器件、或其他非易失性固態記憶體件。 The memory 401 can be used to store the computer program 403, and the at least one processor 402 runs or executes the computer program 403 stored in the memory 401 and calls the data stored in the memory 401, Various functions of the electronic device 4 are realized. The memory 401 It can mainly include a program storage area and a data storage area, wherein the program storage area can store an operating system, at least one application required by a function (such as a sound playback function, an image playback function, etc.), etc.; The use of the device 4 creates data such as audio data, and the like. In addition, the memory 401 may include non-volatile memory, such as hard disk, internal memory, plug-in hard disk, smart memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash memory card ( Flash Card), at least one disk memory device, flash memory device, or other non-volatile solid-state memory device.

所述電子設備4集成的模塊/單元如果以軟件功能單元的形式實現並作為獨立的產品銷售或使用時，可以存儲在一個計算機可讀取存儲媒體中。基於這樣的理解，本申請實現上述實施例方法中的全部或部分流程，也可以藉由計算機程式來指令相關的硬體來完成，所述的計算機程式可存儲於一計算機可讀存儲媒體中，該計算機程式在被處理器執行時，可實現上述各個方法實施例的步驟。其中，所述計算機程式包括計算機程式代碼，所述計算機程式代碼可以為源代碼形式、對象代碼形式、可執行文件或某些中間形式等。所述計算機可讀媒體可以包括：能夠攜帶所述計算機程式代碼的任何實體或裝置、記錄媒體、隨身碟、移動硬碟、磁碟、光盤、計算機記憶體以及唯讀記憶體(Read-Only Memory，ROM)。 If the integrated modules/units of the electronic device 4 are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on such an understanding, all or part of the processes in the methods of the above embodiments of the present application can also be completed by instructing related hardware through computer programs, and the computer programs can be stored in a computer-readable storage medium. When the computer program is executed by the processor, it can realize the steps of the above-mentioned various method embodiments. Wherein, the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file or some intermediate form. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, flash drive, removable hard disk, magnetic disk, optical disk, computer memory, and read-only memory (Read-Only Memory) , ROM).

對於本領域技術人員而言，顯然本申請不限於上述示範性實施例的細節，而且在不背離本申請的精神或基本特徵的情況下，能夠以其他的具體形式實現本申請。因此，無論從哪一點來看，均應將實施例看作是示範性的，而且是非限制性的，本申請的範圍由所附請求項而不是上述說明限定，因此旨在將落在請求項的等同要件的含義和範圍內的所有變化涵括在本申請內。不應將請求項中的任何附關聯圖標記視為限制所涉及的請求項。 It will be apparent to those skilled in the art that the present application is not limited to the details of the exemplary embodiments described above, but that the present application can be implemented in other specific forms without departing from the spirit or essential characteristics of the present application. Therefore, no matter from any point of view, the embodiments should be regarded as exemplary and non-restrictive, and the scope of the application is defined by the appended claims rather than the above description, so it is intended to All changes within the meaning and range of equivalents of the elements are embraced in this application. Any attached reference mark in a claim shall not be deemed to limit the claim to which it relates.

最後應說明的是，以上實施例僅用以說明本申請的技術方案而非限制，儘管參照較佳實施例對本申請進行了詳細說明，本領域的普通技術人員應當理解，可以對本申請的技術方案進行修改或等同替換，而不脫離本申請技術方案的精神和範圍。 Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present application without limitation. Although the present application has been described in detail with reference to the preferred embodiments, those skilled in the art should understand that the technical solutions of the present application can be Make modifications or equivalent replacements without departing from the spirit and scope of the technical solutions of the present application.

101-106:步驟 101-106: Steps

Claims

A three-dimensional object detection method applied to electronic equipment, wherein the three-dimensional object detection method includes: acquiring a detection image and a depth image corresponding to the detection image, wherein the depth image is obtained by a depth camera; Inputting the detection image into the trained target detection model, using the target detection model to determine the object category of the object in the detection image and the two-dimensional boundary frame of the object, including: using the full range of the target detection model The connection layer processes the eigenvalue data to obtain a plurality of candidate two-dimensional border frames of objects in the training image, and performs non-maximum value suppression operations on the plurality of candidate two-dimensional border frames to obtain the training image The two-dimensional border frame and the object category of the object in the object, the non-maximum value suppression operation includes: according to the score of the candidate two-dimensional border frame, sorting a plurality of candidate two-dimensional border frames, and selecting the candidate two-dimensional border frame with the highest score frame; traverse other candidate two-dimensional border frames, calculate the intersection and union ratio between other candidate two-dimensional border frames and the selected candidate two-dimensional border frame, and delete the candidate two-dimensional border frame corresponding to the intersection and union ratio greater than the preset threshold; judge In addition to the selected candidate two-dimensional border frame, whether there are other candidate two-dimensional border frames; according to the object category, determine the object model corresponding to the object and the object model corresponding to the object from the three-dimensional object model library The three-dimensional border frame corresponding to the model includes: pre-establishing a three-dimensional object model library, wherein the three-dimensional object model library includes a plurality of object models corresponding to different object categories and a three-dimensional border frame corresponding to each object model; according to the Depth image and the two-dimensional border frame, calculating the point cloud data of the object framed by the two-dimensional border frame and the distance from the depth camera to the object model; according to the object model and the point cloud data to determine the rotation angle of the object model; The position of the object model in the three-dimensional space is determined according to the distance from the depth camera to the object model, the rotation angle, and the three-dimensional boundary frame.

The three-dimensional object detection method according to claim 1, wherein, according to the depth image and the two-dimensional border frame, the point cloud data and the depth of the object framed by the two-dimensional border frame are calculated The distance from the camera to the object model includes: obtaining the depth value and coordinates of the object framed by the two-dimensional border frame according to the depth image, and determining the object from the depth camera to the object according to the depth value The distance of the model; obtain the point cloud data according to the coordinates and the internal and external parameter matrix transformation formula of the depth camera, and the internal and external parameter matrix transformation formula is:

Wherein (x, y, z) is the world coordinate, which is used to represent the point cloud of a primitive coordinate, f is the focal length, D is the depth value, (x ₁ , y ₁ ) is the frame selected by the two-dimensional border The primitive coordinates of any primitive point in the coordinate set of the object.

The three-dimensional object detection method according to claim 1, wherein the determining the rotation angle of the object model according to the object model and the point cloud data includes: obtaining the object outline according to the point cloud data the first point cloud data; converting the object model into a second point cloud data; performing point cloud matching on the first point cloud data and the second point cloud data to obtain the rotation angle of the object model.

The three-dimensional target detection method as claimed in item 1, wherein the method also includes: obtaining training images; building a target detection model based on You Only Look Once (YOLO) network; The training image is input into the target detection model for training, and the feature value data of the training image is obtained after the alternate processing of convolution and mean value pooling by the target detection model; using the target detection model The fully connected layer processes the feature value data to obtain the two-dimensional border frame and object category of the object in the training image, and adjusts the parameters of the target detection model to minimize the loss function to obtain the training completed Object detection model.

The 3D object detection method according to claim 1, wherein the method further includes: establishing the 3D object model library, wherein the 3D object model library includes multiple object models corresponding to different object categories and each A three-dimensional border frame corresponding to each object model, the three-dimensional border frame includes the length, width and height corresponding to each object category.

The three-dimensional object detection method according to claim 1, wherein, according to the distance from the depth camera to the object model, the rotation angle, and the three-dimensional bounding box, determine the object model in three-dimensional space The position includes: determining the direction of the object model in the three-dimensional space according to the rotation angle; according to the direction of the object model in the three-dimensional space, the distance from the depth camera to the object model and the The 3D bounding box of the object model determines the position of the object model in the 3D space.

The three-dimensional object detection method according to claim 6, wherein the method further includes: using the position of the object model in the three-dimensional space as the position of the object in the three-dimensional space, outputting the object category and the The position of an object in three-dimensional space.

An electronic device, wherein the electronic device includes a processor and a memory, and the processor is used to execute a computer program stored in the memory to realize the three-dimensional object detection method described in any one of claims 1 to 7 .

A computer-readable storage medium, wherein the computer-readable storage medium stores at least one instruction, and when the at least one instruction is executed by a processor, the three-dimensional object detection as described in any one of claims 1 to 7 is realized method.