TWI855330B

TWI855330B - Method for detecting three-dimensional object, electronic device and storage medium

Info

Publication number: TWI855330B
Application number: TW111120335A
Authority: TW
Inventors: 李潔; 郭錦斌
Original assignee: 鴻海精密工業股份有限公司
Priority date: 2022-05-31
Filing date: 2022-05-31
Publication date: 2024-09-11
Also published as: TW202349264A

Abstract

The present application provides a method for detecting three-dimensional (3D) object, an electronic device and a storage medium. The method includes: acquiring a training image; constructing and training a semantic segmentation model and obtaining a trained semantic segmentation model; inputting detected images into the trained semantic segmentation model and obtaining object categories and object positions; determining an object model from a 3D object model library; obtaining point cloud data of the object and a distance between a depth camera and the object model according to the depth image; determining a rotation angle of the object model according to the point cloud data and the object model. According to the distance from the depth camera to the object model, the rotation angle, and the object positions, the position of the object model in a 3D space is determined, thereby determining the position of the object in the 3D space. This application can quickly determine the position of the object in 3D space.

Description

Three-dimensional target detection method, electronic device and computer-readable storage medium

本申請涉及計算機視覺和深度學習技術、尤其涉及一種三維目標檢測方法、電子設備及計算機可讀存儲媒體。 This application relates to computer vision and deep learning technology, and in particular to a three-dimensional target detection method, electronic equipment, and computer-readable storage medium.

在自動駕駛領域中，自動駕駛系統會利用不同種類的傳感器檢測車輛前方或附近的物體，並做出對應的決策。例如，自動駕駛系統需要快速精準的檢測出物體的類別及三維位置，然後做出對應的決策以確保行車安全。然而，目前利用三維目標檢測算法檢測出物體的類別及位置，並利用回歸運算得到物體的三維位置，然而回歸運算在預測上需要花費很長的時間。此外，在進行三維目標檢測時，現有自動駕駛系統利用光達或者雷達取得深度信息來檢測車輛與前方物體的距離，但目前使用光達或者雷達的成本高昂、視場範圍比較小。 In the field of autonomous driving, the autonomous driving system uses different types of sensors to detect objects in front of or near the vehicle and make corresponding decisions. For example, the autonomous driving system needs to quickly and accurately detect the type and three-dimensional position of the object, and then make corresponding decisions to ensure driving safety. However, the three-dimensional target detection algorithm is currently used to detect the type and position of the object, and the three-dimensional position of the object is obtained by regression operation, but the regression operation takes a long time to predict. In addition, when performing three-dimensional target detection, the existing autonomous driving system uses lidar or radar to obtain depth information to detect the distance between the vehicle and the object in front, but the current cost of using lidar or radar is high and the field of view is relatively small.

鑒於以上內容，有必要提供一種三維目標檢測方法、電子設備及計算機可讀存儲媒體，以解決物體檢測效率慢及成本高的問題。 In view of the above, it is necessary to provide a three-dimensional target detection method, electronic equipment and computer-readable storage medium to solve the problems of slow object detection efficiency and high cost.

本申請實施例提供了一種三維目標檢測方法，所述三維目標檢測方法包括：獲取訓練圖像；基於全卷積網路構建語義分割模型；將所述訓練圖像輸入所述語義分割模型，利用所述語義分割模型中的卷積層和池化層進行多次卷積及池化，得到多個不同尺寸的特徵圖；將所述多個不同尺寸的特徵圖進行上採樣，得到和所述訓練圖像相同尺寸的第一圖像，對所述第一圖像進行像素分類並優化分類損失，輸出所述訓練圖像中物體的物體類別及所述物體的位置，得到訓練完成的語義分割模型；獲取檢測圖像及所述檢測圖像相應的深度圖像，其中，所述深度圖像藉由深度相機獲取；將所述檢測圖像輸入所述訓練完成的語義分割模型，得到所述檢測圖像中物體的物體類別及物體位置；根據所述物體類別，從所述三維物體模型庫中確定與所述物體對應的物件模型；根據所述深度圖像得到所述物體的點雲數據及所述深度相機到所述物件模型的距離；根據所述點雲數據及所述物件模型，確定所述物件模型的旋轉角度；根據所述深度相機到所述物件模型的距離、所述旋轉角度及所述物體的位置，確定所述物件模型在三維空間中的位置。 The present application embodiment provides a three-dimensional target detection method, which includes: obtaining a training image; constructing a semantic segmentation model based on a fully convolutional network; inputting the training image into the semantic segmentation model, performing multiple convolutions and pooling using the convolution layer and the pooling layer in the semantic segmentation model, and obtaining multiple feature maps of different sizes; upsampling the multiple feature maps of different sizes to obtain a first image of the same size as the training image, performing pixel classification on the first image and optimizing classification loss, outputting the object category of the object in the training image and the position of the object, and obtaining a trained semantic segmentation model; obtaining a detection image and the detection image; A depth image corresponding to the image is obtained by a depth camera; the detection image is input into the trained semantic segmentation model to obtain the object category and object position of the object in the detection image; according to the object category, the object model corresponding to the object is determined from the three-dimensional object model library; according to the depth image, the point cloud data of the object and the distance from the depth camera to the object model are obtained; according to the point cloud data and the object model, the rotation angle of the object model is determined; according to the distance from the depth camera to the object model, the rotation angle and the position of the object, the position of the object model in three-dimensional space is determined.

在一種可選的實施方式中，所述將所述多個不同尺寸的特徵圖進行上採樣，得到和所述訓練圖像相同尺寸的第一圖像包括：將所述多個不同尺寸的特徵圖進行上採樣後，進行反卷積運算，得到運算結果；對所述運算結果進行加和運算，得到和所述訓練圖像相同尺寸的第一圖像。 In an optional implementation, upsampling the feature maps of different sizes to obtain a first image of the same size as the training image includes: upsampling the feature maps of different sizes, performing a deconvolution operation to obtain an operation result; and summing the operation results to obtain a first image of the same size as the training image.

在一種可選的實施方式中，所述根據所述物體類別，從所述三維物體模型庫中確定與所述物體對應的物件模型包括：根據所述物體類別查找所述三維物體模型庫，確定所述物體的物件模型，所述三維物體模型庫包括物體類別及與物體類別對應的物件模型。 In an optional implementation, the step of determining the object model corresponding to the object from the three-dimensional object model library according to the object category includes: searching the three-dimensional object model library according to the object category to determine the object model of the object, wherein the three-dimensional object model library includes the object category and the object model corresponding to the object category.

在一種可選的實施方式中，所述根據所述深度圖像得到所述物體的點雲數據及所述深度相機到所述物件模型的距離包括：根據所述深度圖像獲取所述物體的深度值及座標，根據所述深度值確定所述深度相機到所述物件模型的距離；根據所述座標和所述深度相機的內外參矩陣變換公式得到所述點雲數據。 In an optional implementation, obtaining the point cloud data of the object and the distance from the depth camera to the object model according to the depth image includes: obtaining the depth value and coordinates of the object according to the depth image, determining the distance from the depth camera to the object model according to the depth value; and obtaining the point cloud data according to the coordinates and the internal and external parameter matrix transformation formula of the depth camera.

在一種可選的實施方式中，所述根據所述點雲數據及所述物件模型，確定所述物件模型的旋轉角度包括：根據所述點雲數據，得到物體輪廓的第一點雲數據；將所述物件模型轉化為第二點雲數據；將所述第一點雲數據與所述第二點雲數據進行點雲匹配，將所述第一點雲數據中物體輪廓的點擬合成第一平面並計算所述第一平面的曲率，將所述第二點雲數據的點擬合成第二平面並計算所述第二平面的曲率；計算所述第一平面的曲率與所述第二平面的曲率之差得到曲率偏差值，根據所述曲率偏差值確定所述物體的物件模型的旋轉角度。 In an optional implementation, determining the rotation angle of the object model according to the point cloud data and the object model includes: obtaining first point cloud data of the object outline according to the point cloud data; converting the object model into second point cloud data; performing point cloud matching on the first point cloud data and the second point cloud data, merging the points of the object outline in the first point cloud data into a first plane and calculating the curvature of the first plane, merging the points of the second point cloud data into a second plane and calculating the curvature of the second plane; calculating the difference between the curvature of the first plane and the curvature of the second plane to obtain a curvature deviation value, and determining the rotation angle of the object model of the object according to the curvature deviation value.

在一種可選的實施方式中，所述將所述物件模型轉化為第二點雲數據包括：利用點雲庫中的多個函數對所述物體的物件模型進行處理並生成所述第二點雲數據。 In an optional implementation, converting the object model into the second point cloud data includes: using multiple functions in the point cloud library to process the object model of the object and generate the second point cloud data.

在一種可選的實施方式中，所述根據所述深度相機到所述物件模型的距離、所述旋轉角度及所述物體的位置，確定所述物件模型在三維空間中的位置包括：根據所述旋轉角度確定所述物件模型在三維空間中的方向；根據所述物件模型在所述三維空間中的方向、所述深度相機到所述物件模型的距離及所述物體的位置，確定所述物件模型在三維空間中的位置。 In an optional implementation, determining the position of the object model in three-dimensional space according to the distance from the depth camera to the object model, the rotation angle, and the position of the object includes: determining the direction of the object model in three-dimensional space according to the rotation angle; determining the position of the object model in three-dimensional space according to the direction of the object model in the three-dimensional space, the distance from the depth camera to the object model, and the position of the object.

在一種可選的實施方式中，所述方法還包括：將所述物體的物件模型在三維空間中的位置作為所述物體在三維空間中的位置，輸出所述物體類別及所述物體在三維空間中的位置。 In an optional implementation, the method further includes: using the position of the object model of the object in the three-dimensional space as the position of the object in the three-dimensional space, and outputting the object category and the position of the object in the three-dimensional space.

本申請實施例還提供了一種電子設備，所述電子設備包括處理器和記憶體，所述處理器用於執行所述記憶體中存儲的計算機程式時實現所述的三維目標檢測方法。 This application embodiment also provides an electronic device, which includes a processor and a memory, and the processor is used to implement the three-dimensional target detection method when executing a computer program stored in the memory.

本申請實施例還提供了一種計算機可讀存儲媒體，所述計算機可讀存儲媒體上存儲有計算機程式，所述計算機程式被處理器執行時實現所述的三維目標檢測方法。 This application embodiment also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the three-dimensional target detection method is implemented.

本申請的技術方案成本低且不需進行複雜的運算，能快速的得到物體的三維位置。 The technical solution of this application is low-cost and does not require complex calculations, and can quickly obtain the three-dimensional position of an object.

4:電子設備 4: Electronic equipment

401:記憶體 401: Memory

402:處理器 402:Processor

403:計算機程式 403:Computer Program

404:通訊匯流排 404: Communication bus

101-107:步驟 101-107: Steps

圖1為本申請實施例提供的一種三維目標檢測方法的流程圖。 Figure 1 is a flow chart of a three-dimensional target detection method provided in an embodiment of this application.

圖2為本申請實施例提供的經過所述訓練完成的語義分割模型分割後的圖像。 Figure 2 is an image segmented by the trained semantic segmentation model provided in the embodiment of this application.

圖3為本申請實施例提供的三維物體模型庫示意圖。 Figure 3 is a schematic diagram of the three-dimensional object model library provided in this application embodiment.

圖4為本申請實施例提供的一種電子設備的結構示意圖。 Figure 4 is a schematic diagram of the structure of an electronic device provided in an embodiment of this application.

為了能夠更清楚地理解本申請的上述目的、特徵和優點，下面結合附圖和具體實施例對本申請進行詳細描述。需要說明的是，此處所描述的具體實施例僅用以解釋本申請，並不用於限定本申請。 In order to more clearly understand the above-mentioned purpose, features and advantages of this application, the following is a detailed description of this application in conjunction with the attached drawings and specific embodiments. It should be noted that the specific embodiments described here are only used to explain this application and are not intended to limit this application.

在下面的描述中闡述了很多具體細節以便於充分理解本申請，所描述的實施例僅僅是本申請一部分實施例，而不是全部的實施例。基於本申請中的實施例，本領域普通技術人員在沒有做出創造性勞動前提下所獲得的所有其他實施例，都屬本申請保護的範圍。 In the following description, many specific details are explained to facilitate a full understanding of this application. The embodiments described are only part of the embodiments of this application, not all of them. Based on the embodiments in this application, all other embodiments obtained by ordinary technicians in this field without creative labor are within the scope of protection of this application.

除非另有定義，本文所使用的所有的技術和科學術語與屬本申請的技術領域的技術人員通常理解的含義相同。本文中在本申請的說明書中所使用的術語只是為了描述具體的實施例的目的，不是旨在於限制本申請。 Unless otherwise defined, all technical and scientific terms used herein have the same meaning as those commonly understood by technicians in the technical field of this application. The terms used in this specification of this application are only for the purpose of describing specific embodiments and are not intended to limit this application.

參閱圖1所示，為本申請實施例提供的一種三維目標檢測方法的流程圖。所述方法應用於電子設備(例如，圖4所示的電子設備4)中，所述電子設備可以是任何一種可與用戶進行人機交互的電子產品，例如，個人計算機、平板電腦、智能手機、個人數位助理(Personal Digital Assistant，PDA)、遊戲機、交互式網路電視(Internet Protocol Television，IPTV)、智能穿戴式裝置等。 Refer to FIG. 1, which is a flow chart of a three-dimensional target detection method provided in an embodiment of the present application. The method is applied to an electronic device (for example, the electronic device 4 shown in FIG. 4), and the electronic device can be any electronic product that can interact with a user, such as a personal computer, a tablet computer, a smart phone, a personal digital assistant (PDA), a game console, an interactive network television (IPTV), a smart wearable device, etc.

所述電子設備是一種能夠按照事先設定或存儲的指令，自動進行數值計算和/或信息處理的設備，其硬體包括，但不限於：微處理器、專用集成電路(Application Specific Integrated Circuit，ASIC)、可編程門陣列(Field-Programmable Gate Array，FPGA)、數位訊號處理器(Digital Signal Processor，DSP)、嵌入式設備等。 The electronic device is a device that can automatically perform numerical calculations and/or information processing according to pre-set or stored instructions, and its hardware includes, but is not limited to: microprocessors, application specific integrated circuits (ASIC), field-programmable gate arrays (FPGA), digital signal processors (DSP), embedded devices, etc.

所述電子設備還可以包括網路設備和/或用戶設備。其中，所述網路設備包括，但不限於單個網路伺服器、多個網路伺服器組成的伺服器組或基於雲計算(Cloud Computing)的由大量主機或網路伺服器構成的雲。 The electronic device may also include network equipment and/or user equipment. The network equipment includes, but is not limited to, a single network server, a server group consisting of multiple network servers, or a cloud consisting of a large number of hosts or network servers based on cloud computing.

所述電子設備所處的網路包括但不限於網際網路、廣域網、城域網、區域網、虛擬專用網路(Virtual Private Network，VPN)等。 The network where the electronic device is located includes but is not limited to the Internet, wide area network, metropolitan area network, local area network, virtual private network (VPN), etc.

所述方法具體包括如下： The method specifically includes the following:

101，獲取訓練圖像。 101, get the training image.

在本申請的至少一個實施例中，所述訓練圖像包括，但不限於在不同時間段城市、鄉村等各類公路上的場景圖像。在本實施例中，所述訓練圖像包括Pascal VOC數據集、Cityscapes數據集中的圖像。 In at least one embodiment of the present application, the training images include, but are not limited to, scene images of various types of roads such as cities and rural areas at different time periods. In this embodiment, the training images include images in the Pascal VOC dataset and the Cityscapes dataset.

在本申請的至少一個實施例中，所述方法還包括：對所述訓練圖像進行數據增強操作，以擴增不同的訓練圖像。所述數據增強操作包括，但不限於翻轉圖像、旋轉圖像、縮放圖像、裁剪圖像。需要說明的是，藉由所述數據增強操作可以得到更多不同場景下的車輛前方圖像作為訓練樣本，對語義分割模型學習進行訓練與優化，使所述語義分割模型更具魯棒性。 In at least one embodiment of the present application, the method further includes: performing a data enhancement operation on the training image to expand different training images. The data enhancement operation includes, but is not limited to, flipping the image, rotating the image, scaling the image, and cropping the image. It should be noted that by performing the data enhancement operation, more images of the front of the vehicle in different scenes can be obtained as training samples, and the semantic segmentation model learning is trained and optimized to make the semantic segmentation model more robust.

102，構建語義分割模型並利用訓練圖像訓練所述語義分割模型，得到訓練完成的語義分割模型。 102, construct a semantic segmentation model and use the training images to train the semantic segmentation model to obtain a trained semantic segmentation model.

在本申請的至少一個實施例中，構建語義分割模型包括：基於全卷積網路構建語義分割模型。 In at least one embodiment of the present application, constructing a semantic segmentation model includes: constructing a semantic segmentation model based on a fully convolutional network.

基於所述全卷積網路，能夠對輸入全卷積網路的圖像中的多個目標進行分割識別。例如，將一張圖像輸入至所述語義分割模型，所述圖像中包括多個目標。例如車、人、小狗等。所述語義分割模型對所述圖像中的多個目標進行分割並識別後，輸出所述訓練圖像中物體的物體類別及所述物體的位置。 Based on the fully convolutional network, multiple targets in the image input to the fully convolutional network can be segmented and identified. For example, an image is input to the semantic segmentation model, and the image includes multiple targets. For example, a car, a person, a puppy, etc. After the semantic segmentation model segments and identifies the multiple targets in the image, it outputs the object category of the object in the training image and the position of the object.

在本申請的至少一個實施例中，訓練語義分割模型得到訓練完成的語義分割模型的方法包括：將所述訓練圖像輸入所述語義分割模型，利用所述語義分割模型中的卷積層和池化層進行多次卷積及池化，得到多個不同尺寸的特徵圖；將多個所述不同尺寸的特徵圖進行上採樣，再進行反卷積運算，得到運算結果；對所述運算結果進行加和運算，得到和所述訓練圖像相同尺寸的第一圖像；對所述第一圖像進行像素分類，並計算和優化分類損失，輸出所述訓練圖像中物體的物體類別及所述物體的位置得到訓練完成的語義分割模型。在本實施例中，採用softmax交叉熵函數作為損失函數對所述語義分割模型進行訓練。例如，如圖2所示，圖2為本申請實施例提供的經過所述訓練完成的語義分割模型分割後的圖像。 In at least one embodiment of the present application, the method for training a semantic segmentation model to obtain a trained semantic segmentation model includes: inputting the training image into the semantic segmentation model, performing multiple convolutions and poolings using the convolution layer and the pooling layer in the semantic segmentation model to obtain multiple feature maps of different sizes; upsampling the multiple feature maps of different sizes, and then performing deconvolution operations to obtain operation results; adding the operation results to obtain a first image of the same size as the training image; performing pixel classification on the first image, and calculating and optimizing the classification loss, and outputting the object category of the object in the training image and the position of the object to obtain the trained semantic segmentation model. In this embodiment, the softmax cross entropy function is used as the loss function to train the semantic segmentation model. For example, as shown in FIG2, FIG2 is an image segmented by the semantic segmentation model after the training provided in the embodiment of the present application.

103，獲取檢測圖像及與所述檢測圖像對應的深度圖像。 103, obtaining a detection image and a depth image corresponding to the detection image.

在本申請的至少一個實施例中，利用安裝在車輛內部或外部的攝像頭進行拍攝，將拍攝到的車輛前方的圖像作為檢測圖像。 In at least one embodiment of the present application, a camera installed inside or outside a vehicle is used to take pictures, and the image in front of the vehicle is used as a detection image.

在本申請的至少一個實施例中，獲取與所述檢測圖像對應的深度圖像包括：使用深度相機獲取深度圖像，利用安裝在車輛上的深度相機拍攝的車輛前方的圖像作為深度圖像。需要說明的是，當利用安裝在車輛內部或者外部的攝像頭拍攝車輛前方的圖像作為檢測圖像時，所述深度相機同時拍攝車輛前方圖像作為深度圖像，所述深度圖像與所述檢測圖像對應。例如，利用不同類型的攝像頭針對車輛前方的同一對象分別進行拍攝以得到檢測圖像與深度圖像。 In at least one embodiment of the present application, obtaining a depth image corresponding to the detection image includes: obtaining a depth image using a depth camera, and using an image in front of the vehicle taken by the depth camera installed on the vehicle as the depth image. It should be noted that when a camera installed inside or outside the vehicle is used to take an image in front of the vehicle as a detection image, the depth camera simultaneously takes an image in front of the vehicle as a depth image, and the depth image corresponds to the detection image. For example, different types of cameras are used to take pictures of the same object in front of the vehicle to obtain a detection image and a depth image.

需要說明的是，所述深度相機為現有的深度相機，所述方法還包括：從深度相機獲取深度圖像的深度信息。 It should be noted that the depth camera is an existing depth camera, and the method further includes: obtaining depth information of the depth image from the depth camera.

104，根據所述物體類別，從所述三維物體模型庫中確定與所述物體對應的物件模型。 104, according to the object category, determine the object model corresponding to the object from the three-dimensional object model library.

在本申請的至少一個實施例中，所述根據所述物體類別，從所述三維物體模型庫中確定與所述物體對應的物件模型包括：建立物體類別與物件模型相對應的三維物體模型庫，其中，所述三維物體模型庫包括物體類別及與物體類別對應的物件模型；根據所述物體類別查找所述三維物體模型庫確定所述物體的物件模型。參考圖3所示，為本申請實施例提供的三維物體模型庫示意圖。當物體類別為小車時，基於三維物體模型庫查找小車的物件模型；當物體類別為小貨車時，基於三維物體模型庫查找小貨車的物件模型；當物體類別為電動車時，基於三維物體模型庫查找電動車的物件模型；當物體類別為大型巴士時，基於三維物體模型庫查找大型巴士的物件模型。 In at least one embodiment of the present application, the method of determining the object model corresponding to the object from the three-dimensional object model library according to the object category includes: establishing a three-dimensional object model library corresponding to the object category and the object model, wherein the three-dimensional object model library includes the object category and the object model corresponding to the object category; and determining the object model of the object by searching the three-dimensional object model library according to the object category. Referring to FIG3 , a schematic diagram of the three-dimensional object model library provided in the embodiment of the present application is shown. When the object type is a car, the object model of the car is searched based on the three-dimensional object model library; when the object type is a small truck, the object model of the small truck is searched based on the three-dimensional object model library; when the object type is an electric car, the object model of the electric car is searched based on the three-dimensional object model library; when the object type is a large bus, the object model of the large bus is searched based on the three-dimensional object model library.

在本申請實施例中，所述物體的物件模型包括物體的三維模型。 In this application embodiment, the object model of the object includes a three-dimensional model of the object.

105，根據所述深度圖像得到所述點雲數據及所述深度相機到所述物件模型的距離。 105, obtaining the point cloud data and the distance from the depth camera to the object model according to the depth image.

在本申請的至少一個實施例中，確定所述深度相機到所述物件模型的距離包括：根據所述深度圖像獲取所述物體的深度值；根據所述深度值確定所述深度相機到所述物件模型的距離。在本實施例中，所述深度值從深度相機中獲得。具體地，在利用深度相機拍攝得到深度圖像時，深度相機顯示深度值，所述深度值為所述深度相機到所述物體的距離，在本實施例中，將所述深度相機到所述物體的距離作為所述深度相機到所述物體的物件模型的距離。 In at least one embodiment of the present application, determining the distance from the depth camera to the object model includes: obtaining a depth value of the object according to the depth image; and determining the distance from the depth camera to the object model according to the depth value. In this embodiment, the depth value is obtained from the depth camera. Specifically, when the depth image is obtained by shooting with the depth camera, the depth camera displays a depth value, and the depth value is the distance from the depth camera to the object. In this embodiment, the distance from the depth camera to the object is used as the distance from the depth camera to the object model of the object.

在本申請的至少一個實施例中，獲取所述點雲數據的方法包括：根據所述深度圖像獲取所述物體的座標集；根據所述座標集和所述深度相機的內外參矩陣變換公式得到所述物體的點雲數據。在本實施例中，所述物體的座標集為物體的像素座標集；所述物體的點雲數據為所述物體的座標集中的座標對應的世界座標。所述物體的點雲數據用於表徵物體的輪廓的數據。將所述物體的座標集中的座標轉化為對應的世界座標需經過內外參矩陣變換公式得到，所述內外參矩陣變換公式為：

In at least one embodiment of the present application, the method for obtaining the point cloud data includes: obtaining the coordinate set of the object according to the depth image; obtaining the point cloud data of the object according to the coordinate set and the intrinsic and extrinsic parameter matrix transformation formula of the depth camera. In this embodiment, the coordinate set of the object is the pixel coordinate set of the object; the point cloud data of the object is the world coordinates corresponding to the coordinates in the coordinate set of the object. The point cloud data of the object is used to characterize the contour of the object. The coordinates in the coordinate set of the object are converted into corresponding world coordinates through the intrinsic and extrinsic parameter matrix transformation formula, and the intrinsic and extrinsic parameter matrix transformation formula is:

其中(x，y，z)為世界座標，用於表示為一個像素座標的點雲，f為焦距，D為深度值，(x ₁ ，y ₁ )為所述二維邊線框內物體的座標集中任意一像素點的像素座標。利用上述公式將所述做座標集中的所有座標逐一轉化為世界座標，得到所述物體的點雲數據。 Where (x, y, z) is the world coordinate, used to represent a point cloud of pixel coordinates, f is the focal length, D is the depth value, (x1 _, y1 ₎ is the pixel coordinate of any pixel point in the coordinate set of the object in the two-dimensional boundary frame. All coordinates in the coordinate set are converted into world coordinates one by one using the above formula to obtain the point cloud data of the object.

106，根據所述點雲數據及所述物件模型，確定所述物件模型的旋轉角度。 106, determining the rotation angle of the object model according to the point cloud data and the object model.

在本申請的至少一個實施例中，所述根據所述點雲數據及所述物件模型，確定所述物件模型的旋轉角度包括：根據所述點雲數據，得到所述物體輪廓的第一點雲數據；將所述物件模型轉化為第二點雲數據；將所述第一點雲數據與所述第二點雲數據進行點雲匹配，將所述第一點雲數據中物體輪廓的點擬合成第一平面並計算所述第一平面的曲率，將所述第二點雲數據的點擬合成第二平面並計算所述第二平面的曲率；計算所述第一平面的曲率與所述第二平面的曲率之差得到曲率偏差值，根據所述曲率偏差值確定所述物體的物件模型的旋轉角度。 In at least one embodiment of the present application, determining the rotation angle of the object model according to the point cloud data and the object model includes: obtaining first point cloud data of the object outline according to the point cloud data; converting the object model into second point cloud data; performing point cloud matching on the first point cloud data and the second point cloud data, merging the points of the object outline in the first point cloud data into a first plane and calculating the curvature of the first plane, merging the points of the second point cloud data into a second plane and calculating the curvature of the second plane; calculating the difference between the curvature of the first plane and the curvature of the second plane to obtain a curvature deviation value, and determining the rotation angle of the object model of the object according to the curvature deviation value.

在本申請的至少一個實施例中，所述將所述物體的物件模型轉化為第二點雲數據包括：利用點雲庫(Point Cloud Library，PCL)中的多個函數對所述物體的物件模型進行處理並生成所述物體的物件模型的點雲數據作為第二點雲數據。 In at least one embodiment of the present application, the converting the object model of the object into the second point cloud data includes: using multiple functions in the Point Cloud Library (PCL) to process the object model of the object and generate the point cloud data of the object model of the object as the second point cloud data.

107，確定所述物體在三維空間中的位置。 107, determine the position of the object in three-dimensional space.

在本申請的至少一個實施例中，根據所述旋轉角度確定所述物件模型在所述三維空間中的方向，根據所述物件模型在所述三維空間中的方向、所述深度相機到所述物件模型的距離及所述物體的位置，確定所述物件模型在三維空間中的位置。具體地，將所述物件模型在三維空間中的位置作為所述物體在三維空間中的位置，輸出所述物體類別及所述物體在三維空間中的位置。例如，將所述物體類別及所述物體在三維空間中的位置顯示於一顯示屏上。 In at least one embodiment of the present application, the direction of the object model in the three-dimensional space is determined according to the rotation angle, and the position of the object model in the three-dimensional space is determined according to the direction of the object model in the three-dimensional space, the distance from the depth camera to the object model, and the position of the object. Specifically, the position of the object model in the three-dimensional space is used as the position of the object in the three-dimensional space, and the object category and the position of the object in the three-dimensional space are output. For example, the object category and the position of the object in the three-dimensional space are displayed on a display screen.

以上所述，僅是本申請的具體實施方式，但本申請的保護範圍並不局限於此，對於本領域的普通技術人員來說，在不脫離本申請創造構思的前提下，還可以做出改進，但這些均屬本申請的保護範圍。 The above is only the specific implementation method of this application, but the protection scope of this application is not limited to this. For ordinary technical personnel in this field, improvements can be made without departing from the creative concept of this application, but these are all within the protection scope of this application.

如圖4所示，圖4為本申請實施例提供的一種電子設備結構示意圖。所述電子設備4包括記憶體401、至少一個處理器402、存儲在所述記憶體401中並可在所述至少一個處理器402上運行的計算機程式403及至少一條通訊匯流排404。 As shown in FIG. 4 , FIG. 4 is a schematic diagram of an electronic device structure provided in an embodiment of the present application. The electronic device 4 includes a memory 401, at least one processor 402, a computer program 403 stored in the memory 401 and executable on the at least one processor 402, and at least one communication bus 404.

本領域技術人員可以理解，圖4所示的示意圖僅僅是所述電子設備4的示例，並不構成對所述電子設備4的限定，可以包括比圖示更多或更少的部件，或者組合某些部件，或者不同的部件，例如所述電子設備4還可以包括輸入輸出設備、網路接入設備等。 Those skilled in the art can understand that the schematic diagram shown in FIG4 is only an example of the electronic device 4 and does not constitute a limitation on the electronic device 4. The electronic device 4 may include more or fewer components than shown in the diagram, or may combine certain components, or different components. For example, the electronic device 4 may also include input and output devices, network access devices, etc.

所述至少一個處理器402可以是中央處理單元(Central Processing Unit，CPU)，還可以是其他通用處理器、數位訊號處理器(Digital Signal Processor，DSP)、專用集成電路(Application Specific Integrated Circuit，ASIC)、現場可編程門陣列(Field-Programmable Gate Array，FPGA)或者其他可編程邏輯器件、分立門或者晶體管邏輯器件、分立硬體組件等。該至少一個處理器402可以是微處理器或者該至少一個處理器402也可以是任何常規的處理器等，所述至少一個處理器402是所述電子設備4的控制中心，利用各種介面和線路連接整個電子設備4的各個部分。 The at least one processor 402 may be a central processing unit (CPU), or other general-purpose processors, digital signal processors (DSP), application-specific integrated circuits (ASIC), field-programmable gate arrays (FPGA), or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The at least one processor 402 may be a microprocessor or any conventional processor, etc. The at least one processor 402 is the control center of the electronic device 4, and uses various interfaces and lines to connect the various parts of the entire electronic device 4.

所述記憶體401可用於存儲所述計算機程式403，所述至少一個處理器402藉由運行或執行存儲在所述記憶體401內的計算機程式403，以及調用存儲在記憶體401內的數據，實現所述電子設備4的各種功能。所述記憶體401 可主要包括存儲程式區和存儲數據區，其中，存儲程式區可存儲操作系統、至少一個功能所需的應用程式(比如聲音播放功能、圖像播放功能等)等；存儲數據區可存儲根據電子設備4的使用所創建的數據(比如音頻數據)等。此外，記憶體401可以包括非易失性記憶體，例如硬碟、內存、插接式硬碟，智能存儲卡(Smart Media Card，SMC)，安全數位(Secure Digital，SD)卡，閃存卡(Flash Card)、至少一個磁盤記憶體件、閃存器件、或其他非易失性固態記憶體件。 The memory 401 can be used to store the computer program 403. The at least one processor 402 realizes various functions of the electronic device 4 by running or executing the computer program 403 stored in the memory 401 and calling the data stored in the memory 401. The memory 401 can mainly include a program storage area and a data storage area, wherein the program storage area can store an operating system, an application required for at least one function (such as a sound playback function, an image playback function, etc.), etc.; the data storage area can store data created according to the use of the electronic device 4 (such as audio data), etc. In addition, the memory 401 may include non-volatile memory, such as a hard disk, memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) card, a Flash Card, at least one disk memory device, a flash memory device, or other non-volatile solid-state memory devices.

所述電子設備4集成的模塊/單元如果以軟件功能單元的形式實現並作為獨立的產品銷售或使用時，可以存儲在一個計算機可讀取存儲媒體中。基於這樣的理解，本申請實現上述實施例方法中的全部或部分流程，也可以藉由計算機程式來指令相關的硬體來完成，所述的計算機程式可存儲於一計算機可讀存儲媒體中，該計算機程式在被處理器執行時，可實現上述各個方法實施例的步驟。其中，所述計算機程式包括計算機程式代碼，所述計算機程式代碼可以為源代碼形式、對象代碼形式、可執行文件或某些中間形式等。所述計算機可讀媒體可以包括：能夠攜帶所述計算機程式代碼的任何實體或裝置、記錄媒體、隨身碟、移動硬碟、磁碟、光盤、計算機記憶體以及唯讀記憶體(Read-Only Memory，ROM)。 If the module/unit integrated in the electronic device 4 is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the present application implements all or part of the processes in the above-mentioned embodiment method, and can also be completed by instructing the relevant hardware through a computer program. The computer program can be stored in a computer-readable storage medium. When the computer program is executed by the processor, the steps of the above-mentioned method embodiments can be implemented. Among them, the computer program includes computer program code, and the computer program code can be in the form of source code, object code, executable file or some intermediate form. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording media, a flash drive, a removable hard drive, a magnetic disk, an optical disk, a computer memory, and a read-only memory (ROM).

對於本領域技術人員而言，顯然本申請不限於上述示範性實施例的細節，而且在不背離本申請的精神或基本特徵的情況下，能夠以其他的具體形式實現本申請。因此，無論從哪一點來看，均應將實施例看作是示範性的，而且是非限制性的，本申請的範圍由所附請求項而不是上述說明限定，因此旨在將落在請求項的等同要件的含義和範圍內的所有變化涵括在本申請內。不應將請求項中的任何附關聯圖標記視為限制所涉及的請求項。 It is obvious to those skilled in the art that the present application is not limited to the details of the above exemplary embodiments and can be implemented in other specific forms without departing from the spirit or basic features of the present application. Therefore, no matter from which point of view, the embodiments should be regarded as exemplary and non-restrictive, and the scope of the present application is defined by the attached claims rather than the above description, so it is intended to include all changes that fall within the meaning and scope of the equivalent elements of the claims. Any attached figure mark in the claims should not be regarded as limiting the claims involved.

最後應說明的是，以上實施例僅用以說明本申請的技術方案而非限制，儘管參照較佳實施例對本申請進行了詳細說明，本領域的普通技術人員應當理解，可以對本申請的技術方案進行修改或等同替換，而不脫離本申請技術方案的精神和範圍。 Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of this application and are not limiting. Although this application is described in detail with reference to the preferred embodiments, ordinary technicians in this field should understand that the technical solution of this application can be modified or replaced by equivalents without departing from the spirit and scope of the technical solution of this application.

101-107:步驟 101-107: Steps

Claims

A three-dimensional target detection method is applied to electronic equipment, wherein the three-dimensional target detection method comprises: obtaining a training image; constructing a semantic segmentation model based on a fully convolutional network; inputting the training image into the semantic segmentation model, performing multiple convolutions and pooling using the convolutional layer and the pooling layer in the semantic segmentation model to obtain multiple feature maps of different sizes; upsampling the multiple feature maps of different sizes to obtain the same feature map as the A first image of the same size as the training image is obtained, pixel classification is performed on the first image and classification loss is optimized, the object category of the object in the training image and the position of the object are output, and a trained semantic segmentation model is obtained, wherein the upsampling of the feature maps of the plurality of different sizes to obtain the first image of the same size as the training image comprises: after upsampling the feature maps of the plurality of different sizes, inverse A convolution operation is performed to obtain an operation result; an addition operation is performed on the operation result to obtain a first image of the same size as the training image; a detection image and a depth image corresponding to the detection image are obtained, wherein the depth image is obtained by a depth camera; the detection image is input into the trained semantic segmentation model to obtain the object category and object position of the object in the detection image; according to the object category, an object is obtained from the three Determine the object model corresponding to the object in the three-dimensional object model library; obtain the point cloud data of the object and the distance from the depth camera to the object model according to the depth image; determine the rotation angle of the object model according to the point cloud data and the object model; Determine the position of the object model in three-dimensional space according to the distance from the depth camera to the object model, the rotation angle and the position of the object.

The three-dimensional target detection method as described in claim 1, wherein the determining the object model corresponding to the object from the three-dimensional object model library according to the object category comprises: searching the three-dimensional object model library according to the object category to determine the object model of the object, wherein the three-dimensional object model library comprises the object category and the object model corresponding to the object category.

The three-dimensional target detection method as described in claim 1, wherein the step of obtaining the point cloud data of the object and the distance from the depth camera to the object model according to the depth image comprises: obtaining the depth value and coordinates of the object according to the depth image, determining the distance from the depth camera to the object model according to the depth value; and obtaining the point cloud data according to the coordinates and the internal and external parameter matrix transformation formula of the depth camera.

The three-dimensional target detection method as described in claim 1, wherein the determining the rotation angle of the object model according to the point cloud data and the object model comprises: obtaining first point cloud data of the object outline according to the point cloud data; converting the object model into second point cloud data; performing point cloud matching on the first point cloud data and the second point cloud data, merging the points of the object outline in the first point cloud data into a first plane and calculating the curvature of the first plane, merging the points of the second point cloud data into a second plane and calculating the curvature of the second plane; calculating the difference between the curvature of the first plane and the curvature of the second plane to obtain a curvature deviation value, and determining the rotation angle of the object model of the object according to the curvature deviation value.

The three-dimensional target detection method as described in claim 4, wherein the converting the object model into the second point cloud data includes: using multiple functions in the point cloud library to process the object model of the object and generate the second point cloud data.

A three-dimensional target detection method as described in any one of claims 1 to 5, wherein determining the position of the object model in three-dimensional space according to the distance from the depth camera to the object model, the rotation angle, and the position of the object comprises: determining the direction of the object model in three-dimensional space according to the rotation angle; determining the position of the object model in three-dimensional space according to the direction of the object model in the three-dimensional space, the distance from the depth camera to the object model, and the position of the object.

The three-dimensional target detection method as described in claim 6, wherein the method further comprises: taking the position of the object model of the object in the three-dimensional space as the position of the object in the three-dimensional space, and outputting the object category and the position of the object in the three-dimensional space.

An electronic device, wherein the electronic device includes a processor and a memory, and the processor is used to execute a computer program stored in the memory to implement a three-dimensional target detection method as described in any one of claims 1 to 7.

A computer-readable storage medium, wherein the computer-readable storage medium stores at least one instruction, and when the at least one instruction is executed by a processor, the three-dimensional target detection method as described in any one of claim items 1 to 7 is implemented.