TW202301276A

TW202301276A - Depth detection method and device, electronic equipment and storage medium

Info

Publication number: TW202301276A
Application number: TW111122249A
Authority: TW
Inventors: 趙佳; 謝符寶; 劉文韜; 錢晨
Original assignee: 大陸商北京市商湯科技開發有限公司
Priority date: 2021-06-28
Filing date: 2022-06-15
Publication date: 2023-01-01
Also published as: CN113345000A; WO2023273499A1

Abstract

The invention relates to a depth detection method and device, electronic equipment and a storage medium, and the method comprises the steps: obtaining a plurality of to-be-detected frames which comprise image frames obtained by carrying out the image collection of a target object from at least two collection visual angles; carrying out key point detection on a target area in the target object according to the to-be-detected frames, determining multiple key point detection results corresponding to the multiple to-be-detected frames, wherein the target area comprises a head area and/or a shoulder area; and determining depth information of the target object according to the plurality of key point detection results.

Description

In-depth detection method and device, electronic equipment, storage medium and program product

本發明要求在2021年6月28日提交中國專利局、申請號為202110721270.1、發明名稱為“深度檢測方法及裝置、電子設備和存儲介質”的中國專利申請的優先權，其全部內容通過引用結合在本發明中。The present invention claims the priority of the Chinese patent application filed with the China Patent Office on June 28, 2021, with the application number 202110721270.1, and the title of the invention is "deep detection method and device, electronic equipment and storage medium", the entire contents of which are incorporated by reference In the present invention.

本發明涉及電腦技術領域，尤其涉及一種深度檢測方法及裝置、電子設備、儲存媒體和程式產品。The invention relates to the field of computer technology, in particular to a depth detection method and device, electronic equipment, storage media and program products.

深度訊息可以反映圖像中的人體相對於圖像採集設備的距離，基於深度訊息，可以對圖像中的人體對象進行空間定位。雙目相機是一種較為常見和廣泛應用的圖像採集設備，基於雙目相機採集的至少兩個圖像，可以通過圖像間的匹配來確定圖像中人體的深度訊息，然而圖像間的匹配計算複雜且精度容易受影響，如何便捷且準確地確定圖像中人體的深度訊息，成為目前一個亟待解決的問題。The depth information can reflect the distance of the human body in the image relative to the image acquisition device, and based on the depth information, the human body object in the image can be spatially positioned. The binocular camera is a relatively common and widely used image acquisition device. Based on at least two images collected by the binocular camera, the depth information of the human body in the image can be determined through the matching between the images. However, the difference between the images The matching calculation is complex and the accuracy is easily affected. How to conveniently and accurately determine the depth information of the human body in the image has become an urgent problem to be solved.

本發明提出了一種深度檢測的技術方案。The invention proposes a technical scheme of depth detection.

根據本發明的一方面，提供了一種深度檢測方法，包括：According to an aspect of the present invention, a depth detection method is provided, comprising:

獲取多幀待檢測幀，其中，所述多幀待檢測幀包括從至少兩個採集視角對目標對象進行圖像採集所得到的圖像幀；根據所述待檢測幀進行所述目標對象中目標區域的關鍵點檢測，確定與所述多幀待檢測幀對應的多個關鍵點檢測結果，其中，所述目標區域包括頭部區域和/或肩部區域；根據所述多個關鍵點檢測結果，確定所述目標對象的深度訊息。Acquiring multiple frames to be detected, wherein the multiple frames to be detected include image frames obtained by collecting images of the target object from at least two acquisition angles of view; Key point detection of the region, determining a plurality of key point detection results corresponding to the multiple frames to be detected, wherein the target area includes a head region and/or a shoulder region; according to the plurality of key point detection results , to determine the depth information of the target object.

在一種可能的實現方式中，所述根據所述多個關鍵點檢測結果，確定所述目標對象的深度訊息，包括：獲取至少兩個採集設備分別對應的至少兩個預設設備參數，所述至少兩個採集設備用於從至少兩個採集視角對所述目標對象進行圖像採集；根據所述至少兩個預設設備參數以及所述多個關鍵點檢測結果，確定所述待檢測幀中所述目標對象的深度訊息。In a possible implementation manner, the determining the depth information of the target object according to the multiple key point detection results includes: acquiring at least two preset device parameters respectively corresponding to at least two acquisition devices, the At least two acquisition devices are used to acquire images of the target object from at least two acquisition angles of view; according to the at least two preset device parameters and the plurality of key point detection results, determine the Depth information of the target object.

在一種可能的實現方式中，所述深度訊息包括深度距離，所述深度距離包括所述目標對象與採集設備的光心之間的距離；所述根據所述至少兩個預設設備參數以及所述多個關鍵點檢測結果，確定所述待檢測幀中所述目標對象的深度訊息，包括：根據所述至少兩個預設設備參數中的預設外部參數以及所述多個關鍵點檢測結果在至少兩個形式下的坐標，得到所述深度距離；其中，所述預設外部參數包括所述至少兩個採集設備之間形成的相對參數。In a possible implementation manner, the depth information includes a depth distance, and the depth distance includes a distance between the target object and the optical center of the acquisition device; the at least two preset device parameters and the The plurality of key point detection results, determining the depth information of the target object in the frame to be detected includes: according to the preset external parameters in the at least two preset device parameters and the plurality of key point detection results Coordinates in at least two forms obtain the depth distance; wherein, the preset external parameters include relative parameters formed between the at least two acquisition devices.

在一種可能的實現方式中，所述深度訊息包括偏移角度，所述偏移角度包括所述目標對象相對於所述採集設備的光軸的空間角度；所述根據所述至少兩個預設設備參數以及所述多個關鍵點檢測結果，確定所述待檢測幀中所述目標對象的深度訊息，包括：根據所述至少兩個預設設備參數中的預設內部參數以及所述多個關鍵點檢測結果在至少兩個形式下的坐標，得到所述偏移角度；其中，所述預設內部參數包括所述至少兩個設備分別對應的設備參數。In a possible implementation manner, the depth information includes an offset angle, and the offset angle includes a spatial angle of the target object relative to the optical axis of the acquisition device; the at least two preset Determining the depth information of the target object in the frame to be detected based on the device parameters and the multiple key point detection results includes: according to the preset internal parameters in the at least two preset device parameters and the multiple The coordinates of the key point detection results in at least two forms obtain the offset angle; wherein, the preset internal parameters include device parameters corresponding to the at least two devices respectively.

在一種可能的實現方式中，所述根據所述待檢測幀進行所述目標對象中目標區域的關鍵點檢測，包括：根據所述目標對象在參考幀中的位置訊息，對所述待檢測幀中的所述目標對象的目標區域進行關鍵點檢測，得到與所述待檢測幀對應的關鍵點檢測結果，其中，所述參考幀為所述待檢測幀所屬的目標影片中，位於所述待檢測幀之前的影片幀。In a possible implementation manner, the performing the key point detection of the target area in the target object according to the frame to be detected includes: according to the position information of the target object in the reference frame, The key point detection is performed on the target area of the target object, and the key point detection result corresponding to the frame to be detected is obtained, wherein the reference frame is the target film to which the frame to be detected belongs, and is located in the frame to be detected Detects the movie frame before the frame.

在一種可能的實現方式中，所述根據所述目標對象在參考幀中的位置訊息，對所述待檢測幀中的所述目標對象的目標區域進行關鍵點檢測，得到與所述待檢測幀對應的關鍵點檢測結果，包括：根據所述參考幀中所述目標對象的第一位置，對所述待檢測幀進行裁剪，得到裁剪結果；對所述裁剪結果中的目標對象的目標區域進行關鍵點檢測，得到與所述待檢測幀對應的關鍵點檢測結果。In a possible implementation manner, according to the position information of the target object in the reference frame, the key point detection is performed on the target area of the target object in the frame to be detected, and the key point detection corresponding to the frame to be detected is obtained. The corresponding key point detection result includes: clipping the frame to be detected according to the first position of the target object in the reference frame to obtain a clipping result; Key point detection, obtaining a key point detection result corresponding to the frame to be detected.

在一種可能的實現方式中，所述根據所述目標對象在參考幀中的位置訊息，對所述待檢測幀中的所述目標對象的目標區域進行關鍵點檢測，得到與所述待檢測幀對應的關鍵點檢測結果，包括：獲取所述目標對象的目標區域在所述參考幀中的第二位置；根據所述第二位置，對所述待檢測幀進行裁剪，得到裁剪結果；對所述裁剪結果中的目標對象的目標區域進行關鍵點檢測，得到與所述待檢測幀對應的關鍵點檢測結果。In a possible implementation manner, according to the position information of the target object in the reference frame, the key point detection is performed on the target area of the target object in the frame to be detected, and the key point detection corresponding to the frame to be detected is obtained. The corresponding key point detection result includes: obtaining a second position of the target area of the target object in the reference frame; clipping the frame to be detected according to the second position to obtain a clipping result; The key point detection is performed on the target area of the target object in the clipping result, and the key point detection result corresponding to the frame to be detected is obtained.

在一種可能的實現方式中，所述獲取所述目標對象的目標區域在所述參考幀中的第二位置，包括：通過第一神經網路對所述參考幀中的目標區域進行識別，得到所述第一神經網路輸出的第二位置；和/或，根據所述參考幀對應的關鍵點檢測結果，得到所述目標區域在所述參考幀中的第二位置。In a possible implementation manner, the obtaining the second position of the target area of the target object in the reference frame includes: using a first neural network to identify the target area in the reference frame to obtain The second position output by the first neural network; and/or, according to the key point detection result corresponding to the reference frame, the second position of the target area in the reference frame is obtained.

在一種可能的實現方式中，所述方法還包括：根據所述目標對象的深度訊息，確定所述目標對象在三維空間中的位置。In a possible implementation manner, the method further includes: determining a position of the target object in a three-dimensional space according to depth information of the target object.

根據本發明的一方面，提供了一種深度檢測裝置，包括：According to an aspect of the present invention, a depth detection device is provided, comprising:

獲取模組，用於獲取多幀待檢測幀，其中，所述多幀待檢測幀包括從至少兩個採集視角對目標對象進行圖像採集所得到的圖像幀；關鍵點檢測模組，用於根據所述待檢測幀進行所述目標對象中目標區域的關鍵點檢測，確定與所述多幀待檢測幀對應的多個關鍵點檢測結果，其中，所述目標區域包括頭部區域和/或肩部區域；深度檢測模組，用於根據所述多個關鍵點檢測結果，確定所述目標對象的深度訊息。The acquisition module is used to acquire multiple frames to be detected, wherein the multiple frames to be detected include image frames obtained by collecting images of the target object from at least two acquisition angles of view; the key point detection module is used to Perform key point detection of the target area in the target object according to the frame to be detected, and determine a plurality of key point detection results corresponding to the multiple frames to be detected, wherein the target area includes a head area and/or or the shoulder area; a depth detection module, configured to determine the depth information of the target object according to the detection results of the plurality of key points.

在一種可能的實現方式中，所述深度檢測模組用於：獲取至少兩個採集設備分別對應的至少兩個預設設備參數，所述至少兩個採集設備用於從至少兩個採集視角對所述目標對象進行圖像採集；根據所述至少兩個預設設備參數以及所述多個關鍵點檢測結果，確定所述待檢測幀中所述目標對象的深度訊息。In a possible implementation manner, the depth detection module is configured to: acquire at least two preset device parameters respectively corresponding to at least two acquisition devices, and the at least two acquisition devices are used to measure performing image acquisition on the target object; determining depth information of the target object in the frame to be detected according to the at least two preset device parameters and the plurality of key point detection results.

在一種可能的實現方式中，所述深度訊息包括深度距離，所述深度距離包括所述目標對象與採集設備的光心之間的距離；所述深度檢測模組進一步用於：根據所述至少兩個預設設備參數中的預設外部參數以及所述多個關鍵點檢測結果在至少兩個形式下的坐標，得到所述深度距離；其中，所述預設外部參數包括所述至少兩個採集設備之間形成的相對參數。In a possible implementation manner, the depth information includes a depth distance, and the depth distance includes a distance between the target object and the optical center of the acquisition device; the depth detection module is further configured to: according to at least The preset external parameters among the two preset device parameters and the coordinates of the plurality of key point detection results in at least two forms obtain the depth distance; wherein, the preset external parameters include the at least two Collect relative parameters formed between devices.

在一種可能的實現方式中，所述深度訊息包括偏移角度，所述偏移角度包括所述目標對象相對於所述採集設備的光軸的空間角度；所述深度檢測模組進一步用於：根據所述至少兩個預設設備參數中的預設內部參數以及所述多個關鍵點檢測結果在至少兩個形式下的坐標，得到所述偏移角度；其中，所述預設內部參數包括所述至少兩個設備分別對應的設備參數。In a possible implementation manner, the depth information includes an offset angle, and the offset angle includes a spatial angle of the target object relative to the optical axis of the acquisition device; the depth detection module is further configured to: According to the preset internal parameters in the at least two preset device parameters and the coordinates of the plurality of key point detection results in at least two forms, the offset angle is obtained; wherein the preset internal parameters include Device parameters respectively corresponding to the at least two devices.

在一種可能的實現方式中，所述關鍵點檢測模組用於：根據所述目標對象在參考幀中的位置訊息，對所述待檢測幀中的所述目標對象的目標區域進行關鍵點檢測，得到與所述待檢測幀對應的關鍵點檢測結果，其中，所述參考幀為所述待檢測幀所屬的目標影片中，位於所述待檢測幀之前的影片幀。In a possible implementation manner, the key point detection module is configured to: perform key point detection on the target area of the target object in the frame to be detected according to the position information of the target object in the reference frame , to obtain a key point detection result corresponding to the frame to be detected, wherein the reference frame is a movie frame before the frame to be detected in the target movie to which the frame to be detected belongs.

在一種可能的實現方式中，所述關鍵點檢測模組進一步用於：根據所述參考幀中所述目標對象的第一位置，對所述待檢測幀進行裁剪，得到裁剪結果；對所述裁剪結果中的目標對象的目標區域進行關鍵點檢測，得到與所述待檢測幀對應的關鍵點檢測結果。In a possible implementation manner, the key point detection module is further configured to: clip the frame to be detected according to the first position of the target object in the reference frame to obtain a clipping result; The key point detection is performed on the target area of the target object in the clipping result, and the key point detection result corresponding to the frame to be detected is obtained.

在一種可能的實現方式中，所述關鍵點檢測模組進一步用於：獲取所述目標對象的目標區域在所述參考幀中的第二位置；根據所述第二位置，對所述待檢測幀進行裁剪，得到裁剪結果；對所述裁剪結果中的目標對象的目標區域進行關鍵點檢測，得到與所述待檢測幀對應的關鍵點檢測結果。In a possible implementation manner, the key point detection module is further configured to: acquire a second position of the target area of the target object in the reference frame; Clipping the frame to obtain a clipping result; performing key point detection on the target area of the target object in the clipping result to obtain a key point detection result corresponding to the frame to be detected.

在一種可能的實現方式中，所述關鍵點檢測模組進一步用於：通過第一神經網路對所述參考幀中的目標區域進行識別，得到所述第一神經網路輸出的第二位置；和/或，根據所述參考幀對應的關鍵點檢測結果，得到所述目標區域在所述參考幀中的第二位置。In a possible implementation manner, the key point detection module is further configured to: use the first neural network to identify the target area in the reference frame, and obtain the second position output by the first neural network and/or, according to the key point detection result corresponding to the reference frame, obtain the second position of the target area in the reference frame.

在一種可能的實現方式中，所述裝置還用於：根據所述目標對象的深度訊息，確定所述目標對象在三維空間中的位置。In a possible implementation manner, the device is further configured to: determine the position of the target object in a three-dimensional space according to the depth information of the target object.

根據本發明的一方面，提供了一種電子設備，包括：處理器；用於儲存處理器可執行指令的記憶體；其中，所述處理器被配置為調用所述記憶體儲存的指令，以執行上述方法。According to an aspect of the present invention, there is provided an electronic device, including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to call the instructions stored in the memory to execute the above method.

根據本發明的一方面，提供了一種電腦可讀儲存媒體，其上儲存有電腦程式指令，所述電腦程式指令被處理器執行時實現上述方法。According to one aspect of the present invention, a computer-readable storage medium is provided, on which computer program instructions are stored, and the above-mentioned method is implemented when the computer program instructions are executed by a processor.

根據本發明的一方面，提供了一種電腦程式產品，包括電腦可讀代碼，當所述電腦可讀代碼在電子設備中運行時，所述電子設備中的處理器執行上述方法。According to one aspect of the present invention, a computer program product is provided, including computer readable codes, and when the computer readable codes are run in an electronic device, a processor in the electronic device executes the above method.

在本發明實施例中，通過獲取從至少兩個採集視角下採集到的多幀待檢測幀，根據待檢測幀進行目標區域的關鍵點檢測，確定多幀待檢測幀對應的多個關鍵點檢測結果，並基於多個關鍵點檢測結果，確定目標對象的深度訊息，通過本發明實施例，可以通過至少兩個採集視角下所採集的多幀待檢測幀所形成的視差，利用多幀待檢測幀中目標區域對應的多個關鍵點檢測結果，實現基於視差的計算來得到深度訊息，有效減小基於視差進行計算的過程中所處理的數據量，提高深度檢測的效率和精度。In the embodiment of the present invention, by acquiring multiple frames to be detected from at least two acquisition angles of view, and performing key point detection of the target area according to the frames to be detected, multiple key point detections corresponding to multiple frames to be detected are determined As a result, and based on the detection results of multiple key points, the depth information of the target object is determined. Through the embodiment of the present invention, the parallax formed by the multiple frames to be detected collected under at least two acquisition angles can be used to utilize the multiple frames to be detected. The detection results of multiple key points corresponding to the target area in the frame realize the calculation based on parallax to obtain depth information, effectively reduce the amount of data processed in the process of calculation based on parallax, and improve the efficiency and accuracy of depth detection.

應當理解的是，以上的一般描述和後文的細節描述僅是示例性和解釋性的，而非限制本發明。根據下面參考附圖對示例性實施例的詳細說明，本發明的其它特徵及方面將變得清楚。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention. Other features and aspects of the present invention will become apparent from the following detailed description of exemplary embodiments with reference to the accompanying drawings.

以下將參考附圖詳細說明本發明的各種示例性實施例、特徵和方面。附圖中相同的附圖標記表示功能相同或相似的元件。儘管在附圖中示出了實施例的各種方面，但是除非特別指出，不必按比例繪製附圖。Various exemplary embodiments, features, and aspects of the invention will be described in detail below with reference to the accompanying drawings. The same reference numbers in the figures indicate functionally identical or similar elements. While various aspects of the embodiments are shown in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

在這裡專用的詞“示例性”意為“用作例子、實施例或說明性”。這裡作為“示例性”所說明的任何實施例不必解釋為優於或好於其它實施例。The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration." Any embodiment described herein as "exemplary" is not necessarily to be construed as superior or better than other embodiments.

本文中術語“和/或”，僅僅是一種描述關聯對象的關聯關係，表示可以存在三種關係，例如，A和/或B，可以表示：單獨存在A，同時存在A和B，單獨存在B這三種情況。另外，本文中術語“至少一種”表示多種中的任意一種或多種中的至少兩種的任意組合，例如，包括A、B、C中的至少一種，可以表示包括從A、B和C構成的集合中選擇的任意一個或多個元素。The term "and/or" in this article is just an association relationship describing associated objects, which means that there can be three relationships, for example, A and/or B can mean: A exists alone, A and B exist at the same time, and B exists alone. three conditions. In addition, the term "at least one" herein means any one of a variety or any combination of at least two of the more, for example, including at least one of A, B, and C, may mean including the composition of A, B, and C Any one or more elements selected in the collection.

另外，為了更好地說明本發明，在下文的具體實施方式中給出了眾多的具體細節。本領域技術人員應當理解，沒有某些具體細節，本發明同樣可以實施。在一些實例中，對於本領域技術人員熟知的方法、手段、元件和電路未作詳細描述，以便於凸顯本發明的主旨。In addition, in order to better illustrate the present invention, numerous specific details are given in the specific embodiments below. It will be understood by those skilled in the art that the present invention may be practiced without certain of the specific details. In some instances, methods, means, components and circuits well known to those skilled in the art have not been described in detail in order to highlight the gist of the present invention.

圖1示出根據本發明實施例的深度檢測方法的流程圖。該方法可以由深度檢測裝置執行，深度檢測裝置可以是終端設備或伺服器等電子設備，終端設備可以為用戶設備（User Equipment，UE）、移動設備、用戶終端、終端、行動電話、室內無線電話、個人數位助理（Personal Digital Assistant，PDA）、手持設備、計算設備、車載設備、可穿戴設備等。在一些可能的實現方式中，該方法可以通過處理器調用記憶體中儲存的電腦可讀指令的方式來實現。或者，可以通過伺服器執行該方法。如圖1所示，該方法可以包括：Fig. 1 shows a flowchart of a depth detection method according to an embodiment of the present invention. The method can be performed by a depth detection device, and the depth detection device can be an electronic device such as a terminal device or a server, and the terminal device can be user equipment (User Equipment, UE), mobile device, user terminal, terminal, mobile phone, indoor wireless phone , Personal Digital Assistant (PDA), handheld devices, computing devices, vehicle-mounted devices, wearable devices, etc. In some possible implementation manners, the method may be implemented by the processor calling computer-readable instructions stored in the memory. Alternatively, the method may be performed by a server. As shown in Figure 1, the method may include:

步驟S11，獲取多幀待檢測幀，其中，多幀待檢測幀包括從至少兩個採集視角對目標對象進行圖像採集所得到的圖像幀。Step S11 , acquiring multiple frames to be detected, wherein the multiple frames to be detected include image frames obtained by collecting images of the target object from at least two collection angles of view.

其中，待檢測幀可以是具有深度檢測需求的任意圖像幀，比如可以是從拍攝的影片中提取的圖像幀，或是拍攝圖像得到的圖像幀等。多幀待檢測幀的數量在本發明實施例中不做限制，可以包含兩幀或兩幀以上。Wherein, the frame to be detected may be any image frame that requires depth detection, for example, it may be an image frame extracted from a filmed film, or an image frame obtained by shooting an image. The number of multiple frames to be detected is not limited in this embodiment of the present invention, and may include two or more frames.

採集視角可以為對目標對象進行圖像採集的角度，不同的待檢測幀可以通過設置在不同採集視角下的圖像採集設備進行採集，也可以通過相同的圖像設備在不同的採集視角下進行採集。The acquisition angle of view can be the angle of image acquisition of the target object. Different frames to be detected can be collected by image acquisition devices set at different acquisition angles of view, or can be collected by the same image device under different acquisition angles of view. collection.

待檢測幀中包含待進行深度檢測的目標對象，目標對象的類型在本發明實施例中不做限制，可以包括各類人物對象、動物對象或是部分機械對象，比如機器人等。後續各公開實施例均以目標對象為人物對象為例進行說明，目標對象為其他類型的實現方式可以參考後續各公開實施例進行靈活擴展，不再一一闡述。The frame to be detected includes the target object to be subjected to depth detection. The type of the target object is not limited in the embodiment of the present invention, and may include various human objects, animal objects or some mechanical objects, such as robots. Subsequent disclosed embodiments are described by taking the target object as a person object as an example. Implementations in which the target object is other types can be flexibly expanded by referring to the subsequent disclosed embodiments, and will not be elaborated one by one.

待檢測幀中包含的目標對象數量在本發明實施例中同樣不做限制，可以包含一個或多個目標對象，根據實際情況靈活決定。The number of target objects contained in the frame to be detected is also not limited in this embodiment of the present invention, and may contain one or more target objects, which can be flexibly determined according to actual conditions.

獲取多幀待檢測幀的方式在本發明實施例中也不做限制，在一種可能的實現方式中，可以從一個或多個影片中進行幀提取以得到多幀待檢測幀，其中，幀提取可以包括逐幀提取、按照一定的間隔進行幀採樣或是隨機幀採樣等一種或多種方式。在一種可能的實現方式中，也可以對目標對象進行多角度的圖像採集來得到多幀待檢測幀；在一些可能的實現方式中，還可以從數據庫中讀取以得到不同採集視角下的多幀待檢測幀等。The method of obtaining multiple frames to be detected is not limited in the embodiment of the present invention. In a possible implementation, frame extraction may be performed from one or more films to obtain multiple frames to be detected, wherein the frame extraction It may include one or more methods such as frame-by-frame extraction, frame sampling at a certain interval, or random frame sampling. In a possible implementation, it is also possible to collect images from multiple angles of the target object to obtain multiple frames to be detected; Multiple frames to be detected, etc.

步驟S12，根據待檢測幀進行目標對象中目標區域的關鍵點檢測，確定與多幀待檢測幀對應的多個關鍵點檢測結果。Step S12 , performing key point detection of the target area in the target object according to the frame to be detected, and determining multiple key point detection results corresponding to multiple frames to be detected.

其中，關鍵點檢測結果可以包括檢測到的關鍵點在待檢測幀中的位置。其中，檢測到的關鍵點數量和類型可以根據實際情況靈活決定，在一些可能的實現方式中，檢測到的關鍵點數量可以包括2~150個等，在一個示例中，檢測到的關鍵點可以包含人體的14個肢體關鍵點（如頭部關鍵點、肩部關鍵點、頸部關鍵點、手肘關鍵點、手腕關鍵點、胯部關鍵點、腿部關鍵點以及足部關鍵點等），或是包含人體外圍輪廓上的59個輪廓關鍵點（如頭部外圍或是肩部外圍上的一些關鍵點）等。在一種可能的實現方式中，為了減小計算量，檢測到的關鍵點也可以僅包含頭部關鍵點、左肩關鍵點以及右肩關鍵點共三個關鍵點。Wherein, the key point detection result may include the position of the detected key point in the frame to be detected. Among them, the number and types of detected key points can be flexibly determined according to the actual situation. In some possible implementations, the number of detected key points can include 2 to 150, etc. In one example, the detected key points can be Contains 14 body key points of the human body (such as head key points, shoulder key points, neck key points, elbow key points, wrist key points, crotch key points, leg key points, foot key points, etc.) , or include 59 contour key points on the contour of the human body (such as some key points on the periphery of the head or the periphery of the shoulders), etc. In a possible implementation manner, in order to reduce the amount of calculation, the detected key points may also only include three key points including the key point of the head, the key point of the left shoulder and the key point of the right shoulder.

多個關鍵點檢測結果可以分別與多幀待檢測幀相對應，舉例來說，對多幀待檢測幀分別進行關鍵點檢測，則每幀待檢測幀可以對應一個關鍵點檢測結果，從而可以得到多個關鍵點檢測結果。Multiple key point detection results can correspond to multiple frames to be detected respectively. For example, if key point detection is performed on multiple frames to be detected, each frame to be detected can correspond to a key point detection result, so that it can be obtained Multiple keypoint detection results.

目標區域可以包括頭部區域和/或肩部區域，目標對象的頭部區域可以是目標對象頭部所在的區域，比如頭部關鍵點和頸部關鍵點之間所構成的區域；肩部區域則可以是目標對象肩頸部所在的區域，比如頸部關鍵點和肩部關鍵點之間所構成的區域。The target area may include a head area and/or a shoulder area, and the head area of the target object may be the area where the head of the target object is located, such as the area formed between the key points of the head and the key points of the neck; the shoulder area Then it may be the area where the shoulder and neck of the target object are located, such as the area formed between the key points of the neck and the key points of the shoulder.

圖2示出根據本發明實施例的目標區域的示意圖，如圖2所示，在一種可能的實現方式中，在目標區域包括頭部區域和肩部區域的情況下，可以將頭部關鍵點、左肩關鍵點和右肩關鍵點連接而成的頭肩框，作為目標區域。在一個示例中，頭肩框可以是如圖2所示的矩形，從圖2中可以看出，頭肩框可以通過連接目標對象頭部頂點的頭部關鍵點、左肩關節處的左肩關鍵點和右肩關節處的右肩關鍵點所得到。在一個示例中，頭肩框也可以為其他性狀，比如多邊形、圓形或是其他不規則的形狀等。Fig. 2 shows a schematic diagram of the target area according to an embodiment of the present invention. As shown in Fig. 2 , in a possible implementation, when the target area includes the head area and the shoulder area, the key points of the head can be , the left shoulder key point and the right shoulder key point are connected as the head and shoulders frame as the target area. In one example, the head-shoulders frame can be a rectangle as shown in Figure 2. It can be seen from Figure 2 that the head-shoulders frame can be connected to the head key point at the head vertex of the target object and the left shoulder key point at the left shoulder joint. and the right shoulder key point at the right shoulder joint. In an example, the head-and-shoulders frame may also be in other shapes, such as polygons, circles, or other irregular shapes.

關鍵點檢測的方式可以根據實際情況靈活決定，在一種可能的實現方式中，可以將待檢測幀輸入具有關鍵點檢測功能的任意神經網路以實現關鍵點檢測；在一些可能的實現方式中，也可以通過相關的關鍵點識別算法，對待檢測幀進行關鍵點識別以得到關鍵點檢測結果；在一些可能的實現方式中，還可以根據目標對象或目標對象中的目標區域在待檢測幀中的位置，對待檢測幀中的部分圖像區域進行關鍵點檢測，以得到關鍵點檢測結果等。步驟S12的一些可能的具體實現方式可以詳見下述各公開實施例，在此先不做展開。The way of key point detection can be flexibly determined according to the actual situation. In one possible implementation, the frame to be detected can be input into any neural network with key point detection function to realize key point detection; in some possible implementations, It is also possible to perform key point identification on the frame to be detected through a relevant key point identification algorithm to obtain a key point detection result; Position, perform key point detection on a part of the image area in the frame to be detected to obtain key point detection results, etc. For some possible specific implementations of step S12, reference may be made to the following disclosed embodiments in detail, which will not be expanded here.

步驟S13，根據多個關鍵點檢測結果，確定待檢測幀中目標對象的深度訊息。Step S13 , according to the multiple key point detection results, determine the depth information of the target object in the frame to be detected.

其中，深度訊息包含的訊息內容可以根據實際情況靈活決定，任何可以反映目標對象在三維空間中的深度情況的訊息，均可以作為深度訊息的實現方式。在一種可能的實現方式中，深度訊息可以包括深度距離和/或偏移角度。The content of the information included in the depth information can be flexibly determined according to the actual situation, and any information that can reflect the depth of the target object in the three-dimensional space can be used as the implementation of the depth information. In a possible implementation manner, the depth information may include a depth distance and/or an offset angle.

深度距離可以是目標對象與採集設備之間的距離，採集設備可以是對目標對象進行圖像採集的任意設備，在一些可能的實現方式中，該採集設備可以是靜態圖像的採集設備，如照相機等；在一些可能的實現方式中，該採集設備也可以是採集動態圖像的設備，比如攝影機或是攝影鏡頭等。The depth distance can be the distance between the target object and the collection device, and the collection device can be any device that collects images of the target object. In some possible implementations, the collection device can be a static image collection device, such as A camera, etc.; in some possible implementation manners, the acquisition device may also be a device for acquiring dynamic images, such as a video camera or a photographic lens.

如上述公開實施例所述，不同的待檢測幀可以通過設置在不同採集視角下的圖像採集設備進行採集，也可以通過相同的圖像設備在不同的採集視角下進行採集，因此，採集設備的數量可以為一個或多個。在一種可能的實現方式中，本發明實施例提出的深度檢測方法，可以基於至少兩個採集設備來實現，在這種情況下，至少兩個採集設備可以從至少兩個採集視角對目標對象進行圖像採集，以得到多幀待檢測幀。As described in the above disclosed embodiments, different frames to be detected can be collected by image acquisition devices set at different acquisition angles of view, or can be acquired by the same image device at different acquisition angles of view. Therefore, the acquisition device The number of can be one or more. In a possible implementation manner, the depth detection method proposed in the embodiment of the present invention can be implemented based on at least two acquisition devices. In this case, the at least two acquisition devices can detect the target object from at least two acquisition perspectives. Image acquisition to obtain multiple frames to be detected.

在採集設備包括至少兩個採集設備的情況下，不同的採集設備的類型可以相同，也可以不同，根據實際情況靈活選擇即可，在本發明實施例中不做限制。In the case where the collection device includes at least two collection devices, the types of different collection devices may be the same or different, which can be flexibly selected according to the actual situation, and there is no limitation in this embodiment of the present invention.

深度距離可以是目標對象與採集設備之間的距離，該距離可以是目標對象與採集設備整體之間的距離，也可以是目標對象與採集設備的某個設備部件之間的距離，在一些可能的實現方式中，可以將目標對象與採集設備的光心之間的距離，作為深度距離。The depth distance can be the distance between the target object and the collection device, the distance can be the distance between the target object and the collection device as a whole, or the distance between the target object and a certain equipment part of the collection device, in some possible In an implementation manner of , the distance between the target object and the optical center of the acquisition device may be used as the depth distance.

偏移角度可以是目標對象相對於採集設備的偏移角度，在一種可能的實現方式中，該偏移角度可以是目標對象相對於採集設備的光軸的空間角度。The offset angle may be an offset angle of the target object relative to the collection device, and in a possible implementation manner, the offset angle may be a spatial angle of the target object relative to the optical axis of the collection device.

由於多個關鍵點檢測結果可以與多幀待檢測幀相對應，而多幀待檢測幀可以通過從至少兩個採集視角對目標對象進行圖像採集所得到，因此，基於多個關鍵點檢測結果，可以確定多幀待檢測幀之間所形成的視差，繼而可以實現基於視差的深度訊息計算，得到目標對象的深度訊息。其中，基於關鍵點檢測結果所實現的基於視差的計算方式可以根據實際情況靈活決定，任何基於視差實現深度測距的方式均可以用於步驟S13的實現過程中，詳見下述各公開實施例，在此先不做展開。Since multiple key point detection results can correspond to multiple frames to be detected, and multiple frames to be detected can be obtained by collecting images of the target object from at least two acquisition angles of view, therefore, based on multiple key point detection results , the disparity formed between multiple frames to be detected can be determined, and then the depth information calculation based on the disparity can be realized to obtain the depth information of the target object. Among them, the parallax-based calculation method based on the key point detection results can be flexibly determined according to the actual situation. Any method for realizing depth ranging based on parallax can be used in the implementation process of step S13. For details, see the following disclosed embodiments. , do not expand here.

在一種可能的實現方式中，步驟S12可以包括：In a possible implementation manner, step S12 may include:

根據目標對象在參考幀中的位置訊息，對待檢測幀中的目標對象的目標區域進行關鍵點檢測，得到與待檢測幀對應的關鍵點檢測結果。According to the position information of the target object in the reference frame, key point detection is performed on the target area of the target object in the frame to be detected, and a key point detection result corresponding to the frame to be detected is obtained.

其中，參考幀可以是目標影片中位於待檢測幀之前的影片幀，目標影片可以是包含待檢測幀的影片。在一些可能的實現方式中，不同的待檢測幀可以分別屬於不同的目標影片，在這種情況下，不同的待檢測幀對應的參考幀也可以不同。Wherein, the reference frame may be a film frame located before the frame to be detected in the target film, and the target film may be a film containing the frame to be detected. In some possible implementation manners, different frames to be detected may respectively belong to different target films. In this case, different frames to be detected may correspond to different reference frames.

在一些可能的實現方式中，參考幀可以是目標影片中待檢測幀的前一幀，在一些可能的實現方式中，參考幀也可以是目標影片中，位於待檢測幀以前且與待檢測幀之間的距離不超過預設距離的影片幀，預設距離的數量可以根據實際情況靈活決定，可以是間隔一幀或多幀等，在本發明實施例中不做限定。In some possible implementations, the reference frame can be the previous frame of the frame to be detected in the target movie, and in some possible implementations, the reference frame can also be the frame in the target movie that is located before the frame to be detected and is the same as the frame to be detected The distance between the movie frames does not exceed the preset distance, the number of preset distances can be flexibly determined according to the actual situation, and can be one or more frames apart, which is not limited in the embodiment of the present invention.

由於參考幀位於待檢測幀之前，且與待檢測幀的距離不超過預設距離，因此參考幀中目標對象的位置，和待檢測幀中目標對象的位置可能較為接近，在這種情況下，根據目標對象在參考幀中的位置訊息，可以大致確定出待檢測幀中目標對象的位置訊息，在這種情況下，可以對待檢測幀中的目標對象的目標區域進行更有針對性的關鍵點檢測，且檢測的數據量也會較小，從而可以得到更為準確的關鍵點檢測結果，也可以提升關鍵點檢測的效率。Since the reference frame is located before the frame to be detected, and the distance from the frame to be detected does not exceed the preset distance, the position of the target object in the reference frame may be relatively close to the position of the target object in the frame to be detected. In this case, According to the position information of the target object in the reference frame, the position information of the target object in the frame to be detected can be roughly determined. In this case, the target area of the target object in the frame to be detected can be more targeted. Detection, and the amount of data detected will be smaller, so that more accurate key point detection results can be obtained, and the efficiency of key point detection can also be improved.

在一些可能的實現方式中，根據目標對象在參考幀中的位置訊息，對待檢測幀中的目標對象的目標區域進行關鍵點檢測的方式可以根據實際情況靈活決定，比如可以根據目標對象在參考幀中的位置訊息對待檢測幀進行裁剪後再進行關鍵點檢測，或是根據目標對象在參考幀中的位置訊息，直接對待檢測幀中對應位置的圖像區域進行關鍵點檢測等，各種可能的實現方式可以詳見下述各公開實施例，在此先不做展開。In some possible implementations, according to the position information of the target object in the reference frame, the key point detection method of the target area of the target object in the frame to be detected can be flexibly determined according to the actual situation, for example, according to the position information of the target object in the reference frame The position information in the to-be-detected frame is cropped and then the key point detection is performed, or according to the position information of the target object in the reference frame, the key point detection is directly performed on the image area corresponding to the position in the to-be-detected frame, etc., various possible implementations The methods can be referred to the following disclosed embodiments in detail, and will not be expanded here.

通過本發明實施例，可以根據目標對象在參考幀中的位置訊息，對待檢測幀中的目標區域實現更有針對性的關鍵點檢測，提升關鍵點檢測的效率和精度，從而提升深度檢測方法的效率和精度。Through the embodiment of the present invention, according to the position information of the target object in the reference frame, more targeted key point detection can be realized for the target area in the frame to be detected, and the efficiency and accuracy of key point detection can be improved, thereby improving the depth detection method. efficiency and precision.

在一種可能的實現方式中，根據目標對象在參考幀中的位置訊息，對待檢測幀中的目標對象的目標區域進行關鍵點檢測，得到與待檢測幀對應的關鍵點檢測結果，包括：In a possible implementation manner, according to the position information of the target object in the reference frame, the key point detection is performed on the target area of the target object in the frame to be detected, and the key point detection result corresponding to the frame to be detected is obtained, including:

根據所述參考幀中所述目標對象的第一位置，對待檢測幀進行裁剪，得到裁剪結果；clipping the frame to be detected according to the first position of the target object in the reference frame to obtain a clipping result;

對裁剪結果中的目標對象的目標區域進行關鍵點檢測，得到與待檢測幀對應的關鍵點檢測結果。The key point detection is performed on the target area of the target object in the clipping result, and the key point detection result corresponding to the frame to be detected is obtained.

其中，第一位置可以是參考幀中目標對象整體的位置坐標，比如在目標對象為人物對象的情況下，該第一位置可以是目標對象的人體框在參考幀中的位置坐標。Wherein, the first position may be the overall position coordinates of the target object in the reference frame. For example, if the target object is a person object, the first position may be the position coordinates of the target object's body frame in the reference frame.

根據第一位置對待檢測幀進行裁剪的方式在本發明實施例中同樣不做限制，不局限於下述各公開實施例。在一種可能的實現方式中，可以根據第一位置，確定參考幀中人體框的第一坐標，並結合參考幀和待檢測幀之間的位置坐標對應關係，確定目標對象的人體框在待檢測幀中的第二坐標，基於該第二坐標對待檢測幀進行裁剪以得到裁剪結果。The manner of clipping the frame to be detected according to the first position is also not limited in this embodiment of the present invention, and is not limited to the following disclosed embodiments. In a possible implementation, the first coordinates of the human body frame in the reference frame can be determined according to the first position, and combined with the position coordinate correspondence between the reference frame and the frame to be detected, it can be determined that the human body frame of the target object is in the frame to be detected The second coordinate in the frame, based on the second coordinate, the frame to be detected is clipped to obtain a clipping result.

在一些可能的實現方式中，也可以根據第一位置，確定參考幀中人體框的第一坐標，以及人體框的邊框長度，並結合參考幀和待檢測幀之間的位置坐標對應關係，確定目標對象的人體框在待檢測幀中的第二坐標，基於該第二坐標和邊框長度來對待檢測幀進行裁剪以得到裁剪結果，其中，基於第二坐標和邊框長度的裁剪，可以是根據第二坐標確定裁剪端點的位置，並邊框長度確定裁剪結果的長度，在一個示例中，裁剪結果的長度可以與邊框長度一致，在一個示例中，裁剪結果的長度也可以與邊框長度成比例，比如為邊框長度的N倍等，N可以為不小於1的任意數值等。In some possible implementations, the first coordinates of the body frame in the reference frame and the border length of the body frame can also be determined according to the first position, and combined with the position coordinate correspondence between the reference frame and the frame to be detected, determine The second coordinates of the human body frame of the target object in the frame to be detected, and the frame to be detected is cropped based on the second coordinates and the frame length to obtain a clipping result, wherein, the clipping based on the second coordinates and the frame length can be based on the first The two coordinates determine the position of the clipping endpoint, and the frame length determines the length of the clipping result. In one example, the length of the clipping result can be consistent with the frame length. In one example, the length of the clipping result can also be proportional to the frame length. For example, it is N times the length of the frame, and N can be any value not less than 1.

對裁剪結果中的目標對象進行關鍵點檢測的方式可以根據實際情況靈活決定，詳見下述各公開實施例，在此先不做展開。The manner of performing key point detection on the target object in the clipping result can be flexibly determined according to the actual situation. For details, please refer to the following disclosed embodiments, which will not be expanded here.

通過本發明實施例，可以根據參考幀中目標對象的第一位置，對待檢測幀中的目標對象進行初步定位，得到裁剪結果，基於該裁剪結果進行目標區域的關鍵點檢測，一方面可以減小檢測的數據量，提高檢測效率，另一方面由於裁剪後目標對象在裁剪結果中所占的比例較大，因此可以提升關鍵點檢測的精度。Through the embodiment of the present invention, the target object in the frame to be detected can be preliminarily positioned according to the first position of the target object in the reference frame, and the clipping result is obtained, and the key point detection of the target area is performed based on the clipping result, on the one hand, it can reduce The amount of detected data improves the detection efficiency. On the other hand, since the target object accounts for a large proportion in the cropped result after cropping, the accuracy of key point detection can be improved.

獲取目標對象的目標區域在參考幀中的第二位置；Acquiring a second position of the target area of the target object in the reference frame;

根據第二位置，對待檢測幀進行裁剪，得到裁剪結果；Cutting the frame to be detected according to the second position to obtain a cutting result;

對裁剪結果中的目標對象進行關鍵點檢測，得到關鍵點檢測結果。Perform key point detection on the target object in the clipping result to obtain the key point detection result.

其中，第二位置可以是目標對象的目標區域在參考幀中的位置坐標，如上述各公開實施例所述，目標區域可以包括頭部區域和/或肩部區域，故在一種可能的實現方式中，該第二位置可以是目標對象的頭肩框在參考幀中的位置坐標。Wherein, the second position may be the position coordinates of the target area of the target object in the reference frame. As described in the above disclosed embodiments, the target area may include the head area and/or the shoulder area, so in a possible implementation manner In , the second position may be the position coordinates of the head and shoulders frame of the target object in the reference frame.

如何確定目標區域在參考幀中的第二位置，其實現形式可以根據實際情況靈活決定，比如可以通過對參考幀進行頭肩框和/或關鍵點識別等方式進行實現，詳見下述各公開實施例，在此先不做展開。How to determine the second position of the target area in the reference frame, the implementation form can be flexibly determined according to the actual situation, for example, it can be realized by performing head and shoulder frame and/or key point recognition on the reference frame, see the following publications for details Embodiment, do not expand here.

根據第二位置對待檢測幀進行裁剪的方式，可以參考根據第一位置對待檢測幀進行裁剪的方式，在此不再贅述。For the manner of clipping the frame to be detected according to the second position, reference may be made to the manner of clipping the frame to be detected according to the first position, which will not be repeated here.

對裁剪結果中的目標對象進行關鍵點檢測的方式，可以與根據第一位置所得到的裁剪結果進行關鍵點檢測的方式相同，也可以不同，詳見下述各公開實施例，在此先不做展開。The key point detection method for the target object in the clipping result can be the same as the key point detection method based on the clipping result obtained at the first position, or it can be different. Do unfold.

通過本發明實施例，可以根據參考幀中目標對象的目標區域所在的第二位置來得到關鍵點檢測結果，這種方式可以更為針對性地關注目標區域，從而進一步減小數據的處理量，從而更進一步地提升了深度檢測的精度和效率。Through the embodiment of the present invention, the key point detection result can be obtained according to the second position of the target area of the target object in the reference frame. In this way, the target area can be more targeted, thereby further reducing the amount of data processing. Therefore, the accuracy and efficiency of depth detection are further improved.

在一種可能的實現方式中，獲取目標對象的目標區域在參考幀中的第二位置，可以包括：In a possible implementation manner, obtaining the second position of the target area of the target object in the reference frame may include:

通過第一神經網路對參考幀中的目標區域進行識別，得到第一神經網路輸出的第二位置；和/或，Identifying the target area in the reference frame through the first neural network to obtain the second position output by the first neural network; and/or,

根據參考幀對應的關鍵點檢測結果，得到目標區域在參考幀中的第二位置。According to the key point detection result corresponding to the reference frame, the second position of the target area in the reference frame is obtained.

其中，第一神經網路可以是用於確定第二位置的任意網路，其實現形式在本發明實施例中不做限制。在一些可能的實現方式中，第一神經網路可以是目標區域檢測網路，用於直接從參考幀中識別目標區域的第二位置，在一個示例中，該目標區域檢測網路可以是更快的基於區域的卷積神經網路（Faster Regions with Convolutional Neural Networks，Faster RCNN）；在一些可能的實現方式中，第一神經網路也可以是關鍵點檢測網路，用於對參考幀中的一個或多個關鍵點進行識別，繼而根據識別到的關鍵點位置，確定參考幀中目標區域的第二位置。Wherein, the first neural network may be any network used to determine the second position, and its implementation form is not limited in this embodiment of the present invention. In some possible implementations, the first neural network may be an object area detection network for identifying the second location of the object area directly from a reference frame, and in one example, the object area detection network may be more Faster region-based convolutional neural network (Faster Regions with Convolutional Neural Networks, Faster RCNN); in some possible implementations, the first neural network can also be a key point detection network for Identify one or more key points, and then determine the second position of the target area in the reference frame according to the positions of the identified key points.

在一些可能的實現方式中，參考幀也可能作為待檢測幀進行深度檢測，在這種情況下，參考幀可能已經經歷過關鍵點檢測並得到對應的關鍵點檢測結果。因此，在一些可能的實現方式中，可以根據參考幀對應的關鍵點檢測結果，來得到目標區域在參考幀中的第二位置。In some possible implementation manners, the reference frame may also be used as the frame to be detected for depth detection. In this case, the reference frame may have undergone key point detection and a corresponding key point detection result has been obtained. Therefore, in some possible implementation manners, the second position of the target area in the reference frame may be obtained according to the key point detection result corresponding to the reference frame.

在一些可能的實現方式中，也可以直接對參考幀進行關鍵點檢測以得到關鍵點檢測結果，關鍵點檢測的方式可以參考其他各公開實施例，在此不再贅述。In some possible implementation manners, the key point detection may also be directly performed on the reference frame to obtain the key point detection result. For the key point detection method, reference may be made to other disclosed embodiments, which will not be repeated here.

通過本發明實施例，可以根據參考幀的實際情況，靈活地採用多種方式確定目標區域在參考幀中的第二位置，提升了深度檢測的靈活性和通用性；而且在一些可能的實現方式中，在位於待檢測幀以前的參考幀參與過深度檢測的情況下，可以直接基於參考幀在深度檢測中得到的中間結果來確定第二位置，從而減小數據的重複計算，提升深度檢測的效率和精度。Through the embodiment of the present invention, the second position of the target area in the reference frame can be flexibly determined in multiple ways according to the actual situation of the reference frame, which improves the flexibility and versatility of depth detection; and in some possible implementations , when the reference frame before the frame to be detected has participated in the depth detection, the second position can be determined directly based on the intermediate result of the reference frame in the depth detection, thereby reducing the repeated calculation of data and improving the efficiency of depth detection and precision.

在一種可能的實現方式中，對裁剪結果中的目標對象進行關鍵點檢測，得到關鍵點檢測結果，可以包括：In a possible implementation manner, the key point detection is performed on the target object in the clipping result to obtain the key point detection result, which may include:

通過第二神經網路對裁剪結果中的目標對象進行關鍵點檢測，得到關鍵點檢測結果。The second neural network is used to perform key point detection on the target object in the clipping result to obtain a key point detection result.

其中，第二神經網路可以是用於實現關鍵點檢測的任意神經網路，其實現方式在本發明實施例中不做限制，其中，在第一神經網路可以是關鍵點檢測網路的情況下，第二神經網路可以與第一神經網路的實現方式相同或不同。Wherein, the second neural network can be any neural network used to realize key point detection, and its implementation mode is not limited in the embodiment of the present invention, wherein, the first neural network can be a key point detection network In some cases, the second neural network may be implemented in the same or different manner as the first neural network.

在一些可能的實現方式中，也可以通過相關的關鍵點識別算法對裁剪結果中的目標對象進行關鍵點檢測，採用何種關鍵點識別算法在本發明實施例中同樣不做限制。In some possible implementation manners, key point detection may also be performed on the target object in the clipping result through a related key point recognition algorithm, and the key point recognition algorithm to be used is also not limited in the embodiment of the present invention.

圖3示出根據本發明一實施例的深度檢測方法的流程圖，如圖3所示，在一種可能的實現方式中，步驟S13可以包括：FIG. 3 shows a flow chart of a depth detection method according to an embodiment of the present invention. As shown in FIG. 3, in a possible implementation, step S13 may include:

步驟S131，獲取至少兩個採集設備分別對應的至少兩個預設設備參數，至少兩個採集設備用於從至少兩個採集視角對目標對象進行圖像採集。Step S131, acquiring at least two preset device parameters respectively corresponding to at least two capture devices, the at least two capture devices are used to capture images of the target object from at least two capture angles of view.

步驟S132，根據至少兩個預設設備參數以及多個關鍵點檢測結果，確定待檢測幀中目標對象的深度訊息。Step S132: Determine the depth information of the target object in the frame to be detected according to at least two preset device parameters and a plurality of key point detection results.

其中，採集設備的實現方式可以參考上述各公開實施例，在此不再贅述。For the implementation manner of the acquisition device, reference may be made to the above-mentioned disclosed embodiments, which will not be repeated here.

在一些可能的實現方式中，至少兩個預設設備參數可以包括至少兩個採集設備分別對應的預設內部參數。預設內部參數可以是採集設備本身的一些標定參數，其包含的參數類型和種類可以根據採集設備的實際情況靈活決定。在一些可能的實現方式中，預設內部參數可以包括採集設備的內參矩陣，該內參矩陣中可以包含相機的一個或多個焦距參數，以及一個或多個相機的主點位置等。In some possible implementation manners, the at least two preset device parameters may include preset internal parameters respectively corresponding to at least two acquisition devices. The preset internal parameters may be some calibration parameters of the collection device itself, and the types and types of parameters contained therein may be flexibly determined according to the actual situation of the collection device. In some possible implementation manners, the preset internal parameters may include an internal reference matrix of the acquisition device, and the internal reference matrix may include one or more focal length parameters of the camera, principal point positions of one or more cameras, and the like.

在一些可能的實現方式中，由於採集設備可以包括至少兩個採集設備，因此至少兩個預設設備參數中，還可以包括預設外部參數，其中，預設外部參數可以是不同採集設備之間所形成的相對參數，用於描述不同採集設備之間在世界坐標系中的相對位置。在一些可能的實現方式中，預設外部參數可以包括不同採集設備之間形成的外參矩陣，在一個示例中，該外參矩陣可以包括旋轉矩陣和/或平移向量矩陣等。In some possible implementations, since the collection device may include at least two collection devices, at least two preset device parameters may also include preset external parameters, wherein the preset external parameters may be between different collection devices The formed relative parameters are used to describe the relative positions of different acquisition devices in the world coordinate system. In some possible implementation manners, the preset external parameters may include an external parameter matrix formed between different acquisition devices. In an example, the external parameter matrix may include a rotation matrix and/or a translation vector matrix, and the like.

獲取預設設備參數的方式在本發明實施例中不做限定，在一些可能的實現方式中，可以根據採集設備的實際情況直接獲取該預設設備參數，在一些可能的實現方式中，也可以通過對採集設備進行標定來獲得該預設設備參數。The method of obtaining preset device parameters is not limited in the embodiment of the present invention. In some possible implementations, the preset device parameters can be directly obtained according to the actual situation of the acquisition device. In some possible implementations, you can also The preset device parameters are obtained by calibrating the acquisition device.

根據多個關鍵點檢測結果之間的位置關係，結合至少兩個預設設備參數，可以確定在三維的世界坐標系下不同的待檢測幀之間所形成的視差。上述公開實施例中提到，深度訊息包含的訊息內容可以根據實際情況靈活決定，因此隨著深度訊息內容的不同，根據預設設備參數與多個關鍵點檢測結果確定深度訊息的過程也可以隨之發生變化，詳見下述各公開實施例，在此先不做展開。According to the positional relationship among the multiple key point detection results, combined with at least two preset device parameters, the parallax formed between different frames to be detected in the three-dimensional world coordinate system can be determined. As mentioned in the above-mentioned disclosed embodiments, the content of the information contained in the depth information can be flexibly determined according to the actual situation. Therefore, as the content of the depth information is different, the process of determining the depth information according to the preset device parameters and the detection results of multiple key points can also be determined at any time. For the changes, see the following disclosed embodiments for details, and will not be expanded here.

通過本發明實施例，可以利用至少兩個預設設備參數和多個關鍵點檢測結果，確定不同待檢測幀之間所形成的視差，簡單便捷地確定深度訊息，這種方式計算量較小且結果較為精確，可以提升深度檢測的精度和效率。Through the embodiment of the present invention, at least two preset device parameters and multiple key point detection results can be used to determine the parallax formed between different frames to be detected, and to determine the depth information simply and conveniently. This method has a small amount of calculation and is The result is more accurate, which can improve the accuracy and efficiency of depth detection.

在一種可能的實現方式中，步驟S132可以包括：In a possible implementation manner, step S132 may include:

根據至少兩個預設設備參數中的預設外部參數以及多個關鍵點檢測結果在至少兩個形式下的坐標，得到深度距離。The depth distance is obtained according to the preset external parameters among the at least two preset device parameters and the coordinates of the multiple key point detection results in at least two forms.

其中，預設外部參數的實現形式可以參考上述各公開實施例，在此不再贅述。關鍵點檢測結果在至少兩個形式下的坐標，可以是關鍵點檢測結果在不同的坐標系下所對應的坐標，比如可以包括關鍵點檢測結果在圖像坐標系中所形成的像素坐標，和/或，在不同的採集設備中分別形成的齊次坐標等。具體選擇哪些形式的坐標可以根據實際情況靈活選擇，不局限於下述各公開實施例。Wherein, the implementation form of preset external parameters may refer to the above-mentioned disclosed embodiments, which will not be repeated here. The coordinates of the key point detection results in at least two forms may be the corresponding coordinates of the key point detection results in different coordinate systems, for example, may include the pixel coordinates formed by the key point detection results in the image coordinate system, and /or, homogeneous coordinates formed separately in different acquisition devices, etc. Which form of coordinates to choose can be flexibly selected according to actual conditions, and is not limited to the following disclosed embodiments.

在得到深度距離的過程中，選用關鍵點檢測結果中哪個關鍵點的坐標，在本發明實施例中不做限制，在一些可能的實現方式中，可以選用頭部關鍵點、左肩關鍵點以及右肩關鍵點中的一個或多個，在一個示例中，可以選用頭部關鍵點。在一些可能的實現方式中，還可以選用頭肩中心點。In the process of obtaining the depth distance, the coordinates of the key points in the key point detection results are not limited in the embodiment of the present invention. In some possible implementations, the head key point, left shoulder key point and right shoulder key point can be selected. One or more of the shoulder keys and, in one example, the head key. In some possible implementations, the center of the head and shoulders can also be chosen.

其中，頭肩中心點可以是上述公開實施例中提到的頭肩框的中心點，在一些可能的實現方式中，可以根據頭部關鍵點、左肩關鍵點和右肩關鍵點的位置坐標，確定頭肩框整體的位置坐標，並基於該頭肩框整體的位置坐標，確定頭肩中心點的位置坐標；在一些可能的實現方式中，也可以直接將頭肩中心點作為待檢測的關鍵點，從而在關鍵點檢測結果中直接獲取到頭肩中心點的位置坐標。Wherein, the center point of the head and shoulders may be the center point of the head and shoulders frame mentioned in the above-mentioned disclosed embodiments. In some possible implementation manners, according to the position coordinates of the head key point, left shoulder key point and right shoulder key point, Determine the overall position coordinates of the head and shoulders frame, and determine the position coordinates of the center point of the head and shoulders based on the overall position coordinates of the head and shoulders frame; in some possible implementations, the center point of the head and shoulders can also be directly used as the key to be detected point, so that the position coordinates of the center point of the head and shoulders can be directly obtained in the key point detection results.

隨著採集設備數量的不同，得到深度距離的計算方式可以靈活發生變化，不局限於下述各公開實施例。在一個示例中，可以包括左相機和右相機兩個採集設備，在這種情況下，根據至少兩個預設設備參數中的外部參數以及多個關鍵點檢測結果在至少兩個形式下的坐標，得到深度距離的過程，可以通過下述公式（1）和（2）進行表示：As the number of acquisition devices is different, the calculation method for obtaining the depth distance can be flexibly changed, and is not limited to the following disclosed embodiments. In one example, it may include two acquisition devices, a left camera and a right camera. In this case, according to the external parameters in at least two preset device parameters and the coordinates of a plurality of key point detection results in at least two forms , the process of obtaining the depth distance can be expressed by the following formulas (1) and (2):

（1）

(1)

（2）

(2)

其中，

為深度距離，

為左相機採集的待檢測幀中關鍵點在齊次形式下的原始坐標，

為對原始坐標進行線性變換後所得到的變換坐標，

為右相機採集的待檢測幀中關鍵點在齊次形式下的坐標，

為預設外部參數中右相機相對於左相機的旋轉矩陣

，

為預設外部參數中右相機相對於左相機的平移向量矩陣

。 in,

is the depth distance,

is the original coordinates of the key points in the homogeneous form in the frame to be detected collected by the left camera,

is the transformed coordinate obtained by linearly transforming the original coordinate,

is the coordinates of the key points in the homogeneous form in the frame to be detected collected by the right camera,

is the rotation matrix of the right camera relative to the left camera in the preset extrinsic parameters

,

is the translation vector matrix of the right camera relative to the left camera in the preset extrinsic parameters

.

通過本發明實施例，可以不同相機坐標系下關鍵點的齊次形式坐標，以及關鍵點在線性變換後形式下的坐標，結合不同相機之間相對的預設外部參數，以較小的計算量精確地確定深度距離，從而提升深度檢測的精度和效率。Through the embodiment of the present invention, the homogeneous form coordinates of key points in different camera coordinate systems, and the coordinates of key points in the form of linear transformation can be combined with the relative preset external parameters between different cameras, with a small amount of calculation Accurately determine the depth distance, thereby improving the accuracy and efficiency of depth detection.

在一種可能的實現方式中，步驟S132也可以包括：In a possible implementation manner, step S132 may also include:

根據至少兩個預設設備參數中的預設內部參數以及多個關鍵點檢測結果在至少兩個形式下的坐標，得到偏移角度。The offset angle is obtained according to the preset internal parameters in the at least two preset device parameters and the coordinates of the multiple key point detection results in at least two forms.

其中，預設內部參數與關鍵點檢測結果在至少兩個形式下的坐標的實現形式，同樣可以參考上述各公開實施例，在此不再贅述。Wherein, the implementation forms of preset internal parameters and coordinates of key point detection results in at least two forms can also refer to the above disclosed embodiments, and will not be repeated here.

根據預設內部參數以及關鍵點檢測結果在至少兩個形式下的坐標，得到偏移角度的方式也可以靈活選擇，不局限於下述各公開實施例。確定偏移角度的過程中，選用的關鍵點的種類同樣可以根據實際情況靈活選擇，可以參考上述確定深度距離中選用的關鍵點類型，在此不再贅述。According to the preset internal parameters and the coordinates of the key point detection results in at least two forms, the way of obtaining the offset angle can also be flexibly selected, and is not limited to the following disclosed embodiments. In the process of determining the offset angle, the type of selected key points can also be flexibly selected according to the actual situation. You can refer to the type of key points selected in the above-mentioned determination of the depth distance, which will not be repeated here.

類似於深度距離的確定過程，隨著採集設備數量的不同，得到偏移角度的計算方式也可以靈活發生變化，不局限於下述各公開實施例。在一個示例中，以採集設備包括某目標相機為例，得到相對於該目標相機的偏移角度的過程，可以通過下述公式（3）至（5）進行表示：Similar to the determination process of the depth distance, with the different number of acquisition devices, the calculation method for obtaining the offset angle can also be flexibly changed, and is not limited to the following disclosed embodiments. In an example, taking the acquisition device including a certain target camera as an example, the process of obtaining the offset angle relative to the target camera can be expressed by the following formulas (3) to (5):

（3）

(3)

（4）

(4)

（5）

(5)

其中，

為目標對象在x軸方向上的偏移角度，

為目標對象在y軸方向上的偏移角度，

為目標相機採集的待檢測幀中關鍵點在齊次形式下的坐標，

為目標相機採集的待檢測幀中關鍵點的像素坐標，

和

為目標相機的內參矩陣

中的焦距參數，

和

為目標相機的內參矩陣

中的主點位置。 in,

is the offset angle of the target object in the x-axis direction,

is the offset angle of the target object in the y-axis direction,

is the coordinates of the key points in the homogeneous form in the frame to be detected collected by the target camera,

is the pixel coordinate of the key point in the frame to be detected collected by the target camera,

and

is the intrinsic parameter matrix of the target camera

The focal length parameter in ,

and

is the intrinsic parameter matrix of the target camera

The location of the principal point in .

通過本發明實施例，可以利用預設內部參數和深度檢測過程中得到的關鍵點檢測結果在不同形式下的坐標，簡單便捷地確定偏移角度，這種確定方式無需獲取額外的數據，且便於計算，可以提升深度檢測的效率和便捷程度。Through the embodiment of the present invention, the offset angle can be determined simply and conveniently by using the preset internal parameters and the coordinates of the key point detection results obtained in the depth detection process in different forms. This determination method does not need to obtain additional data and is convenient. Computing can improve the efficiency and convenience of in-depth detection.

在一種可能的實現方式中，本發明實施例提出的方法還可以包括：In a possible implementation manner, the method proposed in the embodiment of the present invention may also include:

根據目標對象的深度訊息，確定目標對象在三維空間中的位置。According to the depth information of the target object, the position of the target object in the three-dimensional space is determined.

其中，目標對象在三維空間中的位置，可以是目標對象在三維空間中的三維坐標。基於深度訊息確定三維空間中的位置的方式可以根據實際情況靈活選擇，在一種可能的實現方式中，可以根據目標對象的關鍵點檢測結果，確定目標對象在待檢測幀中的二維坐標，並將該二維坐標與深度訊息中的深度距離和/或偏移角度等進行結合，從而確定目標對象在三維空間中的三維坐標。Wherein, the position of the target object in the three-dimensional space may be the three-dimensional coordinates of the target object in the three-dimensional space. The way to determine the position in the three-dimensional space based on the depth information can be flexibly selected according to the actual situation. In a possible implementation mode, the two-dimensional coordinates of the target object in the frame to be detected can be determined according to the key point detection results of the target object, and The 2D coordinates are combined with the depth distance and/or offset angle in the depth information to determine the 3D coordinates of the target object in the 3D space.

在確定目標對象在三維空間中的位置以後，可以基於該三維的位置訊息，對目標對象進行人臉識別、活體識別、路線跟蹤或是應用到虛擬現實（Virtual Reality，VR）或增強現實（Augmented Reality，AR）等場景中。通過本發明實施例，可以利用深度訊息對目標對象進行三維定位，從而與目標對象實現各種方式的交互等操作。比如，在一些可能的實現方式中，可以根據目標對象在三維空間中的位置，確定目標對象與智能空調之間的距離和角度，從而動態調整智能空調的風向和/或風速；在一些可能的實現方式中，也可以在AR遊戲平臺中，基於目標對象在三維空間中的位置，對目標對象在遊戲場景中進行定位，從而可以更加真實自然地實現AR場景中的人機互動。After determining the position of the target object in the three-dimensional space, based on the three-dimensional position information, face recognition, living body recognition, route tracking or application to virtual reality (Virtual Reality, VR) or augmented reality (Augmented Reality) can be performed on the target object. Reality, AR) and other scenarios. Through the embodiments of the present invention, the depth information can be used to perform three-dimensional positioning of the target object, so as to realize various operations such as interaction with the target object. For example, in some possible implementations, the distance and angle between the target object and the smart air conditioner can be determined according to the position of the target object in three-dimensional space, so as to dynamically adjust the wind direction and/or wind speed of the smart air conditioner; in some possible In the implementation mode, the target object can also be positioned in the game scene based on the position of the target object in the three-dimensional space in the AR game platform, so that the human-computer interaction in the AR scene can be realized more realistically and naturally.

可以理解，本發明提及的上述各個方法實施例，在不違背原理邏輯的情況下，均可以彼此相互結合形成結合後的實施例，限於篇幅，本發明不再贅述。本領域技術人員可以理解，在具體實施方式的上述方法中，各步驟的具體執行順序應當以其功能和可能的內在邏輯確定。It can be understood that the above-mentioned method embodiments mentioned in the present invention can all be combined with each other to form a combined embodiment without violating the principle and logic. Due to space limitations, the present invention will not repeat them. Those skilled in the art can understand that, in the above method in the specific implementation manner, the specific execution order of each step should be determined according to its function and possible internal logic.

此外，本發明還提供了深度檢測裝置、電子設備、電腦可讀儲存媒體、程式，上述均可用來實現本發明提供的任一種深度檢測方法，相應技術方案和描述和參見方法部分的相應記載，不再贅述。In addition, the present invention also provides depth detection devices, electronic equipment, computer-readable storage media, and programs, all of which can be used to implement any depth detection method provided by the present invention, corresponding technical solutions and descriptions, and refer to the corresponding records in the method section, No longer.

圖4示出根據本發明實施例的深度檢測裝置的方塊圖。如圖4所示，裝置20包括：Fig. 4 shows a block diagram of a depth detection device according to an embodiment of the present invention. As shown in Figure 4, device 20 includes:

獲取模組21，用於獲取多幀待檢測幀，其中，多幀待檢測幀包括從至少兩個採集視角對目標對象進行圖像採集所得到的圖像幀。The obtaining module 21 is configured to obtain multiple frames to be detected, wherein the multiple frames to be detected include image frames obtained by collecting images of the target object from at least two collection angles of view.

關鍵點檢測模組22，用於根據待檢測幀進行目標對象中目標區域的關鍵點檢測，確定與多幀待檢測幀對應的多個關鍵點檢測結果，其中，目標區域包括頭部區域和/或肩部區域。The key point detection module 22 is used to perform key point detection of the target area in the target object according to the frame to be detected, and determine a plurality of key point detection results corresponding to multiple frames to be detected, wherein the target area includes the head area and/or or the shoulder area.

深度檢測模組23，用於根據多個關鍵點檢測結果，確定目標對象的深度訊息。The depth detection module 23 is configured to determine the depth information of the target object according to the detection results of multiple key points.

在一種可能的實現方式中，深度檢測模組用於：獲取至少兩個採集設備分別對應的至少兩個預設設備參數，至少兩個採集設備用於從至少兩個採集視角對目標對象進行圖像採集；根據至少兩個預設設備參數以及多個關鍵點檢測結果，確定待檢測幀中目標對象的深度訊息。In a possible implementation manner, the depth detection module is used to: obtain at least two preset device parameters respectively corresponding to at least two collection devices, and the at least two collection devices are used to map the target object from at least two collection perspectives. Image acquisition; according to at least two preset device parameters and multiple key point detection results, determine the depth information of the target object in the frame to be detected.

在一種可能的實現方式中，深度訊息包括深度距離，深度距離包括目標對象與採集設備的光心之間的距離；深度檢測模組進一步用於：根據至少兩個預設設備參數中的預設外部參數以及多個關鍵點檢測結果在至少兩個形式下的坐標，得到深度距離；其中，預設外部參數包括至少兩個採集設備之間形成的相對參數。In a possible implementation manner, the depth information includes a depth distance, and the depth distance includes a distance between the target object and the optical center of the acquisition device; the depth detection module is further used for: according to presets in at least two preset device parameters The external parameters and the coordinates of the multiple key point detection results in at least two forms can be used to obtain the depth distance; wherein, the preset external parameters include relative parameters formed between at least two acquisition devices.

在一種可能的實現方式中，深度訊息包括偏移角度，偏移角度包括目標對象相對於採集設備的光軸的空間角度；深度檢測模組進一步用於：根據至少兩個預設設備參數中的預設內部參數以及多個關鍵點檢測結果在至少兩個形式下的坐標，得到偏移角度；其中，預設內部參數包括至少兩個設備分別對應的設備參數。In a possible implementation, the depth information includes an offset angle, and the offset angle includes a spatial angle of the target object relative to the optical axis of the acquisition device; the depth detection module is further used to: according to at least two preset device parameters The preset internal parameters and the coordinates of the multiple key point detection results in at least two forms are used to obtain the offset angle; wherein the preset internal parameters include device parameters corresponding to at least two devices respectively.

在一種可能的實現方式中，關鍵點檢測模組用於：根據目標對象在參考幀中的位置訊息，對待檢測幀中的目標對象的目標區域進行關鍵點檢測，得到與待檢測幀對應的關鍵點檢測結果，其中，參考幀為待檢測幀所屬的目標影片中，位於待檢測幀之前的影片幀。In a possible implementation, the key point detection module is used to: perform key point detection on the target area of the target object in the frame to be detected according to the position information of the target object in the reference frame, and obtain the key point corresponding to the frame to be detected The point detection result, wherein, the reference frame is the movie frame before the frame to be detected in the target movie to which the frame to be detected belongs.

在一種可能的實現方式中，關鍵點檢測模組進一步用於：根據參考幀中目標對象的第一位置，對待檢測幀進行裁剪，得到裁剪結果；對裁剪結果中的目標對象的目標區域進行關鍵點檢測，得到與待檢測幀對應的關鍵點檢測結果。In a possible implementation, the key point detection module is further used to: crop the frame to be detected according to the first position of the target object in the reference frame to obtain the cropping result; key the target area of the target object in the cropping result Point detection to obtain key point detection results corresponding to the frame to be detected.

在一種可能的實現方式中，關鍵點檢測模組進一步用於：獲取目標對象的目標區域在參考幀中的第二位置；根據第二位置，對待檢測幀進行裁剪，得到裁剪結果；對裁剪結果中的目標對象的目標區域進行關鍵點檢測，得到與待檢測幀對應的關鍵點檢測結果。In a possible implementation, the key point detection module is further used to: obtain the second position of the target area of the target object in the reference frame; according to the second position, the frame to be detected is cropped to obtain the cropping result; the cropping result The key point detection is performed on the target area of the target object in , and the key point detection result corresponding to the frame to be detected is obtained.

在一種可能的實現方式中，關鍵點檢測模組進一步用於：通過第一神經網路對參考幀中的目標區域進行識別，得到第一神經網路輸出的第二位置；和/或，根據參考幀對應的關鍵點檢測結果，得到目標區域在參考幀中的第二位置。In a possible implementation, the key point detection module is further used to: use the first neural network to identify the target area in the reference frame to obtain the second position output by the first neural network; and/or, according to The key point detection result corresponding to the reference frame is used to obtain the second position of the target area in the reference frame.

在一種可能的實現方式中，裝置還用於：根據目標對象的深度訊息，確定目標對象在三維空間中的位置。In a possible implementation manner, the device is further configured to: determine the position of the target object in the three-dimensional space according to the depth information of the target object.

在一些實施例中，本發明實施例提供的裝置具有的功能或包含的模組可以用於執行上文方法實施例描述的方法，其具體實現可以參照上文方法實施例的描述，為了簡潔，這裡不再贅述。In some embodiments, the functions or modules included in the device provided by the embodiments of the present invention can be used to execute the methods described in the above method embodiments, and its specific implementation can refer to the description of the above method embodiments. For brevity, I won't go into details here.

應用場景示例Application Scenario Example

圖5示出根據本發明一應用示例的示意圖，如圖5所示，本發明應用示例提出一種深度檢測方法，可以包括如下過程：Fig. 5 shows a schematic diagram of an application example according to the present invention. As shown in Fig. 5, the application example of the present invention proposes a depth detection method, which may include the following process:

步驟S31，使用Faster RCNN神經網路，從雙目相機（包括左相機和右相機）拍攝的兩張待檢測幀中分別進行人體的頭肩框檢測，得到左相機的第一幀中頭肩框的位置，以及右相機的第一幀中頭肩框的位置。Step S31, use the Faster RCNN neural network to detect the head and shoulders frame of the human body from the two frames to be detected taken by the binocular camera (including the left camera and the right camera), and obtain the head and shoulders frame in the first frame of the left camera. position, and the position of the head-and-shoulders box in the first frame of the right camera.

步驟S32，分別獲取左相機和右相機各自對應的目標影片，從目標影片的第二幀開始，將該影片幀作為待檢測幀，將待檢測幀的上一幀作為參考幀，根據參考幀中頭肩框的第二位置，通過關鍵點檢測網路對待檢測幀進行關鍵點檢測，得到頭部關鍵點、左肩關鍵點以及右肩關鍵點這三個關鍵點的位置坐標，並將該三個關鍵點的外接矩形作為待檢測幀中的頭肩框。Step S32, obtain the target films corresponding to the left camera and the right camera respectively, start from the second frame of the target film, use this film frame as the frame to be detected, and use the previous frame of the frame to be detected as the reference frame, according to the header in the reference frame For the second position of the shoulder frame, the key point detection is performed on the frame to be detected through the key point detection network, and the position coordinates of the three key points of the head key point, left shoulder key point and right shoulder key point are obtained, and the three key points The circumscribed rectangle of the point is used as the head and shoulders frame in the frame to be detected.

步驟S33，根據待檢測幀中關鍵點在至少兩個形式下的坐標，以及相機的內參矩陣，計算目標對象相對於相機的偏移角度：Step S33, according to the coordinates of the key points in the frame to be detected in at least two forms, and the internal reference matrix of the camera, calculate the offset angle of the target object relative to the camera:

其中，可以根據待檢測幀中頭部關鍵點的像素坐標(u, v, 1) 和相機的內參矩陣K，通過上述公開實施例中提到的公式（3）至（5），計算得到頭部關鍵點對應的齊次形式的坐標(x/z, y/z, 1)，以及相對相機光軸的偏移角度

和

。 Among them, according to the pixel coordinates (u, v, 1) of the key points of the head in the frame to be detected and the internal reference matrix K of the camera, the head The homogeneous coordinates (x/z, y/z, 1) corresponding to the internal key points, and the offset angle relative to the camera optical axis

and

.

步驟S34，根據待檢測幀中關鍵點在左相機和右相機中的齊次坐標，以及右相機相對於左相機的外參矩陣，計算目標對象的深度距離：Step S34, according to the homogeneous coordinates of the key points in the frame to be detected in the left camera and the right camera, and the extrinsic matrix of the right camera relative to the left camera, calculate the depth distance of the target object:

其中，可以根據同一個關鍵點分別在左、右相機中的齊次形式的坐標，以及右相機相對於左相機的外參矩陣R和T，通過上述公開實施例中提到的公式（1）和（2），計算目標對象的深度距離d。Among them, according to the homogeneous coordinates of the same key point in the left and right cameras respectively, and the extrinsic matrix R and T of the right camera relative to the left camera, through the formula (1) mentioned in the above disclosed embodiments and (2), calculate the depth distance d of the target object.

在一個示例中，在通過步驟S33和步驟S34確定待檢測幀中目標對象的深度訊息以後，還可以將左相機和右相機分別對應的目標影片中，待檢測幀的下一幀作為待檢測幀，並回到步驟S32重新進行深度檢測。In one example, after the depth information of the target object in the frame to be detected is determined through steps S33 and S34, the next frame of the frame to be detected in the target video corresponding to the left camera and the right camera can also be used as the frame to be detected , and return to step S32 to perform depth detection again.

通過本發明應用示例，可以利用人體的頭肩框和頭肩框中的關鍵點計算不同視角下採集的待檢測幀所形成的視差，相對於基於圖像匹配的視差估計方法來說，計算量更小，應用場景更廣。Through the application example of the present invention, the head and shoulders frame of the human body and the key points in the head and shoulders frame can be used to calculate the parallax formed by the frames to be detected collected under different viewing angles. Compared with the parallax estimation method based on image matching, the calculation amount Smaller, wider application scenarios.

可以理解，本發明提及的上述各個方法實施例，在不違背原理邏輯的情況下，均可以彼此相互結合形成結合後的實施例，限於篇幅，本發明不再贅述。It can be understood that the above-mentioned method embodiments mentioned in the present invention can all be combined with each other to form a combined embodiment without violating the principle and logic. Due to space limitations, the present invention will not repeat them.

本領域技術人員可以理解，在具體實施方式的上述方法中，各步驟的撰寫順序並不意味著嚴格的執行順序而對實施過程構成任何限定，各步驟的具體執行順序應當以其功能和可能的內在邏輯確定。Those skilled in the art can understand that in the above method of specific implementation, the writing order of each step does not mean a strict execution order and constitutes any limitation on the implementation process. The specific execution order of each step should be based on its function and possible The inner logic is OK.

本發明實施例還提出一種電腦可讀儲存媒體，其上儲存有電腦程式指令，所述電腦程式指令被處理器執行時實現上述方法。電腦可讀儲存媒體可以是非揮發性電腦可讀儲存媒體或揮發性電腦可讀儲存媒體。The embodiment of the present invention also proposes a computer-readable storage medium on which computer program instructions are stored, and the above-mentioned method is implemented when the computer program instructions are executed by a processor. A computer readable storage medium may be a non-volatile computer readable storage medium or a volatile computer readable storage medium.

本發明實施例還提出一種電子設備，包括：處理器；用於儲存處理器可執行指令的記憶體；其中，所述處理器被配置為調用所述記憶體儲存的指令，以執行上述方法。An embodiment of the present invention also proposes an electronic device, including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to invoke the instructions stored in the memory to execute the above method.

本發明實施例還提供了一種電腦程式產品，包括電腦可讀代碼，當電腦可讀代碼在設備上運行時，設備中的處理器執行用於實現如上任一實施例提供的深度檢測方法的指令。An embodiment of the present invention also provides a computer program product, including computer-readable codes. When the computer-readable codes run on the device, the processor in the device executes instructions for implementing the depth detection method provided in any of the above embodiments. .

本發明實施例還提供了另一種電腦程式產品，用於儲存電腦可讀指令，指令被執行時使得電腦執行上述任一實施例提供的深度檢測方法的操作。The embodiment of the present invention also provides another computer program product, which is used for storing computer-readable instructions. When the instructions are executed, the computer executes the operation of the depth detection method provided by any one of the above-mentioned embodiments.

電子設備可以被提供為終端、伺服器或其它形態的設備。Electronic devices may be provided as terminals, servers, or other types of devices.

圖6示出根據本發明實施例的電子設備800的方塊圖。如圖6所示，電子設備800可以是移動電話，電腦，數位廣播終端，訊息收發設備，遊戲控制台，平板設備，醫療設備，健身設備，個人數位助理等終端。FIG. 6 shows a block diagram of an electronic device 800 according to an embodiment of the present invention. As shown in FIG. 6 , the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a message sending and receiving device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant and other terminals.

參照圖6，電子設備800可以包括以下一個或多個組件：處理組件802，記憶體804，電源組件806，多媒體組件808，音頻組件810，輸入/輸出（I/ O）的介面812，感測器組件814，以及通信組件816。Referring to FIG. 6, the electronic device 800 may include one or more of the following components: a processing component 802, a memory 804, a power supply component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor The implementer component 814, and the communication component 816.

處理組件802通常控制電子設備800的整體操作，諸如與顯示，電話呼叫，數據通信，相機操作和記錄操作相關聯的操作。處理組件802可以包括一個或多個處理器820來執行指令，以完成上述的方法的全部或部分步驟。此外，處理組件802可以包括一個或多個模組，便於處理組件802和其他組件之間的交互。例如，處理組件802可以包括多媒體模組，以方便多媒體組件808和處理組件802之間的交互。The processing component 802 generally controls the overall operations of the electronic device 800, such as those associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 802 may include one or more processors 820 to execute instructions to complete all or part of the steps of the above method. Additionally, processing component 802 may include one or more modules to facilitate interaction between processing component 802 and other components. For example, processing component 802 may include a multimedia module to facilitate interaction between multimedia component 808 and processing component 802 .

記憶體804被配置為儲存各種類型的數據以支持在電子設備800的操作。這些數據的示例包括用於在電子設備800上操作的任何應用程式或方法的指令，連絡人數據，電話簿數據，訊息，圖片，影片等。記憶體804可以由任何類型的揮發性或非揮發性儲存設備或者它們的組合實現，如靜態隨機存取記憶體（SRAM），電子抹除式可複寫唯讀記憶體（EEPROM），可抹除可程式化唯讀記憶體（EPROM），可程式化唯讀記憶體（PROM），唯讀記憶體（ROM），磁記憶體，快閃記憶體，磁碟或光碟。The memory 804 is configured to store various types of data to support operations in the electronic device 800 . Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and the like. Memory 804 may be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electronically erasable rewritable read-only memory (EEPROM), erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic Disk or Optical Disk.

電源組件806為電子設備800的各種組件提供電力。電源組件806可以包括電源管理系統，一個或多個電源，及其他與為電子設備800生成、管理和分配電力相關聯的組件。The power supply component 806 provides power to various components of the electronic device 800 . Power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for electronic device 800 .

多媒體組件808包括在所述電子設備800和用戶之間的提供一個輸出介面的螢幕。在一些實施例中，螢幕可以包括液晶顯示器（LCD）和觸控面板（TP）。如果螢幕包括觸控面板，螢幕可以被實現為觸控屏，以接收來自用戶的輸入信號。觸控面板包括一個或多個觸控感測器以感測觸控、滑動和觸控面板上的手勢。所述觸控感測器可以不僅感測觸控或滑動動作的邊界，而且還檢測與所述觸控或滑動操作相關的持續時間和壓力。在一些實施例中，多媒體組件808包括一個前置攝影鏡頭和/或後置攝影鏡頭。當電子設備800處於操作模式，如拍攝模式或影片模式時，前置攝影鏡頭和/或後置攝影鏡頭可以接收外部的多媒體數據。每個前置攝影鏡頭和後置攝影鏡頭可以是一個固定的光學透鏡系統或具有焦距和光學變焦能力。The multimedia component 808 includes a screen providing an output interface between the electronic device 800 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touch, swipe and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or swipe action, but also detect the duration and pressure associated with the touch or swipe action. In some embodiments, the multimedia component 808 includes a front camera and/or a rear camera. When the electronic device 800 is in an operation mode, such as a shooting mode or a movie mode, the front camera lens and/or the rear camera lens can receive external multimedia data. Each front camera lens and rear camera lens can be a fixed optical lens system or have focal length and optical zoom capability.

音頻組件810被配置為輸出和/或輸入音頻訊號。例如，音頻組件810包括一個麥克風（MIC），當電子設備800處於操作模式，如呼叫模式、記錄模式和語音識別模式時，麥克風被配置為接收外部音頻訊號。所接收的音頻訊號可以被進一步儲存在記憶體804或經由通信組件816發送。在一些實施例中，音頻組件810還包括一個揚聲器，用於輸出音頻訊號。The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a microphone (MIC), which is configured to receive external audio signals when the electronic device 800 is in operation modes, such as call mode, recording mode and voice recognition mode. The received audio signal can be further stored in the memory 804 or sent via the communication component 816 . In some embodiments, the audio component 810 further includes a speaker for outputting audio signals.

I/ O介面812為處理組件802和外圍介面模組之間提供介面，上述外圍介面模組可以是鍵盤，點擊輪，按鈕等。這些按鈕可包括但不限於：主頁按鈕、音量按鈕、啟動按鈕和鎖定按鈕。The I/O interface 812 provides an interface between the processing unit 802 and peripheral interface modules, such as a keyboard, a click wheel, and buttons. These buttons may include, but are not limited to: a home button, volume buttons, start button, and lock button.

感測器組件814包括一個或多個感測器，用於為電子設備800提供各個方面的狀態評估。例如，感測器組件814可以檢測到電子設備800的打開/關閉狀態，組件的相對定位，例如所述組件為電子設備800的顯示器和小鍵盤，感測器組件814還可以檢測電子設備800或電子設備800一個組件的位置改變，用戶與電子設備800接觸的存在或不存在，電子設備800方位或加速/減速和電子設備800的溫度變化。感測器組件814可以包括接近感測器，被配置用來在沒有任何的物理接觸時檢測附近物體的存在。感測器組件814還可以包括光感測器，如互補金屬氧化物半導體（CMOS）或電荷耦合裝置（CCD）圖像感測器，用於在成像應用中使用。在一些實施例中，該感測器組件814還可以包括加速度感測器，陀螺儀感測器，磁感測器，壓力感測器或溫度感測器。Sensor assembly 814 includes one or more sensors for providing various aspects of status assessment for electronic device 800 . For example, the sensor assembly 814 can detect the open/closed state of the electronic device 800, the relative positioning of components, such as the display and keypad of the electronic device 800, and the sensor assembly 814 can also detect the electronic device 800 or The position of a component of the electronic device 800 changes, the presence or absence of user contact with the electronic device 800 , the orientation or acceleration/deceleration of the electronic device 800 and the temperature of the electronic device 800 change. The sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. The sensor assembly 814 may also include a light sensor, such as a complementary metal-oxide-semiconductor (CMOS) or charge-coupled device (CCD) image sensor, for use in imaging applications. In some embodiments, the sensor component 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor or a temperature sensor.

通信組件816被配置為便於電子設備800和其他設備之間有線或無線方式的通信。電子設備800可以接入基於通信標準的無線網路，如無線網路（WiFi），第二代移動通信技術（2G）或第三代移動通信技術（3G），或它們的組合。在一個示例性實施例中，通信組件816經由廣播頻道接收來自外部廣播管理系統的廣播信號或廣播相關訊息。在一個示例性實施例中，所述通信組件816還包括近場通信（NFC）模組，以促進短程通信。例如，在NFC模組可基於射頻識別（RFID）技術，紅外數據協會（IrDA）技術，超寬頻（UWB）技術，藍牙（BT）技術和其他技術來實現。The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 can access a wireless network based on a communication standard, such as a wireless network (WiFi), a second generation mobile communication technology (2G) or a third generation mobile communication technology (3G), or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 also includes a near field communication (NFC) module to facilitate short-range communication. For example, the NFC module can be implemented based on Radio Frequency Identification (RFID) technology, Infrared Data Association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology and other technologies.

在示例性實施例中，電子設備800可以被一個或多個應用專用積體電路（ASIC）、數字訊號處理器（DSP）、數字訊號處理設備（DSPD）、可程式邏輯裝置（PLD）、現場可程式化邏輯閘陣列（FPGA）、控制器、微控制器、微處理器或其他電子元件實現，用於執行上述方法。In an exemplary embodiment, electronic device 800 may be implemented by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field Programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic component implementation for performing the method described above.

在示例性實施例中，還提供了一種非揮發性電腦可讀儲存媒體，例如包括電腦程式指令的記憶體804，上述電腦程式指令可由電子設備800的處理器820執行以完成上述方法。In an exemplary embodiment, there is also provided a non-volatile computer-readable storage medium, such as the memory 804 including computer program instructions, which can be executed by the processor 820 of the electronic device 800 to implement the above method.

圖7示出根據本發明實施例的電子設備1900的方塊圖。如圖7所示，電子設備1900可以被提供為一伺服器。參照圖7，電子設備1900包括處理組件1922，其進一步包括一個或多個處理器，以及由記憶體1932所代表的記憶體資源，用於儲存可由處理組件1922的執行的指令，例如應用程序。記憶體1932中儲存的應用程式可以包括一個或一個以上的每一個對應於一組指令的模組。此外，處理組件1922被配置為執行指令，以執行上述方法。FIG. 7 shows a block diagram of an electronic device 1900 according to an embodiment of the present invention. As shown in FIG. 7, the electronic device 1900 may be provided as a server. Referring to FIG. 7 , electronic device 1900 includes processing component 1922 , which further includes one or more processors, and a memory resource represented by memory 1932 for storing instructions executable by processing component 1922 , such as application programs. The application programs stored in the memory 1932 may include one or more modules each corresponding to a set of instructions. In addition, the processing component 1922 is configured to execute instructions to perform the above method.

電子設備1900還可以包括一個電源組件1926被配置為執行電子設備1900的電源管理，一個有線或無線網路介面1950被配置為將電子設備1900連接到網路，和一個輸入輸出（I/O）介面1958。電子設備1900可以操作基於儲存在記憶體1932的操作系統，例如微軟伺服器操作系統（Windows Server ^TM），蘋果公司推出的基於圖形用戶界面操作系統(Mac OS X ^TM)，多用戶多進程的電腦操作系統（Unix ^TM）, 自由和開放原代碼的類Unix操作系統（Linux ^TM），開放原代碼的類Unix操作系統（FreeBSD ^TM）或類似。 Electronic device 1900 may also include a power supply component 1926 configured to perform power management of electronic device 1900, a wired or wireless network interface 1950 configured to connect electronic device 1900 to a network, and an input/output (I/O) Interface 1958. The electronic device 1900 can operate based on the operating system stored in the memory 1932, such as the Microsoft server operating system (Windows Server ^TM ), the operating system based on the graphical user interface (Mac OS X ^TM ) introduced by Apple Inc., multi-user and multi-process computer Operating system (Unix ^™ ), free and open source Unix-like operating system (Linux ^™ ), open source Unix-like operating system (FreeBSD ^™ ) or similar.

在示例性實施例中，還提供了一種非揮發性電腦可讀儲存媒體，例如包括電腦程式指令的記憶體1932，上述電腦程式指令可由電子設備1900的處理組件1922執行以完成上述方法。In an exemplary embodiment, there is also provided a non-volatile computer-readable storage medium, such as the memory 1932 including computer program instructions, which can be executed by the processing component 1922 of the electronic device 1900 to implement the above method.

本發明可以是系統、方法和/或電腦程式產品。電腦程式產品可以包括電腦可讀儲存媒體，其上載有用於使處理器實現本發明的各個方面的電腦可讀程式指令。The present invention can be a system, method and/or computer program product. The computer program product may include a computer-readable storage medium loaded with computer-readable program instructions for causing a processor to implement various aspects of the invention.

電腦可讀儲存媒體可以是可以保持和儲存由指令執行設備使用的指令的有形設備。電腦可讀儲存媒體例如可以是（但不限於）電儲存設備、磁儲存設備、光儲存設備、電磁儲存設備、半導體儲存設備或者上述的任意合適的組合。電腦可讀儲存媒體的更具體的例子（非窮舉的列表）包括：便攜式電腦盤、硬盤、隨機存取記憶體（RAM）、唯讀記憶體（ROM）、可抹除可程式化唯讀記憶體（EPROM或閃存）、靜態隨機存取記憶體（SRAM）、可擕式壓縮磁碟唯讀記憶體（CD-ROM）、數位多功能影音光碟（DVD）、記憶卡、磁片、機械編碼設備、例如其上儲存有指令的打孔卡或凹槽內凸起結構、以及上述的任意合適的組合。這裡所使用的電腦可讀儲存媒體不被解釋為瞬時訊號本身，諸如無線電波或者其他自由傳播的電磁波、通過波導或其他傳輸媒介傳播的電磁波（例如，通過光纖電纜的光脈衝）、或者通過電線傳輸的電訊號。A computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. A computer readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the above. More specific examples (a non-exhaustive list) of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only Memory (EPROM or Flash), Static Random Access Memory (SRAM), Compact Disk Read-Only Memory (CD-ROM), Digital Versatile Disc (DVD), Memory Cards, Diskettes, Mechanical Encoding devices, such as punched cards or raised structures in grooves having instructions stored thereon, and any suitable combination of the foregoing. As used herein, computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., pulses of light through fiber optic cables), or transmitted electrical signals.

這裡所描述的電腦可讀程序指令可以從電腦可讀儲存媒體下載到各個計算/處理設備，或者通過網路、例如網際網路、區域網路、廣域網路和/或無線網路下載到外部電腦或外部儲存設備。網路可以包括銅傳輸電纜、光纖傳輸、無線傳輸、路由器、防火牆、交換機、網關電腦和/或邊緣伺服器。每個計算/處理設備中的網路適配卡或者網路介面從網路接收電腦可讀程式指令，並轉發該電腦可讀程式指令，以供儲存在各個計算/處理設備中的電腦可讀儲存媒體中。The computer readable program instructions described herein may be downloaded from a computer readable storage medium to each computing/processing device, or to an external computer over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network or external storage device. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for computer-readable storage in each computing/processing device in storage media.

用於執行本發明操作的電腦程序指令可以是彙編指令、指令集架構（ISA）指令、機器指令、機器相關指令、微代碼、韌體指令、狀態設置數據、或者以一種或多種編程語言的任意組合編寫的源代碼或目標代碼，所述編程語言包括面向對象的編程語言—諸如Smalltalk、C++等，以及常規的過程式編程語言—諸如“C”語言或類似的編程語言。電腦可讀程式指令可以完全地在用戶電腦上執行、部分地在用戶電腦上執行、作為一個獨立的軟件包執行、部分在用戶電腦上部分在遠程電腦上執行、或者完全在遠程電腦或伺服器上執行。在涉及遠程電腦的情形中，遠程電腦可以通過任意種類的網路—包括區域網路(LAN)或廣域網路(WAN)—連接到用戶電腦，或者，可以連接到外部電腦（例如利用網際網路服務提供商來通過網際網路連接）。在一些實施例中，通過利用電腦可讀程式指令的狀態訊息來個性化定制電子電路，例如可程式化邏輯電路、現場可程式化邏輯閘陣列（FPGA）或可程式化邏輯陣列（PLA），該電子電路可以執行電腦可讀程式指令，從而實現本發明的各個方面。Computer program instructions for performing the operations of the present invention may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or any Combining source or object code written in programming languages including object-oriented programming languages such as Smalltalk, C++, etc., and conventional procedural programming languages such as the "C" language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server to execute. In cases involving a remote computer, the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or wide area network (WAN), or it can be connected to an external computer (for example, using the Internet service provider to connect via the Internet). In some embodiments, electronic circuits, such as programmable logic circuits, field programmable gate arrays (FPGAs) or programmable logic arrays (PLAs), are customized by utilizing state information of computer readable program instructions, The electronic circuit can execute computer-readable program instructions to implement various aspects of the present invention.

這裡參照根據本發明實施例的方法、裝置（系統）和電腦程式產品的流程圖和/或方塊圖描述了本發明的各個方面。應當理解，流程圖和/或方塊圖的每個方框以及流程圖和/或方塊圖中各方框的組合，都可以由電腦可讀程式指令實現。Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It should be understood that each block of the flowchart and/or block diagrams, and combinations of blocks in the flowchart and/or block diagrams, can be implemented by computer-readable program instructions.

這些電腦可讀程式指令可以提供給通用電腦、專用電腦或其它可編程數據處理裝置的處理器，從而生產出一種機器，使得這些指令在通過電腦或其它可編程數據處理裝置的處理器執行時，產生了實現流程圖和/或方塊圖中的一個或多個方框中規定的功能/動作的裝置。也可以把這些電腦可讀程式指令儲存在電腦可讀儲存媒體中，這些指令使得電腦、可編程數據處理裝置和/或其他設備以特定方式工作，從而，儲存有指令的電腦可讀媒體則包括一個製造品，其包括實現流程圖和/或方塊圖中的一個或多個方框中規定的功能/動作的各個方面的指令。These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine such that when executed by the processor of the computer or other programmable data processing apparatus, the Means that implement the functions/actions specified in one or more blocks in the flowchart and/or block diagrams are produced. These computer-readable program instructions can also be stored in computer-readable storage media, and these instructions cause computers, programmable data processing devices and/or other devices to operate in a specific way, so that computer-readable media storing instructions include An article of manufacture comprising instructions for implementing aspects of the functions/acts specified in one or more blocks in flowcharts and/or block diagrams.

也可以把電腦可讀程序指令加載到電腦、其它可程式數據處理裝置、或其它設備上，使得在電腦、其它可程式數據處理裝置或其它設備上執行一系列操作步驟，以產生電腦實現的過程，從而使得在電腦、其它可程式數據處理裝置、或其它設備上執行的指令實現流程圖和/或方塊圖中的一個或多個方框中規定的功能/動作。It is also possible to load computer-readable program instructions into a computer, other programmable data processing device, or other equipment, so that a series of operation steps are executed on the computer, other programmable data processing device, or other equipment to produce a computer-implemented process , so that instructions executed on computers, other programmable data processing devices, or other devices implement the functions/actions specified in one or more blocks in the flow charts and/or block diagrams.

附圖中的流程圖和方塊圖顯示了根據本發明的多個實施例的系統、方法和電腦程式產品的可能實現的體系架構、功能和操作。在這點上，流程圖或方塊圖中的每個方框可以代表一個模組、程式段或指令的一部分，所述模組、程式段或指令的一部分包含一個或多個用於實現規定的邏輯功能的可執行指令。在有些作為替換的實現中，方框中所標注的功能也可以以不同於附圖中所標注的順序發生。例如，兩個連續的方框實際上可以基本並行地執行，它們有時也可以按相反的順序執行，這依所涉及的功能而定。也要注意的是，方塊圖和/或流程圖中的每個方框、以及方塊圖和/或流程圖中的方框的組合，可以用執行規定的功能或動作的專用的基於硬體的系統來實現，或者可以用專用硬體與電腦指令的組合來實現。The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in a flowchart or block diagram may represent a module, program segment, or portion of an instruction that contains one or more Executable instructions for logical functions. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block in the block diagrams and/or flow diagrams, and combinations of blocks in the block diagrams and/or flow diagrams, can be implemented with a dedicated hardware-based computer that performs the specified function or action. system, or may be implemented using a combination of dedicated hardware and computer instructions.

該電腦程式產品可以具體通過硬件、軟件或其結合的方式實現。在一個可選實施例中，所述電腦程式產品具體體現為電腦儲存媒體，在另一個可選實施例中，電腦程式產品具體體現為軟件產品，例如軟件開發包(Software Development Kit，SDK)等等。The computer program product can be realized by hardware, software or a combination thereof. In an optional embodiment, the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), etc. wait.

以上已經描述了本發明的各實施例，上述說明是示例性的，並非窮盡性的，並且也不限於所披露的各實施例。在不偏離所說明的各實施例的範圍和精神的情況下，對於本技術領域的普通技術人員來說許多修改和變更都是顯而易見的。本文中所用術語的選擇，旨在最好地解釋各實施例的原理、實際應用或對市場中的技術的改進，或者使本技術領域的其它普通技術人員能理解本文披露的各實施例。Having described various embodiments of the present invention, the foregoing description is exemplary, not exhaustive, and is not limited to the disclosed embodiments. Many modifications and alterations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principle of each embodiment, practical application or improvement of technology in the market, or to enable other ordinary skilled in the art to understand each embodiment disclosed herein.

S11~S13:步驟 S31~S34:步驟 S131~S132:步驟 20:深度檢測裝置 21:獲取模組 22:關鍵點檢測模組 23:深度檢測模組 800:電子設備 802:處理組件 804:記憶體 806:電源組件 808:多媒體組件 810:音頻組件 812:輸入/輸出介面 814:感測器組件 816:通信組件 820:處理器 1900:電子設備 1922:處理組件 1926:電源組件 1932:記憶體 1950:網路介面 1958:輸入輸出介面 S11~S13: Steps S31~S34: steps S131~S132: Steps 20: Depth detection device 21: Get the mod 22: Key point detection module 23: Depth detection module 800: Electronic equipment 802: processing components 804: memory 806: Power components 808:Multimedia component 810:Audio components 812: input/output interface 814: Sensor component 816:Communication component 820: Processor 1900: Electronic equipment 1922: Processing components 1926: Power components 1932: Memory 1950: Web interface 1958: Input and output interface

此處的附圖被併入說明書中並構成本說明書的一部分，這些附圖示出了符合本發明的實施例，並與說明書一起用於說明本發明的技術方案。圖1示出根據本發明實施例的深度檢測方法的流程圖；圖2示出根據本發明實施例的目標區域的示意圖；圖3示出根據本發明實施例的深度檢測方法的流程圖；圖4示出根據本發明實施例的深度檢測裝置的方塊圖；圖5示出根據本發明一應用示例的示意圖；圖6示出根據本發明實施例的電子設備的方塊圖；及圖7示出根據本發明實施例的電子設備的方塊圖。 The accompanying drawings here are incorporated into the specification and constitute a part of the specification. These drawings show embodiments consistent with the present invention, and are used together with the description to explain the technical solution of the present invention. Fig. 1 shows the flowchart of the depth detection method according to an embodiment of the present invention; Fig. 2 shows a schematic diagram of a target area according to an embodiment of the present invention; Fig. 3 shows the flowchart of the depth detection method according to an embodiment of the present invention; 4 shows a block diagram of a depth detection device according to an embodiment of the present invention; Fig. 5 shows a schematic diagram of an application example according to the present invention; Figure 6 shows a block diagram of an electronic device according to an embodiment of the present invention; and Fig. 7 shows a block diagram of an electronic device according to an embodiment of the present invention.

S11~S13:步驟 S11~S13: Steps

Claims

A deep detection method, comprising: Acquiring multiple frames to be detected, wherein the multiple frames to be detected include image frames obtained by collecting images of the target object from at least two acquisition angles of view; Perform key point detection of a target area in the target object according to the frame to be detected, and determine a plurality of key point detection results corresponding to the multiple frames to be detected, wherein the target area includes a head area and/or shoulder area; Depth information of the target object is determined according to the plurality of key point detection results.

According to the method described in claim 1, wherein the determining the depth information of the target object according to the multiple key point detection results includes: Acquire at least two preset device parameters respectively corresponding to at least two capture devices, the at least two capture devices are used to capture images of the target object from at least two capture angles of view; Depth information of the target object in the frame to be detected is determined according to the at least two preset device parameters and the plurality of key point detection results.

The method according to claim 2, wherein the depth information includes a depth distance, and the depth distance includes a distance between the target object and the optical center of the acquisition device; The determining the depth information of the target object in the frame to be detected according to the at least two preset device parameters and the multiple key point detection results includes: According to the preset external parameters in the at least two preset device parameters and the coordinates of the plurality of key point detection results in at least two forms, the depth distance is obtained; wherein the preset external parameters include the The relative parameters formed between the at least two acquisition devices.

The method according to claim 2, wherein the depth information comprises an offset angle comprising a spatial angle of the target object relative to an optical axis of the acquisition device; The determining the depth information of the target object in the frame to be detected according to the at least two preset device parameters and the multiple key point detection results includes: According to the preset internal parameters in the at least two preset device parameters and the coordinates of the plurality of key point detection results in at least two forms, the offset angle is obtained; wherein the preset internal parameters include Device parameters respectively corresponding to the at least two devices.

According to the method described in claim 1, wherein the key point detection of the target area in the target object according to the frame to be detected includes: According to the position information of the target object in the reference frame, perform key point detection on the target area of the target object in the frame to be detected, and obtain a key point detection result corresponding to the frame to be detected, wherein the The reference frame is a film frame before the frame to be detected in the target film to which the frame to be detected belongs.

According to the method described in claim 5, wherein, according to the position information of the target object in the reference frame, the key point detection is performed on the target area of the target object in the frame to be detected, and the obtained Key point detection results corresponding to the frame to be detected, including: clipping the frame to be detected according to the first position of the target object in the reference frame to obtain a clipping result; Key point detection is performed on the target area of the target object in the clipping result to obtain a key point detection result corresponding to the frame to be detected.

According to the method described in claim 5, wherein, according to the position information of the target object in the reference frame, the key point detection is performed on the target area of the target object in the frame to be detected, and the obtained Key point detection results corresponding to the frame to be detected, including: Acquiring a second position of the target area of the target object in the reference frame; clipping the frame to be detected according to the second position to obtain a clipping result; Key point detection is performed on the target area of the target object in the clipping result to obtain a key point detection result corresponding to the frame to be detected.

The method according to claim 7, wherein said obtaining the second position of the target area of the target object in the reference frame includes: Using the first neural network to identify the target area in the reference frame to obtain the second position output by the first neural network; and/or, Obtain a second position of the target area in the reference frame according to the key point detection result corresponding to the reference frame.

The method according to any one of claims 1 to 8, wherein the method further comprises: According to the depth information of the target object, the position of the target object in the three-dimensional space is determined.

A depth detection device, comprising: An acquisition module, configured to acquire multiple frames to be detected, wherein the multiple frames to be detected include image frames obtained from image acquisition of the target object from at least two acquisition angles of view; The key point detection module is used to perform key point detection of the target area in the target object according to the frame to be detected, and determine a plurality of key point detection results corresponding to the multiple frames to be detected, wherein the target Areas include head area and/or shoulder area; The depth detection module is used to determine the depth information of the target object according to the detection results of the plurality of key points.

An electronic device comprising: processor; memory for storing processor-executable instructions; Wherein, the processor is configured to invoke the instructions stored in the memory to execute the method described in any one of claims 1-9.

A computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the method described in any one of claims 1 to 9 is realized.

A computer program product, including computer readable code, when the computer readable code is run in the electronic device, the processor in the electronic device executes the method for implementing any one of claims 1 to 9 .