TWI742690B - Method and apparatus for detecting a human body, computer device, and storage medium - Google Patents
Method and apparatus for detecting a human body, computer device, and storage medium Download PDFInfo
- Publication number
- TWI742690B TWI742690B TW109117278A TW109117278A TWI742690B TW I742690 B TWI742690 B TW I742690B TW 109117278 A TW109117278 A TW 109117278A TW 109117278 A TW109117278 A TW 109117278A TW I742690 B TWI742690 B TW I742690B
- Authority
- TW
- Taiwan
- Prior art keywords
- feature matrix
- contour
- feature
- bone
- target
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 79
- 210000000988 bone and bone Anatomy 0.000 claims abstract description 450
- 239000011159 matrix material Substances 0.000 claims description 798
- 238000000605 extraction Methods 0.000 claims description 198
- 230000004927 fusion Effects 0.000 claims description 164
- 238000013528 artificial neural network Methods 0.000 claims description 145
- 238000001514 detection method Methods 0.000 claims description 114
- 238000012545 processing Methods 0.000 claims description 102
- 230000009466 transformation Effects 0.000 claims description 102
- 238000013527 convolutional neural network Methods 0.000 claims description 76
- 238000006073 displacement reaction Methods 0.000 claims description 70
- 238000012549 training Methods 0.000 claims description 45
- 230000008569 process Effects 0.000 claims description 33
- 239000000284 extract Substances 0.000 claims description 15
- 238000004590 computer program Methods 0.000 claims description 8
- 238000004891 communication Methods 0.000 claims description 4
- 210000002805 bone matrix Anatomy 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 14
- 238000004364 calculation method Methods 0.000 description 10
- 230000006870 function Effects 0.000 description 6
- 238000007499 fusion processing Methods 0.000 description 6
- 210000001503 joint Anatomy 0.000 description 6
- 230000009471 action Effects 0.000 description 4
- 230000037237 body shape Effects 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 238000007500 overflow downdraw method Methods 0.000 description 3
- 230000011218 segmentation Effects 0.000 description 3
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 210000000544 articulatio talocruralis Anatomy 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 210000002310 elbow joint Anatomy 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 210000004394 hip joint Anatomy 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 210000000629 knee joint Anatomy 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 210000000323 shoulder joint Anatomy 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 210000003857 wrist joint Anatomy 0.000 description 2
- 210000003423 ankle Anatomy 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 210000001624 hip Anatomy 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 210000003127 knee Anatomy 0.000 description 1
- 229940050561 matrix product Drugs 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000003739 neck Anatomy 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 210000000707 wrist Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/23—Recognition of whole body movements, e.g. for sport training
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/03—Recognition of patterns in medical or anatomical images
- G06V2201/033—Recognition of patterns in medical or anatomical images of skeletal patterns
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Human Computer Interaction (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Biodiversity & Conservation Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Image Analysis (AREA)
Abstract
Description
本公開涉及圖像處理技術領域,具體而言,涉及一種人體檢測方法、裝置、電腦設備及儲存媒體。The present disclosure relates to the field of image processing technology, and in particular, to a human body detection method, device, computer equipment, and storage medium.
隨著神經網路在圖像、視頻、語音、文本等領域的應用,用戶對基於神經網路的各種模型的精度要求也越來越高。在圖像中進行人體檢測是神經網路的一種重要應用場景,對人體檢測的精細度和計算數據量的要求較高。With the application of neural networks in image, video, voice, text and other fields, users have higher and higher requirements for the accuracy of various models based on neural networks. Human detection in images is an important application scenario of neural networks, which requires high precision in human detection and the amount of calculation data.
本公開實施例的目的在於提供一種人體檢測方法、裝置、電腦設備及儲存媒體。The purpose of the embodiments of the present disclosure is to provide a human body detection method, device, computer equipment, and storage medium.
第一方面,本公開實施例提供了一種人體檢測方法,包括:獲取待檢測圖像;基於所述待檢測圖像,確定用於表徵人體骨骼結構的骨骼關鍵點的位置訊息、以及用於表徵人體輪廓的輪廓關鍵點的位置訊息;基於所述骨骼關鍵點的位置訊息、以及所述輪廓關鍵點的位置訊息,生成人體檢測結果。In a first aspect, embodiments of the present disclosure provide a human body detection method, including: acquiring an image to be detected; determining the position information of key bone points used to characterize the bone structure of the human body based on the image to be detected; The position information of the contour key points of the human body contour; and the human body detection result is generated based on the position information of the bone key points and the position information of the contour key points.
本公開實施例能夠從待檢測圖像中,確定用於表徵人體骨骼結構的骨骼關鍵點的位置訊息、以及用於表徵人體輪廓的輪廓關鍵點的位置訊息,並基於骨骼關鍵點的位置訊息、以及輪廓關鍵點的位置訊息,生成人體檢測結果,在提升表徵精細度的同時,兼顧計算數據量。The embodiment of the present disclosure can determine the position information of the key points of the skeleton used to characterize the human skeletal structure and the position information of the key points of the outline used to characterize the outline of the human body from the image to be detected, and based on the position information of the key points of the bone, As well as the position information of key points of the contour, the human body detection result is generated, which improves the precision of the representation while taking into account the amount of calculation data.
另外,本公開實施方式中,由於是採用表徵人體骨骼結構的骨骼關鍵點的位置訊息,和表徵人體輪廓的輪廓關鍵點的位置訊息來得到人體檢測結果,表徵人體的訊息更加豐富,具有更廣闊的應用場景,如圖像編輯、人體體型調整等。In addition, in the embodiments of the present disclosure, since the position information of the key points of the bones that characterize the bone structure of the human body and the position information of the key points of the outline that characterize the outline of the human body are used to obtain the human body detection results, the information that characterizes the human body is richer and has a broader scope. Application scenarios, such as image editing, body shape adjustment, etc.
一種可選實施方式中,所述輪廓關鍵點包括主輪廓關鍵點和輔助輪廓關鍵點;其中,兩個相鄰的所述主輪廓關鍵點之間存在至少一個輔助輪廓關鍵點。In an optional embodiment, the contour key points include a main contour key point and an auxiliary contour key point; wherein there is at least one auxiliary contour key point between two adjacent main contour key points.
在該實施方式中,通過主輪廓關鍵點的位置訊息和輔助輪廓關鍵點的位置訊息表徵人體輪廓,使得人體輪廓的標識更加精確,訊息量更加豐富。In this embodiment, the contour of the human body is represented by the position information of the main contour key points and the position information of the auxiliary contour key points, so that the identification of the human contour is more accurate and the amount of information is richer.
一種可選實施方式中,基於所述待檢測圖像,確定用於表徵人體輪廓的輪廓關鍵點的位置訊息,包括:基於所述待檢測圖像,確定所述主輪廓關鍵點的位置訊息;基於所述主輪廓關鍵點的位置訊息,確定人體輪廓訊息;基於確定的所述人體輪廓訊息,確定多個所述輔助輪廓關鍵點的位置訊息。In an optional implementation manner, based on the image to be detected, determining the position information of key contour points used to characterize the contour of the human body includes: determining the position information of the key contour points of the main contour based on the image to be detected; Determine the human body contour information based on the position information of the main contour key points; and determine the position information of a plurality of the auxiliary contour key points based on the determined human contour information.
在該實施方式中,能夠更加精確的定位主輪廓關鍵點的位置訊息、以及輔助輪廓關鍵點的位置訊息。In this embodiment, the position information of the key points of the main contour and the position information of the key points of the auxiliary contour can be located more accurately.
一種可選實施方式中,所述人體檢測結果包括下述一種或者多種:添加有骨骼關鍵點標記、以及輪廓關鍵點標記的待檢測圖像;包括所述骨骼關鍵點的位置訊息以及所述輪廓關鍵點的位置訊息的數據組。In an optional implementation manner, the human body detection result includes one or more of the following: an image to be detected added with bone key point markers and contour key point markers; including position information of the bone key points and the contour The data group of the location information of the key point.
在該實施方式中,包括了骨骼關鍵點標記、以及輪廓關鍵點標記的待檢測圖像能夠給人以更直觀的視覺印象;包括骨骼關鍵點的位置訊息以及輪廓關鍵點的位置訊息的數據組更易於後續處理。In this embodiment, the to-be-detected image including bone key point markers and contour key point markers can give people a more intuitive visual impression; a data group including the position information of the bone key points and the position information of the contour key points Easier to follow-up processing.
一種可選實施方式中,該方法還包括:基於所述人體檢測結果,執行下述操作中一種或者多種:人體動作識別、人體姿態檢測、人體輪廓調整、人體圖像編輯、以及人體貼圖。In an optional implementation manner, the method further includes: performing one or more of the following operations based on the human body detection result: human body motion recognition, human body posture detection, human body contour adjustment, human body image editing, and human body mapping.
在該實施方式中,基於表徵精細更高和計算數據量更少的人體檢測結果,能夠以更高的精度、更快的速度實現更多的操作。In this embodiment, based on the human body detection result with higher characterization and less calculation data amount, more operations can be realized with higher accuracy and faster speed.
一種可選實施方式中,所述基於所述待檢測圖像,確定用於表徵人體骨骼結構的骨骼關鍵點的位置訊息、以及用於表徵人體輪廓的輪廓關鍵點的位置訊息,包括:基於所述待檢測圖像,進行特徵提取以獲得骨骼特徵及輪廓特徵,並將得到的骨骼特徵和輪廓特徵進行特徵融合;基於特徵融合結果,確定所述骨骼關鍵點的位置訊息、以及所述輪廓關鍵點的位置訊息。In an optional implementation manner, the determining, based on the image to be detected, the position information of the key points of the skeleton used to characterize the human skeletal structure and the position information of the key points of the outline used to characterize the outline of the human body includes: In the image to be detected, feature extraction is performed to obtain bone features and contour features, and the obtained bone features and contour features are feature-fused; based on the feature fusion result, the position information of the bone key points and the contour key are determined Point’s location information.
該實施方式中,能夠對待檢測圖像進行特徵提取以獲得骨骼特徵和輪廓特徵,並將得到的骨骼特徵及輪廓特徵進行特徵融合,進而得到用於表徵人體骨骼結構的骨骼關鍵點的位置訊息,以及用於能夠表徵人體輪廓的輪廓關鍵點的位置訊息。基於該種方法得到的人體檢測結果,既能夠以更少的數據量表示人體,又提取到人體的骨骼特徵和輪廓特徵來表示人體,兼顧提升表徵精細度。In this embodiment, feature extraction can be performed on the image to be detected to obtain bone features and contour features, and the obtained bone features and contour features can be feature-fused to obtain position information of bone key points used to characterize the bone structure of the human body. And the position information of the key points of the outline that can characterize the outline of the human body. The human body detection result obtained based on this method can not only represent the human body with a smaller amount of data, but also extract the bone features and contour features of the human body to represent the human body, taking into account the improvement of the fineness of the representation.
一種可選實施方式中,所述基於所述待檢測圖像,進行特徵提取以獲得骨骼特徵及輪廓特徵,並將得到的骨骼特徵和輪廓特徵進行特徵融合,包括:基於所述待檢測圖像,進行至少一次特徵提取,並將每次特徵提取得到的骨骼特徵以及輪廓特徵進行特徵融合,其中,在進行多次特徵提取的情況下,基於第i次特徵融合的特徵融合結果進行第i+1次特徵提取,i為正整數;所述基於特徵融合結果,確定用於表徵人體骨骼結構的骨骼關鍵點的位置訊息、以及用於表徵人體輪廓的輪廓關鍵點的位置訊息,包括:基於最後一次特徵融合的特徵融合結果,確定所述骨骼關鍵點的位置訊息、以及所述輪廓關鍵點的位置訊息。In an optional implementation manner, the feature extraction based on the image to be detected to obtain bone features and contour features, and feature fusion of the obtained bone features and contour features includes: based on the image to be detected , Perform feature extraction at least once, and perform feature fusion on the bone features and contour features obtained from each feature extraction. In the case of multiple feature extractions, perform the i+th feature fusion result based on the feature fusion result of the i-th feature fusion. 1 feature extraction, i is a positive integer; the determination based on the result of feature fusion to determine the position information of the key points of the skeleton used to characterize the human skeletal structure and the position information of the key points of the outline used to characterize the contour of the human body includes: based on the last The feature fusion result of a feature fusion determines the position information of the bone key points and the position information of the contour key points.
在該實施方式中,對待檢測圖像進行至少一次特徵提取,並將每次特徵提取得到的骨骼特徵以及輪廓特徵進行特徵融合,能夠使得具有位置關聯關係的骨骼特徵點和輪廓特徵點進行相互矯正,最終得到的骨骼關鍵點的位置訊息、以及輪廓關鍵點的位置訊息具有更高的精度。In this embodiment, at least one feature extraction is performed on the image to be detected, and the bone features and contour features obtained from each feature extraction are feature fused, so that the bone feature points and the contour feature points that have a positional relationship can be corrected mutually. , The final position information of the bone key points and the position information of the contour key points have higher accuracy.
一種可選實施方式中,所述基於所述待檢測圖像,進行至少一次特徵提取,包括:在第一次特徵提取中,使用預先訓練的第一特徵提取網路從待檢測圖像中提取用於表徵人體骨骼特徵的骨骼關鍵點的第一目標骨骼特徵矩陣;並提取用於表徵人體輪廓特徵的輪廓關鍵點的第一目標輪廓特徵矩陣;在第i+1次特徵提取中,使用預先訓練的第二特徵提取網路從第i次特徵融合的特徵融合結果中,提取用於表徵人體骨骼特徵的骨骼關鍵點的第一目標骨骼特徵矩陣;並提取用於表徵人體輪廓特徵的輪廓關鍵點的第一目標輪廓特徵矩陣;其中,第一特徵提取網路和第二特徵提取網路的網路參數不同,且不同次的特徵提取使用的第二特徵提取網路的網路參數不同。In an optional implementation manner, the performing at least one feature extraction based on the image to be detected includes: in the first feature extraction, using a pre-trained first feature extraction network to extract from the image to be detected The first target skeletal feature matrix used to characterize the skeleton key points of the human skeleton feature; and the first target contour feature matrix of the contour key points used to characterize the contour feature of the human body is extracted; in the i+1th feature extraction, the pre- The trained second feature extraction network extracts the first target bone feature matrix of the bone key points used to characterize the human bone features from the feature fusion result of the i-th feature fusion; and extracts the contour key used to characterize the contour features of the human body The first target contour feature matrix of points; wherein the network parameters of the first feature extraction network and the second feature extraction network are different, and the network parameters of the second feature extraction network used for different times of feature extraction are different.
在該實施例中,對骨骼特徵和輪廓特徵進行至少一次提取和至少一次的融合,最終得到的骨骼關鍵點的位置訊息、以及輪廓關鍵點的位置訊息具有更高的精度。In this embodiment, the bone feature and the contour feature are extracted and merged at least once, and the position information of the bone key points and the position information of the contour key points are finally obtained with higher accuracy.
一種可選實施方式中,將提取得到的骨骼特徵和輪廓特徵進行特徵融合,包括:使用預先訓練的特徵融合神經網路對所述第一目標骨骼特徵矩陣、以及所述第一目標輪廓特徵矩陣進行特徵融合,得到第二目標骨骼特徵矩陣和第二目標輪廓特徵矩陣;其中,所述第二目標骨骼特徵矩陣為三維骨骼特徵矩陣,該三維骨骼特徵矩陣包括與各個骨骼關鍵點分別對應的二維骨骼特徵矩陣;所述二維骨骼特徵矩陣中每個元素的值,表徵與該元素對應的像素點屬對應骨骼關鍵點的概率;所述第二目標輪廓特徵矩陣為三維輪廓特徵矩陣,該三維輪廓特徵矩陣包括與各個輪廓關鍵點分別對應的二維輪廓特徵矩陣;所述二維輪廓特徵矩陣中每個元素的值,表徵與該元素對應的像素點屬對應輪廓關鍵點的概率;不同次特徵融合使用的特徵融合神經網路的網路參數不同。In an optional implementation manner, performing feature fusion of the extracted bone features and contour features includes: using a pre-trained feature fusion neural network to perform the feature fusion of the first target bone feature matrix and the first target contour feature matrix Feature fusion is performed to obtain a second target skeleton feature matrix and a second target contour feature matrix; wherein, the second target skeleton feature matrix is a three-dimensional bone feature matrix, and the three-dimensional bone feature matrix includes two corresponding bone key points. A three-dimensional skeleton feature matrix; the value of each element in the two-dimensional skeleton feature matrix represents the probability that the pixel corresponding to the element belongs to the corresponding bone key point; the second target contour feature matrix is a three-dimensional contour feature matrix, the The three-dimensional contour feature matrix includes a two-dimensional contour feature matrix corresponding to each contour key point; the value of each element in the two-dimensional contour feature matrix represents the probability that the pixel corresponding to the element belongs to the corresponding contour key point; The network parameters of the feature fusion neural network used in the secondary feature fusion are different.
該實施方式中,基於預先訓練的特徵融合網路對骨骼特徵以及輪廓特徵進行融合,能夠得到更好的特徵融合結果,使最終得到的骨骼關鍵點的位置訊息、以及輪廓關鍵點的位置訊息具有更高的精度。In this embodiment, the bone features and contour features are fused based on the pre-trained feature fusion network, which can obtain better feature fusion results, so that the final position information of the bone key points and the position information of the contour key points have Higher accuracy.
一種可選實施方式中,所述基於最後一次特徵融合的特徵融合結果,確定所述骨骼關鍵點的位置訊息、以及所述輪廓關鍵點的位置訊息,包括:基於最後一次特徵融合得到的第二目標骨骼特徵矩陣,確定所述骨骼關鍵點的位置訊息;以及基於最後一次特徵融合得到的第二目標輪廓特徵矩陣,確定所述輪廓關鍵點的位置訊息。In an optional embodiment, the determining the position information of the bone key points and the position information of the contour key points based on the feature fusion result of the last feature fusion includes: the second feature fusion obtained based on the last feature fusion The target skeleton feature matrix determines the position information of the skeleton key points; and based on the second target contour feature matrix obtained from the last feature fusion, the position information of the contour key points is determined.
該實施方式中,經過至少一次特徵提取和特徵融合,使最終得到的骨骼關鍵點的位置訊息、以及輪廓關鍵點的位置訊息具有更高的精度。In this embodiment, after at least one feature extraction and feature fusion, the position information of the bone key points and the position information of the contour key points finally obtained have higher accuracy.
一種可選實施方式中,所述第一特徵提取網路包括:共有特徵提取網路、第一骨骼特徵提取網路以及第一輪廓特徵提取網路;使用第一特徵提取網路從待檢測圖像中提取用於表徵人體骨骼特徵的骨骼關鍵點的第一目標骨骼特徵矩陣;並提取用於表徵人體輪廓特徵的輪廓關鍵點的第一目標輪廓特徵矩陣,包括:使用所述共有特徵提取網路對所述待檢測圖像進行卷積處理,得到包含骨骼特徵以及輪廓特徵的基礎特徵矩陣;使用所述第一骨骼特徵提取網路對所述基礎特徵矩陣進行卷積處理,得到第一骨骼特徵矩陣,並從所述第一骨骼特徵提取網路中的第一目標卷積層獲取第二骨骼特徵矩陣;基於所述第一骨骼特徵矩陣以及所述第二骨骼特徵矩陣,得到所述第一目標骨骼特徵矩陣;所述第一目標卷積層為所述第一骨骼特徵提取網路中,除最後一層卷積層外的其他任一卷積層;使用所述第一輪廓特徵提取網路,對所述基礎特徵矩陣進行卷積處理,得到第一輪廓特徵矩陣,並從所述第一輪廓特徵提取網路中的第二目標卷積層獲取第二輪廓特徵矩陣;基於所述第一輪廓特徵矩陣以及所述第二輪廓特徵矩陣,得到所述第一目標輪廓特徵矩陣;所述第二目標卷積層為所述第一輪廓特徵提取網路中,除最後一層卷積層外的其他任一卷積層。In an alternative embodiment, the first feature extraction network includes: a common feature extraction network, a first bone feature extraction network, and a first contour feature extraction network; the first feature extraction network is used to extract the image from the image to be inspected. Extracting the first target skeleton feature matrix of the skeleton key points used to characterize the human skeleton features from the image; and extracting the first target contour feature matrix of the contour key points used to characterize the contour features of the human body includes: using the shared feature extraction network Road performs convolution processing on the image to be detected to obtain a basic feature matrix containing bone features and contour features; using the first bone feature extraction network to perform convolution processing on the basic feature matrix to obtain a first skeleton Feature matrix, and obtain a second skeleton feature matrix from the first target convolutional layer in the first skeleton feature extraction network; and obtain the first skeleton feature matrix based on the first skeleton feature matrix and the second skeleton feature matrix Target skeleton feature matrix; the first target convolutional layer is any convolutional layer except the last convolutional layer in the first bone feature extraction network; using the first contour feature extraction network, The basic feature matrix is subjected to convolution processing to obtain a first contour feature matrix, and a second contour feature matrix is obtained from the second target convolution layer in the first contour feature extraction network; based on the first contour feature matrix and The second contour feature matrix is used to obtain the first target contour feature matrix; the second target convolutional layer is any convolutional layer except the last convolutional layer in the first contour feature extraction network.
該實施方式中,使用共有特徵提取網路提取骨骼特徵和輪廓特徵,去除待檢測圖像中除骨骼特徵和輪廓特徵外的其他特徵,然後分別使用第一骨骼特徵提取網路對骨骼特徵進行針對性提取,使用第一輪廓特徵提取網路對輪廓特徵進行針對性提取,所需要耗費的計算量更少。In this embodiment, the common feature extraction network is used to extract bone features and contour features, and other features other than bone features and contour features in the image to be detected are removed, and then the first bone feature extraction network is used to target the bone features. It uses the first contour feature extraction network to extract the contour features in a targeted manner, which requires less calculation.
一種可選實施方式中,所述基於所述第一骨骼特徵矩陣以及所述第二骨骼特徵矩陣,得到所述第一目標骨骼特徵矩陣,包括:將所述第一骨骼特徵矩陣以及所述第二骨骼特徵矩陣進行拼接處理,得到第一拼接骨骼特徵矩陣;對所述第一拼接骨骼特徵矩陣進行維度變換處理,得到所述第一目標骨骼特徵矩陣;所述基於所述第一輪廓特徵矩陣以及所述第二輪廓特徵矩陣,得到所述第一目標輪廓特徵矩陣,包括:將所述第一輪廓特徵矩陣以及所述第二輪廓特徵矩陣進行拼接處理,得到第一拼接輪廓特徵矩陣;對所述第一拼接輪廓特徵矩陣進行維度變換處理,得到所述第一目標輪廓特徵矩陣;其中,所述第一目標骨骼特徵矩陣的維度與所述第一目標輪廓特徵矩陣的維度相同、且所述第一目標骨骼特徵矩陣與所述第一目標輪廓特徵矩陣在相同維度上的維數相同。In an optional implementation manner, the obtaining the first target bone feature matrix based on the first bone feature matrix and the second bone feature matrix includes: combining the first bone feature matrix and the second bone feature matrix The two bone feature matrices are spliced to obtain a first spliced bone feature matrix; the first spliced bone feature matrix is subjected to dimensional transformation processing to obtain the first target bone feature matrix; the first contour feature matrix is based on And the second contour feature matrix to obtain the first target contour feature matrix, including: concatenating the first contour feature matrix and the second contour feature matrix to obtain the first concatenated contour feature matrix; The first spliced contour feature matrix is subjected to dimensional transformation processing to obtain the first target contour feature matrix; wherein the dimension of the first target skeleton feature matrix is the same as the dimension of the first target contour feature matrix, and The first target skeleton feature matrix and the first target contour feature matrix have the same dimension in the same dimension.
該實施方式中,將第一骨骼特徵矩陣以及所述第二骨骼特徵矩陣進行拼接處理,使得第一目標骨骼特徵矩陣中具有更加豐富的骨骼特徵訊息;同時將第一輪廓特徵矩陣以及所述第二輪廓特徵矩陣進行拼接處理,使得第一目標輪廓特徵矩陣具有更加豐富的骨骼特徵訊息,在後續的特徵融合過程中,能夠以更高的精度提取得到骨骼關鍵點的位置訊息、以及輪廓關鍵點的位置訊息。In this embodiment, the first skeletal feature matrix and the second skeletal feature matrix are spliced, so that the first target skeletal feature matrix has richer skeletal feature information; at the same time, the first contour feature matrix and the second skeleton feature matrix are combined. The two contour feature matrices are spliced, so that the contour feature matrix of the first target has richer bone feature information. In the subsequent feature fusion process, the position information of the bone key points and the contour key points can be extracted with higher precision Location information.
一種可選實施方式中,所述特徵融合神經網路包括:第一卷積神經網路、第二卷積神經網路、第一變換神經網路、以及第二變換神經網路;所述使用特徵融合神經網路對所述第一目標骨骼特徵矩陣、以及所述第一目標輪廓特徵矩陣進行特徵融合,得到第二目標骨骼特徵矩陣和第二目標輪廓特徵矩陣,包括:使用所述第一卷積神經網路對所述第一目標骨骼特徵矩陣進行卷積處理,得到第一中間骨骼特徵矩陣;以及使用所述第二卷積神經網路對所述第一目標輪廓特徵矩陣進行卷積處理,得到第一中間輪廓特徵矩陣;將所述第一中間輪廓特徵矩陣與所述第一目標骨骼特徵矩陣進行拼接處理,得到第一拼接特徵矩陣;並使用所述第一變換神經網路對所述第一拼接特徵矩陣進行維度變換,得到所述第二目標骨骼特徵矩陣;將所述第一中間骨骼特徵矩陣與所述第一目標輪廓特徵矩陣進行拼接處理,得到第二拼接特徵矩陣,並使用所述第二變換神經網路對所述第二拼接特徵矩陣進行維度變換,得到所述第二目標輪廓特徵矩陣。In an alternative embodiment, the feature fusion neural network includes: a first convolutional neural network, a second convolutional neural network, a first transformation neural network, and a second transformation neural network; the use The feature fusion neural network performs feature fusion on the first target skeleton feature matrix and the first target contour feature matrix to obtain a second target skeleton feature matrix and a second target contour feature matrix, including: using the first The convolutional neural network performs convolution processing on the first target skeleton feature matrix to obtain a first intermediate skeleton feature matrix; and uses the second convolutional neural network to convolve the first target contour feature matrix Processing to obtain a first intermediate contour feature matrix; splicing the first intermediate contour feature matrix and the first target bone feature matrix to obtain a first splicing feature matrix; and using the first transformation neural network to Performing dimensional transformation on the first splicing feature matrix to obtain the second target bone feature matrix; performing splicing processing on the first intermediate bone feature matrix and the first target contour feature matrix to obtain a second splicing feature matrix, The second transformation neural network is used to perform dimensional transformation on the second splicing feature matrix to obtain the second target contour feature matrix.
該實施方式中,通過將所述第一中間輪廓特徵矩陣與所述第一目標骨骼特徵矩陣進行拼接處理,並基於拼接處理結果得到第二目標骨骼特徵矩陣的方式,將骨骼特徵和輪廓特徵進行融合,以實現使用輪廓特徵提取得到的骨骼特徵進行矯正。另外,通過將所述第一中間骨骼特徵矩陣與所述第一目標輪廓特徵矩陣進行拼接處理,並基於拼接處理結果得到第二目標輪廓特徵矩陣的方式,以將骨骼特徵和輪廓特徵進行融合,以實現使用骨骼特徵對提取得到的輪廓特徵進行矯正。進而,能夠以更高的精度提取得到骨骼關鍵點的位置訊息、以及輪廓關鍵點的位置訊息。In this embodiment, the first intermediate contour feature matrix and the first target bone feature matrix are spliced, and the second target bone feature matrix is obtained based on the splicing processing result. Fusion to achieve correction using the bone features obtained by contour feature extraction. In addition, by performing stitching processing on the first intermediate skeleton feature matrix and the first target contour feature matrix, and obtaining a second target contour feature matrix based on the result of the stitching processing, the skeletal features and contour features are merged, In order to achieve the use of bone features to correct the extracted contour features. Furthermore, the position information of the bone key points and the position information of the contour key points can be extracted with higher accuracy.
一種可選實施方式中,所述特徵融合神經網路包括:第一定向卷積神經網路、第二定向卷積神經網路、第三卷積神經網路、第四卷積神經網路、第三變換神經網路、以及第四變換神經網路;所述使用特徵融合神經網路對所述第一目標骨骼特徵矩陣、以及所述第一目標輪廓特徵矩陣進行特徵融合,得到第二目標骨骼特徵矩陣和第二目標輪廓特徵矩陣,包括:使用所述第一定向卷積神經網路對所述第一目標骨骼特徵矩陣進行定向卷積處理,得到第一定向骨骼特徵矩陣;並使用第三卷積神經網路對所述第一定向骨骼特徵矩陣進行卷積處理,得到第二中間骨骼特徵矩陣;以及使用所述第二定向卷積神經網路對所述第一目標輪廓特徵矩陣進行定向卷積處理,得到第一定向輪廓特徵矩陣;並使用第四卷積神經網路對所述第一定向輪廓特徵矩陣進行卷積處理,得到第二中間輪廓特徵矩陣;將所述第二中間輪廓特徵矩陣與所述第一目標骨骼特徵矩陣進行拼接處理,得到第三拼接特徵矩陣;並使用第三變換神經網路對所述第三拼接特徵矩陣進行維度變換,得到所述第二目標骨骼特徵矩陣;將所述第二中間骨骼特徵矩陣與所述第一目標輪廓特徵矩陣進行拼接處理,得到第四拼接特徵矩陣,並使用第四變換神經網路對所述第四拼接特徵矩陣進行維度變換,得到所述第二目標輪廓特徵矩陣。In an optional implementation manner, the feature fusion neural network includes: a first directional convolutional neural network, a second directional convolutional neural network, a third convolutional neural network, and a fourth convolutional neural network , The third transformation neural network, and the fourth transformation neural network; said using the feature fusion neural network to perform feature fusion on the first target skeleton feature matrix and the first target contour feature matrix to obtain the second The target skeleton feature matrix and the second target contour feature matrix include: performing directional convolution processing on the first target skeleton feature matrix using the first directional convolutional neural network to obtain the first directional skeleton feature matrix; And use a third convolutional neural network to perform convolution processing on the first directional bone feature matrix to obtain a second intermediate skeletal feature matrix; and use the second directional convolutional neural network to perform convolution processing on the first target Performing directional convolution processing on the contour feature matrix to obtain a first directional contour feature matrix; and using a fourth convolution neural network to perform convolution processing on the first directional contour feature matrix to obtain a second intermediate contour feature matrix; The second intermediate contour feature matrix and the first target bone feature matrix are spliced to obtain a third spliced feature matrix; and a third transformation neural network is used to perform dimensional transformation on the third spliced feature matrix to obtain The second target skeleton feature matrix; the second intermediate skeleton feature matrix and the first target contour feature matrix are spliced to obtain a fourth spliced feature matrix, and a fourth transform neural network is used to perform the splicing process on the first Four splicing feature matrices are subjected to dimensional transformation to obtain the second target contour feature matrix.
該實施方式中,通過定向卷積的方式對特徵進行融合處理,能夠以更高的精度提取得到骨骼關鍵點的位置訊息、以及輪廓關鍵點的位置訊息。In this embodiment, the feature fusion processing is performed by means of directional convolution, and the position information of the bone key points and the position information of the contour key points can be extracted with higher accuracy.
一種可選實施方式中,所述特徵融合神經網路包括:位移估計神經網路、第五變換神經網路;所述使用特徵融合神經網路對所述第一目標骨骼特徵矩陣、以及所述第一目標輪廓特徵矩陣進行特徵融合,得到第二目標骨骼特徵矩陣和第二目標輪廓特徵矩陣,包括:對所述第一目標骨骼特徵矩陣和所述第一目標輪廓特徵矩陣進行拼接處理,得到第五拼接特徵矩陣;將所述第五拼接特徵矩陣輸入至所述位移估計神經網路中,對預先確定的多組關鍵點對進行位移估計,得到每組關鍵點對中的一個關鍵點移動至另一關鍵點的位移訊息;將每組關鍵點對中的每個關鍵點分別作為當前關鍵點,從與該當前關鍵點配對的另一關鍵點對應的三維特徵矩陣中,獲取與所述配對的另一關鍵點對應的二維特徵矩陣;根據從所述配對的另一關鍵點到所述當前關鍵點的位移訊息,對所述配對的另一關鍵點對應的二維特徵矩陣中的元素進行位置變換,得到與該當前關鍵點對應的位移特徵矩陣;針對每個骨骼關鍵點,將該骨骼關鍵點對應的二維特徵矩陣,與其對應的各個位移特徵矩陣進行拼接處理,得到該骨骼關鍵點的拼接二維特徵矩陣;並將該骨骼關鍵點的拼接二維特徵矩陣輸入至所述第五變換神經網路,得到與該骨骼關鍵點對應的目標二維特徵矩陣;基於各個骨骼關鍵點分別對應的目標二維特徵矩陣,生成所述第二目標骨骼特徵矩陣;針對每個輪廓關鍵點,將該輪廓關鍵點對應的二維特徵矩陣,與其對應的各個位移特徵矩陣進行拼接處理,得到該輪廓關鍵點的拼接二維特徵矩陣;並將該輪廓關鍵點的拼接二維特徵矩陣輸入至所述第五變換神經網路,得到與該輪廓關鍵點對應的目標二維特徵矩陣;基於各個輪廓關鍵點分別對應的目標二維特徵矩陣,生成所述第二目標輪廓特徵矩陣。In an optional implementation manner, the feature fusion neural network includes: a displacement estimation neural network, a fifth transformation neural network; the use of the feature fusion neural network to compare the feature matrix of the first target bone, and the Performing feature fusion on the first target contour feature matrix to obtain the second target skeleton feature matrix and the second target contour feature matrix includes: splicing the first target skeleton feature matrix and the first target contour feature matrix to obtain Fifth splicing feature matrix; input the fifth splicing feature matrix into the displacement estimation neural network, and perform displacement estimation on multiple sets of predetermined key point pairs to obtain a key point movement in each group of key point pairs Displacement information to another key point; each key point in each group of key point pairs is used as the current key point, and the three-dimensional feature matrix corresponding to the other key point paired with the current key point is obtained from the three-dimensional feature matrix corresponding to the other key point. The two-dimensional feature matrix corresponding to the other key point of the pair; according to the displacement information from the other key point of the pair to the current key point, the two-dimensional feature matrix corresponding to the other key point of the pair is The element undergoes position transformation to obtain the displacement feature matrix corresponding to the current key point; for each bone key point, the two-dimensional feature matrix corresponding to the bone key point is spliced with its corresponding displacement feature matrix to obtain the bone The spliced two-dimensional feature matrix of the key points; and input the spliced two-dimensional feature matrix of the bone key points to the fifth transformation neural network to obtain the target two-dimensional feature matrix corresponding to the bone key points; based on each bone key The target two-dimensional feature matrix corresponding to each point is generated to generate the second target bone feature matrix; for each contour key point, the two-dimensional feature matrix corresponding to the contour key point is spliced with its corresponding displacement feature matrix, Obtain the spliced two-dimensional feature matrix of the contour key point; input the spliced two-dimensional feature matrix of the contour key point to the fifth transformation neural network to obtain the target two-dimensional feature matrix corresponding to the contour key point; based on The target two-dimensional feature matrix corresponding to each contour key point respectively generates the second target contour feature matrix.
該實施方式中,通過對骨骼關鍵點,以及輪廓關鍵點進行位移變換的方式實現特徵融合,能夠以更高的精度提取得到骨骼關鍵點的位置訊息、以及輪廓關鍵點的位置訊息。In this embodiment, the feature fusion is realized by performing displacement transformation on the bone key points and the contour key points, and the position information of the bone key points and the position information of the contour key points can be extracted with higher accuracy.
一種可選實施方式中,所述人體檢測方法通過人體檢測模型實現;所述人體檢測模型包括:所述第一特徵提取網路和/或所述特徵融合神經網路:所述人體檢測模型為利用訓練樣本集中的樣本圖像訓練得到的,所述樣本圖像標注有人體骨骼結構的骨骼關鍵點的實際位置訊息、以及人體輪廓的輪廓關鍵點的實際位置訊息。In an optional embodiment, the human body detection method is implemented by a human body detection model; the human body detection model includes: the first feature extraction network and/or the feature fusion neural network: the human body detection model is It is obtained by training using sample images in the training sample set, and the sample images are marked with actual position information of bone key points of the human skeletal structure and actual position information of the contour key points of the outline of the human body.
該實施方式中,通過該訓練方法的到的人體檢測模型具有更高的檢測精度,並通過該人體檢測模型能夠得到兼顧表徵精細度以及計算數據量的人體檢測結果。In this embodiment, the human body detection model obtained through the training method has higher detection accuracy, and the human body detection model can obtain the human body detection result that takes into account the fineness of the representation and the amount of calculation data.
第二方面,本公開實施例還提供一種人體檢測裝置,包括:獲取模組,用於獲取待檢測圖像;檢測模組,用於基於所述待檢測圖像,確定用於表徵人體骨骼結構的骨骼關鍵點的位置訊息、以及用於表徵人體輪廓的輪廓關鍵點的位置訊息;生成模組,用於基於所述骨骼關鍵點的位置訊息、以及所述輪廓關鍵點的位置訊息,生成人體檢測結果。In a second aspect, the embodiments of the present disclosure also provide a human body detection device, including: an acquisition module for acquiring an image to be detected; a detection module for determining a skeleton structure of the human body based on the image to be detected The position information of the bone key points and the position information of the contour key points used to represent the outline of the human body; a generating module for generating the human body based on the position information of the bone key points and the position information of the contour key points Test results.
第三方面,本公開實施例還提供一種電腦設備,包括:處理器、非暫時性儲存媒體和匯流排,所述非暫時性儲存媒體儲存有所述處理器可執行的機器可讀指令,當電腦設備運行的情況下,所述處理器與所述儲存媒體之間通過匯流排通信,所述機器可讀指令被所述處理器執行的情況下執行上述第一方面,或第一方面中任一種可能的實施方式中的步驟。In a third aspect, embodiments of the present disclosure also provide a computer device, including a processor, a non-transitory storage medium, and a bus, the non-transitory storage medium storing machine-readable instructions executable by the processor, when When the computer device is running, the processor and the storage medium communicate through a bus, and the machine-readable instruction executes the first aspect or any of the first aspects when the machine-readable instruction is executed by the processor. Steps in one possible implementation.
第四方面,本公開實施例還提供一種電腦可讀取儲存媒體,該電腦可讀取儲存媒體上儲存有電腦程式,該電腦程式被處理器運行的情況下執行上述第一方面,或第一方面中任一種可能的實施方式中的步驟。In a fourth aspect, the embodiments of the present disclosure also provide a computer-readable storage medium having a computer program stored on the computer-readable storage medium, and the computer program executes the first aspect or the first aspect when the computer program is run by a processor. The steps in any possible implementation of the aspect.
本公開實施例能夠從待檢測圖像中,確定用於表徵人體骨骼結構的骨骼關鍵點的位置訊息、以及用於表徵人體輪廓的輪廓關鍵點的位置訊息,並基於骨骼關鍵點的位置訊息、以及輪廓關鍵點的位置訊息,生成人體檢測結果,在提升表徵精細度的同時,兼顧計算數據量。The embodiment of the present disclosure can determine the position information of the key points of the skeleton used to characterize the human skeletal structure and the position information of the key points of the outline used to characterize the outline of the human body from the image to be detected, and based on the position information of the key points of the bone, As well as the position information of key points of the contour, the human body detection result is generated, which improves the precision of the representation while taking into account the amount of calculation data.
為使本公開的上述目的、特徵和優點能更明顯易懂,下文特舉較佳實施例,並配合所附附圖,作詳細說明如下。In order to make the above objectives, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with accompanying drawings are described in detail as follows.
為使本公開實施例的目的、技術方案和優點更加清楚,下面將結合本公開實施例中附圖,對本公開實施例中的技術方案進行清楚、完整地描述,顯然,所描述的實施例僅僅是本公開一部分實施例,而不是全部的實施例。通常在附圖中描述和示出的本公開實施例的組件可以以各種不同的配置來佈置和設計。因此,以下結合附圖所提供的本公開的實施例的詳細描述並非旨在限制要求保護的本公開的範圍,而是僅僅表示本公開的實施例。基於本公開的實施例,本領域技術人員在沒有做出創造性勞動的前提下所獲得的所有其他實施例,都屬於本公開保護的範圍。In order to make the objectives, technical solutions and advantages of the embodiments of the present disclosure clearer, the technical solutions in the embodiments of the present disclosure will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only These are a part of the embodiments of the present disclosure, but not all of the embodiments. The components of the embodiments of the present disclosure generally described and illustrated in the accompanying drawings may be arranged and designed in various different configurations. Therefore, the following detailed description of the embodiments of the present disclosure provided in conjunction with the accompanying drawings is not intended to limit the scope of the claimed present disclosure, but merely represents the embodiments of the present disclosure. Based on the embodiments of the present disclosure, all other embodiments obtained by those skilled in the art without creative work shall fall within the protection scope of the present disclosure.
經研究發現,在進行人體檢測時,通常有下述兩種方式:骨骼關鍵點檢測法和語義分割法。Research has found that in human body detection, there are usually the following two methods: bone key point detection method and semantic segmentation method.
骨骼關鍵點檢測法;在該種方法中,通過神經網路模型從圖像中提取人體的骨骼關鍵點,並基於骨骼關鍵點得到對應的人體檢測結果;在該種人體檢測方法中,其採用了簡單的人體表示方法,具有更少的數據量,因而在基於該種方法得到的人體檢測結果進行其他後續處理時,所需要耗費的計算量也較少;其更多的被用於人體的姿勢、動作識別等領域;例如行為檢測、基於人體姿態的人機交互等領域;但由於該種方法並不能提取到人體的輪廓訊息,使得得到的人體檢測結果表徵精細度低。Skeleton key point detection method; in this method, the human bone key points are extracted from the image through the neural network model, and the corresponding human body detection results are obtained based on the bone key points; in this human body detection method, it uses The simple human body representation method has less data volume, so when other follow-up processing is performed based on the human body detection results obtained by this method, the amount of calculation required is less; more of it is used for the human body Gesture, action recognition and other fields; such as behavior detection, human-computer interaction based on human posture, etc. However, because this method cannot extract the contour information of the human body, the resulting human body detection results have low representation precision.
語義分割法;在該種方法中,通過語義分割模型識別圖像中每一個像素點屬人體的概率,並基於圖像中各個像素點屬人體的概率,得到人體檢測結果;在該種人體檢測方法,雖然能夠完整的得到人體的輪廓訊息,但人體識別結果中所包含的計算數據量較大。Semantic segmentation method; in this method, the probability that each pixel in the image belongs to the human body is recognized through the semantic segmentation model, and the human body detection result is obtained based on the probability that each pixel in the image belongs to the human body; in this kind of human body detection In this method, although the contour information of the human body can be obtained completely, the amount of calculation data contained in the human body recognition result is relatively large.
因此,一種能夠兼顧表徵精細度和計算數據量的人體檢測方法成為當前亟待解決的問題。Therefore, a human body detection method that can take into account the fineness of representation and the amount of calculated data has become a problem that needs to be solved urgently.
基於上述研究,本公開提供了一種人體檢測方法、裝置、電腦設備及儲存媒體,能夠對待檢測圖像進行特徵提取以提取得到人體的骨骼特徵和輪廓特徵,並將提取得到的骨骼特徵及輪廓特徵進行特徵融合,進而得到用於表徵人體骨骼結構的骨骼關鍵點的位置訊息,以及用於表徵人體輪廓的輪廓關鍵點的位置訊息。基於該種方法得到的人體檢測結果,具有更少的數據量,而且反映了人體的骨骼特徵和輪廓特徵,兼顧提升表徵精細度。Based on the above research, the present disclosure provides a human body detection method, device, computer equipment and storage medium, which can perform feature extraction on the image to be detected to extract the bone features and contour features of the human body, and extract the bone features and contour features obtained. The feature fusion is performed to obtain the position information of the key points of the bones used to characterize the skeleton structure of the human body, and the position information of the key points of the outline used to characterize the outline of the human body. The human body detection result obtained based on this method has less data volume, and reflects the skeletal characteristics and contour characteristics of the human body, and improves the fineness of representation.
另外,本公開實施例中,由於是採用表徵人體骨骼結構的骨骼關鍵點的位置訊息,和表徵人體輪廓的輪廓關鍵點的位置訊息來得到人體檢測結果,表徵人體的訊息更加豐富,具有更廣闊的應用場景。In addition, in the embodiments of the present disclosure, since the position information of the key points of the bones that characterize the skeleton structure of the human body and the position information of the key points of the outline that characterize the outline of the human body are used to obtain the human body detection results, the information that characterizes the human body is richer and has a broader scope. Application scenarios.
針對現有的人體檢測方式所存在的缺陷,需要經過反復實踐並仔細研究後才能確定,因此,對現有問題的發現過程以及本公開所提出的解決方案,都應該落入本公開的範圍之內。The defects of the existing human detection methods need to be determined after repeated practice and careful study. Therefore, the discovery process of the existing problems and the solutions proposed in this disclosure should all fall within the scope of this disclosure.
以下對根據本公開實施例的一種人體檢測方法進行詳細介紹,該人體檢測方法可適用於具有數據處理能力的任意設備,例如計算機。The following describes in detail a human body detection method according to an embodiment of the present disclosure. The human body detection method can be applied to any device with data processing capability, such as a computer.
參見圖1所示,為本公開實施例提供的人體檢測方法的流程圖,其中:Refer to Fig. 1, which is a flowchart of a human body detection method provided by an embodiment of the present disclosure, in which:
步驟S101:獲取待檢測圖像。Step S101: Obtain an image to be detected.
步驟S102:基於待檢測圖像,確定用於表徵人體骨骼結構的骨骼關鍵點的位置訊息、以及用於表徵人體輪廓的輪廓關鍵點的位置訊息。Step S102: Based on the image to be detected, determine the position information of the bone key points used to characterize the human skeletal structure and the position information of the contour key points used to characterize the outline of the human body.
步驟S103:基於骨骼關鍵點的位置訊息、以及輪廓關鍵點的位置訊息,生成人體檢測結果。Step S103: Generate a human body detection result based on the position information of the bone key points and the position information of the contour key points.
下面分別對上述步驟S101~S103加以說明。The above steps S101 to S103 are respectively described below.
I:在上述步驟S101中,待檢測圖像可以是,例如安裝在目標位置的攝像頭所拍攝得到的待檢測圖像,其他電腦設備發送的待檢測圖像,從本地數據庫中讀取的預先保存的待檢測圖像等。待檢測圖像中可以包括人體圖像,也可以不包括人體圖像;若待檢測圖像中包括人體圖像,則能夠基於本公開實施例提供的人體檢測方法,得到最終的人體檢測結果;若待檢測圖像中不包括人體圖像,則得到的人體檢測結果例如為空。I: In the above step S101, the image to be inspected may be, for example, the image to be inspected taken by a camera installed at the target location, the image to be inspected sent by other computer equipment, and it is pre-saved from the local database. The image to be detected, etc. The image to be detected may or may not include a human body image; if the image to be detected includes a human body image, the final human body detection result can be obtained based on the human body detection method provided by the embodiment of the present disclosure; If the image to be detected does not include a human body image, the human body detection result obtained is, for example, empty.
II:在上述步驟S102中,如圖2 a所示,骨骼關鍵點可以用於表徵人體的骨骼特徵,該骨骼特徵包括人體的關節部位的特徵。關節例如為肘關節、手腕關節、肩關節、頸關節、胯關節、膝關節、踝關節等。示例性的,還可以在人體頭部設置骨骼關鍵點。II: In the above step S102, as shown in FIG. 2a, the bone key points can be used to characterize the bone features of the human body, and the bone features include the characteristics of the joint parts of the human body. The joints are, for example, elbow joints, wrist joints, shoulder joints, neck joints, hip joints, knee joints, ankle joints, and the like. Exemplarily, bone key points can also be set on the human head.
輪廓關鍵點可以用於表徵人體的輪廓特徵,其可以包括:主輪廓關鍵點,如圖2a所示,或者包括:主輪廓關鍵點和輔助輪廓關鍵點,如圖2b~圖2d所示;其中,圖2b~圖2d是圖2a中線框內的部位的局部圖。Contour key points can be used to characterize the contour features of the human body, which can include: main contour key points, as shown in Figure 2a, or include: main contour key points and auxiliary contour key points, as shown in Figure 2b~Figure 2d; , Figures 2b~2d are partial diagrams of the parts within the line frame in Figure 2a.
其中,主輪廓關鍵點是表徵人體關節部位輪廓的輪廓關鍵點,如圖2a所示,例如肘關節的輪廓、腕關節的輪廓、肩關節的輪廓、頸關節的輪廓、胯關節的輪廓、膝關節的輪廓、踝關節的輪廓等,其一般與表徵對應關節部位的骨骼關鍵點對應出現。Among them, the main contour key points are the contour key points that characterize the contours of the human body joints, as shown in Figure 2a, such as the contours of the elbow joint, the contour of the wrist joint, the contour of the shoulder joint, the contour of the neck joint, the contour of the hip joint, and the knee joint. The contours of the joints, the contours of the ankle joints, etc., generally appear corresponding to the key points of the bones that characterize the corresponding joints.
輔助輪廓關鍵點是表徵人體關節部位之間輪廓的輪廓關鍵點,兩個相鄰主輪廓關鍵點之間的輔助輪廓關鍵點至少有一個;如圖2b示出示例中,兩個主輪廓關鍵點之間的輔助輪廓關鍵點有一個;如圖2c示出示例中,兩個主輪廓關鍵點之間的輔助輪廓關鍵點有兩個;如圖2d示出示例中,兩個主輪廓關鍵點之間的輔助輪廓關鍵點有三個。Auxiliary contour key points are contour key points that characterize the contour between the joints of the human body. There is at least one auxiliary contour key point between two adjacent main contour key points; in the example shown in Figure 2b, there are two main contour key points There is one auxiliary contour key point between the two; in the example shown in Figure 2c, there are two auxiliary contour key points between the two main contour key points; Figure 2d shows the example, one of the two main contour key points There are three key points in the auxiliary contour between.
以上附圖和文字描述中涉及到的骨骼關鍵點和輪廓關鍵點僅作為示例,以便於對本公開的理解。實際應用中,可以根據實際場景適當調整骨骼關鍵點和輪廓關鍵點的數量以及位置,本公開對此並不限定。The bone key points and outline key points involved in the above drawings and text descriptions are only examples to facilitate the understanding of the present disclosure. In actual applications, the number and positions of bone key points and contour key points can be appropriately adjusted according to the actual scene, which is not limited in the present disclosure.
針對輪廓關鍵點包括:主輪廓關鍵點和輔助輪廓關鍵點的情況,可以採用下述方式基於待檢測圖像,確定用於表徵人體輪廓的輪廓關鍵點的位置訊息:For contour key points including: main contour key points and auxiliary contour key points, the following methods can be used to determine the position information of the contour key points used to characterize the contour of the human body based on the image to be detected:
基於待檢測圖像,確定主輪廓關鍵點的位置訊息;基於主輪廓關鍵點的位置訊息,確定人體輪廓訊息;基於確定的人體輪廓訊息,確定多個輔助輪廓關鍵點的位置訊息。Based on the image to be detected, determine the position information of the main contour key points; determine the human contour information based on the position information of the main contour key points; determine the position information of multiple auxiliary contour key points based on the determined human contour information.
針對輪廓關鍵點包括主輪廓關鍵點的情況,則直接基於待檢測圖像,確定主輪廓關鍵點的位置訊息即可。For the case where the contour key points include the main contour key points, the position information of the main contour key points can be determined directly based on the image to be detected.
本公開實施例提供一種基於待檢測圖像,確定用於表徵人體骨骼結構的骨骼關鍵點的位置訊息、以及用於表徵人體輪廓的輪廓關鍵點的位置訊息的具體方法:The embodiments of the present disclosure provide a specific method for determining the position information of the key points of bones used to characterize the human skeletal structure and the position information of the key points of the outline used to characterize the outline of the human body based on the image to be detected:
基於待檢測圖像,進行特徵提取以獲得骨骼特徵及輪廓特徵,並將得到的骨骼特徵和輪廓特徵進行特徵融合;基於特徵融合結果,確定骨骼關鍵點的位置訊息、以及輪廓關鍵點的位置訊息。Based on the image to be detected, perform feature extraction to obtain bone features and contour features, and perform feature fusion on the obtained bone features and contour features; based on the feature fusion results, determine the position information of the bone key points and the position information of the contour key points .
基於待檢測圖像,進行骨骼特徵及輪廓特徵提取,可以採用但不限於下述A或B中任意一種。Based on the image to be detected, the bone feature and contour feature extraction can be performed but not limited to any of the following A or B.
A:對待檢測圖像,進行一次特徵提取,並對該次特徵提取得到的骨骼特徵以及輪廓特徵進行特徵融合。A: Perform a feature extraction on the image to be detected, and perform feature fusion on the bone features and contour features obtained by the feature extraction.
B:對待檢測圖像,進行多次特徵提取,並在每次進行特徵提取後,對該次特徵提取得到的骨骼特徵及輪廓特徵進行特徵融合,並基於最後一次特徵融合的特徵融合結果,確定骨骼關鍵點的位置訊息、以及輪廓關鍵點的位置訊息。B: Perform multiple feature extraction on the image to be detected, and after each feature extraction, perform feature fusion on the bone features and contour features obtained from this feature extraction, and determine based on the feature fusion result of the last feature fusion The position information of the bone key points and the position information of the contour key points.
以下將首先對A情況進行具體的描述。The following will firstly describe the situation A in detail.
在A情況下,基於該次特徵融合的特徵融合結果,確定用於表徵人體骨骼結構的骨骼關鍵點的位置訊息和用於表徵人體輪廓的輪廓關鍵點的位置訊息。In case A, based on the feature fusion result of this feature fusion, the position information of the bone key points used to characterize the human skeletal structure and the position information of the contour key points used to characterize the contour of the human body are determined.
下面在a1和a2中分別對特徵提取過程和特徵融合過程加以說明。The following describes the feature extraction process and feature fusion process in a1 and a2 respectively.
a1:特徵提取過程:a1: Feature extraction process:
可以使用預先訓練的第一特徵提取網路從待檢測圖像中提取用於表徵人體骨骼特徵的骨骼關鍵點的第一目標骨骼特徵矩陣;並提取用於表徵人體輪廓特徵的輪廓關鍵點的第一目標輪廓特徵矩陣。The pre-trained first feature extraction network can be used to extract the first target bone feature matrix of the bone key points used to characterize the human bone features from the image to be detected; A target contour feature matrix.
具體地,參見圖3所示,本公開實施例提供一種第一特徵提取網路的結構示意圖。第一特徵提取網路包括:共有特徵提取網路、第一骨骼特徵提取網路以及第一輪廓特徵提取網路。Specifically, referring to FIG. 3, an embodiment of the present disclosure provides a schematic structural diagram of a first feature extraction network. The first feature extraction network includes: a common feature extraction network, a first bone feature extraction network, and a first contour feature extraction network.
參見圖4所示,本公開實施例還提供一種基於圖3提供的第一特徵提取網路從待檢測圖像中提取第一目標骨骼特徵矩陣及第一目標輪廓特徵矩陣的具體過程,包括如下步驟。Referring to FIG. 4, an embodiment of the present disclosure also provides a specific process for extracting a first target skeleton feature matrix and a first target contour feature matrix from the image to be detected based on the first feature extraction network provided in FIG. 3, including the following step.
步驟S401:使用共有特徵提取網路對待檢測圖像進行卷積處理,得到包含骨骼特徵以及輪廓特徵的基礎特徵矩陣。Step S401: Use the common feature extraction network to perform convolution processing on the image to be detected to obtain a basic feature matrix including bone features and contour features.
在具體實施中,待檢測圖像能夠被表示為一圖像矩陣;若待檢測圖像為單顏色通道圖像,例如灰度圖,則其能夠被表示為一個二維圖像矩陣;二維圖像矩陣中的各個元素,與待檢測圖像的像素點一一對應;二維圖像矩陣中各個元素的值,即為與各個元素對應的像素點的像素值。若待檢測圖像為多顏色通道圖像,例如RGB格式的圖像,則其能夠被表示為一個三維圖像矩陣;三維圖像矩陣中,包括了三個與不同顏色(例如,R、G、B)通道一一對應的二維圖像矩陣;任一二維圖像矩陣中各個元素的值,即為與各個元素對應的像素點,在對應顏色通道下的像素值。In specific implementation, the image to be detected can be represented as an image matrix; if the image to be detected is a single-color channel image, such as a grayscale image, it can be represented as a two-dimensional image matrix; Each element in the image matrix corresponds to the pixel of the image to be detected one-to-one; the value of each element in the two-dimensional image matrix is the pixel value of the pixel corresponding to each element. If the image to be detected is a multi-color channel image, such as an image in RGB format, it can be represented as a three-dimensional image matrix; the three-dimensional image matrix includes three different colors (for example, R, G , B) Two-dimensional image matrix with one-to-one channel correspondence; the value of each element in any two-dimensional image matrix is the pixel value corresponding to each element and the pixel value under the corresponding color channel.
共有特徵提取網路中包括了至少一層卷積層;將待檢測圖像的圖像矩陣輸入至共有特徵提取網路後,使用共有特徵提取網路對待檢測圖像的圖像矩陣進行卷積處理,提取待檢測圖像中的特徵。該種情況下,所提取到的特徵既包含骨骼特徵,又包含輪廓特徵。The common feature extraction network includes at least one convolutional layer; after the image matrix of the image to be detected is input to the common feature extraction network, the common feature extraction network is used to perform convolution processing on the image matrix of the image to be detected. Extract the features in the image to be detected. In this case, the extracted features include both bone features and contour features.
步驟S402:使用第一骨骼特徵提取網路對基礎特徵矩陣進行卷積處理,得到第一骨骼特徵矩陣,並從第一骨骼特徵提取網路中的第一目標卷積層獲取第二骨骼特徵矩陣;基於第一骨骼特徵矩陣以及第二骨骼特徵矩陣,得到第一目標骨骼特徵矩陣;第一目標卷積層為第一骨骼特徵提取網路中,除最後一層卷積層外的其他任一卷積層。Step S402: Use the first bone feature extraction network to perform convolution processing on the basic feature matrix to obtain a first bone feature matrix, and obtain a second bone feature matrix from the first target convolution layer in the first bone feature extraction network; Based on the first skeletal feature matrix and the second skeletal feature matrix, the first target skeletal feature matrix is obtained; the first target convolutional layer is any convolutional layer except the last convolutional layer in the first skeletal feature extraction network.
在具體實施中,第一骨骼特徵提取網路包括了多層卷積層。多層卷積層依次連接,下一層卷積層的輸入,為上一層卷積層的輸出。具有該種結構的第一骨骼特徵提取網路能夠對基礎特徵矩陣進行多次卷積處理,並從最後一層卷積層得到第一骨骼特徵矩陣。此處,第一骨骼特徵矩陣為三維特徵矩陣;在該三維特徵矩陣中,包括了多個二維特徵矩陣,且各個二維特徵矩陣與預先確定的多個骨骼關鍵點一一對應。與某個骨骼關鍵點對應的二維特徵矩陣中元素的值,表示與該元素對應的像素點屬該骨骼關鍵點的概率,且與一個元素對應的像素點一般有多個。In specific implementation, the first bone feature extraction network includes multiple convolutional layers. The multiple convolutional layers are connected in sequence, and the input of the next convolutional layer is the output of the previous convolutional layer. The first skeleton feature extraction network with this structure can perform multiple convolution processing on the basic feature matrix, and obtain the first skeleton feature matrix from the last convolution layer. Here, the first skeleton feature matrix is a three-dimensional feature matrix; the three-dimensional feature matrix includes a plurality of two-dimensional feature matrices, and each two-dimensional feature matrix corresponds to a plurality of predetermined bone key points in a one-to-one correspondence. The value of an element in the two-dimensional feature matrix corresponding to a certain bone key point represents the probability that the pixel corresponding to the element belongs to the bone key point, and there are generally multiple pixels corresponding to one element.
另外,通過多層卷積層對基礎特徵矩陣的多次卷積處理,雖然能夠從基礎特徵矩陣中提取到人體的骨骼特徵,但隨著卷積次數的增加,會丟失待檢測圖像中的一些訊息,這些訊息裡也可能包括人體的骨骼特徵的相關訊息;若待檢測圖像中的訊息丟失過多,就可能會造成最終得到的用於表徵人體骨骼特徵的骨骼關鍵點的第一目標骨骼特徵矩陣不夠精確。因此,在本公開實施例中,還會從第一骨骼特徵提取網路的第一目標卷積層獲取第二骨骼特徵矩陣,並基於第一骨骼特徵矩陣以及第二骨骼特徵矩陣,得到第一目標骨骼特徵矩陣。In addition, through the multiple convolution processing of the basic feature matrix by the multi-layer convolutional layer, although the bone features of the human body can be extracted from the basic feature matrix, as the number of convolutions increases, some information in the image to be detected will be lost , These information may also include information related to the bone features of the human body; if the information in the image to be detected is lost too much, it may result in the first target bone feature matrix used to characterize the bones of the human body. Not precise enough. Therefore, in the embodiment of the present disclosure, the second skeletal feature matrix is also obtained from the first target convolutional layer of the first skeletal feature extraction network, and the first target is obtained based on the first skeletal feature matrix and the second skeletal feature matrix. Bone feature matrix.
這裡,第一目標卷積層,為第一骨骼特徵提取網路中,除最後一層卷積層外的其他任一卷積層。在圖3的示例中,第一骨骼特徵提取網路中的倒數第二層卷積層被選定作為第一目標卷積層。Here, the first target convolutional layer is any convolutional layer except the last convolutional layer in the first bone feature extraction network. In the example of FIG. 3, the penultimate convolutional layer in the first bone feature extraction network is selected as the first target convolutional layer.
例如可以採用下述方式基於第一骨骼特徵矩陣以及第二骨骼特徵矩陣,得到第一目標骨骼特徵矩陣:For example, the following method can be used to obtain the first target bone feature matrix based on the first bone feature matrix and the second bone feature matrix:
將第一骨骼特徵矩陣以及第二骨骼特徵矩陣進行拼接處理,得到第一拼接骨骼特徵矩陣;對第一拼接骨骼特徵矩陣進行維度變換處理,得到第一目標骨骼特徵矩陣。The first bone feature matrix and the second bone feature matrix are spliced to obtain the first spliced bone feature matrix; the first spliced bone feature matrix is subjected to dimensional transformation processing to obtain the first target bone feature matrix.
此處,對第一拼接骨骼特徵矩陣進行維度變換處理的情況下,可以將其輸入至維度變換神經網路,使用該維度變換神經網路對第一拼接骨骼特徵矩陣進行至少一次卷積處理,得到第一目標骨骼特徵矩陣。Here, in the case of performing dimensional transformation processing on the first spliced bone feature matrix, it can be input to the dimensional transformation neural network, and the dimensional transformation neural network is used to perform convolution processing on the first spliced bone feature matrix at least once, Obtain the first target bone feature matrix.
此處,維度變換神經網路可以將第一骨骼特徵矩陣及第二骨骼特徵矩陣中攜帶的特徵訊息進行融合,使得得到的第一目標骨骼特徵矩陣中,包含有更豐富的訊息。Here, the dimensional transformation neural network can merge the feature information carried in the first bone feature matrix and the second bone feature matrix, so that the obtained first target bone feature matrix contains richer information.
步驟S403:使用第一輪廓特徵提取網路,對基礎特徵矩陣進行卷積處理,得到第一輪廓特徵矩陣,並從第一輪廓特徵提取網路中的第二目標卷積層獲取第二輪廓特徵矩陣;基於第一輪廓特徵矩陣以及第二輪廓特徵矩陣,得到第一目標輪廓特徵矩陣;第二目標卷積層為第一輪廓特徵提取網路中,除最後一層卷積層外的其他任一卷積層。在圖3的示例中,第一輪廓特徵提取網路中的倒數第二層卷積層被選定作為第二目標卷積層。Step S403: Use the first contour feature extraction network to perform convolution processing on the basic feature matrix to obtain a first contour feature matrix, and obtain a second contour feature matrix from the second target convolution layer in the first contour feature extraction network ; Based on the first contour feature matrix and the second contour feature matrix, the first target contour feature matrix is obtained; the second target convolutional layer is any convolutional layer except the last convolutional layer in the first contour feature extraction network. In the example of FIG. 3, the penultimate convolutional layer in the first contour feature extraction network is selected as the second target convolutional layer.
在具體實施中,第一輪廓特徵提取網路也包括了多層卷積層。多層卷積層依次連接,下一層卷積層的輸入,為上一層卷積層的輸出。具有該種結構的第一輪廓特徵提取網路能夠對基礎特徵矩陣進行多次卷積處理,並從最後一層卷積層得到第一輪廓特徵矩陣。此處,第一輪廓特徵矩陣為三維特徵矩陣;在該三維特徵矩陣中,包括了多個二維特徵矩陣,且各個二維特徵矩陣與預先確定的多個輪廓關鍵點一一對應。與某個輪廓關鍵點對應的二維特徵矩陣中元素的值,表示與該元素對應的像素點屬該輪廓關鍵點的概率,且與一個元素對應的像素點一般有多個。In specific implementation, the first contour feature extraction network also includes multiple convolutional layers. The multiple convolutional layers are connected in sequence, and the input of the next convolutional layer is the output of the previous convolutional layer. The first contour feature extraction network with this structure can perform multiple convolution processing on the basic feature matrix, and obtain the first contour feature matrix from the last convolution layer. Here, the first contour feature matrix is a three-dimensional feature matrix; the three-dimensional feature matrix includes a plurality of two-dimensional feature matrices, and each two-dimensional feature matrix corresponds to a plurality of predetermined contour key points in a one-to-one correspondence. The value of an element in the two-dimensional feature matrix corresponding to a certain contour key point represents the probability that the pixel point corresponding to the element belongs to the contour key point, and there are generally multiple pixels corresponding to one element.
這裡需要注意的是,輪廓關鍵點的數量和骨骼關鍵點的數量一般不同,因此,所得到的第一輪廓特徵矩陣中所包括的二維特徵矩陣的數量,與第一骨骼特徵矩陣中所包括的二維特徵矩陣的數量可以不同。It should be noted here that the number of contour key points and the number of bone key points are generally different. Therefore, the number of two-dimensional feature matrices included in the obtained first contour feature matrix is different from that included in the first bone feature matrix. The number of two-dimensional feature matrices can be different.
例如,若骨骼關鍵點的數量為14,輪廓關鍵點的數量為25個,則第一輪廓特徵矩陣中所包括的二維特徵矩陣數量為25個,第一骨骼特徵矩陣中所包括的二維特徵矩陣數量為14個。For example, if the number of bone key points is 14 and the number of contour key points is 25, the number of two-dimensional feature matrices included in the first contour feature matrix is 25, and the two-dimensional feature matrix included in the first bone feature matrix is 25. The number of feature matrices is 14.
另外,為了使得第一目標輪廓特徵矩陣中,也包含有更加豐富的訊息,可以採用如上述步驟S402類似的方式,從第一輪廓特徵提取網路中的第二目標卷積層獲取第二輪廓特徵矩陣,然後基於第一輪廓特徵矩陣和第二輪廓特徵矩陣,得到第一目標輪廓特徵矩陣。In addition, in order to make the first target contour feature matrix also contain richer information, the second contour feature can be obtained from the second target convolutional layer in the first contour feature extraction network in a manner similar to the above step S402. Matrix, and then based on the first contour feature matrix and the second contour feature matrix, the first target contour feature matrix is obtained.
此處,基於第一輪廓特徵矩陣和第二輪廓特徵矩陣,得到第一目標輪廓特徵矩陣的方式例如包括:Here, based on the first contour feature matrix and the second contour feature matrix, the manner of obtaining the first target contour feature matrix includes, for example:
將第一輪廓特徵矩陣以及第二輪廓特徵矩陣進行拼接處理,得到第一拼接輪廓特徵矩陣;對第一拼接輪廓特徵矩陣進行維度變換處理,得到第一目標輪廓特徵矩陣。The first contour feature matrix and the second contour feature matrix are spliced to obtain the first spliced contour feature matrix; the first spliced contour feature matrix is subjected to dimensional transformation processing to obtain the first target contour feature matrix.
需要注意的是,上述步驟S402和步驟S403中,第一目標骨骼特徵矩陣的維度與第一目標輪廓特徵矩陣的維度相同、且第一目標骨骼特徵矩陣與第一目標輪廓特徵矩陣在相同維度上的維數相同,以便後續基於第一目標骨骼特徵矩陣與第一目標輪廓特徵矩陣進行特徵融合處理。It should be noted that in the above steps S402 and S403, the dimension of the first target skeleton feature matrix is the same as the dimension of the first target contour feature matrix, and the first target skeleton feature matrix and the first target contour feature matrix are in the same dimension. The dimensions of are the same, so that subsequent feature fusion processing is performed based on the first target skeleton feature matrix and the first target contour feature matrix.
例如,若第一目標骨骼特徵矩陣的維度為3,且各個維度的維數分別為64、32和14,那麼該第一目標骨骼特徵矩陣的維數表示為64*32*14;第一目標輪廓特徵矩陣的維數也可以表示為64*32*14。For example, if the dimension of the first target skeleton feature matrix is 3, and the dimensions of each dimension are 64, 32, and 14, respectively, then the dimension of the first target skeleton feature matrix is expressed as 64*32*14; the first target The dimension of the contour feature matrix can also be expressed as 64*32*14.
另外,在另一種實施例中,還可以採用下述方式得到第一目標骨骼特徵矩陣和第一目標輪廓特徵矩陣:In addition, in another embodiment, the first target skeleton feature matrix and the first target contour feature matrix can also be obtained in the following manner:
使用共有特徵提取網路對待檢測圖像進行卷積處理,得到包含骨骼特徵以及輪廓特徵的基礎特徵矩陣;Use the common feature extraction network to perform convolution processing on the image to be detected to obtain a basic feature matrix containing bone features and contour features;
使用第一骨骼特徵提取網路對基礎特徵矩陣進行卷積處理,得到第一骨骼特徵矩陣,並對第一骨骼特徵矩陣進行維度變換處理,得到第一目標骨骼特徵矩陣;Use the first bone feature extraction network to perform convolution processing on the basic feature matrix to obtain the first bone feature matrix, and perform dimensional transformation processing on the first bone feature matrix to obtain the first target bone feature matrix;
使用第一輪廓特徵提取網路對基礎特徵矩陣進行卷積處理,得到第一輪廓特徵矩陣,並對第一輪廓特徵矩陣進行維度變換處理,得到第一目標輪廓特徵矩陣。Use the first contour feature extraction network to perform convolution processing on the basic feature matrix to obtain a first contour feature matrix, and perform dimensional transformation processing on the first contour feature matrix to obtain a first target contour feature matrix.
在該種方式中,也能夠以較高的精度將人體的骨骼特徵和輪廓特徵從待檢測圖像中提取出來。In this way, the bone features and contour features of the human body can also be extracted from the image to be detected with higher accuracy.
另外,本公開實施例中提供的第一特徵提取網路是預先訓練得到的。In addition, the first feature extraction network provided in the embodiments of the present disclosure is obtained by pre-training.
這裡,本公開實施例提供的人體檢測方法通過人體檢測模型實現;人體檢測模型包括:第一特徵提取網路和/或特徵融合神經網路;Here, the human body detection method provided by the embodiments of the present disclosure is implemented by a human body detection model; the human body detection model includes: a first feature extraction network and/or a feature fusion neural network;
人體檢測模型為利用訓練樣本集中的樣本圖像訓練得到的,樣本圖像標注有人體骨骼結構的骨骼關鍵點的實際位置訊息、以及人體輪廓的輪廓關鍵點的實際位置訊息。The human body detection model is obtained by training using sample images in the training sample set. The sample images are marked with actual position information of bone key points of the human skeletal structure and actual position information of the contour key points of the outline of the human body.
具體地,針對人體檢測模型中包括第一特徵提取網路的情況,第一特徵提取網路可以單獨訓練,也可以與特徵融合神經網路進行聯合訓練,也可以將單獨訓練和聯合訓練相結合。Specifically, for the case where the human body detection model includes the first feature extraction network, the first feature extraction network can be trained separately, or it can be jointly trained with the feature fusion neural network, or it can be combined with separate training and joint training. .
訓練得到第一特徵提取網路的過程包括但不限於下述(1)和(2)所示。The process of training to obtain the first feature extraction network includes but is not limited to the following (1) and (2).
(1)對第一特徵提取網路進行單獨訓練例如包括:(1) The individual training of the first feature extraction network includes, for example:
步驟1.1:獲取多張樣本圖像,以及每張樣本圖像的標注數據;標注數據包括:用於表徵人體骨骼結構的骨骼關鍵點的實際位置訊息、以及用於表徵人體輪廓的輪廓關鍵點的實際位置訊息;Step 1.1: Obtain multiple sample images and the annotation data of each sample image; the annotation data includes: the actual position information of the bone key points used to characterize the human skeletal structure, and the outline key points used to characterize the outline of the human body Actual location information;
步驟1.2:將多張樣本圖像輸入第一基礎特徵提取網路中,得到第一樣本目標骨骼特徵矩陣,以及第一樣本目標輪廓特徵矩陣;Step 1.2: Input multiple sample images into the first basic feature extraction network to obtain the first sample target bone feature matrix and the first sample target contour feature matrix;
步驟1.3:基於第一樣本目標骨骼特徵矩陣,確定骨骼關鍵點的第一預測位置訊息;以及基於第一樣本目標輪廓特徵矩陣,確定輪廓關鍵點的第一預測位置訊息;Step 1.3: Determine the first predicted position information of the skeleton key points based on the first sample target bone feature matrix; and determine the first predicted position information of the contour key points based on the first sample target contour feature matrix;
步驟1.4:基於骨骼關鍵點的實際位置訊息、以及骨骼關鍵點的第一預測位置訊息,確定第一損失;以及基於輪廓關鍵點的實際位置訊息、以及輪廓關鍵點的第一預測位置訊息,確定第二損失;Step 1.4: Determine the first loss based on the actual position information of the bone key points and the first predicted position information of the bone key points; and determine the first loss based on the actual position information of the contour key points and the first predicted position information of the contour key points Second loss
步驟1.5:基於第一損失、以及第二損失,對第一基礎特徵提取網路進行本輪訓練;Step 1.5: Perform this round of training on the first basic feature extraction network based on the first loss and the second loss;
經過對第一基礎特徵提取網路的多輪訓練,得到第一特徵提取網路。After multiple rounds of training on the first basic feature extraction network, the first feature extraction network is obtained.
如圖3所示,第一損失為圖3中的LS1;第二損失為圖3中的LC1。基於第一損失和第二損失,監督第一基礎特徵提取網路的訓練,以得到較高精度的第一特徵提取網路。As shown in Figure 3, the first loss is LS1 in Figure 3; the second loss is LC1 in Figure 3. Based on the first loss and the second loss, supervise the training of the first basic feature extraction network to obtain a higher-precision first feature extraction network.
(2)將第一特徵提取網路和特徵融合神經網路進行聯合訓練例如包括:(2) Joint training of the first feature extraction network and the feature fusion neural network includes, for example:
步驟2.1:獲取多張樣本圖像,以及每張樣本圖像的標注數據;標注數據包括:用於表徵人體骨骼結構的骨骼關鍵點的實際位置訊息、以及用於表徵人體輪廓的輪廓關鍵點的實際位置訊息;Step 2.1: Obtain multiple sample images and the annotation data of each sample image; the annotation data includes: the actual position information of the bone key points used to characterize the human skeletal structure, and the outline key points used to characterize the outline of the human body Actual location information;
步驟2.2:將多張樣本圖像輸入第一基礎特徵提取網路中,得到第一樣本目標骨骼特徵矩陣,以及第一樣本目標輪廓特徵矩陣;Step 2.2: Input multiple sample images into the first basic feature extraction network to obtain the first sample target bone feature matrix and the first sample target contour feature matrix;
步驟2.3:使用基礎特徵融合神經網路對第一樣本目標骨骼特徵矩陣、以及第一樣本目標輪廓特徵矩陣進行特徵融合,得到第二樣本目標骨骼特徵矩陣和第二樣本目標輪廓特徵矩陣。Step 2.3: Use the basic feature fusion neural network to perform feature fusion on the first sample target skeleton feature matrix and the first sample target contour feature matrix to obtain the second sample target skeleton feature matrix and the second sample target contour feature matrix.
步驟2.4:基於第二樣本目標骨骼特徵矩陣,確定骨骼關鍵點的第二預測位置訊息;以及基於第二樣本目標輪廓特徵矩陣,確定輪廓關鍵點的第二預測位置訊息;Step 2.4: Determine the second predicted position information of the skeleton key points based on the second sample target bone feature matrix; and determine the second predicted position information of the contour key points based on the second sample target contour feature matrix;
步驟2.5:基於骨骼關鍵點的實際位置訊息、以及骨骼關鍵點的第二預測位置訊息,確定第三損失;以及基於輪廓關鍵點的實際位置訊息、以及輪廓關鍵點的第二預測位置訊息,確定第四損失;Step 2.5: Determine the third loss based on the actual position information of the bone key points and the second predicted position information of the bone key points; and determine the third loss based on the actual position information of the contour key points and the second predicted position information of the contour key points Fourth loss
步驟2.6:基於第三損失、以及第四損失,對第一基礎特徵提取網路、以及基礎特徵融合神經網路進行本輪訓練;Step 2.6: Perform this round of training on the first basic feature extraction network and the basic feature fusion neural network based on the third loss and the fourth loss;
經過對第一基礎卷積神經網路和基礎特徵融合神經網路的多輪訓練,得到第一特徵提取網路和特徵融合神經網路。After multiple rounds of training on the first basic convolutional neural network and the basic feature fusion neural network, the first feature extraction network and the feature fusion neural network are obtained.
(3)將單獨訓練和聯合訓練相結合得到第一特徵提取網路的過程,可以採用上述(1)和(2)中的過程進行同步訓練。(3) The process of combining individual training and joint training to obtain the first feature extraction network can be synchronized training using the processes in (1) and (2) above.
或者還可以先採用(1)中的過程對第一特徵提取網路進行預訓練;將進行了預訓練之後得到的第一特徵提取網路,與特徵融合神經網路進行上述(2)中的聯合訓練。Alternatively, the process in (1) can be used to pre-train the first feature extraction network; the first feature extraction network obtained after the pre-training can be used with the feature fusion neural network to perform the above (2) Joint training.
需要注意的是,對第一特徵提取網路進行單獨訓練和聯合訓練,所採用的樣本圖像可以相同也可以不同。It should be noted that for the separate training and joint training of the first feature extraction network, the sample images used can be the same or different.
在對第一特徵提取網路和特徵融合神經網路進行聯合訓練之前,也可以先對特徵融合神經網路進行預訓練,然後採用進行了預訓練的特徵融合神經網路,與第一特徵提取網路進行聯合訓練。Before the joint training of the first feature extraction network and the feature fusion neural network, the feature fusion neural network can also be pre-trained, and then the pre-trained feature fusion neural network can be used with the first feature extraction The network conducts joint training.
對特徵融合神經網路進行單獨訓練的詳細過程,可以參見下述a2示出的實施例的描述。For the detailed process of individual training of the feature fusion neural network, please refer to the description of the embodiment shown in a2 below.
a2:特徵融合過程:a2: Feature fusion process:
在得到用於表徵人體骨骼特徵的骨骼關鍵點的第一目標骨骼特徵矩陣,和用於表徵人體輪廓特徵的輪廓關鍵點的第一目標輪廓特徵矩陣後,就可以基於第一目標骨骼特徵矩和第一目標輪廓特徵矩陣進行特徵融合處理。After obtaining the first target bone feature matrix of the bone key points used to characterize the human bone features, and the first target contour feature matrix of the contour key points used to characterize the human body contour features, it can be based on the first target bone feature moments and The first target contour feature matrix performs feature fusion processing.
具體地,基於待檢測圖像進行骨骼特徵和輪廓特徵提取的過程中,雖然所使用的基礎特徵矩陣是同一個,但是第一骨骼特徵提取網路是從基礎特徵矩陣中提取骨骼特徵,而第一輪廓特徵提取網路是從基礎特徵矩陣中提取輪廓特徵。兩個過程相互獨立而存在。但是針對同一人體而言,輪廓特徵和骨骼特徵之間是具有相互的關聯關係的;將輪廓特徵和骨骼特徵進行融合的目的,是要利用骨骼特徵和輪廓特徵之間的相互影響關係。例如,可以基於輪廓特徵,對最終提取到的骨骼關鍵點的位置訊息進行修正,並基於骨骼特徵,對最終提取到的輪廓關鍵點的位置訊息進行修正,進而得到更加準確的骨骼關鍵點的位置訊息和輪廓關鍵點的位置訊息,以得到更高精度的人體檢測結果。Specifically, in the process of extracting bone features and contour features based on the image to be detected, although the basic feature matrix used is the same, the first bone feature extraction network extracts bone features from the basic feature matrix, and the first A contour feature extraction network extracts contour features from the basic feature matrix. The two processes exist independently of each other. But for the same human body, there is a correlation between contour features and bone features; the purpose of fusing contour features and bone features is to use the mutual influence relationship between bone features and contour features. For example, the position information of the final extracted key points of the skeleton can be corrected based on the contour features, and the position information of the final extracted key points of the contour can be corrected based on the bone features, so as to obtain a more accurate position of the bone key points The information and the position information of the key points of the contour can be used to obtain higher-precision human body detection results.
本公開實施例提供一種將提取得到的骨骼特徵和輪廓特徵進行特徵融合的具體方法,包括:使用預先訓練的特徵融合神經網路對第一目標骨骼特徵矩陣、以及第一目標輪廓特徵矩陣進行特徵融合,得到第二目標骨骼特徵矩陣和第二目標輪廓特徵矩陣。The embodiments of the present disclosure provide a specific method for feature fusion of extracted bone features and contour features, including: using a pre-trained feature fusion neural network to feature a first target bone feature matrix and a first target contour feature matrix Fusion, obtain the second target skeleton feature matrix and the second target contour feature matrix.
其中,第二目標骨骼特徵矩陣為三維骨骼特徵矩陣,該三維骨骼特徵矩陣包括與各個骨骼關鍵點分別對應的二維骨骼特徵矩陣;二維骨骼特徵矩陣中每個元素的值,表徵與該元素對應的像素點屬於對應骨骼關鍵點(即,該二維骨骼特徵矩陣對應的骨骼關鍵點)的概率;第二目標輪廓特徵矩陣為三維輪廓特徵矩陣,該三維輪廓特徵矩陣包括與各個輪廓關鍵點分別對應的二維輪廓特徵矩陣;二維輪廓特徵矩陣中每個元素的值,表徵與該元素對應的像素點屬對應輪廓關鍵點的概率。Wherein, the second target bone feature matrix is a three-dimensional bone feature matrix, and the three-dimensional bone feature matrix includes a two-dimensional bone feature matrix corresponding to each key point of the bone; The corresponding pixel point belongs to the probability of the corresponding bone key point (ie, the bone key point corresponding to the two-dimensional bone feature matrix); the second target contour feature matrix is the three-dimensional contour feature matrix, and the three-dimensional contour feature matrix includes the key points of each contour Corresponding two-dimensional contour feature matrix respectively; the value of each element in the two-dimensional contour feature matrix represents the probability that the pixel corresponding to the element belongs to the corresponding contour key point.
本公開實施例中提供的特徵融合神經網路可以單獨訓練,也可以與第一特徵提取網路進行聯合訓練,也可以將單獨訓練和聯合訓練相結合。The feature fusion neural network provided in the embodiments of the present disclosure can be trained separately, can also be jointly trained with the first feature extraction network, or can be combined with separate training and joint training.
將特徵融合神經網路與第一特徵提取網路進行聯合訓練的過程,可以參見上述(2),在此不再贅述。The process of jointly training the feature fusion neural network and the first feature extraction network can be referred to (2) above, and will not be repeated here.
針對不同結構的特徵融合神經網路,在對其進行單獨訓練的情況下,所用的訓練方法也會有所區別,針對不同結構的特徵融合神經網路的訓練方法,可以參見下述M1~M3。For feature fusion neural networks of different structures, the training methods used will be different when they are trained separately. For the training methods of feature fusion neural networks of different structures, please refer to the following M1~M3 .
對骨骼特徵和輪廓特徵進行特徵融合的過程可以包括但不限於下述M1~M3中至少一種。The process of feature fusion of bone features and contour features may include but is not limited to at least one of the following M1 to M3.
M1:M1:
參見圖5所示,本公開實施例提供一種特徵融合神經網路的具體結構,包括:第一卷積神經網路、第二卷積神經網路、第一變換神經網路、以及第二變換神經網路。Referring to FIG. 5, an embodiment of the present disclosure provides a specific structure of a feature fusion neural network, including: a first convolutional neural network, a second convolutional neural network, a first transformation neural network, and a second transformation Neural network.
參見圖6所示,本公開實施例還提供一種基於圖5提供的特徵融合神經網路,對第一目標骨骼特徵矩陣、以及第一目標輪廓特徵矩陣進行特徵融合,得到第二目標骨骼特徵矩陣和第二目標輪廓特徵矩陣的具體方法,包括以下步驟。As shown in FIG. 6, an embodiment of the present disclosure also provides a feature fusion neural network based on the feature fusion neural network provided in FIG. 5 to perform feature fusion on the first target skeleton feature matrix and the first target contour feature matrix to obtain the second target skeleton feature matrix And the specific method of the second target contour feature matrix includes the following steps.
步驟S601:使用第一卷積神經網路對第一目標骨骼特徵矩陣進行卷積處理,得到第一中間骨骼特徵矩陣。執行步驟S603。Step S601: Use the first convolutional neural network to perform convolution processing on the first target skeleton feature matrix to obtain the first intermediate skeleton feature matrix. Step S603 is executed.
此處,第一卷積神經網路包括至少一層卷積層。若第一卷積神經網路有多層,則多層卷積層依次連接;本層卷積層的輸入為上一層卷積層的輸出。將第一目標骨骼特徵矩陣輸入至第一卷積神經網路,使用各卷積層對第一目標骨骼特徵矩陣進行卷積處理,以得到第一中間骨骼特徵矩陣。Here, the first convolutional neural network includes at least one convolutional layer. If the first convolutional neural network has multiple layers, the multiple convolutional layers are connected in sequence; the input of the convolutional layer of this layer is the output of the previous convolutional layer. The first target skeleton feature matrix is input to the first convolutional neural network, and each convolution layer is used to perform convolution processing on the first target skeleton feature matrix to obtain the first intermediate skeleton feature matrix.
該過程是為了能夠進一步的將骨骼特徵從第一目標骨骼特徵矩陣中提取出來。This process is to be able to further extract the bone features from the first target bone feature matrix.
步驟S602:使用第二卷積神經網路對第一目標輪廓特徵矩陣進行卷積處理,得到第一中間輪廓特徵矩陣。執行步驟S604。Step S602: Use the second convolutional neural network to perform convolution processing on the first target contour feature matrix to obtain a first intermediate contour feature matrix. Step S604 is executed.
此處,該處理過程與上述步驟S601類似,在此不再贅述。Here, the processing procedure is similar to the above step S601, and will not be repeated here.
需要注意的是,步驟S601和步驟S602的執行無先後順序。可以同步執行,也可以異步執行。It should be noted that there is no order of execution of step S601 and step S602. It can be executed synchronously or asynchronously.
步驟S603:將第一中間輪廓特徵矩陣與第一目標骨骼特徵矩陣進行拼接處理,得到第一拼接特徵矩陣;並使用第一變換神經網路對第一拼接特徵矩陣進行維度變換,得到第二目標骨骼特徵矩陣。Step S603: Perform splicing processing on the first intermediate contour feature matrix and the first target bone feature matrix to obtain the first spliced feature matrix; and use the first transformation neural network to perform dimensional transformation on the first spliced feature matrix to obtain the second target Bone feature matrix.
這裡,將第一中間輪廓特徵矩陣與第一目標骨骼特徵矩陣進行拼接處理,得到第一拼接特徵矩陣,使得得到的第一拼接特徵矩陣中,既包括了輪廓特徵,又包括了骨骼特徵。Here, the first intermediate contour feature matrix and the first target bone feature matrix are spliced to obtain the first spliced feature matrix, so that the obtained first spliced feature matrix includes both the contour feature and the bone feature.
使用第一變換神經網路,對第一拼接矩陣進行進一步的維度變換,實際上是使用第一變換神經網路再次從第一拼接特徵矩陣中提取骨骼特徵;由於在得到第一拼接特徵矩陣的過程,去除了待檢測圖像中除骨骼特徵和輪廓特徵以外的其他特徵,僅包括了骨骼特徵和輪廓特徵,因而基於第一拼接特徵矩陣所得到的第二目標骨骼特徵矩陣中所包含的骨骼特徵,會受到輪廓特徵的影響,能夠建立骨骼特徵和輪廓特徵之間的相互聯繫,可以實現骨骼特徵和輪廓特徵的融合。Use the first transformation neural network to perform further dimensional transformation on the first splicing matrix. In fact, the first transformation neural network is used to extract the bone features from the first splicing feature matrix again; because the first splicing feature matrix is obtained In the process, other features except bone features and contour features in the image to be detected are removed, and only bone features and contour features are included. Therefore, the bones contained in the second target bone feature matrix obtained based on the first splicing feature matrix Features will be affected by contour features, can establish the relationship between bone features and contour features, and can realize the fusion of bone features and contour features.
步驟S604:將第一中間骨骼特徵矩陣與第一目標輪廓特徵矩陣進行拼接處理,得到第二拼接特徵矩陣,並使用第二變換神經網路對第二拼接特徵矩陣進行維度變換,得到第二目標輪廓特徵矩陣。Step S604: Perform splicing processing on the first intermediate skeleton feature matrix and the first target contour feature matrix to obtain a second spliced feature matrix, and use the second transform neural network to perform dimensional transformation on the second spliced feature matrix to obtain the second target Contour feature matrix.
這裡,將第一中間骨骼特徵矩陣與第一目標輪廓特徵矩陣進行拼接處理,得到第二拼接特徵矩陣的過程,與上述步驟S602種得到第一拼接特徵矩陣的過程類似,在此不再贅述。Here, the process of splicing the first intermediate skeleton feature matrix and the first target contour feature matrix to obtain the second splicing feature matrix is similar to the process of obtaining the first splicing feature matrix in step S602, and will not be repeated here.
同樣的,第二目標輪廓特徵矩陣所包含的輪廓特徵,會受到骨骼特徵的影響,建立了骨骼特徵和輪廓特徵之間的相互聯繫,實現了骨骼特徵和輪廓特徵的融合。Similarly, the contour features contained in the contour feature matrix of the second target will be affected by the bone features, and the correlation between the bone features and the contour features is established, and the fusion of the bone features and the contour features is realized.
另一種實施例中,可以採用下述方式對特徵融合神經網路進行單獨訓練。In another embodiment, the feature fusion neural network can be individually trained in the following manner.
步驟3.1:獲取多張樣本圖像的第一樣本目標骨骼特徵矩陣、以及第一樣本目標輪廓特徵矩陣。Step 3.1: Obtain the first sample target bone feature matrix and the first sample target contour feature matrix of multiple sample images.
獲取方式與上述實施例中獲取第一目標骨骼特徵矩陣、第一目標輪廓特徵矩陣的方式類似,在此不再贅述。可以在與第一特徵提取網路進行聯合訓練的情況下獲取,也可以使用預訓練的第一特徵提取網路獲取。The method of obtaining is similar to the method of obtaining the first target skeleton feature matrix and the first target contour feature matrix in the foregoing embodiment, and will not be repeated here. It can be obtained in the case of joint training with the first feature extraction network, or can be obtained using a pre-trained first feature extraction network.
步驟3.2:使用第一基礎卷積神經網路對第一樣本目標骨骼特徵矩陣進行卷積處理,得到第一樣本中間骨骼特徵矩陣。Step 3.2: Use the first basic convolutional neural network to perform convolution processing on the target bone feature matrix of the first sample to obtain the middle bone feature matrix of the first sample.
步驟3.3:使用第二基礎卷積神經網路對第一樣本目標輪廓特徵矩陣進行卷積處理,得到第一樣本中間輪廓特徵矩陣。Step 3.3: Use the second basic convolutional neural network to perform convolution processing on the target contour feature matrix of the first sample to obtain the middle contour feature matrix of the first sample.
步驟3.4:將第一樣本中間輪廓特徵矩陣與第一樣本目標骨骼特徵矩陣進行拼接處理,得到第一樣本拼接特徵矩陣;並使用第一基礎變換神經網路對第一樣本拼接特徵矩陣進行維度變換,得到第二樣本目標骨骼特徵矩陣。Step 3.4: Perform splicing processing on the middle contour feature matrix of the first sample and the target bone feature matrix of the first sample to obtain the spliced feature matrix of the first sample; and use the first basic transformation neural network to splice the features of the first sample The matrix undergoes dimensional transformation to obtain the target bone feature matrix of the second sample.
步驟3.5:將第一樣本中間骨骼特徵矩陣與第一樣本目標輪廓特徵矩陣進行拼接處理,得到第二樣本拼接特徵矩陣,並使用第二基礎變換神經網路對第二樣本拼接特徵矩陣進行維度變換,得到第二樣本目標輪廓特徵矩陣。Step 3.5: Perform splicing processing on the middle skeleton feature matrix of the first sample and the target contour feature matrix of the first sample to obtain the second sample splicing feature matrix, and use the second basic transformation neural network to perform the splicing feature matrix of the second sample The dimension transformation is used to obtain the target contour feature matrix of the second sample.
步驟3.6:基於第二樣本目標骨骼特徵矩陣,確定骨骼關鍵點的第三預測位置訊息;以及基於第二樣本目標輪廓特徵矩陣,確定輪廓關鍵點的第三預測位置訊息。Step 3.6: Determine the third predicted position information of the skeleton key point based on the second sample target skeleton feature matrix; and determine the third predicted position information of the contour key point based on the second sample target contour feature matrix.
步驟3.7:基於骨骼關鍵點的實際位置訊息、以及骨骼關鍵點的第三預測位置訊息,確定第五損失;以及基於輪廓關鍵點的實際位置訊息、以及輪廓關鍵點的第三預測位置訊息,確定第六損失。Step 3.7: Determine the fifth loss based on the actual position information of the bone key points and the third predicted position information of the bone key points; and determine the fifth loss based on the actual position information of the contour key points and the third predicted position information of the contour key points Sixth loss.
步驟3.8:基於第五損失、以及第六損失,對第一基礎卷積神經網路、第二基礎卷積神經網路、第一基礎變換神經網路、以及第二基礎變換神經網路進行本輪訓練;Step 3.8: Based on the fifth loss and the sixth loss, perform the calculation of the first basic convolutional neural network, the second basic convolutional neural network, the first basic transform neural network, and the second basic transform neural network Round training
經過對第一基礎卷積神經網路、第二基礎卷積神經網路、第一基礎變換神經網路、以及第二基礎變換神經網路的多輪訓練,得到特徵融合神經網路。After multiple rounds of training on the first basic convolutional neural network, the second basic convolutional neural network, the first basic transformation neural network, and the second basic transformation neural network, a feature fusion neural network is obtained.
此處,第五損失為圖5中的LS2;第六損失為圖5中的LC2。Here, the fifth loss is LS2 in FIG. 5; the sixth loss is LC2 in FIG. 5.
M2:M2:
參見圖7所示,本公開實施例提供的另一種特徵融合神經網路的具體結構,包括:第一定向卷積神經網路、第二定向卷積神經網路、第三卷積神經網路、第四卷積神經網路、第三變換神經網路、以及第四變換神經網路。Referring to FIG. 7, the specific structure of another feature fusion neural network provided by the embodiment of the present disclosure includes: a first directional convolutional neural network, a second directional convolutional neural network, and a third convolutional neural network Way, the fourth convolutional neural network, the third transform neural network, and the fourth transform neural network.
參見圖8所示,本公開實施例還提供一種基於圖7提供的特徵融合神經網路,對第一目標骨骼特徵矩陣、以及第一目標輪廓特徵矩陣進行特徵融合,得到第二目標骨骼特徵矩陣和第二目標輪廓特徵矩陣的具體方法,包括以下步驟。Referring to FIG. 8, an embodiment of the present disclosure also provides a feature fusion neural network based on the feature fusion neural network provided in FIG. 7 to perform feature fusion on a first target skeleton feature matrix and a first target contour feature matrix to obtain a second target skeleton feature matrix And the specific method of the second target contour feature matrix includes the following steps.
步驟S801:使用第一定向卷積神經網路對第一目標骨骼特徵矩陣進行定向卷積處理,得到第一定向骨骼特徵矩陣。使用第三卷積神經網路對第一定向骨骼特徵矩陣進行卷積處理,得到第二中間骨骼特徵矩陣。執行步驟S804。Step S801: Use the first directional convolutional neural network to perform directional convolution processing on the first target bone feature matrix to obtain the first directional bone feature matrix. Use the third convolutional neural network to perform convolution processing on the first directional skeleton feature matrix to obtain the second intermediate skeleton feature matrix. Step S804 is executed.
步驟S802:使用第二定向卷積神經網路對第一目標輪廓特徵矩陣進行定向卷積處理,得到第一定向輪廓特徵矩陣;並使用第四卷積神經網路對第一定向輪廓特徵矩陣進行卷積處理,得到第二中間輪廓特徵矩陣。執行步驟S803。Step S802: Use the second directional convolutional neural network to perform directional convolution processing on the first target contour feature matrix to obtain the first directional contour feature matrix; and use the fourth convolutional neural network to analyze the first directional contour feature The matrix is subjected to convolution processing to obtain the second intermediate contour feature matrix. Step S803 is executed.
步驟S803:將第二中間輪廓特徵矩陣與第一目標骨骼特徵矩陣進行拼接處理,得到第三拼接特徵矩陣;並使用第三變換神經網路對第三拼接特徵矩陣進行維度變換,得到第二目標骨骼特徵矩陣。Step S803: Perform splicing processing on the second intermediate contour feature matrix and the first target bone feature matrix to obtain a third spliced feature matrix; and use the third transformation neural network to perform dimensional transformation on the third spliced feature matrix to obtain the second target Bone feature matrix.
步驟S804:將第二中間骨骼特徵矩陣與第一目標輪廓特徵矩陣進行拼接處理,得到第四拼接特徵矩陣,並使用第四變換神經網路對第四拼接特徵矩陣進行維度變換,得到第二目標輪廓特徵矩陣。Step S804: Perform splicing processing on the second intermediate skeleton feature matrix and the first target contour feature matrix to obtain a fourth spliced feature matrix, and use the fourth transform neural network to perform dimensional transformation on the fourth spliced feature matrix to obtain the second target Contour feature matrix.
在具體實施中,在將骨骼特徵和輪廓特徵進行特徵融合的過程中,由於骨骼關鍵點通常集中在人體的骨架,而輪廓關鍵點則集中在人體的輪廓,也即分佈在骨架周圍。因此需要針對骨骼特徵和輪廓特徵分別進行局部空間變換。例如,將骨骼特徵變換至輪廓特徵在輪廓特徵矩陣中的位置,並將輪廓特徵變換至骨骼特徵在骨骼特徵矩陣中的位置,以更好的提取出骨骼特徵和輪廓特徵,實現骨骼特徵和輪廓特徵的融合。In a specific implementation, in the process of feature fusion of bone features and contour features, the bone key points are usually concentrated on the skeleton of the human body, while the contour key points are concentrated on the outline of the human body, that is, distributed around the skeleton. Therefore, it is necessary to perform local spatial transformations for bone features and contour features respectively. For example, transform the bone feature to the position of the contour feature in the contour feature matrix, and transform the contour feature to the position of the bone feature in the bone feature matrix, so as to better extract the bone features and contour features, and realize the bone features and contours. The fusion of features.
為了實現該目的,本公開實施例首先使用第一定向卷積神經網路對第一目標骨骼特徵矩陣進行定向卷積處理;該定向卷積能夠有效的在特徵層面實現骨骼特徵的定向空間變換。然後使用第三卷積神經網路對得到的第一定向骨骼特徵矩陣進行卷積處理,得到第二中間骨骼特徵矩陣。該種情況下,由於已經通過第一定向卷積層對骨骼特徵進行了定向空間變換,因此骨骼特徵實際上是向輪廓特徵方向發生了移動。然後,將第二中間骨骼特徵矩陣和第一目標輪廓特徵矩陣進行拼接處理,得到第四拼接特徵矩陣。第四拼接特徵矩陣在包括輪廓特徵的同時,還包括了進行了定向空間變換的骨骼特徵。然後使用第四變換神經網路對第四拼接特徵矩陣進行維度變換,也即從第四拼接特徵矩陣中,再一次提取輪廓特徵。以這種方式得到的第二目標輪廓特徵矩陣會受到骨骼特徵的影響,實現了骨骼特徵和輪廓特徵之間的融合。In order to achieve this objective, the embodiments of the present disclosure first use the first directional convolutional neural network to perform directional convolution processing on the first target bone feature matrix; the directional convolution can effectively realize the directional spatial transformation of bone features at the feature level . Then, a third convolutional neural network is used to perform convolution processing on the obtained first directional skeleton feature matrix to obtain a second intermediate skeleton feature matrix. In this case, since the skeletal feature has been oriented spatially transformed through the first directional convolution layer, the skeletal feature actually moves in the direction of the contour feature. Then, the second middle skeleton feature matrix and the first target contour feature matrix are spliced to obtain a fourth spliced feature matrix. The fourth splicing feature matrix includes not only contour features, but also bone features that have undergone directional spatial transformation. Then, the fourth transformation neural network is used to perform dimensional transformation on the fourth splicing feature matrix, that is, from the fourth splicing feature matrix, the contour feature is extracted again. The second target contour feature matrix obtained in this way will be affected by the bone features, and the fusion between the bone features and the contour features is realized.
同理,本公開實施例首先使用第二定向卷積神經網路對第一目標輪廓特徵矩陣進行定向卷積處理,該定向卷積能夠有效的在特徵層面實現輪廓特徵的定向空間變換。然後使用第四卷積神經網路對得到的第一定向輪廓特徵矩陣進行卷積處理,得到第二中間輪廓特徵矩陣。該種情況下,由於已經通過第二定向卷積層對輪廓特徵進行了定向空間變換,因此輪廓特徵實際上向骨骼特徵方向發生了移動。然後,對第二中間輪廓特徵矩陣和第一目標骨骼特徵矩陣進行拼接處理,得到第三拼接特徵矩陣。第三拼接特徵矩陣在包括骨骼特徵的同時,還包括了進行了定向空間變換的輪廓特徵。然後使用第三變換神經網路對第三拼接特徵矩陣進行維度變換,也即從第三拼接特徵矩陣中,再一次提取骨骼特徵。以這種方式得到的第二目標骨骼特徵矩陣會受到輪廓特徵的影響,實現了骨骼特徵和輪廓特徵之間的融合。In the same way, the embodiment of the present disclosure first uses the second directional convolutional neural network to perform directional convolution processing on the first target contour feature matrix, and the directional convolution can effectively realize the directional spatial transformation of the contour feature at the feature level. Then, a fourth convolutional neural network is used to perform convolution processing on the obtained first directional contour feature matrix to obtain a second intermediate contour feature matrix. In this case, since the contour feature has been oriented spatially transformed through the second directional convolution layer, the contour feature has actually moved in the direction of the bone feature. Then, the second intermediate contour feature matrix and the first target bone feature matrix are spliced to obtain a third spliced feature matrix. The third splicing feature matrix includes not only the bone features, but also the contour features that have undergone directional spatial transformation. Then, the third transformation neural network is used to perform dimensional transformation on the third splicing feature matrix, that is, the bone features are extracted again from the third splicing feature matrix. The second target bone feature matrix obtained in this way will be affected by the contour feature, and the fusion between the bone feature and the contour feature is realized.
具體地,定向卷積由多次迭代卷積步驟組成,有效的定向卷積滿足下述要求:Specifically, directional convolution consists of multiple iterative convolution steps, and effective directional convolution meets the following requirements:
(1)在每次迭代卷積步驟中,僅更新特徵矩陣中的一組元素的元素值;(1) In each iteration of the convolution step, only update the element value of a group of elements in the feature matrix;
(2)在最後一次迭代卷積步驟之後,所有元素的元素值應當只更新一次。(2) After the last iteration of the convolution step, the element values of all elements should be updated only once.
以對第一目標骨骼特徵矩陣進行定向卷積為例,為了實現定向卷積過程,可以定義一特徵函數序列,用於控制元素的更新順序。其中,函數Fk 的輸入是第一目標骨骼特徵矩陣中各元素的位置,而函數Fk 的輸出表示是否更新第k 次迭代中的元素。該輸出可以是1或0;1代表更新,0代表不更新。具體而言,在第k 次迭代過程中,只更新Fk = 1的區域中元素的元素值,而保持其他區域中元素的元素值不變。第i次迭代的更新可以表示為:Taking the directional convolution of the first target bone feature matrix as an example, in order to realize the directional convolution process, a feature function sequence can be defined , Used to control the update sequence of elements. Among them, the input of the function F k is the position of each element in the first target skeleton feature matrix, and the output of the function F k indicates whether to update the element in the kth iteration. The output can be 1 or 0; 1 means update, 0 means no update. Specifically, in the k- th iteration process, only the element values of the elements in the region of F k = 1 are updated, while the element values of the elements in other regions are kept unchanged. The update of the i-th iteration can be expressed as:
其中,T0 (X) =X ,X 表示定向卷積的輸入,也即第一目標骨骼特徵矩陣;W 和b 分別表示多次迭代過程中的共享權重和偏差。Among them, T 0 (X) = X , X represents the input of directional convolution, that is, the first target bone feature matrix; W and b respectively represent the shared weight and bias during multiple iterations.
為了實現骨骼特徵和輪廓特徵的融合,可以設定一對對稱的定向卷積算子,也即上述特徵函數序列,分別為散射卷積算子,和聚集卷積算子。其中,散射卷積算子負責由內向外依次更新特徵矩陣中的元素;而聚集卷積算子由外向內依次更新特徵矩陣中的元素。In order to achieve the fusion of bone features and contour features, a pair of symmetrical directional convolution operators can be set, that is, the sequence of the above-mentioned feature functions , Respectively, the scattering convolution operator , And the aggregation convolution operator . Among them, the scattering convolution operator is responsible for sequentially updating the elements in the feature matrix from the inside to the outside; while the gathering convolution operator sequentially updates the elements in the feature matrix from the outside to the inside.
在使用第一定向卷積神經網路對第一目標骨骼特徵矩陣進行定向卷積處理的情況下,由於要將骨骼特徵元素定向空間變換至該元素周圍的位置(與輪廓特徵更相關的位置),因此使用散射卷積算子;在使用第二定向卷積神經網路對第一目標輪廓特徵矩陣進行定向卷積處理的情況下,由於要將輪廓特徵元素定向空間變換至輪廓特徵矩陣中間的位置(與骨骼特徵更相關的位置),因此使用聚集卷積算子。In the case of using the first directional convolutional neural network to perform directional convolution processing on the first target bone feature matrix, because the oriented space of the bone feature element is transformed to the position around the element (a position more related to the contour feature) ), so use the scattering convolution operator ; In the case of using the second directional convolutional neural network to perform directional convolution processing on the contour feature matrix of the first target, because the oriented space of the contour feature element is transformed to the position in the middle of the contour feature matrix (more relevant to the bone feature Position), so the aggregate convolution operator is used .
具體地,第一定向卷積神經網路對第一目標骨骼特徵矩陣進行定向卷積處理過程如下。Specifically, the first directional convolutional neural network performs directional convolution processing on the first target bone feature matrix as follows.
將第一目標骨骼特徵矩陣劃分為多個子矩陣,每個子矩陣被稱為一個網格;其中,若第一目標骨骼特徵矩陣為三維矩陣,三個維度的維數分別為:m、n、s,則第一目標骨骼特徵矩陣的維數被表示為m*n*s;若網格的大小為5,也即,每個網格的維數均可以被表示為5*5*s。Divide the first target bone feature matrix into multiple sub-matrices, and each sub-matrix is called a grid; among them, if the first target bone feature matrix is a three-dimensional matrix, the dimensions of the three dimensions are: m, n, s , The dimension of the first target skeleton feature matrix is expressed as m*n*s; if the size of the grid is 5, that is, the dimension of each grid can be expressed as 5*5*s.
然後針對每個網格,使用散射卷積算子進行多次迭代卷積,得到目標子矩陣。如圖9a所示,提供了一種使用散射卷積算子對網格大小為5的子矩陣中元素的元素值進行兩次迭代更新的過程。其中,圖9a中a表示原始子矩陣;b表示進行了一次迭代得到的子矩陣,c表示進行兩次迭代得到的子矩陣,也即目標子矩陣。Then for each grid, use the scattering convolution operator Perform multiple iterative convolutions to obtain the target sub-matrix. As shown in Figure 9a, it provides a way to use the scattering convolution operator The element value of the element in the sub-matrix with a grid size of 5 is updated twice iteratively. Among them, a in FIG. 9a represents the original sub-matrix; b represents the sub-matrix obtained by performing one iteration, and c represents the sub-matrix obtained by performing two iterations, that is, the target sub-matrix.
將各個網格對應的目標子矩陣拼接在一起,得到第一定向骨骼特徵矩陣。The target sub-matrices corresponding to each grid are spliced together to obtain the first directional bone feature matrix.
類似的,第二定向卷積神經網路對第一目標輪廓特徵矩陣進行定向卷積處理的過程如下。Similarly, the second directional convolutional neural network performs directional convolution processing on the first target contour feature matrix as follows.
將第一目標輪廓特徵矩陣劃分為多個子矩陣,每個子矩陣被稱為一個網格;其中,若第一目標輪廓特徵矩陣為三維矩陣,三個維度的維數分別為:m、n、s,則第一目標輪廓特徵矩陣的維數被表示為m*n*s;若網格的尺寸大小為5,也即,每個網格的維數均可以被表示為5*5*s。Divide the first target contour feature matrix into multiple sub-matrices, and each sub-matrix is called a grid; among them, if the first target contour feature matrix is a three-dimensional matrix, the dimensions of the three dimensions are: m, n, s , The dimension of the first target contour feature matrix is expressed as m*n*s; if the size of the grid is 5, that is, the dimension of each grid can be expressed as 5*5*s.
然後針對每個網格,使用聚集卷積算子進行多次迭代卷積,得到目標子矩陣。Then for each grid, use the aggregation convolution operator Perform multiple iterative convolutions to obtain the target sub-matrix.
如圖9b所示,提供了一種使用聚集卷積算子對網格大小為5的子矩陣中元素的元素值進行兩次迭代更新的過程。其中,圖9b中a表示原始子矩陣;b表示進行了一次迭代得到的子矩陣,c表示進行兩次迭代得到的子矩陣,也即目標子矩陣。As shown in Figure 9b, it provides a way to use the aggregated convolution operator The element value of the element in the sub-matrix with a grid size of 5 is updated twice iteratively. Among them, a in FIG. 9b represents the original sub-matrix; b represents the sub-matrix obtained by performing one iteration, and c represents the sub-matrix obtained by performing two iterations, that is, the target sub-matrix.
將各個網格對應的目標子矩陣拼接在一起,得到第一定向輪廓特徵矩陣。The target sub-matrices corresponding to each grid are spliced together to obtain the first directional contour feature matrix.
這裡需要注意的是,各個子矩陣的迭代卷積過程可以並行處理。It should be noted here that the iterative convolution process of each sub-matrix can be processed in parallel.
圖9a和圖9b中的示例,僅僅是使用散射卷積算子和聚集卷積算子對子矩陣中元素的元素值進行迭代更新的示例。The examples in Figure 9a and Figure 9b only use the scattering convolution operator And aggregate convolution operator An example of iteratively updating the element values of the elements in the sub-matrix.
另一種實施例中,可以採用下述方式對特徵融合神經網路進行單獨訓練。In another embodiment, the feature fusion neural network can be individually trained in the following manner.
步驟4.1:獲取多張樣本圖像的第一樣本目標骨骼特徵矩陣、以及第一樣本目標輪廓特徵矩陣。Step 4.1: Obtain the first sample target bone feature matrix and the first sample target contour feature matrix of the multiple sample images.
獲取方式與上述實施例中獲取第一目標骨骼特徵矩陣、第一目標輪廓特徵矩陣的方式類似,在此不再贅述。可以在與第一特徵提取網路進行聯合訓練的情況下獲取,也可以使用預訓練的第一特徵提取網路獲取。The method of obtaining is similar to the method of obtaining the first target skeleton feature matrix and the first target contour feature matrix in the foregoing embodiment, and will not be repeated here. It can be obtained in the case of joint training with the first feature extraction network, or can be obtained using a pre-trained first feature extraction network.
步驟4.2:使用第一基礎定向卷積神經網路對第一樣本目標骨骼特徵矩陣進行定向卷積處理,得到第一樣本定向骨骼特徵矩陣;使用第一樣本定向骨骼特徵矩陣,以及輪廓關鍵點的實際位置訊息,得到第七損失。並基於第七損失,對第一基礎定向卷積神經網路進行本輪訓練。Step 4.2: Use the first basic directional convolutional neural network to perform directional convolution processing on the first sample target bone feature matrix to obtain the first sample oriented bone feature matrix; use the first sample oriented bone feature matrix and contour The actual position information of the key point, the seventh loss is obtained. And based on the seventh loss, the first basic directional convolutional neural network is trained in this round.
此處,第七損失為圖7中的LC3。Here, the seventh loss is LC3 in FIG. 7.
這裡,使用第一基礎定向卷積神經網路對第一樣本目標骨骼特徵矩陣進行定向卷積處理,也即將第一樣本目標骨骼特徵矩陣進行定向空間變換。該種情況下,要使得得到的第一樣本定向骨骼特徵矩陣表徵的關鍵點的位置訊息,盡可能的與輪廓關鍵點的位置訊息保持一致。因此要基於第一樣本定向骨骼特徵矩陣,以及輪廓關鍵點的實際位置訊息,得到第七損失,使用第七損失,監督對第一基礎定向卷積神經網路的訓練。Here, the first basic directional convolutional neural network is used to perform directional convolution processing on the first sample target bone feature matrix, that is, to perform directional space transformation on the first sample target bone feature matrix. In this case, it is necessary to make the obtained position information of the key points represented by the oriented bone feature matrix of the first sample consistent with the position information of the key points of the contour as much as possible. Therefore, it is necessary to obtain the seventh loss based on the first sample orientation bone feature matrix and the actual position information of the outline key points, and use the seventh loss to supervise the training of the first basic orientation convolutional neural network.
步驟4.3:使用第二基礎定向卷積神經網路對第一樣本目標輪廓特徵矩陣進行定向卷積處理,得到第一樣本定向輪廓特徵矩陣;使用第一樣本定向輪廓特徵矩陣,以及骨骼關鍵點的實際位置訊息,得到第八損失。並基於第八損失,對第二基礎定向卷積神經網路進行本輪訓練。Step 4.3: Use the second basic directional convolutional neural network to perform directional convolution processing on the first sample target contour feature matrix to obtain the first sample oriented contour feature matrix; use the first sample oriented contour feature matrix, and the skeleton The actual position information of the key point, get the eighth loss. And based on the eighth loss, the second basic directional convolutional neural network is trained in this round.
此處,第八損失為圖7中的LS3。Here, the eighth loss is LS3 in FIG. 7.
步驟4.4:使用第四基礎卷積神經網路對第一樣本定向輪廓特徵矩陣進行卷積處理,得到第二樣本中間輪廓特徵矩陣;以及將得到的第二樣本中間輪廓特徵矩陣與第一樣本目標骨骼特徵矩陣進行拼接處理,得到第三樣本拼接特徵矩陣,並使用第三基礎變換神經網路對第三樣本拼接特徵矩陣進行維度變換,得到第二樣本目標骨骼特徵矩陣。Step 4.4: Use the fourth basic convolutional neural network to perform convolution processing on the oriented contour feature matrix of the first sample to obtain the middle contour feature matrix of the second sample; and make the obtained middle contour feature matrix of the second sample the same as the first The target bone feature matrix is spliced to obtain the third sample spliced feature matrix, and the third basic transformation neural network is used to perform dimensional transformation on the third sample spliced feature matrix to obtain the second sample target bone feature matrix.
步驟4.5:基於第二樣本目標骨骼特徵矩陣確定骨骼關鍵點的第四預測位置訊息;基於骨骼關鍵點的實際位置訊息、以及骨骼關鍵點的第四預測位置訊息,確定第九損失。Step 4.5: Determine the fourth predicted position information of the bone key point based on the second sample target bone feature matrix; determine the ninth loss based on the actual position information of the bone key point and the fourth predicted position information of the bone key point.
此處,第九損失為圖7中LS4。Here, the ninth loss is LS4 in FIG. 7.
步驟4.6:使用第三基礎卷積神經網路對第一樣本定向骨骼特徵矩陣進行卷積處理,得到第二樣本中間骨骼特徵矩陣;以及將得到的第二樣本中間骨骼特徵矩陣與第一樣本目標輪廓特徵矩陣進行拼接處理,得到第四樣本拼接特徵矩陣,並使用第四基礎變換神經網路對第四樣本拼接特徵矩陣進行維度變換,得到第二樣本目標輪廓特徵矩陣。Step 4.6: Use the third basic convolutional neural network to perform convolution processing on the oriented skeleton feature matrix of the first sample to obtain the middle skeleton feature matrix of the second sample; and make the obtained middle skeleton feature matrix of the second sample the same as the first sample The target contour feature matrix is spliced to obtain the fourth sample spliced feature matrix, and the fourth sample spliced feature matrix is dimensionally transformed using the fourth basic transformation neural network to obtain the second sample target contour feature matrix.
步驟4.7:基於第二樣本目標輪廓特徵矩陣確定輪廓關鍵點的第四預測位置訊息;基於輪廓關鍵點的實際位置訊息、以及輪廓關鍵點的第四預測位置訊息,確定第十損失。Step 4.7: Determine the fourth predicted position information of the contour key point based on the second sample target contour feature matrix; determine the tenth loss based on the actual position information of the contour key point and the fourth predicted position information of the contour key point.
此處,第十損失為圖7中LC4。Here, the tenth loss is LC4 in FIG. 7.
步驟4.8:基於第九損失和第十損失,對第三基礎卷積神經網路、第四基礎卷積神經網路、第三基礎變換神經網路、以及第四基礎變換神經網路進行本輪訓練。Step 4.8: Based on the ninth loss and the tenth loss, perform this round on the third basic convolutional neural network, the fourth basic convolutional neural network, the third basic transform neural network, and the fourth basic transform neural network train.
經過對第一基礎定向卷積神經網路、第二基礎定向卷積神經網路、第三基礎卷積神經網路、第四基礎卷積神經網路、第三基礎變換神經網路、以及第四基礎變換神經網路進行多輪訓練,得到訓練好的特徵融合神經網路。After the first basic directional convolutional neural network, the second basic directional convolutional neural network, the third basic convolutional neural network, the fourth basic convolutional neural network, the third basic transform neural network, and the first The four basic transformation neural network performs multiple rounds of training to obtain a well-trained feature fusion neural network.
M3:M3:
參見圖10所示,本公開實施例提供的另一種特徵融合神經網路的具體結構,包括:位移估計神經網路、第五變換神經網路。Referring to FIG. 10, another specific structure of a feature fusion neural network provided by an embodiment of the present disclosure includes: a displacement estimation neural network and a fifth transformation neural network.
參見圖11所示,本公開實施例還提供一種基於圖10提供的特徵融合神經網路,對第一目標骨骼特徵矩陣、以及第一目標輪廓特徵矩陣進行特徵融合,得到第二目標骨骼特徵矩陣和第二目標輪廓特徵矩陣的具體方法,包括以下步驟。As shown in FIG. 11, an embodiment of the present disclosure also provides a feature fusion neural network based on the feature fusion neural network provided in FIG. 10 to perform feature fusion on a first target skeleton feature matrix and a first target contour feature matrix to obtain a second target skeleton feature matrix And the specific method of the second target contour feature matrix includes the following steps.
步驟S1101:對第一目標骨骼特徵矩陣和第一目標輪廓特徵矩陣進行拼接處理,得到第五拼接特徵矩陣。Step S1101: Perform splicing processing on the first target skeleton feature matrix and the first target contour feature matrix to obtain a fifth spliced feature matrix.
步驟S1102:將第五拼接特徵矩陣輸入至位移估計神經網路中,對預先確定的多組關鍵點對進行位移估計,得到每組關鍵點對中的一個關鍵點移動至另一關鍵點的位移訊息;其中,每個關鍵點對中的兩個關鍵點位置相鄰,該兩個關鍵點包括一骨骼關鍵點和一輪廓關鍵點,或者包括兩個骨骼關鍵點,或者包括兩個輪廓關鍵點。Step S1102: Input the fifth splicing feature matrix into the displacement estimation neural network, perform displacement estimation on multiple sets of predetermined key point pairs, and obtain the displacement of one key point in each key point pair to another key point Message; where two keypoints in each keypoint pair are adjacent, the two keypoints include a bone keypoint and a contour keypoint, or two bone keypoints, or two contour keypoints .
在具體實施中,會預先為人體確定多個骨骼關鍵點和多個輪廓關鍵點。如圖12所示,提供一種預先為人體確定的多個骨骼關鍵點和輪廓關鍵點的示例。在該示例中,骨骼關鍵點有14個,以圖12中較大的圓點分別表示:頭頂、脖子、兩肩、雙肘、雙腕、雙胯部、雙膝、以及雙踝;輪廓關鍵點有26個,以圖12中較小的圓點表示。除了表徵人體頭頂的骨骼關鍵點外,其他每個骨骼關鍵點都會對應有兩個輪廓關鍵點。其中,雙跨的骨骼關鍵點與同一輪廓關鍵點對應。In specific implementation, multiple bone key points and multiple contour key points are determined in advance for the human body. As shown in FIG. 12, an example of multiple bone key points and contour key points determined in advance for the human body is provided. In this example, there are 14 bone key points, which are represented by the larger dots in Figure 12: the top of the head, the neck, the shoulders, the elbows, the wrists, the hips, the knees, and the ankles; the outline key There are 26 points, represented by the smaller dots in Figure 12. Except for the key points of the bones that characterize the top of the human head, there are two contour key points corresponding to each other bone key points. Among them, the bone key points of the double span correspond to the same contour key points.
位置相鄰的兩個關鍵點能夠構成一個關鍵點對。如圖12中,每兩個通過線段直接連接的關鍵點能夠構成一個關鍵點對。也即,關鍵點對的構成可能出現下述三種情況:(骨骼關鍵點、骨骼關鍵點)、(輪廓關鍵點、輪廓關鍵點)、(骨骼關鍵點、輪廓關鍵點)。Two key points adjacent to each other can form a key point pair. As shown in Figure 12, every two key points directly connected by a line segment can form a key point pair. That is, the composition of key point pairs may appear in the following three situations: (skeleton key points, bone key points), (contour key points, contour key points), (skeleton key points, contour key points).
位移估計神經網路包括多層卷積層,多層卷積層依次連接,用於對第五拼接特徵矩陣中的骨骼特徵和輪廓特徵進行特徵學習,得到每個關鍵點對中的一個關鍵點移動至另一個關鍵點的位移訊息。與每一個關鍵點對應的位移訊息有兩組。The displacement estimation neural network includes a multi-layer convolutional layer, which is connected in turn, and is used to perform feature learning on the bone features and contour features in the fifth splicing feature matrix, so that one key point in each key point pair is moved to another Displacement information of key points. There are two sets of displacement information corresponding to each key point.
例如,若關鍵點對為(P、Q),其中P和Q分別表示一個關鍵點。則該關鍵點對的位移訊息包括:從P移動至Q的位移訊息,和從Q移動至P的位移訊息。For example, if the key point pair is (P, Q), where P and Q respectively represent a key point. Then the displacement information of the key point pair includes: the displacement information from P to Q, and the displacement information from Q to P.
每一組位移訊息均包括移動方向和移動距離。Each group of displacement information includes the moving direction and the moving distance.
步驟S1103:將每組關鍵點對中的每個關鍵點分別作為當前關鍵點,從與該當前關鍵點配對的另一關鍵點對應的三維特徵矩陣中,獲取與配對的另一關鍵點對應的二維特徵矩陣;其中,若配對的另一關鍵點為骨骼關鍵點,則該骨骼關鍵點對應的三維特徵矩陣為第一骨骼特徵矩陣;若配對的另一關鍵點為輪廓關鍵點,則該輪廓關鍵點對應的三維特徵矩陣為第一輪廓特徵矩陣。Step S1103: Use each key point in each key point pair as the current key point, and obtain the corresponding key point corresponding to the other key point from the three-dimensional feature matrix corresponding to the other key point paired with the current key point. Two-dimensional feature matrix; among them, if the other key point of the pair is a bone key point, the three-dimensional feature matrix corresponding to the bone key point is the first bone feature matrix; if the other key point of the pair is a contour key point, the The three-dimensional feature matrix corresponding to the contour key points is the first contour feature matrix.
步驟S1104:根據從配對的另一關鍵點到當前關鍵點的位移訊息,對配對的另一關鍵點對應的二維特徵矩陣中的元素進行位置變換,得到與該當前關鍵點對應的位移特徵矩陣。Step S1104: According to the displacement information from the other key point of the pair to the current key point, perform position transformation on the elements in the two-dimensional feature matrix corresponding to the other key point of the pair to obtain the displacement feature matrix corresponding to the current key point .
此處,仍然以關鍵點對(P、Q)為例,首先將P作為當前關鍵點,並從Q對應的三維特徵矩陣中,獲取與Q對應的二維特徵矩陣。Here, still taking the key point pair (P, Q) as an example, first take P as the current key point, and obtain the two-dimensional feature matrix corresponding to Q from the three-dimensional feature matrix corresponding to Q.
這裡,若Q為骨骼關鍵點,則Q對應的三維特徵矩陣為第一骨骼特徵矩陣(見上述步驟S402)。若Q為輪廓關鍵點,則Q對應的三維特徵矩陣為第一輪廓特徵矩陣(見上述步驟S403)。Here, if Q is a bone key point, the three-dimensional feature matrix corresponding to Q is the first bone feature matrix (see step S402 above). If Q is the contour key point, the three-dimensional feature matrix corresponding to Q is the first contour feature matrix (see step S403 above).
這裡,在Q為骨骼關鍵點的情況下,將第一骨骼特徵矩陣作為Q的三維特徵矩陣,並從第一骨骼特徵矩陣中,得到Q的二維特徵矩陣。這是由於第一骨骼特徵矩陣中,僅包括了骨骼特徵,能夠使得後續處理過程中學習到的骨骼特徵更加的有針對性。同理,在Q為輪廓關鍵點的情況下,將第一輪廓特徵矩陣作為Q的三維特徵矩陣,並從第一輪廓特徵矩陣中得到Q的二維特徵矩陣。這是由於第一輪廓特徵矩陣中僅包括了輪廓特徵,使得後續處理過程中學習到的輪廓特徵更具有針對性。Here, when Q is the key point of the skeleton, the first skeleton feature matrix is taken as the three-dimensional feature matrix of Q, and the two-dimensional feature matrix of Q is obtained from the first skeleton feature matrix. This is because the first bone feature matrix only includes bone features, which can make the bone features learned in the subsequent processing more targeted. Similarly, when Q is the key point of the contour, the first contour feature matrix is taken as the three-dimensional feature matrix of Q, and the two-dimensional feature matrix of Q is obtained from the first contour feature matrix. This is because only the contour features are included in the first contour feature matrix, which makes the contour features learned in the subsequent processing more pertinent.
在得到Q的二維特徵矩陣後,基於從Q移動至P的位移訊息,對Q的二維特徵矩陣中的元素進行位置變換,得到P對應的位移特徵矩陣。After the two-dimensional feature matrix of Q is obtained, based on the displacement information from Q to P, the positions of the elements in the two-dimensional feature matrix of Q are transformed to obtain the displacement feature matrix corresponding to P.
例如圖13所示,若從Q移動至P的位移訊息為(2,3)其中,2表示在第一維度上移動的距離為2,3表示在第二維度上移動的距離為3,則Q的二維特徵矩陣如圖13中a所示;對Q的二維特徵矩陣中的元素進行位置變換後,得到的與P對應的位移特徵矩陣如圖13中b所示。這裡僅以數字來進行位移訊息的相對表示,在實際實施中,應當結合具體方案來理解位移訊息,例如,位移訊息「2」可以指2個元素、2個單元格等等。For example, as shown in Figure 13, if the displacement information from Q to P is (2, 3), where 2 means that the distance moved in the first dimension is 2, and 3 means that the distance moved in the second dimension is 3, then The two-dimensional feature matrix of Q is shown in Figure 13 a; after the position transformation of the elements in the two-dimensional feature matrix of Q, the displacement feature matrix corresponding to P is obtained as shown in Figure 13 b. Here, only numbers are used to represent the relative displacement information. In actual implementation, the displacement information should be understood in conjunction with specific solutions. For example, the displacement information "2" can refer to 2 elements, 2 cells, and so on.
然後在將Q作為當前關鍵點,並從P對應的三維特徵矩陣中,獲取P對應的二維特徵矩陣。然後基於從P移動至Q的位移訊息,對P的二維特徵矩陣中的元素進行位置變換,得到Q對應的位移特徵矩陣Q。Then, taking Q as the current key point, and obtaining the two-dimensional feature matrix corresponding to P from the three-dimensional feature matrix corresponding to P. Then, based on the displacement information from P to Q, position transformation is performed on the elements in the two-dimensional characteristic matrix of P to obtain the displacement characteristic matrix Q corresponding to Q.
如此,能夠生成每個骨骼關鍵點對應的位移特徵矩陣,和每個輪廓關鍵點對應的位移特徵矩陣。In this way, the displacement characteristic matrix corresponding to each bone key point and the displacement characteristic matrix corresponding to each contour key point can be generated.
這裡需要注意的是,每個骨骼關鍵點,可能會與多個關鍵點分別成對,因此,得到的每個骨骼關鍵點的位移特徵矩陣也可能會有多個;每個輪廓關鍵點,也可能會與多個關鍵點分別成對,因此得到的每個輪廓關鍵點的位移特徵矩陣也可能會有多個。且不同的輪廓關鍵點,對應的位移特徵矩陣的數量可能不同;不同的骨骼關鍵點,對應的位移特徵矩陣的數量也可能會有所不同。It should be noted here that each bone key point may be paired with multiple key points. Therefore, there may be multiple displacement feature matrices for each bone key point; each contour key point may also be It may be paired with multiple key points, so there may be multiple displacement feature matrices for each contour key point. And for different contour key points, the number of corresponding displacement feature matrices may be different; for different bone key points, the number of corresponding displacement feature matrices may also be different.
步驟S1105:針對每個骨骼關鍵點,將該骨骼關鍵點對應的二維特徵矩陣,與該骨骼關鍵點對應的各個位移特徵矩陣進行拼接處理,得到該骨骼關鍵點的拼接二維特徵矩陣;並將該骨骼關鍵點的拼接二維特徵矩陣輸入至第五變換神經網路,得到與該骨骼關鍵點對應的目標二維特徵矩陣;基於各個骨骼關鍵點分別對應的目標二維特徵矩陣,生成第二目標骨骼特徵矩陣。Step S1105: For each bone key point, the two-dimensional feature matrix corresponding to the bone key point and each displacement feature matrix corresponding to the bone key point are spliced to obtain a spliced two-dimensional feature matrix of the bone key point; and The spliced two-dimensional feature matrix of the bone key points is input to the fifth transform neural network to obtain the target two-dimensional feature matrix corresponding to the bone key point; based on the target two-dimensional feature matrix corresponding to each bone key point, the first two-dimensional feature matrix is generated. 2. The target bone feature matrix.
步驟S1106:針對每個輪廓關鍵點,將該輪廓關鍵點對應的二維特徵矩陣,與該輪廓關鍵點對應的各個位移特徵矩陣進行拼接處理,得到該輪廓關鍵點的拼接二維特徵矩陣;並將該輪廓關鍵點的拼接二維特徵矩陣輸入至第五變換神經網路,得到與該輪廓關鍵點對應的目標二維特徵矩陣;基於各個輪廓關鍵點分別對應的目標二維特徵矩陣,生成第二目標輪廓特徵矩陣。Step S1106: For each contour key point, perform splicing processing on the two-dimensional feature matrix corresponding to the contour key point and each displacement feature matrix corresponding to the contour key point to obtain the spliced two-dimensional feature matrix of the contour key point; and The spliced two-dimensional feature matrix of the contour key points is input into the fifth transform neural network to obtain the target two-dimensional feature matrix corresponding to the contour key point; based on the target two-dimensional feature matrix corresponding to each contour key point, the first 2. Target contour feature matrix.
例如,若P為骨骼關鍵點,且P對應的二維特徵矩陣為P’,P位於三個關鍵點對中,則基於上述過程,能夠得到P的三個位移特徵矩陣,分別為P1’、P2’和P3’,則將P’、P1’、P2’和P3’進行拼接,得到P的拼接二維特徵矩陣。該種情況下,P的三個位移特徵矩陣中,可能既有對骨骼關鍵點對應的二維特徵矩陣中的元素進行位置變換得到的,也有對輪廓關鍵點對應的二維特徵矩陣中的元素進行位置變換得到的。因此,將P’、P1’、P2’和P3’進行拼接,使得與P在位置上相鄰的各個關鍵點的特徵融合在一起。再使用第五變換神經網路對P的拼接二維特徵矩陣進行卷積處理,使得得到的P的目標二維特徵矩陣既包含了骨骼特徵,又包含了輪廓特徵,實現了骨骼特徵和輪廓特徵的融合。For example, if P is a bone key point, and the two-dimensional feature matrix corresponding to P is P', and P is located in three key point pairs, then based on the above process, three displacement feature matrices of P can be obtained, which are P1', P2' and P3', then P', P1', P2', and P3' are spliced to obtain the spliced two-dimensional feature matrix of P. In this case, the three displacement feature matrices of P may be obtained by transforming the positions of the elements in the two-dimensional feature matrix corresponding to the key points of the skeleton, as well as the elements in the two-dimensional feature matrix corresponding to the key points of the contour. Resulted from position transformation. Therefore, P', P1', P2', and P3' are spliced, so that the features of each key point adjacent to P in position are fused together. Then use the fifth transformation neural network to perform convolution processing on the spliced two-dimensional feature matrix of P, so that the obtained target two-dimensional feature matrix of P contains both bone features and contour features, realizing bone features and contour features Fusion.
同理,若P為輪廓關鍵點,也能夠基於上述過程,實現骨骼特徵和輪廓特徵的融合。In the same way, if P is the contour key point, the fusion of bone features and contour features can also be realized based on the above process.
另一種實施例中,可以採用下述方式對特徵融合神經網路進行單獨訓練。In another embodiment, the feature fusion neural network can be individually trained in the following manner.
步驟5.1:獲取多張樣本圖像的第一樣本目標骨骼特徵矩陣、以及第一樣本目標輪廓特徵矩陣。Step 5.1: Obtain the first sample target bone feature matrix and the first sample target contour feature matrix of multiple sample images.
獲取方式與上述實施例中獲取第一目標骨骼特徵矩陣、第一目標輪廓特徵矩陣的方式類似,在此不再贅述。可以在與第一特徵提取網路進行聯合訓練的情況下獲取,也可以使用預訓練的第一特徵提取網路獲取。The method of obtaining is similar to the method of obtaining the first target skeleton feature matrix and the first target contour feature matrix in the foregoing embodiment, and will not be repeated here. It can be obtained in the case of joint training with the first feature extraction network, or can be obtained using a pre-trained first feature extraction network.
步驟5.2:對第一樣本目標骨骼特徵矩陣和第一樣本目標輪廓特徵矩陣進行拼接處理,得到第五樣本拼接特徵矩陣。Step 5.2: Perform splicing processing on the first sample target skeleton feature matrix and the first sample target contour feature matrix to obtain the fifth sample spliced feature matrix.
步驟5.3:將第五樣本拼接特徵矩陣輸入至基礎位移估計神經網路中,對預先確定的多組關鍵點對進行位移估計,得到每組關鍵點對中的一個關鍵點移動至另一關鍵點的預測位移訊息;其中,每個關鍵點對中的兩個關鍵點位置相鄰,該兩個關鍵點包括一骨骼關鍵點和一輪廓關鍵點,或者包括兩個骨骼關鍵點,或者包括兩個輪廓關鍵點。Step 5.3: Input the fifth sample splicing feature matrix into the basic displacement estimation neural network, perform displacement estimation on multiple sets of predetermined key point pairs, and obtain that one key point of each key point pair moves to another key point The predicted displacement information of each keypoint; where two keypoints in each keypoint pair are adjacent, the two keypoints include a bone keypoint and a contour keypoint, or include two bone keypoints, or two Outline key points.
步驟5.4:將每組關鍵點對中的每個關鍵點分別作為當前關鍵點,從與該當前關鍵點配對的另一關鍵點對應的樣本三維特徵矩陣中,獲取與配對的另一關鍵點對應的樣本二維特徵矩陣。Step 5.4: Take each key point in each key point pair as the current key point, and obtain the corresponding to the other key point of the pair from the sample three-dimensional feature matrix corresponding to the other key point paired with the current key point The two-dimensional feature matrix of the sample.
步驟5.5:根據從配對的另一關鍵點到當前關鍵點的預測位移訊息,對配對的另一關鍵點對應的樣本二維特徵矩陣中的元素進行位置變換,得到與該當前關鍵點對應的樣本位移特徵矩陣。Step 5.5: According to the predicted displacement information from the other key point of the pair to the current key point, perform position transformation on the element in the two-dimensional feature matrix of the sample corresponding to the other key point of the pair to obtain the sample corresponding to the current key point Displacement feature matrix.
步驟5.6:根據當前關鍵點對應的樣本位移特徵矩陣,以及與當前關鍵點對應的樣本二維特徵矩陣,確定位移損失。Step 5.6: Determine the displacement loss according to the sample displacement characteristic matrix corresponding to the current key point and the sample two-dimensional characteristic matrix corresponding to the current key point.
步驟5.7:基於位移損失,對位移估計神經網路進行本輪訓練。Step 5.7: Based on the displacement loss, perform this round of training on the displacement estimation neural network.
步驟5.8:針對每個骨骼關鍵點,將該骨骼關鍵點對應的樣本二維特徵矩陣,與該骨骼關鍵點對應的各個樣本位移特徵矩陣進行拼接處理,得到該骨骼關鍵點的樣本拼接二維特徵矩陣;並將該骨骼關鍵點的樣本拼接二維特徵矩陣輸入至第五基礎變換神經網路,得到與該骨骼關鍵點對應的樣本目標二維特徵矩陣;基於各個骨骼關鍵點分別對應的樣本目標二維特徵矩陣,生成第二樣本目標骨骼特徵矩陣。Step 5.8: For each bone key point, the sample two-dimensional feature matrix corresponding to the bone key point, and each sample displacement feature matrix corresponding to the bone key point are spliced to obtain the two-dimensional feature of the sample splicing of the bone key point Matrix; and input the sample splicing two-dimensional feature matrix of the bone key point into the fifth basic transformation neural network to obtain the two-dimensional feature matrix of the sample target corresponding to the bone key point; based on the sample target corresponding to each bone key point The two-dimensional feature matrix is used to generate the target bone feature matrix of the second sample.
步驟5.9:針對每個輪廓關鍵點,將該輪廓關鍵點對應的樣本二維特徵矩陣,與該輪廓關鍵點對應的各個樣本位移特徵矩陣進行拼接處理,得到該輪廓關鍵點的樣本拼接二維特徵矩陣;並將該輪廓關鍵點的樣本拼接二維特徵矩陣輸入至第五基礎變換神經網路,得到與該輪廓關鍵點對應的樣本目標二維特徵矩陣;基於各個輪廓關鍵點分別對應的樣本目標二維特徵矩陣,生成第二樣本目標輪廓特徵矩陣。Step 5.9: For each contour key point, the two-dimensional feature matrix of the sample corresponding to the contour key point and the displacement feature matrix of each sample corresponding to the contour key point are spliced to obtain the two-dimensional splicing feature of the sample of the contour key point Matrix; and input the two-dimensional feature matrix of the sample splicing of the contour key points into the fifth basic transformation neural network to obtain the two-dimensional feature matrix of the sample target corresponding to the contour key point; based on the sample target corresponding to each contour key point A two-dimensional feature matrix is used to generate a second sample target contour feature matrix.
步驟5.10:基於第二樣本目標骨骼特徵矩陣、第二樣本目標輪廓特徵矩陣、骨骼關鍵點的實際位置訊息、以及輪廓關鍵點的實際位置訊息,確定變換損失。例如,可以基於第二樣本目標骨骼特徵矩陣確定骨骼關鍵點的預測位置訊息,基於第二樣本目標輪廓特徵矩陣確定輪廓關鍵點的預測位置訊息。基於骨骼關鍵點的預測位置訊息、實際位置訊息,以及輪廓關鍵點的預測位置訊息、實際位置訊息,來確定變換損失。Step 5.10: Determine the transformation loss based on the second sample target skeleton feature matrix, the second sample target contour feature matrix, the actual position information of the bone key points, and the actual position information of the contour key points. For example, the predicted position information of the skeleton key points may be determined based on the second sample target skeleton feature matrix, and the predicted position information of the contour key points may be determined based on the second sample target contour feature matrix. Based on the predicted position information and actual position information of the bone key points, and the predicted position information and actual position information of the contour key points, the transformation loss is determined.
步驟5.11:基於變換損失,對第五基礎變換神經網路進行本輪訓練。Step 5.11: Based on the transformation loss, perform this round of training on the fifth basic transformation neural network.
步驟5.12:經過對基礎位移估計神經網路、第五基礎變換神經網路的多輪訓練,得到特徵融合神經網路。Step 5.12: After multiple rounds of training on the basic displacement estimation neural network and the fifth basic transformation neural network, a feature fusion neural network is obtained.
B:對待檢測圖像,進行多次特徵提取,並在每次進行特徵提取後,對該次特徵提取得到的骨骼特徵及輪廓特徵進行特徵融合,並基於最後一次特徵融合的特徵融合結果,確定骨骼關鍵點的位置訊息、以及輪廓關鍵點的位置訊息。B: Perform multiple feature extraction on the image to be detected, and after each feature extraction, perform feature fusion on the bone features and contour features obtained from this feature extraction, and determine based on the feature fusion result of the last feature fusion The position information of the bone key points and the position information of the contour key points.
在進行多次特徵提取的情況下,基於第i次特徵融合的特徵融合結果進行第i+1次特徵提取,i為正整數。In the case of multiple feature extractions, the i+1th feature extraction is performed based on the feature fusion result of the i-th feature fusion, and i is a positive integer.
在B中,進行第一次特徵提取的過程,與上述A中對待檢測圖像提取骨骼特徵和輪廓特徵的過程一致,在此不再贅述。In B, the process of performing the first feature extraction is consistent with the process of extracting bone features and contour features from the image to be detected in A, and will not be repeated here.
在B中進行除第一次特徵提取外的其他各次特徵提取的具體過程,包括:The specific process of each feature extraction except the first feature extraction in B includes:
使用第二特徵提取網路從上一次特徵融合的特徵融合結果中,提取用於表徵人體骨骼特徵的骨骼關鍵點的第一目標骨骼特徵矩陣;並提取用於表徵人體輪廓特徵的輪廓關鍵點的第一目標輪廓特徵矩陣;Use the second feature extraction network to extract the first target skeleton feature matrix of the bone key points used to characterize the human skeleton from the feature fusion result of the last feature fusion; and extract the contour key points used to characterize the contour feature of the human body The contour feature matrix of the first target;
其中,第一特徵提取網路和第二特徵提取網路的網路參數不同,且不同次的特徵提取使用的第二特徵提取網路的網路參數不同。Wherein, the network parameters of the first feature extraction network and the second feature extraction network are different, and the network parameters of the second feature extraction network used for different times of feature extraction are different.
這裡,第一特徵提取網路和第二特徵提取網路均包括多層卷積層。第一特徵提取網路和第二特徵提取網路的網路參數例如包括但不限於:卷積層的數量、每一層卷積層使用的卷積核的大小、每一層卷積層使用的卷積核的數量等。Here, both the first feature extraction network and the second feature extraction network include multiple convolutional layers. The network parameters of the first feature extraction network and the second feature extraction network include, but are not limited to: the number of convolutional layers, the size of the convolution kernel used by each convolutional layer, and the convolution kernel used by each convolutional layer. Quantity etc.
參見圖14所示,本公開實施例提供一種第二特徵提取網路的結構示意圖。第二特徵提取網路包括:第二骨骼特徵提取網路、以及第二輪廓特徵提取網路。Referring to FIG. 14, an embodiment of the present disclosure provides a schematic structural diagram of a second feature extraction network. The second feature extraction network includes: a second bone feature extraction network and a second contour feature extraction network.
使用該第二特徵提取網路進行本次特徵提取的上一次特徵融合的特徵融合結果包括:第二目標骨骼特徵矩陣和第二目標輪廓特徵矩陣;具體得到第二目標骨骼特徵矩陣和第二目標輪廓特徵矩陣的過程參見上述A所示,在此不再贅述。The feature fusion result of the last feature fusion using the second feature extraction network for this feature extraction includes: the second target skeleton feature matrix and the second target contour feature matrix; specifically, the second target skeleton feature matrix and the second target are obtained The process of the contour feature matrix is shown in A above, and will not be repeated here.
使用該第二特徵提取網路從上一次特徵融合的特徵融合結果中,提取用於表徵人體骨骼特徵的骨骼關鍵點的第一目標骨骼特徵矩陣;並提取用於表徵人體輪廓特徵的輪廓關鍵點的第一目標輪廓特徵矩陣的具體過程例如為:Use the second feature extraction network to extract the first target bone feature matrix of the bone key points used to characterize the human bone features from the feature fusion result of the last feature fusion; and extract the contour key points used to characterize the contour features of the human body The specific process of the first target contour feature matrix is, for example:
使用第二骨骼特徵提取網路對上一次特徵融合得到的第二目標骨骼特徵矩陣進行卷積處理,得到第三骨骼特徵矩陣,並從第二骨骼特徵提取網路中的第三目標卷積層獲取第四骨骼特徵矩陣;基於第三骨骼特徵矩陣以及第四骨骼特徵矩陣,得到第五目標骨骼特徵矩陣。其中,第三目標卷積層為第二骨骼特徵提取網路中,除最後一層卷積層外的其他任一卷積層。Use the second bone feature extraction network to perform convolution processing on the second target bone feature matrix obtained from the last feature fusion to obtain the third bone feature matrix, which is obtained from the third target convolution layer in the second bone feature extraction network The fourth skeletal feature matrix: Based on the third skeletal feature matrix and the fourth skeletal feature matrix, the fifth target skeletal feature matrix is obtained. Among them, the third target convolutional layer is any convolutional layer except the last convolutional layer in the second bone feature extraction network.
使用第二輪廓特徵提取網路對上一次特徵融合得到的第二目標輪廓特徵矩陣進行卷積處理,得到第三輪廓特徵矩陣,並從第二輪廓特徵提取網路中的第四目標卷積層獲取第四輪廓特徵矩陣;基於第三輪廓特徵矩陣以及第四輪廓特徵矩陣,得到第六目標輪廓特徵矩陣。第四目標卷積層為第二輪廓特徵提取網路中,除最後一層卷積層外的其他任一卷積層。Use the second contour feature extraction network to perform convolution processing on the second target contour feature matrix obtained from the last feature fusion to obtain the third contour feature matrix, which is obtained from the fourth target convolution layer in the second contour feature extraction network The fourth contour feature matrix: Based on the third contour feature matrix and the fourth contour feature matrix, a sixth target contour feature matrix is obtained. The fourth target convolutional layer is any convolutional layer except the last convolutional layer in the second contour feature extraction network.
具體的處理方式與上述A中使用第一骨骼特徵提取網路從待檢測圖像中提取第一目標骨骼特徵矩陣及第一目標輪廓特徵矩陣的具體過程類似,在此不再贅述。The specific processing method is similar to the specific process of using the first skeleton feature extraction network to extract the first target skeleton feature matrix and the first target contour feature matrix from the image to be detected in A, and will not be repeated here.
以上實施例對於上述Ⅱ中確定骨骼關鍵點以及輪廓關鍵點的位置訊息的方式進行了描述。The above embodiments describe the method of determining the position information of the bone key points and the contour key points in the above II.
Ⅲ:在基於上述Ⅱ得到骨骼關鍵點的位置訊息和輪廓關鍵點的位置訊息後,可將各個骨骼關鍵點的位置,以及輪廓關鍵點的位置從待檢測圖像中確定出來。然後可以生成人體檢測結果。Ⅲ: After obtaining the position information of the bone key points and the position information of the contour key points based on the above Ⅱ, the position of each bone key point and the position of the contour key point can be determined from the image to be detected. The human body detection result can then be generated.
人體檢測結果包括下述一種或者多種:包括骨骼關鍵點標記、以及輪廓關鍵點標記的待檢測圖像;包括骨骼關鍵點的位置訊息以及輪廓關鍵點的位置訊息的數據組。The human body detection result includes one or more of the following: a to-be-detected image including bone key point markers and contour key point markers; a data group including position information of the bone key points and position information of the contour key points.
後續,還可以基於人體檢測結果,執行下述操作中一種或者多種:人體動作識別、人體姿態檢測、人體輪廓調整、人體圖像編輯、以及人體貼圖。Subsequently, based on the human body detection result, one or more of the following operations may be performed: human body motion recognition, human body posture detection, human body contour adjustment, human body image editing, and human body mapping.
此處,動作識別例如識別人體當前所作的動作,如打架、跑步等;人體姿態識別例如識別人體當前姿態,如臥倒、是否作出指定動作等;人體輪廓調整例如對人體的體型、身高進行調整等;人體圖像編輯例如對人體進行縮放、旋轉、剪裁等;人體貼圖例如將圖像A中的人體檢測出來後,將對應人體圖像粘貼至圖像B中。Here, action recognition, for example, recognizes the current actions of the human body, such as fighting, running, etc.; human body posture recognition, for example, recognizes the current posture of the human body, such as lying down, whether to make a specified action, etc.; human contour adjustment, such as adjusting the body shape and height, etc. ; Human body image editing, such as zooming, rotating, and cutting the human body; human body mapping, for example, after detecting the human body in image A, paste the corresponding human body image into image B.
本公開實施例能夠從待檢測圖像中,確定用於表徵人體骨骼結構的骨骼關鍵點的位置訊息、以及用於表徵人體輪廓的輪廓關鍵點的位置訊息,並基於骨骼關鍵點的位置訊息、以及輪廓關鍵點的位置訊息,生成人體檢測結果,在提升表徵精細度的同時,兼顧計算數據量。The embodiment of the present disclosure can determine the position information of the key points of the skeleton used to characterize the human skeletal structure and the position information of the key points of the outline used to characterize the outline of the human body from the image to be detected, and based on the position information of the key points of the bone, As well as the position information of key points of the contour, the human body detection result is generated, which improves the precision of the representation while taking into account the amount of calculation data.
另外,本公開實施方式中,由於是採用表徵人體骨骼結構的骨骼關鍵點的位置訊息,和表徵人體輪廓的輪廓關鍵點的位置訊息來得到人體檢測結果,表徵人體的訊息更加豐富,具有更廣闊的應用場景,如圖像編輯、人體體型調整等。In addition, in the embodiments of the present disclosure, since the position information of the key points of the bones that characterize the bone structure of the human body and the position information of the key points of the outline that characterize the outline of the human body are used to obtain the human body detection results, the information that characterizes the human body is richer and has a broader scope. Application scenarios, such as image editing, body shape adjustment, etc.
基於同一技術構思,本公開實施例中還提供了與人體檢測方法對應的人體檢測裝置,由於本公開實施例中的裝置解決問題的原理與本公開實施例上述人體檢測方法相似,因此裝置的實施可以參見方法的實施,重複之處不再贅述。Based on the same technical concept, the embodiment of the present disclosure also provides a human body detection device corresponding to the human body detection method. Since the principle of the device in the embodiment of the present disclosure to solve the problem is similar to the above-mentioned human body detection method of the embodiment of the present disclosure, the implementation of the device You can refer to the implementation of the method, and the repetition will not be repeated.
參照圖15所示,為本公開實施例提供的一種人體檢測裝置的示意圖,所述裝置包括:獲取模組151、檢測模組152、生成模組153;其中,獲取模組151,用於獲取待檢測圖像;檢測模組152,用於基於所述待檢測圖像,確定用於表徵人體骨骼結構的骨骼關鍵點的位置訊息、以及用於表徵人體輪廓的輪廓關鍵點的位置訊息;生成模組153,用於基於所述骨骼關鍵點的位置訊息、以及所述輪廓關鍵點的位置訊息,生成人體檢測結果。15 is a schematic diagram of a human body detection device provided by an embodiment of the present disclosure. The device includes: an acquisition module 151, a detection module 152, and a generation module 153; wherein, the acquisition module 151 is used to acquire The image to be detected; the detection module 152, based on the image to be detected, is used to determine the position information of the bone key points used to characterize the human skeletal structure and the position information of the contour key points used to characterize the outline of the human body; The module 153 is configured to generate a human body detection result based on the position information of the bone key points and the position information of the contour key points.
一種可能的實施方式中,所述輪廓關鍵點包括主輪廓關鍵點和輔助輪廓關鍵點;其中,兩個相鄰的所述主輪廓關鍵點之間存在至少一個輔助輪廓關鍵點。In a possible implementation manner, the contour key points include a main contour key point and an auxiliary contour key point; wherein there is at least one auxiliary contour key point between two adjacent main contour key points.
一種可能的實施方式中,所述檢測模組152,用於採用下述方式基於所述待檢測圖像,確定用於表徵人體輪廓的輪廓關鍵點的位置訊息:基於所述待檢測圖像,確定所述主輪廓關鍵點的位置訊息;基於所述主輪廓關鍵點的位置訊息,確定人體輪廓訊息;基於確定的所述人體輪廓訊息,確定多個所述輔助輪廓關鍵點的位置訊息果。In a possible implementation manner, the detection module 152 is configured to determine the position information of key contour points used to characterize the contour of the human body based on the image to be detected in the following manner: based on the image to be detected, Determine the position information of the main contour key points; determine the human body contour information based on the position information of the main contour key points; determine the position information results of a plurality of the auxiliary contour key points based on the determined human contour information.
一種可能的實施方式中,所述人體檢測結果包括下述一種或者多種:添加有骨骼關鍵點標記、以及輪廓關鍵點標記的待檢測圖像;包括所述骨骼關鍵點的位置訊息以及所述輪廓關鍵點的位置訊息的數據組。In a possible implementation manner, the human body detection result includes one or more of the following: an image to be detected added with bone key point markers and contour key point markers; including position information of the bone key points and the contour The data group of the location information of the key point.
一種可能的實施方式中,該人體檢測裝置還包括:執行模組154,用於基於所述人體檢測結果,執行下述操作中一種或者多種:人體動作識別、人體姿態檢測、人體輪廓調整、人體圖像編輯、以及人體貼圖。In a possible implementation manner, the human body detection device further includes: an execution module 154 for performing one or more of the following operations based on the human body detection result: human body motion recognition, human body posture detection, human body contour adjustment, and human body Image editing, and body stickers.
一種可能的實施方式中,所述檢測模組152,用於採用下述方式基於所述待檢測圖像,確定用於表徵人體骨骼結構的骨骼關鍵點的位置訊息、以及用於表徵人體輪廓的輪廓關鍵點的位置訊息:基於所述待檢測圖像,進行特徵提取以獲得骨骼特徵及輪廓特徵,並將得到的骨骼特徵和輪廓特徵進行特徵融合;基於特徵融合結果,確定所述骨骼關鍵點的位置訊息、以及所述輪廓關鍵點的位置訊息。In a possible implementation manner, the detection module 152 is configured to determine the position information of key bone points used to characterize the skeleton structure of the human body and the information used to characterize the outline of the human body based on the image to be detected in the following manner: Location information of contour key points: based on the image to be detected, perform feature extraction to obtain bone features and contour features, and perform feature fusion on the obtained bone features and contour features; determine the bone key points based on the result of feature fusion The location information of and the location information of the key points of the outline.
一種可能的實施方式中,所述檢測模組152,用於採用下述方式基於所述待檢測圖像進行特徵提取以獲得骨骼特徵及輪廓特徵,並將得到的骨骼特徵和輪廓特徵進行特徵融合:基於所述待檢測圖像,進行至少一次特徵提取,並將每次特徵提取得到的骨骼特徵以及輪廓特徵進行特徵融合,其中,在進行多次特徵提取的情況下,基於第i次特徵融合的特徵融合結果進行第i+1次特徵提取,i為正整數;所述檢測模組152,用於採用下述方式基於特徵融合結果,確定用於表徵人體骨骼結構的骨骼關鍵點的位置訊息、以及用於表徵人體輪廓的輪廓關鍵點的位置訊息:基於最後一次特徵融合的特徵融合結果,確定所述骨骼關鍵點的位置訊息、以及所述輪廓關鍵點的位置訊息。In a possible implementation manner, the detection module 152 is configured to perform feature extraction based on the image to be detected in the following manner to obtain bone features and contour features, and perform feature fusion on the obtained bone features and contour features : Perform at least one feature extraction based on the image to be detected, and perform feature fusion on the bone features and contour features obtained from each feature extraction. In the case of performing multiple feature extractions, based on the i-th feature fusion The feature fusion result of is subjected to the i+1th feature extraction, and i is a positive integer; the detection module 152 is used to determine the position information of the bone key points used to characterize the human skeletal structure based on the feature fusion result in the following manner And the position information of the contour key points used to characterize the contour of the human body: based on the feature fusion result of the last feature fusion, the position information of the bone key points and the position information of the contour key points are determined.
一種可能的實施方式中,所述檢測模組152,用於採用下述方式基於所述待檢測圖像,進行至少一次特徵提取:在第一次特徵提取中,使用預先訓練的第一特徵提取網路從待檢測圖像中提取用於表徵人體骨骼特徵的骨骼關鍵點的第一目標骨骼特徵矩陣,以及用於表徵人體輪廓特徵的輪廓關鍵點的第一目標輪廓特徵矩陣;在第i+1次特徵提取中,使用預先訓練的第二特徵提取網路從第i次特徵融合的特徵融合結果中,提取用於表徵人體骨骼特徵的骨骼關鍵點的第一目標骨骼特徵矩陣;並提取用於表徵人體輪廓特徵的輪廓關鍵點的第一目標輪廓特徵矩陣;其中,第一特徵提取網路和第二特徵提取網路的網路參數不同,且不同次的特徵提取使用的第二特徵提取網路的網路參數不同。In a possible implementation manner, the detection module 152 is configured to perform at least one feature extraction based on the image to be detected in the following manner: in the first feature extraction, a pre-trained first feature extraction is used The network extracts the first target skeleton feature matrix used to characterize the skeleton key points of the human skeleton feature and the first target contour feature matrix used to represent the contour key points of the human contour feature from the image to be detected; in the i+th In the first feature extraction, the pre-trained second feature extraction network is used to extract the first target bone feature matrix of the bone key points used to characterize the human bone feature from the feature fusion result of the i-th feature fusion; and the extraction is used The first target contour feature matrix for the contour key points that characterize the contour features of the human body; wherein the network parameters of the first feature extraction network and the second feature extraction network are different, and the second feature extraction used for different times of feature extraction The network parameters of the network are different.
一種可能的實施方式中,所述檢測模組152,用於採用下述方式將提取得到的骨骼特徵和輪廓特徵進行特徵融合:使用預先訓練的特徵融合神經網路對所述第一目標骨骼特徵矩陣、以及所述第一目標輪廓特徵矩陣進行特徵融合,得到第二目標骨骼特徵矩陣和第二目標輪廓特徵矩陣;其中,所述第二目標骨骼特徵矩陣為三維骨骼特徵矩陣,該三維骨骼特徵矩陣包括與各個骨骼關鍵點分別對應的二維骨骼特徵矩陣;所述二維骨骼特徵矩陣中每個元素的值,表徵與該元素對應的像素點屬對應骨骼關鍵點的概率;所述第二目標輪廓特徵矩陣為三維輪廓特徵矩陣,該三維輪廓特徵矩陣包括與各個輪廓關鍵點分別對應的二維輪廓特徵矩陣;所述二維輪廓特徵矩陣中每個元素的值,表徵與該元素對應的像素點屬對應輪廓關鍵點的概率;不同次特徵融合使用的特徵融合神經網路的網路參數不同。In a possible implementation manner, the detection module 152 is configured to perform feature fusion of the extracted bone features and contour features in the following manner: use a pre-trained feature fusion neural network to perform feature fusion on the first target bone feature Matrix and the first target contour feature matrix are feature fused to obtain a second target skeleton feature matrix and a second target contour feature matrix; wherein, the second target skeleton feature matrix is a three-dimensional skeleton feature matrix, and the three-dimensional skeleton feature The matrix includes a two-dimensional bone feature matrix corresponding to each bone key point; the value of each element in the two-dimensional bone feature matrix represents the probability that the pixel corresponding to the element belongs to the corresponding bone key point; the second The target contour feature matrix is a three-dimensional contour feature matrix. The three-dimensional contour feature matrix includes a two-dimensional contour feature matrix corresponding to each contour key point; the value of each element in the two-dimensional contour feature matrix represents the corresponding element The probability that a pixel belongs to the key point of the corresponding contour; the network parameters of the feature fusion neural network used in different sub-feature fusion are different.
一種可能的實施方式中,所述檢測模組152,用於採用下述方式基於最後一次特徵融合的特徵融合結果,確定所述骨骼關鍵點的位置訊息、以及所述輪廓關鍵點的位置訊息:基於最後一次特徵融合得到的第二目標骨骼特徵矩陣,確定所述骨骼關鍵點的位置訊息;以及基於最後一次特徵融合得到的第二目標輪廓特徵矩陣,確定所述輪廓關鍵點的位置訊息。In a possible implementation manner, the detection module 152 is configured to determine the position information of the bone key points and the position information of the contour key points based on the feature fusion result of the last feature fusion in the following manner: Determine the position information of the skeleton key points based on the second target skeleton feature matrix obtained in the last feature fusion; and determine the position information of the contour key points based on the second target contour feature matrix obtained in the last feature fusion.
一種可能的實施方式中,第一特徵提取網路包括:共有特徵提取網路、第一骨骼特徵提取網路以及第一輪廓特徵提取網路;所述檢測模組152,用於採用下述方式使用第一特徵提取網路從待檢測圖像中提取用於表徵人體骨骼特徵的骨骼關鍵點的第一目標骨骼特徵矩陣;並提取用於表徵人體輪廓特徵的輪廓關鍵點的第一目標輪廓特徵矩陣:In a possible implementation manner, the first feature extraction network includes: a common feature extraction network, a first bone feature extraction network, and a first contour feature extraction network; the detection module 152 is configured to adopt the following methods Use the first feature extraction network to extract the first target bone feature matrix of the bone key points used to characterize the human bone features from the image to be detected; and extract the first target contour feature used to characterize the contour key points of the human body contour feature matrix:
使用所述共有特徵提取網路對所述待檢測圖像進行卷積處理,得到包含骨骼特徵以及輪廓特徵的基礎特徵矩陣;使用所述第一骨骼特徵提取網路對所述基礎特徵矩陣進行卷積處理,得到第一骨骼特徵矩陣,並從所述第一骨骼特徵提取網路中的第一目標卷積層獲取第二骨骼特徵矩陣;基於所述第一骨骼特徵矩陣以及所述第二骨骼特徵矩陣,得到所述第一目標骨骼特徵矩陣;所述第一目標卷積層為所述第一骨骼特徵提取網路中,除最後一層卷積層外的其他任一卷積層;使用所述第一輪廓特徵提取網路,對所述基礎特徵矩陣進行卷積處理,得到第一輪廓特徵矩陣,並從所述第一輪廓特徵提取網路中的第二目標卷積層獲取第二輪廓特徵矩陣;基於所述第一輪廓特徵矩陣以及所述第二輪廓特徵矩陣,得到所述第一目標輪廓特徵矩陣;所述第二目標卷積層為所述第一輪廓特徵提取網路中,除最後一層卷積層外的其他任一卷積層。Use the common feature extraction network to perform convolution processing on the to-be-detected image to obtain a basic feature matrix containing bone features and contour features; use the first bone feature extraction network to convolve the basic feature matrix Product processing to obtain a first bone feature matrix, and obtain a second bone feature matrix from the first target convolution layer in the first bone feature extraction network; based on the first bone feature matrix and the second bone feature Matrix to obtain the first target bone feature matrix; the first target convolutional layer is any convolutional layer except the last convolutional layer in the first bone feature extraction network; using the first contour The feature extraction network performs convolution processing on the basic feature matrix to obtain a first contour feature matrix, and obtains a second contour feature matrix from the second target convolution layer in the first contour feature extraction network; The first contour feature matrix and the second contour feature matrix are used to obtain the first target contour feature matrix; the second target convolutional layer is in the first contour feature extraction network, except for the last convolutional layer Any other convolutional layer.
一種可能的實施方式中,所述檢測模組152,用於採用下述方式基於所述第一骨骼特徵矩陣以及所述第二骨骼特徵矩陣,得到所述第一目標骨骼特徵矩陣:將所述第一骨骼特徵矩陣以及所述第二骨骼特徵矩陣進行拼接處理,得到第一拼接骨骼特徵矩陣;In a possible implementation manner, the detection module 152 is configured to obtain the first target bone feature matrix based on the first bone feature matrix and the second bone feature matrix in the following manner: Performing splicing processing on the first bone feature matrix and the second bone feature matrix to obtain the first spliced bone feature matrix;
對所述第一拼接骨骼特徵矩陣進行維度變換處理,得到所述第一目標骨骼特徵矩陣;Performing dimensional transformation processing on the first spliced bone feature matrix to obtain the first target bone feature matrix;
所述基於所述第一輪廓特徵矩陣以及所述第二輪廓特徵矩陣,得到所述第一目標輪廓特徵矩陣,包括:將所述第一輪廓特徵矩陣以及所述第二輪廓特徵矩陣進行拼接處理,得到第一拼接輪廓特徵矩陣;對所述第一拼接輪廓特徵矩陣進行維度變換處理,得到所述第一目標輪廓特徵矩陣;其中,所述第一目標骨骼特徵矩陣的維度與所述第一目標輪廓特徵矩陣的維度相同、且所述第一目標骨骼特徵矩陣與所述第一目標輪廓特徵矩陣在相同維度上的維數相同。The obtaining the first target contour characteristic matrix based on the first contour characteristic matrix and the second contour characteristic matrix includes: performing stitching processing on the first contour characteristic matrix and the second contour characteristic matrix , Obtain the first spliced contour feature matrix; perform dimensional transformation processing on the first spliced contour feature matrix to obtain the first target contour feature matrix; wherein the dimensions of the first target skeleton feature matrix are the same as those of the first The dimensions of the target contour feature matrix are the same, and the dimensions of the first target skeleton feature matrix and the first target contour feature matrix are the same in the same dimension.
一種可能的實施方式中,所述特徵融合神經網路包括:第一卷積神經網路、第二卷積神經網路、第一變換神經網路、以及第二變換神經網路;In a possible implementation manner, the feature fusion neural network includes: a first convolutional neural network, a second convolutional neural network, a first transformation neural network, and a second transformation neural network;
所述檢測模組152,用於採用下述方式使用特徵融合神經網路對所述第一目標骨骼特徵矩陣、以及所述第一目標輪廓特徵矩陣進行特徵融合,得到第二目標骨骼特徵矩陣和第二目標輪廓特徵矩陣:使用所述第一卷積神經網路對所述第一目標骨骼特徵矩陣進行卷積處理,得到第一中間骨骼特徵矩陣;以及使用所述第二卷積神經網路對所述第一目標輪廓特徵矩陣進行卷積處理,得到第一中間輪廓特徵矩陣;將所述第一中間輪廓特徵矩陣與所述第一目標骨骼特徵矩陣進行拼接處理,得到第一拼接特徵矩陣;並使用所述第一變換神經網路對所述第一拼接特徵矩陣進行維度變換,得到所述第二目標骨骼特徵矩陣;將所述第一中間骨骼特徵矩陣與所述第一目標輪廓特徵矩陣進行拼接處理,得到第二拼接特徵矩陣,並使用所述第二變換神經網路對所述第二拼接特徵矩陣進行維度變換,得到所述第二目標輪廓特徵矩陣。The detection module 152 is configured to perform feature fusion on the first target skeleton feature matrix and the first target contour feature matrix using a feature fusion neural network in the following manner to obtain a second target skeleton feature matrix and Second target contour feature matrix: using the first convolutional neural network to perform convolution processing on the first target bone feature matrix to obtain a first intermediate bone feature matrix; and using the second convolutional neural network Perform convolution processing on the first target contour feature matrix to obtain a first intermediate contour feature matrix; perform splicing processing on the first intermediate contour feature matrix and the first target bone feature matrix to obtain a first spliced feature matrix And use the first transformation neural network to perform dimensional transformation on the first splicing feature matrix to obtain the second target skeleton feature matrix; combine the first intermediate skeleton feature matrix with the first target contour feature The matrix is spliced to obtain a second spliced feature matrix, and the second spliced feature matrix is dimensionally transformed using the second transformation neural network to obtain the second target contour feature matrix.
一種可能的實施方式中,所述特徵融合神經網路包括:第一定向卷積神經網路、第二定向卷積神經網路、第三卷積神經網路、第四卷積神經網路、第三變換神經網路、以及第四變換神經網路;In a possible implementation manner, the feature fusion neural network includes: a first directional convolutional neural network, a second directional convolutional neural network, a third convolutional neural network, and a fourth convolutional neural network , The third transformation neural network, and the fourth transformation neural network;
所述檢測模組152,用於採用下述方式使用特徵融合神經網路對所述第一目標骨骼特徵矩陣、以及所述第一目標輪廓特徵矩陣進行特徵融合,得到第二目標骨骼特徵矩陣和第二目標輪廓特徵矩陣:使用所述第一定向卷積神經網路對所述第一目標骨骼特徵矩陣進行定向卷積處理,得到第一定向骨骼特徵矩陣;並使用第三卷積神經網路對所述第一定向骨骼特徵矩陣進行卷積處理,得到第二中間骨骼特徵矩陣;以及使用所述第二定向卷積神經網路對所述第一目標輪廓特徵矩陣進行定向卷積處理,得到第一定向輪廓特徵矩陣;並使用第四卷積神經網路對所述第一定向輪廓特徵矩陣進行卷積處理,得到第二中間輪廓特徵矩陣;將所述第二中間輪廓特徵矩陣與所述第一目標骨骼特徵矩陣進行拼接處理,得到第三拼接特徵矩陣;並使用第三變換神經網路對所述第三拼接特徵矩陣進行維度變換,得到所述第二目標骨骼特徵矩陣;將所述第二中間骨骼特徵矩陣與所述第一目標輪廓特徵矩陣進行拼接處理,得到第四拼接特徵矩陣,並使用第四變換神經網路對所述第四拼接特徵矩陣進行維度變換,得到所述第二目標輪廓特徵矩陣。The detection module 152 is configured to perform feature fusion on the first target skeleton feature matrix and the first target contour feature matrix using a feature fusion neural network in the following manner to obtain a second target skeleton feature matrix and Second target contour feature matrix: use the first directional convolutional neural network to perform directional convolution processing on the first target bone feature matrix to obtain the first directional bone feature matrix; and use the third convolutional neural network The network performs convolution processing on the first directional skeleton feature matrix to obtain a second intermediate skeleton feature matrix; and uses the second directional convolutional neural network to perform directional convolution on the first target contour feature matrix Processing to obtain a first directional contour feature matrix; and use a fourth convolutional neural network to perform convolution processing on the first directional contour feature matrix to obtain a second intermediate contour feature matrix; convert the second intermediate contour The feature matrix is spliced with the first target bone feature matrix to obtain a third spliced feature matrix; and a third transformation neural network is used to perform dimensional transformation on the third spliced feature matrix to obtain the second target bone feature Matrix; the second intermediate skeleton feature matrix and the first target contour feature matrix are spliced to obtain a fourth spliced feature matrix, and the fourth spliced feature matrix is dimensionally transformed using a fourth transformation neural network , To obtain the second target contour feature matrix.
一種可能的實施方式中,所述特徵融合神經網路包括:位移估計神經網路、第五變換神經網路;In a possible implementation manner, the feature fusion neural network includes: a displacement estimation neural network and a fifth transformation neural network;
所述檢測模組152,用於採用下述方式使用特徵融合神經網路對所述第一目標骨骼特徵矩陣、以及所述第一目標輪廓特徵矩陣進行特徵融合,得到第二目標骨骼特徵矩陣和第二目標輪廓特徵矩陣:對所述第一目標骨骼特徵矩陣和所述第一目標輪廓特徵矩陣進行拼接處理,得到第五拼接特徵矩陣;所述第五拼接特徵矩陣輸入至所述位移估計神經網路中,對預先確定的多組關鍵點對進行位移估計,得到每組關鍵點對中的一個關鍵點移動至另一關鍵點的位移訊息;將每組關鍵點對中的每個關鍵點分別作為當前關鍵點,從與該當前關鍵點配對的另一關鍵點對應的三維特徵矩陣中,獲取與所述配對的另一關鍵點對應的二維特徵矩陣;根據從所述配對的另一關鍵點到所述當前關鍵點的位移訊息,對所述配對的另一關鍵點對應的二維特徵矩陣中的元素進行位置變換,得到與所述當前關鍵點對應的位移特徵矩陣;針對每個骨骼關鍵點,將該骨骼關鍵點對應的二維特徵矩陣,與該骨骼關鍵點對應的各個位移特徵矩陣進行拼接處理,得到該骨骼關鍵點的拼接二維特徵矩陣;並將該骨骼關鍵點的拼接二維特徵矩陣輸入至所述第五變換神經網路,得到與該骨骼關鍵點對應的目標二維特徵矩陣;基於各個骨骼關鍵點分別對應的目標二維特徵矩陣,生成所述第二目標骨骼特徵矩陣;針對每個輪廓關鍵點,將該輪廓關鍵點對應的二維特徵矩陣,與該骨骼關鍵點對應的各個位移特徵矩陣進行拼接處理,得到該輪廓關鍵點的拼接二維特徵矩陣;並將該輪廓關鍵點的拼接二維特徵矩陣輸入至所述第五變換神經網路,得到與該輪廓關鍵點對應的目標二維特徵矩陣;基於各個輪廓關鍵點分別對應的目標二維特徵矩陣,生成所述第二目標輪廓特徵矩陣。The detection module 152 is configured to perform feature fusion on the first target skeleton feature matrix and the first target contour feature matrix using a feature fusion neural network in the following manner to obtain a second target skeleton feature matrix and The second target contour feature matrix: the first target skeleton feature matrix and the first target contour feature matrix are spliced to obtain a fifth spliced feature matrix; the fifth spliced feature matrix is input to the displacement estimation nerve In the network, the displacement estimation is performed on a plurality of predetermined key point pairs, and the displacement information of one key point in each key point pair to another key point is obtained; each key point in each key point pair is obtained As the current key point, respectively, from the three-dimensional feature matrix corresponding to the other key point paired with the current key point, the two-dimensional feature matrix corresponding to the other key point of the pair is obtained; For the displacement information from the key point to the current key point, perform position transformation on the element in the two-dimensional feature matrix corresponding to the other key point of the pair to obtain the displacement feature matrix corresponding to the current key point; for each Bone key points, the two-dimensional feature matrix corresponding to the bone key point, and each displacement feature matrix corresponding to the bone key point are spliced to obtain the spliced two-dimensional feature matrix of the bone key point; Combine the two-dimensional feature matrix and input it into the fifth transformation neural network to obtain the target two-dimensional feature matrix corresponding to the bone key point; generate the second target based on the target two-dimensional feature matrix corresponding to each bone key point Skeleton feature matrix; for each contour key point, the two-dimensional feature matrix corresponding to the contour key point and each displacement feature matrix corresponding to the bone key point are spliced to obtain the spliced two-dimensional feature matrix of the contour key point; And input the spliced two-dimensional feature matrix of the contour key points into the fifth transform neural network to obtain the target two-dimensional feature matrix corresponding to the contour key point; based on the target two-dimensional feature matrix corresponding to each contour key point , Generating the second target contour feature matrix.
一種可能的實施方式中,所述人體檢測方法通過人體檢測模型實現;所述人體檢測模型包括:所述第一特徵提取網路和/或所述特徵融合神經網路;所述人體檢測模型為利用訓練樣本集中的樣本圖像訓練得到的,所述樣本圖像標注有人體骨骼結構的骨骼關鍵點的實際位置訊息、以及人體輪廓的輪廓關鍵點的實際位置訊息。In a possible implementation manner, the human body detection method is implemented by a human body detection model; the human body detection model includes: the first feature extraction network and/or the feature fusion neural network; the human body detection model is It is obtained by training using sample images in the training sample set, and the sample images are marked with actual position information of bone key points of the human skeletal structure and actual position information of the contour key points of the outline of the human body.
關於裝置中的各模組的處理流程、以及各模組之間的交互流程的描述可以參照上述方法實施例中的相關說明,這裡不再詳述。For the description of the processing flow of each module in the device and the interaction flow between each module, reference may be made to the relevant description in the above method embodiment, which will not be described in detail here.
本公開實施例還提供了一種電腦設備,如圖16所示,為本公開實施例提供的電腦設備結構示意圖,包括:The embodiment of the present disclosure also provides a computer device, as shown in FIG. 16, which is a schematic structural diagram of the computer device provided by the embodiment of the present disclosure, including:
處理器11、儲存媒體12和匯流排13;儲存媒體12用於儲存執行指令,包括隨機存取記憶體121和外部記憶體122;這裡的隨機存取記憶體121也稱隨機存取記憶體儲器,用於暫時存放處理器11中的處理數據,以及與硬盤等外部記憶體122交換的數據,處理器11通過隨機存取記憶體121與外部記憶體122進行數據交換,當所述電腦設備100運行的情況下,所述處理器11與所述儲存媒體12之間通過匯流排13通信,使得所述處理器11在執行以下指令:獲取待檢測圖像;基於所述待檢測圖像,確定用於表徵人體骨骼結構的骨骼關鍵點的位置訊息、以及用於表徵人體輪廓的輪廓關鍵點的位置訊息;基於所述骨骼關鍵點的位置訊息、以及所述輪廓關鍵點的位置訊息,生成人體檢測結果。The
本公開實施例還提供一種電腦可讀取儲存媒體,該電腦可讀取儲存媒體上儲存有電腦程式,該電腦程式被處理器運行的情況下執行上述方法實施例中所述的人體檢測方法的步驟。The embodiment of the present disclosure also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program is executed by a processor to execute the human body detection method described in the above method embodiment step.
本公開實施例所提供的人體檢測方法的電腦程式產品,包括儲存了程式代碼的電腦可讀取儲存媒體,所述程式代碼包括的指令可用於執行上述方法實施例中所述的人體檢測方法的步驟,具體可參見上述方法實施例,在此不再贅述。The computer program product of the human body detection method provided by the embodiment of the present disclosure includes a computer-readable storage medium storing program code. The program code includes instructions that can be used to execute the human body detection method described in the above method embodiment. For details of the steps, please refer to the above method embodiments, which will not be repeated here.
所屬領域的技術人員可以清楚地瞭解到,為描述的方便和簡潔,上述描述的系統和裝置的具體工作過程,可以參考前述方法實施例中的對應過程,在此不再贅述。在本公開所提供的幾個實施例中,應該理解到,所揭露的系統、裝置和方法,可以通過其它的方式實現。以上所描述的裝置實施例僅僅是示意性的,例如,所述單元的劃分,僅僅為一種邏輯功能劃分,實際實現的情況下可以有另外的劃分方式,又例如,多個單元或組件可以結合或者可以積體到另一個系統,或一些特徵可以忽略,或不執行。另一點,所顯示或討論的相互之間的耦合或直接耦合或通信連接可以是通過一些通信介面,裝置或單元的間接耦合或通信連接,可以是電性、機械或其它的形式。Those skilled in the art can clearly understand that, for the convenience and conciseness of the description, the specific working process of the system and device described above can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, device, and method may be implemented in other ways. The device embodiments described above are merely illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined. Or it can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some communication interfaces, devices or units, and may be in electrical, mechanical or other forms.
所述作為分離部件說明的單元可以是或者也可以不是實體上分開的,作為單元顯示的部件可以是或者也可以不是物理單元,即可以位於一個地方,或者也可以分佈到多個網路單元上。可以根據實際的需要選擇其中的部分或者全部單元來實現本實施例方案的目的。The unit described as a separate component may or may not be physically separated, and the component displayed as a unit may or may not be a physical unit, that is, it may be located in one place, or may be distributed to multiple network units . Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
另外,在本公開各個實施例中的各功能單元可以積體在一個處理單元中,也可以是各個單元單獨實體存在,也可以兩個或兩個以上單元積體在一個單元中。In addition, the functional units in the various embodiments of the present disclosure may be integrated in one processing unit, or each unit may exist separately, or two or more units may be integrated in one unit.
所述功能如果以軟體功能單元的形式實現並作為獨立的產品銷售或使用的情況下,可以儲存在一個處理器可執行的非易失的電腦可讀取儲存媒體中。基於這樣的理解,本公開的技術方案本質上或者說對現有技術做出貢獻的部分或者該技術方案的部分可以以軟體產品的形式體現出來,該電腦軟體產品儲存在一個儲存媒體中,包括若干指令用以使得一台電腦設備(可以是個人電腦、伺服器,或者網路設備等)執行本公開各個實施例所述方法的全部或部分步驟。而前述的儲存媒體包括:USB、移動硬碟、唯讀記憶體(Read-Only Memory,ROM)、隨機存取記憶體(Random Access Memory,RAM)、磁碟或者光碟等各種可以儲存程式代碼的媒體。If the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a non-volatile computer-readable storage medium executable by a processor. Based on this understanding, the technical solution of the present disclosure essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including several The instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present disclosure. The aforementioned storage media include: USB, removable hard disk, Read-Only Memory (Read-Only Memory, ROM), Random Access Memory (Random Access Memory, RAM), magnetic disks or optical disks, etc., which can store program codes media.
最後應說明的是:以上所述實施例,僅為本公開的具體實施方式,用以說明本公開的技術方案,而非對其限制,本公開的保護範圍並不局限於此,儘管參照前述實施例對本公開進行了詳細的說明,本領域的普通技術人員應當理解:任何熟悉本技術領域的技術人員在本公開揭露的技術範圍內,其依然可以對前述實施例所記載的技術方案進行修改或可輕易想到變化,或者對其中部分技術特徵進行等同替換;而這些修改、變化或者替換,並不使相應技術方案的本質脫離本公開實施例技術方案的精神和範圍,都應涵蓋在本公開的保護範圍之內。因此,本公開的保護範圍應所述以申請專利範圍的保護範圍為准。Finally, it should be noted that the above-mentioned embodiments are only specific implementations of the present disclosure, which are used to illustrate the technical solutions of the present disclosure, rather than limit it. The protection scope of the present disclosure is not limited to this, although referring to the foregoing The embodiments describe the present disclosure in detail, and those of ordinary skill in the art should understand that any person skilled in the art can still modify the technical solutions described in the foregoing embodiments within the technical scope disclosed in the present disclosure. Or it can be easily conceived of changes, or equivalent replacements of some of the technical features; and these modifications, changes or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present disclosure, and should be covered by the present disclosure. Within the scope of protection. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the patent application.
11:處理器 12:儲存媒體 121:隨機存取記憶體 122:外部記憶體 13:匯流排 151:獲取模組 152:檢測模組 153:生成模組 154:執行模組 LC1:第二損失 LC2:第六損失 LC3:第七損失 LS1:第一損失 LS2:第五損失 LS3:第八損失 S101~S103、S401~S403、S601~S604、S801~S804、S1101~S1106:步驟11: processor 12: storage media 121: Random Access Memory 122: external memory 13: Bus 151: Get Mods 152: Detection Module 153: Generate Module 154: Execution Module LC1: second loss LC2: sixth loss LC3: seventh loss LS1: first loss LS2: fifth loss LS3: Eighth loss S101~S103, S401~S403, S601~S604, S801~S804, S1101~S1106: steps
為了更清楚地說明本公開實施例的技術方案,下面將對實施例中所需要使用的附圖作簡單地介紹,應當理解,以下附圖僅出於說明目的示出了本公開的某些實施例,並不具有限制性,對於本領域普通技術人員來講,在不付出創造性勞動的前提下,還可以根據這些附圖獲得其他相關的附圖。在附圖中相同或相似的附圖標記代表同一要素或等同要素,一旦某一附圖標記在一個附圖中被定義,則在隨後的附圖中不需要對其進行進一步定義和解釋。 圖1示出了本公開實施例所提供的一種人體檢測方法的流程圖。 圖2a示出了本公開實施例所提供的一種輪廓關鍵點及骨骼關鍵點的位置示例。 圖2b示出了本公開實施例所提供的一種主輪廓關鍵點及輔助輪廓關鍵點的位置示例。 圖2c示出了本公開實施例所提供的另一種主輪廓關鍵點及輔助輪廓關鍵點的位置示例。 圖2d示出了本公開實施例所提供的另一種主輪廓關鍵點及輔助輪廓關鍵點的位置示例。 圖3示出了本公開實施例所提供的一種第一特徵提取網路的結構示意圖。 圖4示出了本公開實施例所提供的特徵提取方法的流程圖。 圖5示出了本公開實施例所提供的一種特徵融合網路的結構示意圖。 圖6示出了本公開實施例所提供的特徵融合方法的流程圖。 圖7示出了本公開實施例所提供的另一種特徵融合網路的結構示意圖。 圖8示出了本公開實施例所提供的另一種特徵融合方法的流程圖。 圖9a示出了本公開實施例所提供的一種使用散射卷積算子進行迭代更新過程的示意圖。 圖9b示出了本公開實施例所提供的一種使用聚集卷積算子進行迭代更新過程的示意圖。 圖10示出了本公開實施例所提供的另一種特徵融合網路的結構示意圖。 圖11示出了本公開實施例所提供的另一種特徵融合方法的流程圖。 圖12示出了本公開實施例所提供的骨骼關鍵點和輪廓關鍵點的示例。 圖13示出了本公開實施例所提供的對二維特徵矩陣中的元素進行位移變換的具體示例。 圖14示出了本公開實施例所提供的一種第二特徵提取網路的結構示意圖。 圖15示出了本公開實施例所提供的一種人體檢測裝置的示意圖。 圖16示出了本公開實施例所提供的一種電腦設備的示意圖。In order to explain the technical solutions of the embodiments of the present disclosure more clearly, the following will briefly introduce the drawings that need to be used in the embodiments. It should be understood that the following drawings only show some implementations of the present disclosure for illustrative purposes. The examples are not restrictive. For those of ordinary skill in the art, without creative work, other related drawings can be obtained from these drawings. The same or similar reference signs in the drawings represent the same or equivalent elements. Once a reference sign is defined in one drawing, it does not need to be further defined and explained in subsequent drawings. Fig. 1 shows a flowchart of a human body detection method provided by an embodiment of the present disclosure. Fig. 2a shows an example of the positions of contour key points and bone key points provided by an embodiment of the present disclosure. Fig. 2b shows an example of the positions of the main contour key points and the auxiliary contour key points provided by the embodiments of the present disclosure. Fig. 2c shows another example of the positions of the main contour key points and the auxiliary contour key points provided by the embodiments of the present disclosure. Fig. 2d shows another example of the positions of the main contour key points and the auxiliary contour key points provided by the embodiments of the present disclosure. Fig. 3 shows a schematic structural diagram of a first feature extraction network provided by an embodiment of the present disclosure. Fig. 4 shows a flowchart of a feature extraction method provided by an embodiment of the present disclosure. Fig. 5 shows a schematic structural diagram of a feature fusion network provided by an embodiment of the present disclosure. Fig. 6 shows a flowchart of a feature fusion method provided by an embodiment of the present disclosure. Fig. 7 shows a schematic structural diagram of another feature fusion network provided by an embodiment of the present disclosure. Fig. 8 shows a flowchart of another feature fusion method provided by an embodiment of the present disclosure. Fig. 9a shows a schematic diagram of an iterative update process using a scattering convolution operator provided by an embodiment of the present disclosure. Fig. 9b shows a schematic diagram of an iterative update process using a clustered convolution operator provided by an embodiment of the present disclosure. FIG. 10 shows a schematic structural diagram of another feature fusion network provided by an embodiment of the present disclosure. Fig. 11 shows a flowchart of another feature fusion method provided by an embodiment of the present disclosure. Fig. 12 shows examples of bone key points and contour key points provided by an embodiment of the present disclosure. FIG. 13 shows a specific example of performing displacement transformation on elements in a two-dimensional feature matrix provided by an embodiment of the present disclosure. FIG. 14 shows a schematic structural diagram of a second feature extraction network provided by an embodiment of the present disclosure. FIG. 15 shows a schematic diagram of a human body detection device provided by an embodiment of the present disclosure. FIG. 16 shows a schematic diagram of a computer device provided by an embodiment of the present disclosure.
S101~S103:步驟S101~S103: steps
Claims (15)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910926373.4 | 2019-09-27 | ||
CN201910926373.4A CN110705448B (en) | 2019-09-27 | 2019-09-27 | Human body detection method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
TW202112306A TW202112306A (en) | 2021-04-01 |
TWI742690B true TWI742690B (en) | 2021-10-11 |
Family
ID=69196895
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW109117278A TWI742690B (en) | 2019-09-27 | 2020-05-25 | Method and apparatus for detecting a human body, computer device, and storage medium |
Country Status (9)
Country | Link |
---|---|
US (1) | US20210174074A1 (en) |
EP (1) | EP3828765A4 (en) |
JP (1) | JP7101829B2 (en) |
KR (1) | KR20210038436A (en) |
CN (1) | CN110705448B (en) |
AU (1) | AU2020335016A1 (en) |
SG (1) | SG11202101794SA (en) |
TW (1) | TWI742690B (en) |
WO (1) | WO2021057027A1 (en) |
Families Citing this family (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110705448B (en) * | 2019-09-27 | 2023-01-20 | 北京市商汤科技开发有限公司 | Human body detection method and device |
CN111291793B (en) * | 2020-01-20 | 2023-11-14 | 北京大学口腔医学院 | Element classification method, device and storage medium for grid curved surface |
CN111476291B (en) * | 2020-04-03 | 2023-07-25 | 南京星火技术有限公司 | Data processing method, device and storage medium |
CN111640197A (en) * | 2020-06-09 | 2020-09-08 | 上海商汤智能科技有限公司 | Augmented reality AR special effect control method, device and equipment |
CN112633196A (en) * | 2020-12-28 | 2021-04-09 | 浙江大华技术股份有限公司 | Human body posture detection method and device and computer equipment |
CN113486751B (en) * | 2021-06-29 | 2023-07-04 | 西北大学 | Pedestrian feature extraction method based on graph convolution and edge weight attention |
CN113469018B (en) * | 2021-06-29 | 2024-02-23 | 中北大学 | Multi-modal interactive behavior recognition method based on RGB and three-dimensional skeleton |
CN113743257B (en) * | 2021-08-20 | 2024-05-14 | 江苏大学 | Construction overhead operation instability state detection method integrating space-time characteristics |
CN113837306B (en) * | 2021-09-29 | 2024-04-12 | 南京邮电大学 | Abnormal behavior detection method based on human body key point space-time diagram model |
CN114299288A (en) * | 2021-12-23 | 2022-04-08 | 广州方硅信息技术有限公司 | Image segmentation method, device, equipment and storage medium |
CN114519666B (en) * | 2022-02-18 | 2023-09-19 | 广州方硅信息技术有限公司 | Live image correction method, device, equipment and storage medium |
CN115019386B (en) * | 2022-04-15 | 2024-06-14 | 北京航空航天大学 | Exercise assisting training method based on deep learning |
CN114926610A (en) * | 2022-05-27 | 2022-08-19 | 北京达佳互联信息技术有限公司 | Position determination model training method, position determination method, device and medium |
CN115050101B (en) * | 2022-07-18 | 2024-03-22 | 四川大学 | Gait recognition method based on fusion of skeleton and contour features |
CN115273154B (en) * | 2022-09-26 | 2023-01-17 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | Thermal infrared pedestrian detection method and system based on edge reconstruction and storage medium |
WO2024121900A1 (en) * | 2022-12-05 | 2024-06-13 | Nec Corporation | Key-point associating apparatus, key-point associating method, and non-transitory computer-readable storage medium |
CN115661138B (en) * | 2022-12-13 | 2023-03-21 | 北京大学第三医院(北京大学第三临床医学院) | Human skeleton contour detection method based on DR image |
WO2024143593A1 (en) * | 2022-12-27 | 2024-07-04 | 주식회사 엔씨소프트 | Electronic device, method, and computer-readable storage medium for acquiring information indicating shape of body from one or more images |
CN116137074A (en) * | 2023-02-22 | 2023-05-19 | 常熟理工学院 | Automatic detection method and system for passengers in elevator car |
CN116434335B (en) * | 2023-03-30 | 2024-04-30 | 东莞理工学院 | Method, device, equipment and storage medium for identifying action sequence and deducing intention |
CN117315791B (en) * | 2023-11-28 | 2024-02-20 | 杭州华橙软件技术有限公司 | Bone action recognition method, device and storage medium |
CN118068318B (en) * | 2024-04-17 | 2024-06-28 | 德心智能科技(常州)有限公司 | Multimode sensing method and system based on millimeter wave radar and environment sensor |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103733227A (en) * | 2012-06-14 | 2014-04-16 | 索弗特凯耐提克软件公司 | Three-dimensional object modelling fitting & tracking |
CN104537608A (en) * | 2014-12-31 | 2015-04-22 | 深圳市中兴移动通信有限公司 | Image processing method and device |
CN105550678A (en) * | 2016-02-03 | 2016-05-04 | 武汉大学 | Human body motion feature extraction method based on global remarkable edge area |
CN109255783A (en) * | 2018-10-19 | 2019-01-22 | 上海摩象网络科技有限公司 | A kind of position of skeleton key point on more people's images is arranged detection method |
CN110059522A (en) * | 2018-01-19 | 2019-07-26 | 北京市商汤科技开发有限公司 | Human body contour outline critical point detection method, image processing method, device and equipment |
Family Cites Families (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4728795B2 (en) * | 2005-12-15 | 2011-07-20 | 日本放送協会 | Person object determination apparatus and person object determination program |
JP5253588B2 (en) * | 2009-02-25 | 2013-07-31 | 本田技研工業株式会社 | Capturing and recognizing hand postures using internal distance shape related methods |
CN102831380A (en) * | 2011-06-15 | 2012-12-19 | 康佳集团股份有限公司 | Body action identification method and system based on depth image induction |
US8786680B2 (en) * | 2011-06-21 | 2014-07-22 | Disney Enterprises, Inc. | Motion capture from body mounted cameras |
JP2014089665A (en) * | 2012-10-31 | 2014-05-15 | Toshiba Corp | Image processor, image processing method, and image processing program |
CN103679175B (en) * | 2013-12-13 | 2017-02-15 | 电子科技大学 | Fast 3D skeleton model detecting method based on depth camera |
CN103955680B (en) * | 2014-05-20 | 2017-05-31 | 深圳市赛为智能股份有限公司 | Action identification method and device based on Shape context |
CN108229468B (en) * | 2017-06-28 | 2020-02-21 | 北京市商汤科技开发有限公司 | Vehicle appearance feature recognition and vehicle retrieval method and device, storage medium and electronic equipment |
CN107705355A (en) * | 2017-09-08 | 2018-02-16 | 郭睿 | A kind of 3D human body modeling methods and device based on plurality of pictures |
CN108229308A (en) * | 2017-11-23 | 2018-06-29 | 北京市商汤科技开发有限公司 | Recongnition of objects method, apparatus, storage medium and electronic equipment |
CN108038469B (en) * | 2017-12-27 | 2019-10-25 | 百度在线网络技术(北京)有限公司 | Method and apparatus for detecting human body |
CN109508625A (en) * | 2018-09-07 | 2019-03-22 | 咪咕文化科技有限公司 | Emotional data analysis method and device |
CN109242868B (en) * | 2018-09-17 | 2021-05-04 | 北京旷视科技有限公司 | Image processing method, image processing device, electronic equipment and storage medium |
US11335027B2 (en) * | 2018-09-28 | 2022-05-17 | Hewlett-Packard Development Company, L.P. | Generating spatial gradient maps for a person in an image |
CN109902659B (en) * | 2019-03-15 | 2021-08-20 | 北京字节跳动网络技术有限公司 | Method and apparatus for processing human body image |
CN110084161B (en) * | 2019-04-17 | 2023-04-18 | 中山大学 | Method and system for rapidly detecting key points of human skeleton |
CN110197117B (en) * | 2019-04-18 | 2021-07-06 | 北京奇艺世纪科技有限公司 | Human body contour point extraction method and device, terminal equipment and computer readable storage medium |
CN110111418B (en) * | 2019-05-15 | 2022-02-25 | 北京市商汤科技开发有限公司 | Method and device for creating face model and electronic equipment |
CN110135375B (en) * | 2019-05-20 | 2021-06-01 | 中国科学院宁波材料技术与工程研究所 | Multi-person attitude estimation method based on global information integration |
CN110705448B (en) * | 2019-09-27 | 2023-01-20 | 北京市商汤科技开发有限公司 | Human body detection method and device |
-
2019
- 2019-09-27 CN CN201910926373.4A patent/CN110705448B/en active Active
-
2020
- 2020-04-29 KR KR1020207037358A patent/KR20210038436A/en not_active Application Discontinuation
- 2020-04-29 WO PCT/CN2020/087826 patent/WO2021057027A1/en unknown
- 2020-04-29 AU AU2020335016A patent/AU2020335016A1/en not_active Abandoned
- 2020-04-29 EP EP20853555.9A patent/EP3828765A4/en not_active Withdrawn
- 2020-04-29 JP JP2020572391A patent/JP7101829B2/en active Active
- 2020-04-29 SG SG11202101794SA patent/SG11202101794SA/en unknown
- 2020-05-25 TW TW109117278A patent/TWI742690B/en active
-
2021
- 2021-02-22 US US17/181,376 patent/US20210174074A1/en not_active Abandoned
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103733227A (en) * | 2012-06-14 | 2014-04-16 | 索弗特凯耐提克软件公司 | Three-dimensional object modelling fitting & tracking |
CN104537608A (en) * | 2014-12-31 | 2015-04-22 | 深圳市中兴移动通信有限公司 | Image processing method and device |
CN105550678A (en) * | 2016-02-03 | 2016-05-04 | 武汉大学 | Human body motion feature extraction method based on global remarkable edge area |
CN110059522A (en) * | 2018-01-19 | 2019-07-26 | 北京市商汤科技开发有限公司 | Human body contour outline critical point detection method, image processing method, device and equipment |
CN109255783A (en) * | 2018-10-19 | 2019-01-22 | 上海摩象网络科技有限公司 | A kind of position of skeleton key point on more people's images is arranged detection method |
Also Published As
Publication number | Publication date |
---|---|
US20210174074A1 (en) | 2021-06-10 |
JP7101829B2 (en) | 2022-07-15 |
SG11202101794SA (en) | 2021-04-29 |
CN110705448B (en) | 2023-01-20 |
EP3828765A4 (en) | 2021-12-08 |
AU2020335016A1 (en) | 2021-04-15 |
KR20210038436A (en) | 2021-04-07 |
JP2022503426A (en) | 2022-01-12 |
CN110705448A (en) | 2020-01-17 |
WO2021057027A1 (en) | 2021-04-01 |
TW202112306A (en) | 2021-04-01 |
EP3828765A1 (en) | 2021-06-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI742690B (en) | Method and apparatus for detecting a human body, computer device, and storage medium | |
CN111275518B (en) | Video virtual fitting method and device based on mixed optical flow | |
Chen et al. | Fsrnet: End-to-end learning face super-resolution with facial priors | |
CN107103613B (en) | A kind of three-dimension gesture Attitude estimation method | |
CN108596974B (en) | Dynamic scene robot positioning and mapping system and method | |
US11417095B2 (en) | Image recognition method and apparatus, electronic device, and readable storage medium using an update on body extraction parameter and alignment parameter | |
WO2021135827A1 (en) | Line-of-sight direction determination method and apparatus, electronic device, and storage medium | |
EP3971841A1 (en) | Three-dimensional model generation method and apparatus, and computer device and storage medium | |
Liao et al. | Model-free distortion rectification framework bridged by distortion distribution map | |
CN110288614A (en) | Image processing method, device, equipment and storage medium | |
CN108921926A (en) | A kind of end-to-end three-dimensional facial reconstruction method based on single image | |
CN109063584B (en) | Facial feature point positioning method, device, equipment and medium based on cascade regression | |
CN107657664B (en) | Image optimization method and device after face expression synthesis, storage medium and computer equipment | |
CN109325995B (en) | Low-resolution multi-view hand reconstruction method based on hand parameter model | |
JP2019096113A (en) | Processing device, method and program relating to keypoint data | |
CN112734890B (en) | Face replacement method and device based on three-dimensional reconstruction | |
CN109948441B (en) | Model training method, image processing method, device, electronic equipment and computer readable storage medium | |
CN112560648B (en) | SLAM method based on RGB-D image | |
CN110188667A (en) | It is a kind of based on tripartite fight generate network face ajust method | |
CN112184886A (en) | Image processing method and device, computer equipment and storage medium | |
CN110321452A (en) | A kind of image search method based on direction selection mechanism | |
CN116863044A (en) | Face model generation method and device, electronic equipment and readable storage medium | |
CN114373040B (en) | Three-dimensional model reconstruction method and acquisition terminal | |
CN113592021B (en) | Stereo matching method based on deformable and depth separable convolution | |
CN115439309A (en) | Method for training clothes deformation model, virtual fitting method and related device |